Journal and DataWhat can alt-metrics tell us about the use of digital data repositories?

Publishing findings in a peer-reviewed article is no longer the only way that a researcher can be recognised for his or her research outputs. Since non-traditional impacts of journal articles can be assessed using alt-metrics, it also makes sense to determine what alt-metrics reveal for datasets and other research outputs.

The scientific community has been moving towards increased openness, and academics have begun to make datasets, videos, presentations, and a plethora of other research outputs freely available and citable through digital repositories, notably figshare and Dryad. Many researchers have embraced this new, open way of doing science because sharing data and other research outputs can help to make research more transparent, replicable, creditable, and flexible. These and other advantages (along with their associated challenges) have been extensively covered elsewhere, such as in figshare’s post in the Wellcome Trust blog and a Commentary in BMC Research Notes.

Importantly, research outputs in digital repositories like figshare and Dryad are all accessible via DOIs and Handles. DataCite, which manages the DOIs for both figshare and Dryad, found that most frequently-resolved DOIs in January and February pointed to items in the figshare repository. Clearly, people are viewing these research outputs, and even embracing their practical use as supplementary material for papers. Academics are also starting to cite specific datasets; this can be reliably accomplished using DOIs, since those links to deposited research items will remain stable even if the actual URL changes. Mark Hahnel (founder of figshare) directed me towards a recent paper, also available as a pre-print in arXiv, in which the authors included their figshare dataset in the references section, and not simply as a supplement.

DOIs and Handles make repository items citable in scholarly articles, but they also allow Altmetric to track the associated online attention. I wondered if viewing the alt-metrics of repository items might provide a unique snapshot of the open data movement’s early days, and wanted to know what people were saying about the various research outputs that are shared online. For this week’s Interactions, I’ll take you through some conversations surrounding figshare and Dryad repository items.

 

Conversations about research outputs

A couple of key differences between the data repositories are worth mentioning before I get into the conversations. First of all, Dryad is devoted to publishing datasets only; these data must also be associated with a published, peer-reviewed journal article. In contrast, published items in figshare need not be datasets or even be peer-reviewed. Under figshare’s model, all research outputs, including but not limited to datasets, blog posts, posters, figures, audio clips, and lecture slides, may be shared.

With respect to attention surrounding figshare items, I found that all mentions originated from social media (predominantly Twitter) and blogs. Similar trends held for Dryad, although 2 items were previously mentioned in the news (e.g., this New Scientist article). Overall, mentions of these repository items could be grouped by a few general characteristics (see links for examples):

  • Discussion of data repository services and/or open data [tweet (figshare)] [blog post (figshare)]
  • Promotion and sharing of repository items by author(s) [tweet (figshare)] [blog post (figshare)]
  • Promotion, sharing, and use of repository items by users [tweet (figshare)] [blog post (Dryad)]
  • Alerts of new repository items [tweet by figshare] [tweet by Dryad]

If you use the Altmetric Explorer and would like to browse through the conversations yourself, you can view all mentioned figshare items with a filter set for the DOI prefix 10.6084; Dryad items have the DOI prefix 10.5061.

 

A survey of 100 popular figshare items

Is online attention is greater for particular kinds of research outputs on figshare? To find out, I looked at 100 of the most highly-mentioned figshare items from the Altmetric database and grouped them according to the following categories: data, document, figure, git repository, image, multimedia (including audio and video), poster, or presentation slides. I then plotted the frequency (out of 100) of each category along with average Altmetric scores. (Take a look at the data on figshare.)

Figshare data

Even though data, documents, and presentation slides were the most plentiful among the top 100 figshare items, posters actually had a very high average Altmetric score. It’s hard to guess why this would be the case, but perhaps the brief communication style of posters makes them more accessible to a wide audience of scholars. However, the average score for posters in my sample was probably skewed by the score for Priem et al.’s very popular “Prevalence and use of Twitter among scholars”, which, at the time of writing, was 303.8 (see Altmetric details)!

In terms of metrics, Altmetric calculates a score based on the number of online mentions of figshare DOIs, and figshare counts the number of views as well as social media shares. I chatted with Mark (founder of figshare) about this, and he pointed out that the current leader in figshare views (12211 views at the time of writing) is a dataset called “GenoCAD Training Set I”, which appears to be geared primarily at synthetic biologists. In spite of the incredibly high number of views (presumably by these synthetic biologists), Altmetric only saw 6 tweets (from 4 accounts), giving the dataset an Altmetric score of 2.

It may seem odd that Altmetric’s collection of highly-mentioned figshare items doesn’t resemble figshare’s highly-viewed list. In fact, Mark and I observed that only 1 item in figshare’s top 5 highly-viewed list matched Altmetric’s top 5 in the figshare set. Still, what might seem like a discrepancy actually shouldn’t be very surprising given that mentions and views measure completely different kinds of user engagement. Specifically, mentions require users to take direct action (like pressing a “tweet” button or writing a blog post), whereas viewing a page is passive. And neither mentions nor views guarantee subsequent academic engagement, such as citing a dataset in a forthcoming paper.

Whichever metric you take as the most suitable for measuring activity surrounding data repositories, people seem to be actively sharing research outputs online, with the most attention being paid towards items like posters. With respect to open data, most conversations currently appear to be centred around promotion, rather than reuse and uptake – however, dataset authors who blog about their own data have provided the most comprehensive discussions and calls to action for the open data movement. It seems safe to say that all of this promotional activity will convince more academics to share their data and research outputs, and as such, I expect that in the days to come, more digital datasets will be used and cited in the academic literature. Perhaps in another year or so, the alt-metrics of repository items will paint an even richer picture of the benefits and applications of open data.

Thanks to Mark Hahnel for his helpful comments and figshare-related links.