There have been two widely shared articles lately that have had some interesting statements about journal usage numbers being misleading.
From The Scholarly Kitchen – When the Wolf Finally Arrives: Big Deal Cancelations in North American Libraries
Many libraries seem to have found that demand for the content included in their Big Deals — even what they thought was “core” content — was not nearly as robust as they had believed it to be based on usage data that had been provided by the publishers.
This Nature News article on the ongoing licence struggle between Elsevier and German Universities – German scientists regain access to Elsevier journals
The loss of access to Elsevier content didn’t overly disturb academic routines, researchers say, because they found other ways to get papers they needed, or because Elsevier journals happened not to be of prime importance in their fields
In a number of cases now, subscriptions to scholarly journals that appear to be well used turn out not to be unnecessary subscriptions. What is going on here? Librarians have the usage data for these journals. The numbers! The undeniable evidence that there was a use or view! How can these be misleading?
I don’t think there is one big problem here. I think there are lots of little ones. Altogether I think they show that we need to seriously reevaluate how we calculate cost-per use. And if there truly is a need to subscribe to some journals that have high usage counts.
#1. Articles are being downloaded/viewed but aren’t used or needed
I know…. this is the most boring and obvious one. We all know that an article being downloaded doesn’t mean that the user will find value in it. That a download doesn’t mean actual use. Users often will download, read the abstract -which they could have read online for free – and then dispose.
We excuse these instances too easily though. We assume that a large amount of downloads means a decent percentage of users must have found it useful, that the number of unused downloads is about the same and averages out for each journal, and that a high usage number is the only real way we have to determine value.
We don’t take into account how much branding and discoverability of journals factor in here. I’m more likely to stumble across a less usable journal article from ScienceDirect then I am a more usable article from a smaller publisher website.
The larger the vendor/publisher, the better their discovery and branding, the more likely users will find their content, and the more likely there will be downloads of content that users don’t need or actually use. Plain and simple.
I think there are are other ways we can calculate value beyond high usage numbers. Read on.
#2. The existence of open access versions
When there is an option between an open access version and subscribed publisher version, our search tools usually default to the publisher version. Those 3000 downloads of a journal that justified you re-subscribing to it? Likely a large chunk of those downloads came from accessing articles that have green open access versions online. You’re paying for something that is already online for free.
That’s why when some journal are cancelled the demand appears to disappear. Users can find the content online elsewhere.
What can we do to fix this? Tinker with your search tools as much as you can so they link to open access versions first. That way your usage stats will be a better representation of what content you actually need to pay for. Encourage use of open access versions over pay-walled versions. Promote open access full-text finders like Unpaywall or the OA Button. Look at building open access full-text finders right into your search or discovery tools.
it would be very cool if there were a way to check for OA versions when analyzing your usage statistics. Too actually see what percentage of your use of a subscribed journal had OA copies available. Maybe designing a tool like this would be a good future project for the open access community?
This takes us nicely into my next reason though…
#3. Individual articles skewing the usage count of a journal
Perhaps the biggest problem with the Journal Impact Factor is it’s long tail. That it is influenced heavily by small numbers of highly cited papers. A very similar effect occurs with journal usage. Individual articles with high download counts raise the total for the entire journal. You may be paying for a whole journal, but almost all of your use for that journal comes from three or four articles.
Add this to the fact that maybe those three or four articles also have open access version available online, and it’s clear this is a big problem.
How would we measure this skew? Journal article-level usage metrics perhaps? Something like a citation frequency plot – which attempts to solve the long tail of the Journal Impact Factor – but shows how many accesses came from each journal article?
I think individual article use skewing our metrics might be a bigger problem than we think, but even if does turn out to be a big problem, what can we do about it?
The best option would be to reach out the the authors of that paper and convince them to make it open access.
Another option would be if we could work with publishers to set up a true Demand/Patron Driven Access (DDA or PDA) option for journal articles. With the current journal article DDA models you have to purchase a new copy of the article every time a different patron wants access. Imagine if instead we had a model similar to Ebook DDA, where a purchase means institutional and permanent access to that article. You wouldn’t have to do an individual purchase every time. You could cancel your journal subscription and just subscribe on a cheaper article-level basis.
At a certain level, some journals are a bit like much smaller Big-Deals anyways. While they do provide a curated collection of resources with a lot less junk….what we are really paying for are the “big” articles that everyone wants in the journal. If there was a way for us just to pay for institutional access to those big articles, that would be ideal.
#4. Users not using Citation Management Software
It’s a well known frustration/joke that it is easier to search across the entire internet than it is to just search your own files on your computer.
I know a lot of researchers who don’t bother storing journal article PDFs on their computer for this exact reason. Our even if they do, they can’t find them afterwards.
This results in another misleading boost in our usage statistics. It’s the same researchers coming back to access the same articles again and again.
How do we solve this? Solutions purposed in #3 would help, but it won’t solve the deeper problem, researchers not being able to find their articles again.
Promotion and adoption of citation management software is the solution here. If researchers have an easy way to rediscover their articles, they won’t need to return to the publisher webpage.
A lot of my own personal research and article downloading is spent trying to find an article I remember reading awhile back ,but can’t remember the title, journal, or any other metadata. If I had just saved that article PDF to my citation manager originally once I found it interesting or useful (which I have started doing again), this wouldn’t be a problem. I wouldn’t need to dig through loads of PDFs looking for a specific fact or graph. I would only need to search across my previously saved PDFs and find it a lot quicker.
I don’t think I’ve listed all the reasons why usage statistics can be flawed here. However, this blog post is starting to run a bit long.
I think there are clearly ways we can adjust – and use context -to analyze usage metrics to give us a better indicator of value.
Most libraries already take usage statistics with a grain of sand. They use metrics from their faculty’s publications and citations to help with determining value. The recent U of Montreal Big Deal cancellation even sent out lists for faculty to rank journals based on importance. I’ve seen articles in which libraries have used course syllabuses to see what journals were included and worth subscribing (Which I can’t find now…See my frustrations from #4).
Now, we still disagree a lot on how to determine value. There’s big debates in libraries about what counts as “good” cost-per use metric. I’ve worked at some libraries where $20.00 per article is completely acceptable, where others will cancel if it is more than $4.00. The libraries budget, size, and supporters all need to be taken into context.
However, I think if we start being a bit more critical of usage stats coming from large publishers, pushing users to open access versions, looking at article level metrics, and encouraging use of citation management software, we can come closer to the goal of usage metrics being a more accurate depiction of value.