The problem with using cost-per use analysis to justify journal subscriptions

There have been two widely shared articles lately that have had some interesting statements about journal usage numbers being misleading.

From The Scholarly Kitchen –  When the Wolf Finally Arrives: Big Deal Cancelations in North American Libraries

Many libraries seem to have found that demand for the content included in their Big Deals — even what they thought was “core” content — was not nearly as robust as they had believed it to be based on usage data that had been provided by the publishers.

This Nature News article on the ongoing licence struggle between Elsevier and German Universities – German scientists regain access to Elsevier journals

The loss of access to Elsevier content didn’t overly disturb academic routines, researchers say, because they found other ways to get papers they needed, or because Elsevier journals happened not to be of prime importance in their fields

In a number of cases now, subscriptions to scholarly journals that appear to be well used turn out not to be unnecessary subscriptions. What is going on here? Librarians have the usage data for these journals. The numbers! The undeniable evidence that there was a use or view! How can these be misleading? 

I don’t think there is one big problem here. I think there are lots of little ones. Altogether I think they show that we need to seriously reevaluate how we calculate cost-per use. And if there truly is a need to subscribe to some journals that have high usage counts.

#1. Articles are being downloaded/viewed but aren’t used or needed

I know…. this is the most boring and obvious one. We all know that an article being downloaded doesn’t mean that the user will find value in it. That a download doesn’t mean actual use. Users often will download, read the abstract -which they could have read online for free – and then dispose.

We excuse these instances too easily though. We assume that a large amount of downloads means a decent percentage of users must have found it useful, that the number of unused downloads is about the same and averages out for each journal, and that a high usage number is the only real way we have to determine value.

We don’t take into account how much branding and discoverability of journals factor in here. I’m more likely to stumble across a less usable journal article from ScienceDirect then I am a more usable article from a smaller publisher website.

The larger the vendor/publisher, the better their discovery and branding, the more likely users will find their content, and the more likely there will be downloads of content that users don’t need or actually use. Plain and simple.

I think there are are other ways we can calculate value beyond high usage numbers. Read on.

#2. The existence of open access versions

When there is an option between an open access version and subscribed publisher version, our search tools usually default to the publisher version. Those 3000 downloads of a journal that justified you re-subscribing to it? Likely a large chunk of those downloads came from accessing articles that have green open access versions online. You’re paying for something that is already online for free.

That’s why when some journal are cancelled the demand appears to disappear. Users can find the content online elsewhere.

What can we do to fix this? Tinker with your search tools as much as you can so they link to open access versions first. That way your usage stats will be a better representation of what content you actually need to pay for. Encourage use of open access versions over pay-walled versions. Promote open access full-text finders like Unpaywall or the OA Button. Look at building open access full-text finders right into your search or discovery tools.

it would be very cool if there were a way to check for OA versions when analyzing your usage statistics. Too actually see what percentage of your use of a subscribed journal had OA copies available. Maybe designing a tool like this would be a good future project for the open access community?

This takes us nicely into my next reason though…

#3. Individual articles skewing the usage count of a journal

Perhaps the biggest problem with the Journal Impact Factor is it’s long tail. That it is influenced heavily by small numbers of highly cited papers. A very similar effect occurs with journal usage. Individual articles with high download counts raise the total for the entire journal. You may be paying for a whole journal, but almost all of your use for that  journal comes from three or four articles.

Add this to the fact that maybe those three or four articles also have open access version available online, and it’s clear this is a big problem.

How would we measure this skew? Journal article-level usage metrics perhaps? Something like a citation frequency plot – which attempts to solve the long tail of the Journal Impact Factor – but shows how many accesses came from each journal article?

I think individual article use skewing our metrics might be a bigger problem than we think, but even if does turn out to be a big problem, what can we do about it?

The best option would be to reach out the the authors of that paper and convince them to make it open access.

Another option would be if we could work with publishers to set up a true Demand/Patron Driven Access (DDA or PDA) option for journal articles. With the current journal article DDA models you have to purchase a new copy of the article every time a different patron wants access. Imagine if instead we had a model similar to Ebook DDA, where a purchase means institutional and permanent access to that article. You wouldn’t have to do an individual purchase every time. You could cancel your journal subscription and just subscribe on a cheaper article-level basis.

At a certain level, some journals are a bit like much smaller Big-Deals anyways. While they do provide a curated collection of resources with a lot less junk….what we are really paying for are the “big” articles that everyone wants in the journal. If there was a way for us just to pay for institutional access to those big articles, that would be ideal.

#4. Users not using Citation Management Software

It’s a well known frustration/joke that it is easier to search across the entire internet than it is to just search your own files on your computer.

I know a lot of researchers who don’t bother storing journal article PDFs on their computer for this exact reason. Our even if they do, they can’t find them afterwards.

This results in another misleading boost in our usage statistics. It’s the same researchers coming back to access the same articles again and again.

How do we solve this? Solutions purposed in #3 would help, but it won’t solve the deeper problem, researchers not being able to find their articles again.

Promotion and adoption of citation management software is the solution here. If researchers have an easy way to rediscover their articles, they won’t need to return to the publisher webpage.

A lot of my own personal research and article downloading is spent trying to find an article I remember reading awhile back ,but can’t remember the title, journal, or any other metadata. If I had just saved that article PDF to my citation manager originally once I found it interesting or useful (which I have started doing again), this wouldn’t be a problem. I wouldn’t need to dig through loads of PDFs looking for a specific fact or graph. I would only need to search across my previously saved PDFs and find it a lot quicker.


I don’t think I’ve listed all the reasons why usage statistics can be flawed here. However, this blog post is starting to run a bit long.

I think there are clearly ways we can adjust – and use context -to analyze usage metrics to give us a better indicator of value.

Most libraries already take usage statistics with a grain of sand. They use metrics from their faculty’s publications and citations to help with determining value. The recent U of Montreal Big Deal cancellation even sent out lists for faculty to rank journals based on importance. I’ve seen articles in which libraries have used course syllabuses to see what journals were included and worth subscribing (Which I can’t find now…See my frustrations from #4).

Now, we still disagree a lot on how to determine value. There’s big debates in libraries about what counts as “good” cost-per use metric. I’ve worked at some libraries where $20.00 per article is completely acceptable, where others will cancel if it is more than $4.00. The libraries budget, size, and supporters all need to be taken into context.

However, I think if we start being a bit more critical of usage stats coming from large publishers, pushing users to open access versions, looking at article level metrics, and encouraging use of citation management software, we can come closer to the goal of usage metrics being a more accurate depiction of value.


About Ryan Regier

Doing Library Stuff. Follow me on twitter at: @ryregier
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to The problem with using cost-per use analysis to justify journal subscriptions

  1. Angela Cochran says:

    Nice post Ryan. I have a few questions or thoughts on this.

    1. Do you think there is a value in pre-publication peer review? It strikes me that the value of Green OA versions is that the user knows that it passed peer review and got the stamp of approval from the journal (otherwise the post would be about preprints). If the libraries moved en masse to accessing only green archived papers, peer review would need to be taken over by scientific communities and not facilitated by journals (given that the income to offset the expense goes away). So far, we have not seen any success in getting to that point and absolutely no funding for it.

    2. What if a publisher offered a library XXX number of downloads from it’s collection of journals for XXX amount of money? When the library exceeds the downloads, they get turned off and need to purchase more downloads. The management of this would be easily mechanized by publisher platforms, but what about libraries? I agree that institutional library patrons download first and then decide to use it later. How would the library change this behavior keeping in mind that to the patron, everything is free?

    What do you think of a shared risk model? Your subscription is priced based on a mutually agreed upon estimation of usage. If you exceed that usage, the risk is on the publisher but your rates may go up the next time around to account for it. On the flip side, you don’t use your allotted downloads and you are paying for more than you used but may be able to reduce the rates at the next negotiation.


  2. Ryan Regier says:

    Hi Angela.

    Oy. Great Questions.

    1. I honestly am still not confident in my knowledge of the current state peer review to make a real judgement or prediction here. In my post I was mostly using Green OA to refer to peer reviewed post-prints, but that’s an overly simplified way of looking at it. There will be lots of pre-prints mixed in there, and you’re right, it’s all tied into together. Libraries can’t cancel a bunch of journals without there being an effect on scholarly publishing processes.

    It seems pretty clear that no matter what we do, peer reviewers can only catch a certain percentage of problems with a paper. It’s not perfect, but some-sort of peer review is still better then nothing. I DO still see value for pre-pub peer review because it can stop some more glaring mistakes before they are “published”.

    However, I’m not sure what the solution is here if journals aren’t organizing peer review and scholarly communities can handle it either. Maybe this is only a small piece of the problem though? We still don’t really know what “good” peer review is or how to structure/incentivize it. Maybe if/when this gets resolved, scholarly communities will be willing to dive in?

    2. I’m not that big of fan of this type of pay-per-download model for journal articles. We’ve tried it before and we either go way under our allotted downloads or way over. Usage is tough to predict so many little factors that can shrink or boost it. It’s just a lose-lose model for libraries. We only win if we hit extremely close to that allotted downloads number, which is very unlikely.

    However, if we put in place the model i suggested in this post: Where if an article is purchased/downloaded it immediately becomes accessible for everyone at that institution permanently. That would be a model I would want to try. I think it’s the multiple and re-downloading of popular articles that really kills us, and this model could solve that.

    So instead of paying for a certain amount of downloads, we would pay for a certain amount of (institutional accessible) articles from a journal. Our discovery infrastructure and link resolvers aren’t really set-up to handle these kind of one-off purchases currently, but I think it could be set-up. What do you think about this kind of model?

    • Angela Cochran says:

      I think what you propose turns the value proposition on it’s head. Currently the question libraries are asking is what value a collection has to the patrons. This is measured in the number of ways you outlined. Using downloads, libraries can quantify the cost per download and use that to try and determine value (though as you note, it’s not the whole story). If all those downloads are for 25-30 articles in a collection, then the value lies in those papers alone. So how much are you willing to pay for them? If 1000 patrons in your school used the same 50 papers exclusively, how much would you expect to pay for those papers?

      What do you do with the outliers? Let’s say you have 50 papers each downloaded more than 25 times. Then you have 500 papers each downloaded less than 5 times. Do we need a tiered pricing model where you pay more for the 50 papers and less for the 500 papers? Or is it the other way around?

      Your objection is to paying for content that isn’t used but your goal is to provide access to any papers your patrons want. I can’t think of a way to make that work. But I’ve only been at it for less than a few hours. FWIW, I would like to explore the spread of papers being used. I know what our most downloaded content is and I know how many downloads a journal gets. The next step will be to show what’s not being downloaded. I’ve started doing some of this analysis with citation information but not with download information. I don’t even know if stats that granular are available to me. That’s a whole other project.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s