Coverage and Overlap of 1findr, Dimensions, Scopus, and Web of Science

Quick post here. 1science just released a new journal article and open access search tool, 1findr, today and I thought it be interesting to compare it to another similar search tool that launched a couple months ago, Dimensions.

Richard Poynder shared the below graph on twitter, but I wanted to get a look at the real numbers:

Just a quick note here before I begin. Publication counts in these tools are never static. There are always things being added, removed, and records cleaned up and merged. The numbers I have below are as of Apri 25, 2018 and will likely be different when you look at them.

EDIT: Tables on the WordPress mobile site are very hard to read. I recommend viewing the desktop site or using a mobile device with a wide screen.

So without further ado:

1findr Dimensions
Articles OA Articles % Only Articles OA Articles %
2017 3,626,489 1,173,727 32.4% 3,520,674 961,748 27.3%
2016 3,920,997 1,621,757 41.4% 3,294,695 1,077,309 32.7%
2015 3,981,667 1,772,926 44.5% 3,153,782 1,034,387 32.8%
2014 3,970,157 1,750,603 44.1% 2,972,304 935,980 31.5%
2013 3,752,255 1,625,295 43.3% 2,845,397 854,580 30.0%

Note that 1findr searches only journal articles while Dimension searches across all scholarly documents (i.e. book chapters, proceedings), so I only looked at the number of journal articles in Dimensions.

You can check these numbers yourself – or look at older years – by wildcard searching (“*”) and then filtering by year, publication type and open access.

It’s clear that 1findr indexes more scholarly articles and also finds more open access.

If we look at all the publications in Dimensions, these are the numbers:

Dimensions
All Publications All Open Access %
2017 4,315,463 1,015,118 23.5%
2016 4,072,411 1,271,029 31.2%
2015 3,942,903 1,076,957 27.3%
2014 3,798,820 977,338 25.7%
2013 3,646,656 897,109 24.6%

When you look at all the items the percentage of open access goes down because other publication formats haven’t seen the same advances in open access as journal articles have.

Also, it’s interesting here that when you compare all items in Dimensions (Journal articles + more) against all items in 1findr (Just journal articles), Dimensions has more content overall, but 1findr still finds more open access items.

Dimensions having more content is nice, but their numbers may be a little skewed because they are indexing very differently formatted things. For example, they seem to index some encyclopedia entries individually as chapters.

Let’s look at some counts for Scopus and Web of Science here. Looking at only journal articles again:

Scopus WoS
Articles OA % Articles OA %
2017 1,965,650 145,249 7.4% 2,479,945 742,013 29.9%
2016 1,941,866 127,032 6.5% 2,427,654 767,666 31.6%
2015 1,929,498 110,814 5.7% 2,322,805 720,675 31.0%
2014 1,908,806 108,667 5.7% 2,134,379 619,110 29.0%
2013 1,833,120 95,981 5.2% 2,062,518 564,397 27.4%

Web of Science here includes all of their expanded databases.

Web of Science uses Unpaywall to identify Gold , Hybrid, and Green open access. This is why Web of Science finds so much more open access than Scopus does. Scopus currently only identifies Gold open access (e.g. publications in a fully open access journal), which is why it’s OA counts are so low. They do index a lot of open access content, they just don’t identify or label it like Web of Science does.

It’s a particularly big deal that Scopus only finds Gold OA because according to this recent study, it’s actually one of the smallest categories of open access:

OA over the years

Let’s look at all four of these search tools together now. You can see them in a Google Sheet here.

OAcomparison

By size of journal article coverage and open access, we can rank these tools like so:

  1. 1findr
  2. Dimensions
  3. Web of Science
  4. Scopus

It’s kinda shocking that all of these tools have different counts that differ by millions. Especially for 1findr. Look at how they rank up against Dimensions, the next largest:

# of more articles that 1findr has # of more OA identified
2017 105,815 211,979
2016 626,302 544,448
2015 827,885 738,539
2014 997,853 814,623
2013 906,858 896,023

Also another interesting tidbit here is that Web of Science and Dimensions both do their OA identifying with Unpaywall. 1science’s 1findr uses their own OA identification tool – which some libraries have been using to do some very interesting stuff. It’ll be interesting if we see a full comparison between the two in the future.

(oaDOI is the old name for Unpaywall)

1science and Unpaywall appear to have very similar methods for finding open access except 1science appears to be using some sort of web scraping and using Google Scholar somehow?

Final thought: More, of course, does not mean better. The real strength of Web of Science and Scopus is the ability to do advanced searches, mass export items, and look at links between articles by citation. I know both Dimensions and 1findr are looking at advancing their capabilities in these areas. We are in interesting times for scholarly literature discovery.

Advertisements

About Ryan Regier

Doing Library Stuff. Follow me on twitter at: @ryregier
This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to Coverage and Overlap of 1findr, Dimensions, Scopus, and Web of Science

  1. Aaron Tay says:

    Thanks Ryan for the first interesting preliminary analysis.

    I’m curious that for Dimensions you restricted only to articles. Shouldn’t you include proceeding and
    maybe preprints in the results. For 2017, the effect of proceedings & preprints is quite significant in Dimensions it will rise to 3,885,861.> 1Findr.

    I have always suspected 1Science’s OA detection is better than unpaywall though (but Jason Priem obviously disagrees), which probably explains the greater % of OA found.

    Did you try Summon with perhaps “Add results beyond your library’s collection”? They have a Open access indicator now (Primo’s is in the May release) but I suspect won’t be as accurate.

    Another one to try is https://www.lens.org , though they don’t have open access indicator, but they are also a good guage of what Microsoft academic has.

    It’s nice that these discovery services have a “transparent” search, where you can do a search all, and then slice the results by facets. Indexes that don’t have such a search like Primo or Google Scholar, you need to do all sorts of workarounds that may not work.

    Finally I do agree that from the point of view of researchers , these differences in numbers are not that significant once you go past 60-80 million or so, you usually have most of what people are likely to be interested in and it will be the features that become important.

    I’m curious to see how the free or freemium discovery services will be positioned and whether they can eat into the traditional A&I, or web scale discovery business at east for some use cases.

  2. Ryan Regier says:

    Thanks Aaron,

    Including preprints and proceedings in the Dimension comparison is not something I considered! I was thinking I’d only focus on journal articles so the comparison was “fair” with 1findr? I tested a number of proceedings available in Dimensions to see if they were maybe labelled as journal articles and indexed by 1findr, but I couldn’t find any of them in 1findr. Clearly 1findr doesn’t index many proceedings (though I’m sure they probably do some because distinction between journal article and proceedings can be confusing), so I think it’s best i left them out here?

    I looked at preprints too and this is a bit confusing because both 1findr and Dimensions consider preprints green open access and “attach” them to the final published article. So I’m not sure if a filter would be worth it? Dimensions seems to already capture preprint OA in with Unpaywall. Also the preprint filter numbers in Dimensions are very small, only 10,000 preprints for 2017 compared to 3 million journal articles, so it would have less than a 0.01% difference on the numbers I have above.

    I agree that 1Science seems to index a bit more. Whatever they are doing with web crawling seems to suggest they capture more bronze open access. I wouldn’t be surprised if Unpaywall captures more green open access because of outreach they’ve done with libraries though.

    Searching Summon for numbers is a really good idea. I should have added that!

    I did look at Lens! Also Science Open, SciLit, and Semantic Scholar. Hahaha. None have as great OA filters as Dimensions and 1findr though. A topic for another blog post!

    Also I completely agree with the last bit. I’m still not sure how useful these tools are too researchers. For measuring OA output and doing faint hope searches they may be helpful, but otherwise….

    1science mentioned they are working on collecting their own citation database but planning on using more sources (TDM papers?) than just the open citations in crossref. It’ll be interesting to see what they do.

  3. Pingback: The Future of Library Access: Open Access Linking and “Hybrid” Interlibrary Loan | A Way of Happening

  4. David Groenewegen says:

    Ryan,

    Thanks for that comparison, very interesting. I did a much smaller scale “study” that seems to reflect your overall conclusions while perhaps shedding a bit of extra light.

    Basically I looked myself up (as you do) and found that:

    1Findr found more of my pubs (19) compared to Dimensions (10) but there were 4 duplicates in the former which raises some questions about how tidy the 1Findr database is. How much of the larger dataset is duplications?

    Incidentally I have 27 unduplicated pubs in Google Scholar. So they both have a way to go,

    One of the “OA” pubs in my list according to 1Findr had an OA copy attached (this sucking of OA articles into the 1Findr database is interesting – will it impact on download numbers in IRs?). However, the OA “copy” was not the article referred to in the record. Instead it was a conference talk I gave the same year on a similar topic, with a similar name. Somewhere along the line the algorithm had found the talk in an IR, thought the metadata was close enough and stuck them together, even though my article had a co-author and my talk didn’t. Which raises the question of how many of those “extra” OA articles are actually what they are claimed to be.

    https://1findr.1science.com/search?query=david%20groenewegen&filters%5Bauthor%5D%5B0%5D=david%20groenewegen&sort=yeardesc

    https://app.dimensions.ai/discover/publication?or_facet_researcher=ur.01076702425.20

    Still, early days, and good to see some much needed competition in the market.

    • Ryan Regier says:

      David, Thank you for this!

      I was wondering about duplication and version linking, but wasn’t sure how to approach it!

      I’ll look more into this now, much appreciated!

  5. Pingback: Proliferation of free databases of literature | Christina's LIS Rant

  6. Bill Mischo says:

    Ryan: The numbers for Scopus look to be low. It is the experience of many of us that Scopus is more comprehensive than Web of Science for recent articles. A good example is the field of LIS; Scopus covers more journals and ISI has stopped indexing a number of major LIS journals.
    The numbers I get for Scopus (doing an Advanced Search PUBYEAR IS 20xx) are:
    2,955,360 for 2017
    2,884,619 for 2016
    2,842,576 for 2015
    2,886,299 for 2014
    which puts their coverage numbers significantly higher.

    Thanks,
    Bill

  7. Ryan Regier says:

    Thanks Bill,

    I limited to Document Type = Article when I did the Scopus searches, that’s why I have a smaller number.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s