Commentary: When Open Access isn’t

This week, PLoS ONE published an interesting paper by Bo-Christer Björk and coworkers on the free global availability of articles from scientific journals. One of the principal findings in this study is that 20.4% of articles published in 2008 are now available as Open Access (OA):

Open Access to the Scientific Journal Literature: Situation 2009

Background: The Internet has recently made possible the free global availability of scientific journal articles. Open Access (OA) can occur either via OA scientific journals, or via authors posting manuscripts of articles published in subscription journals in open web repositories. So far there have been few systematic studies showing how big the extent of OA is, in particular studies covering all fields of science.

Methodology/Principal Findings: The proportion of peer reviewed scholarly journal articles, which are available openly in full text on the web, was studied using a random sample of 1837 titles and a web search engine. Of articles published in 2008, 8,5% were freely available at the publishers’ sites. For an additional 11,9% free manuscript versions could be found using search engines, making the overall OA percentage 20,4%. Chemistry (13%) had the lowest overall share of OA, Earth Sciences (33%) the highest. In medicine, biochemistry and chemistry publishing in OA journals was more common. In all other fields author-posted manuscript copies dominated the picture.

Conclusions/Significance: The results show that OA already has a significant positive impact on the availability of the scientific journal literature and that there are big differences between scientific disciplines in the uptake. Due to the lack of awareness of OA-publishing among scientists in most fields outside physics, the results should be of general interest to all scholars. The results should also interest academic publishers, who need to take into account OA in their business strategies and copyright policies, as well as research funders, who like the NIH are starting to require OA availability of results from research projects they fund. The method and search tools developed also offer a good basis for more in-depth studies as well as longitudinal studies.

Having just set up a mirror of the OA subset of PubMed Central, I know that it contains only ~10% of the articles deposited in PubMed Central and only ~1% of the articles indexed by PubMed. It was thus with equal doses of joy and scepticism that I read numbers reported by Bo-Christer Björk and coworkers.

It soon became clear to me that the study did not adhere to the OA definition by the Budapest Open Access Initiative, which is as follows:

By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

The Bo-Christer Björk et al. do not define what exactly they mean by OA. However, from reading their paper is is pretty clear that any article for which they can get hold of free full text is counted as OA. The license under which the copy is distributed does not to matter, and they thus count the 90% of articles in PubMed Central that are published under non-OA licenses as OA. It does not even seem to matter if the free full text is legal or not, implying that any article of which an illegal copy can be found somewhere on the web is counted as OA.

I have heard of Gold OA and Green OA. It is tempting to call this Black OA. But I won’t. Because it just isn’t OA.

11 thoughts on “Commentary: When Open Access isn’t

  1. Lars Juhl Jensen Post author

    I agree 100%. The ~2 million articles in PubMed Central are free (“gratis”) to access, but it is only the ~200,000 that I am free (“libre”) to use as I see fit. I am just a bit baffled to see an OA journal publish a paper that conflates the two.

  2. stevanharnad


    First things first. 80% of the 2.5 million articles published annually in the planet’s 25,000 peer reviewed journals are not OA: They are accessible only to those users whose institutions subscribe to the journal in which they are published. The purpose of OA is to remedy that, and Green Gratis OA articles, self-archived online free for all users webwide, provides that remedy. Green Gratis OA makes it possible to use articles exactly the way subscribers can use them. If you need more uses than that — which ones, exactly? — then please wait until most or all articles are at least Green Gratis OA before you object that Green Gratis OA is not enough. Not now, when most users don’t even have that.

  3. Lars Juhl Jensen Post author

    Dear Stevan, I’m sure that you must have had this discussion countless of times before with people like Peter Murray Rust, and I have no plan to do a rerun. To make a long story short: the people who need “libre” OA are anyone who would like to do text mining or who would like to benefit from the development of such tools.

    I am actually not arguing about Gold OA vs. Green OA. As long as the Green OA manuscripts that are deposited under a license that allows me to actually use them as I see fit, I am happy. But if they are deposited under licenses that that do not permit me to redistribute derivative works of the content, it is quite frankly not of much use to me. I acknowledge that “gratis” may be enough for some, but as a text miner I have to leave out all that content that is not “libre”.

  4. stevanharnad


    Repeating arguments unfortunately does not make them valid. Nor can practical and strategic questions be settled by definitions (especially when there is a definition corresponding to all the options).

    Yes, this is a Gratis/Libre (access/re-use) matter, not a Green/Gold (self-archiving/publishing) matter, but it so happens that the only practical strategy for going from the present 20% OA to 100% OA is Green OA (self-archiving) mandates; and Gratis OA (free access) can be mandated today but Libre OA (free access to article contents plus various re-use rights over and above free access) cannot.

    Peter Murray-Rust’s broader goal of increasing both (1) data access and (2) article content and data re-use rights (for example, text-mining) is a desirable goal, but currently it is at odds with the far more urgent — and reachable — goal of first reaching 100% free access to article content (Gratis OA).

    First things first. Reach for what is within your grasp. Don’t over-reach and miss what is within your grasp. The still-too-slow growth of Green Gratis OA self-archiving mandates is not accelerated and facilitated but slowed and encumbered, and the still dense confusion and misunderstanding surrounding OA are not resolved but enhanced and compounded by insisting that Gratis OA is not enough when we still don’t have Gratis OA and it’s fully within reach if we just keep our eyes on the ball (instead of insisting on a cure for world hunger along with OA)!

    Re-use rights (including text-mining) can wait; and data access is an entirely different matter from article content access. Do the doable, don’t disparage it. The rest will come with the territory, in due time.

    “On Patience, and Letting (Human) Nature Take Its Course”

  5. Lars Juhl Jensen Post author

    Dear Stevan, as I said I do not want to go into a long, long discussion that has already taken place several times. For you the right to mine the text can wait. For Peter Murray Rust, myself, and presumably many others it cannot. Your goal is to make the literature available for everyone to read, which I can only agree is better than the status quo. However, the problem that I am faced with on a daily basis is that there is already much more literature than I can possibly read. Making it more available for me to read thus does not help me unless you can give me more hours in a day. It is my firm belief that we need text mining to deal with the ever increasing flood of literature, and while I agree that reaching too high and getting nothing is one risk, there is also the opposite risk of reaching too low and getting too little. I think it would be most constructive that we just agree to disagree rather than to wast time on debating something that I am sure that we will never agree on.

  6. stevanharnad


    “the problem that I am faced with on a daily basis is that there is already much more literature than I can possibly read. Making it more available for me to read thus does not help me unless you can give me more hours in a day. It is my firm belief that we need text mining to deal with the ever increasing flood of literature, and while I agree that reaching too high and getting nothing is one risk, there is also the opposite risk of reaching too low and getting too little.”

    This is a non-issue, since all OA IR content (like all web content) is harvested, inverted and indexed by google, google scholar, citebase, scopus, scirus, etc. etc. That all comes with the Green Gratis OA territory.

  7. Lars Juhl Jensen Post author

    Stevan, I know that they are harvested by Google. However, I am sorry to tell you that they cannot all be freely crawled by whoever would like to do so. For example, PubMed Central explicitly disallows bulk download or crawling of their repository. If you attempt to use it systematically you will be blocked from accessing the repository.

    I am thus forced to make one of two conclusions: 1) that depositing a manuscript in PubMed Central is not Green OA according to your definition, or 2) that Green OA, in contrast to what you say, does not always allow harvesting. In case of option 1, this shows that around 90% of the biomedical papers that were counted as OA in the PLoS ONE paper that this post is about are indeed also not OA according to your own definitions. In case of option 2, I have to reject your notion that harvesting Green OA content is a non-issue.

  8. stevanharnad


    LJJ: “…they are harvested by Google [but] they cannot all be freely crawled… For example, PubMed Central explicitly disallows… crawling.”

    (1) Google harvesting answers your prior point about the need for text-mining because of information overload.

    (2) PubMed Central (PMC) should have been (and will, mark my words) be a harvester too, not (foolishly) a locus of direct deposit, as now. (The source of the foolishness — to be remedied soon — is funder mandates that needlessly insist on direct PMC deposit instead of Institutional Repository deposit plus automated PMC harvesting.)

    (3) Institutional Repositories will be the locus of deposit of their own research article output; they will allow (reasonable) crawling.

    (4) Gratis OA (free access) and Libre OA (free access plus re-use rights, including harvesting rights) continue to be distinct.

    (5) OA continues to be OA, whether Gratis or Libre.

    (6) All Green OA self-archiving continues to be OA, whether Gratis or Libre.

    (7) Most Green OA self-archiving and all Green OA self-archiving mandates continue to be Gratis OA not Libre OA (because of the extra complications and obstacles that need to be surmounted for Libre OA).

    (8) All of this is as it should be (except for the still too slow rate of adoption of Green Gratis OA mandates by institutions and funders, and the foolish and counterproductive insistence of some funder mandates on direct deposit in PMC).

    (9) All the rest (including everything you yearn for in re-use rights) will come with the territory, once Green Gratis OA mandates have had the chance to globalize.

    (10) It continues to be a retardant on this optimal, inevitable and reachable outcome to insist on *more* before we have even reached universal Green OA.

  9. Lars Juhl Jensen Post author

    Dear Stevan, thanks for the debate. I think it is time end it, though. You have made your points, I have made my points, and we simply do not agree. Considering this and that hardly anyone but us is following this thread by now, I think we both have things more important than this discussion to spend our time on.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s