Why not use the Journal Impact Factor: a Seven Point Primer

By Iddo on October 11th, 2013

After a series of tweets and a couple of Facebook posts about the problems of the Journal Impact Factor (JIF), I was approached by a colleague who asked me: “so why are you obsessed with this”? My answer was that it irks me that I have to use the JIF next to my publications in so many different reports (grant reports, university annual activities, proposals, etc.) since it is a bad metric to evaluate the merit of my papers, and as a scientist, I do not like using bad metrics.

I assume that many readers of my blog constitute the proverbial choir on which my preaching would be wasted. Specifically, those who understand what the Thomson-Reuters Journal Impact Factor is, and how became such a poorly-understood and overused and abused metric. However, for those who have no idea what I am talking about, or for those who are thinking “what is wrong with the Impact Factor”? this post would hopefully be informative, if not valuable. It is a brief post. There was a lot written about the JIF, and the plausible alternatives that can be used to assess journal quality and impact, and I provide a list of further reading material sources at the end. Finally, for those who, like me, think that there are many wrong things with this metric and its use, and would like to convey that information, I hopefully provide some basic arguments.

What is the Journal Impact Factor?

The JIF is supposed to be a proxy for a journal’s impact: i.e. how much influence the journal has in the scientific community. It is calculated as follows:

A = number of times that articles published in the journal in years X-1 and X-2 from this journal were cited in year X

B = number of citable items in the journal in years X-1 and X-2

JIF(X) = A/B

Seems simple enough. The ratio of the number of citations to the number of publications. The higher the ratio, the more the articles are being cited. Therefore, the journal’s impact is higher.

Why is the JIF a bad metric? There are several reasons.

I. The JIF does not measure what it’s supposed to measure

1. The distribution of citations over articles in highly biased

An editorial in Nature from 2005 stated that:

For example, we have analysed the citations of individual papers in Nature and found that 89% of last year’s figure was generated by just 25% of our papers.

Furthermore:

The most cited Nature paper from 2002–03 was the mouse genome, published in December 2002. That paper represents the culmination of a great enterprise, but is inevitably an important point of reference rather than an expression of unusually deep mechanistic insight. So far it has received more than 1,000 citations. Within the measurement year of 2004 alone, it received 522 citations. Our next most cited paper from 2002–03 (concerning the functional organization of the yeast proteome) received 351 citations that year. Only 50 out of the roughly 1,800 citable items published in those two years received more than 100 citations in 2004. The great majority of our papers received fewer than 20 citations. (Emphases added my me).

— Nature 435, 1003-1004 (23 June 2005)

So there are few papers with very high impact factor. The distribution of different citations per papers is enormously skewed. Not that one would expect that all research published in a high-impact journal would be the same, but with such a skewed distribution it is evident that only a small minority of papers contribute to the journal’s impact factor. So it is not the research that is published on the whole in Nature that makes it so impactful. It is a handful of high impact papers that granted Nature its high impact factor. A recent analysis by Björn Brembs and colleagues shows that this is pretty much the situation all around. Even more so: the higher the overall JIF, the more skewed the citations-per-paper distribution. So each year, the impact factor of high-impact journal is supported by a few “superstar” papers. Using a mean as a measure of central tendency in a skewed distribution is meaningless. (Get it? Meaningless. Haha. Groan.)

2. The JIF is a negotiated, irreproducible metric

Look at the equation describing how JIF is calculated. Note that the definition of B, the denominator, is “citable items”. If follows that the lower B is, the higher a journal’s overall impact factor. So the value of B is important for the overall impact factor. How, exactly, is the number of citable items in each journal determined? Yeah, good luck with answering that. From the editors of PLoS Medicine:

During discussions with Thomson Scientific over which article types in PLoS Medicine the company deems as “citable,” it became clear that the process of determining a journal’s impact factor is unscientific and arbitrary. After one in-person meeting, a telephone conversation, and a flurry of e-mail exchanges, we came to realize that Thomson Scientific has no explicit process for deciding which articles other than original research articles it deems as citable. We conclude that science is currently rated by a process that is itself unscientific, subjective, and secretive.

During the course of our discussions with Thompson Scientific, PLoS Medicine’s potential impact factor— based on the same articles published in the same year—seesawed between as much as 11 (when only research articles are entered into the denominator) to less than 3 (when almost all article types in the magazine section are included, as Thomson Scientific had initially done).

— The PLoS Medicine Editors (2006) The Impact Factor Game. PLoS Med 3(6): e291.

So the JIF not an objectively determined metric. Rather, it is determined by negotiation and committee. Not the kind of metric you would use to objectively assess anything.

3. The JIF can be inflated by delays between online and print publications.

Many articles have two publication dates: upon acceptance, the article gets published in the journal’s online site. This is known as a “pre-publication” or “prepub”. The second date can be a few weeks or months later, when the publication coincides with the journal’s print date. The latter is often used as the “official” date of publication. However, the article has existed for a while before the official date, gaining readership and perhaps citations. An article by Tort and colleagues shows that this lag is a highly significant contributor to a journal’s JIF.

CleverButClueless

II. The JIF does not measure what people think it should measure

4. The JIF is abused as a surrogate measure and as a predictive metric

The JIF is commonly used as a measure by hiring committees, promotion and tenure committees and some grant review panels to predict a scientist’s success. The logic in doing so is as follows: if she publishes in a “good place”, then her research has merit, and will have impact in the scientific community. Furthermore, if she published in good places, she will continue to do so. Those assessing institutions have good reason to need a predictive measure: universities and grant agencies invest money in their faculty, and would like to know if these scientists would be successful. But the JIF is a poor metric for assessing, let alone predicting, an individual scientist’s impact. In fact, many scientific councils urge universities, institutes and grant agencies NOT to use the JIF for these purposes. Moreover, Thomson-Reuters themselves maintain:

In the case of academic evaluation for tenure it is sometimes inappropriate to use the impact of the source journal to estimate the expected frequency of a recently published article. Again, the impact factor should be used with informed peer review. Citation frequencies for individual articles are quite varied.

That is a very qualified statement, and since Thomson-Reuters have a vested interest in the use of the JIF in the community, I cannot blame them. I contend that it is totally inappropriate and unscientific to use the JIF of publications as a predictor for an individual’s impact, now or in the future. That is simply not what the JIF measures!

Furthermore, JIF is a poor predictor for individual article impact. Since the distribution of citations-per-paper is so skewed, it follows that one cannot predict how well a paper will be cited based on the impact factor of the journal it was accepted in. Yet people continue to assess article importance based on “where it got into”. This type of assessment as a surrogate measure is poor practice, and it biases reader’s impression as to the merit of the article.

III. The “Impact Factor culture” is bad for science and science publication

5. JIF bias is exacerbated by poor editorial policies

Some journals limit the number of citations that can be used in an article. This encourages a more subtle form of bad science writing practice. The author, faced with a limit on the number of possible cited items, would tend to cite review articles rather than the original research articles. The reason being that one review article may cover material in several research articles. Consequently, this inflates the number of citations review articles get, and “steals citations” from research articles. As a result, journals which publish review articles (either fully or partially) get a higher impact factor. As it is, review articles are more highly cited than research articles, but limitations on number of citations exacerbate this situation. The dynamic I described renders the JIF even more unreliable, favoring review articles over research articles.

6. Papers get retracted more from high JIF ranking journals

Again, referring to Brembs’s paper: JIF is a statistical predictor for the retraction rate in a journal. In other words, the higher a journal’s JIF, the higher the frequency of papers that are retracted from that journal. Brembs does not report on why that is. There are probably several contributing factors: a high-profile publication gets to be read ans scrutinized more; competitors in the field may be “out to get you”. But also, there may be pressure to publish high and therefore “cut corners” when performing research. We don’t know whether retractions from high ranking journals are due to a higher fraction of poor papers than in low ranking journals, or whether that is because the papers in high ranking journals tend to be scrutinized more carefully be more people. Be that as it may, the correlation of JIF and retraction rate is disturbing.

7. I suspect that the extraordinary importance placed on the JIF delays scientific progress

Because of the strong incentives to publish in high-impact journals, researchers can sequester their findings for years, delaying actual communication of their research. It can take an extraordinarily long amount of time to publish findings in a high-impact journal. These journals have very low acceptance rates, so a paper can get delayed for years while the research and papers get revised and resubmitted over and over. Even worse, research would get tailored towards what is perceived as impactful and “sexy”. Another problem, raised recently by the editor in chief of Science, Bruce Alberts, is the overcrowding of fashionable fields:

But perhaps the most destructive result of any automated scoring of a researcher’s quality is the “me-too science” that it encourages. Any evaluation system in which the mere number of a researcher’s publications increases his or her score creates a strong disincentive to pursue risky and potentially groundbreaking work, because it takes years to create a new approach in a new experimental context, during which no publications should be expected. Such metrics further block innovation because they encourage scientists to work in areas of science that are already highly populated, as it is only in these fields that large numbers of scientists can be expected to reference one’s work, no matter how outstanding.

Caveat: I could not find research supporting these points, and I really only wrote it based on personal experience and conversations with colleagues. If anyone out there knows of supporting research (how would you even begin is an interesting question), let me know.

Concluding with an excerpt from an editorial by Kai Simmons, published in 2008, unfortunately still true today:

There are no numerical shortcuts for evaluating research quality. What counts is the quality of a scientist’s work wherever it is published. That quality is ultimately judged by scientists, raising the issue of the process by which scientists review each others’ research. However, unless publishers, scientists, and institutions make serious efforts to change how the impact of each individual scientist’s work is determined, the scientific community will be doomed to live by the numerically driven motto, “survival by your impact factors.”