Why not use the Journal Impact Factor: a Seven Point Primer
What is the Journal Impact Factor?
The JIF is supposed to be a proxy for a journal’s impact: i.e. how much influence the journal has in the scientific community. It is calculated as follows:
A = number of times that articles published in the journal in years X-1 and X-2 from this journal were cited in year X
JIF(X) = A/B
Seems simple enough. The ratio of the number of citations to the number of publications. The higher the ratio, the more the articles are being cited. Therefore, the journal’s impact is higher.
Why is the JIF a bad metric? There are several reasons.
I. The JIF does not measure what it’s supposed to measure
1. The distribution of citations over articles in highly biased
An editorial in Nature from 2005 stated that:
For example, we have analysed the citations of individual papers in Nature and found that 89% of last year’s figure was generated by just 25% of our papers.
The most cited Nature paper from 2002–03 was the mouse genome, published in December 2002. That paper represents the culmination of a great enterprise, but is inevitably an important point of reference rather than an expression of unusually deep mechanistic insight. So far it has received more than 1,000 citations. Within the measurement year of 2004 alone, it received 522 citations. Our next most cited paper from 2002–03 (concerning the functional organization of the yeast proteome) received 351 citations that year. Only 50 out of the roughly 1,800 citable items published in those two years received more than 100 citations in 2004. The great majority of our papers received fewer than 20 citations. (Emphases added my me).— Nature 435, 1003-1004 (23 June 2005)
2. The JIF is a negotiated, irreproducible metric
Look at the equation describing how JIF is calculated. Note that the definition of B, the denominator, is “citable items”. If follows that the lower B is, the higher a journal’s overall impact factor. So the value of B is important for the overall impact factor. How, exactly, is the number of citable items in each journal determined? Yeah, good luck with answering that. From the editors of PLoS Medicine:
During discussions with Thomson Scientific over which article types in PLoS Medicine the company deems as “citable,” it became clear that the process of determining a journal’s impact factor is unscientific and arbitrary. After one in-person meeting, a telephone conversation, and a flurry of e-mail exchanges, we came to realize that Thomson Scientific has no explicit process for deciding which articles other than original research articles it deems as citable. We conclude that science is currently rated by a process that is itself unscientific, subjective, and secretive.
During the course of our discussions with Thompson Scientific, PLoS Medicine’s potential impact factor— based on the same articles published in the same year—seesawed between as much as 11 (when only research articles are entered into the denominator) to less than 3 (when almost all article types in the magazine section are included, as Thomson Scientific had initially done).
— The PLoS Medicine Editors (2006) The Impact Factor Game. PLoS Med 3(6): e291.
3. The JIF can be inflated by delays between online and print publications.
Many articles have two publication dates: upon acceptance, the article gets published in the journal’s online site. This is known as a “pre-publication” or “prepub”. The second date can be a few weeks or months later, when the publication coincides with the journal’s print date. The latter is often used as the “official” date of publication. However, the article has existed for a while before the official date, gaining readership and perhaps citations. An article by Tort and colleagues shows that this lag is a highly significant contributor to a journal’s JIF.
II. The JIF does not measure what people think it should measure
4. The JIF is abused as a surrogate measure and as a predictive metric
The JIF is commonly used as a measure by hiring committees, promotion and tenure committees and some grant review panels to predict a scientist’s success. The logic in doing so is as follows: if she publishes in a “good place”, then her research has merit, and will have impact in the scientific community. Furthermore, if she published in good places, she will continue to do so. Those assessing institutions have good reason to need a predictive measure: universities and grant agencies invest money in their faculty, and would like to know if these scientists would be successful. But the JIF is a poor metric for assessing, let alone predicting, an individual scientist’s impact. In fact, many scientific councils urge universities, institutes and grant agencies NOT to use the JIF for these purposes. Moreover, Thomson-Reuters themselves maintain:
In the case of academic evaluation for tenure it is sometimes inappropriate to use the impact of the source journal to estimate the expected frequency of a recently published article. Again, the impact factor should be used with informed peer review. Citation frequencies for individual articles are quite varied.
III. The “Impact Factor culture” is bad for science and science publication
5. JIF bias is exacerbated by poor editorial policies
Some journals limit the number of citations that can be used in an article. This encourages a more subtle form of bad science writing practice. The author, faced with a limit on the number of possible cited items, would tend to cite review articles rather than the original research articles. The reason being that one review article may cover material in several research articles. Consequently, this inflates the number of citations review articles get, and “steals citations” from research articles. As a result, journals which publish review articles (either fully or partially) get a higher impact factor. As it is, review articles are more highly cited than research articles, but limitations on number of citations exacerbate this situation. The dynamic I described renders the JIF even more unreliable, favoring review articles over research articles.
6. Papers get retracted more from high JIF ranking journals
Again, referring to Brembs’s paper: JIF is a statistical predictor for the retraction rate in a journal. In other words, the higher a journal’s JIF, the higher the frequency of papers that are retracted from that journal. Brembs does not report on why that is. There are probably several contributing factors: a high-profile publication gets to be read ans scrutinized more; competitors in the field may be “out to get you”. But also, there may be pressure to publish high and therefore “cut corners” when performing research. We don’t know whether retractions from high ranking journals are due to a higher fraction of poor papers than in low ranking journals, or whether that is because the papers in high ranking journals tend to be scrutinized more carefully be more people. Be that as it may, the correlation of JIF and retraction rate is disturbing.
7. I suspect that the extraordinary importance placed on the JIF delays scientific progress
Because of the strong incentives to publish in high-impact journals, researchers can sequester their findings for years, delaying actual communication of their research. It can take an extraordinarily long amount of time to publish findings in a high-impact journal. These journals have very low acceptance rates, so a paper can get delayed for years while the research and papers get revised and resubmitted over and over. Even worse, research would get tailored towards what is perceived as impactful and “sexy”. Another problem, raised recently by the editor in chief of Science, Bruce Alberts, is the overcrowding of fashionable fields:
But perhaps the most destructive result of any automated scoring of a researcher’s quality is the “me-too science” that it encourages. Any evaluation system in which the mere number of a researcher’s publications increases his or her score creates a strong disincentive to pursue risky and potentially groundbreaking work, because it takes years to create a new approach in a new experimental context, during which no publications should be expected. Such metrics further block innovation because they encourage scientists to work in areas of science that are already highly populated, as it is only in these fields that large numbers of scientists can be expected to reference one’s work, no matter how outstanding.
Caveat: I could not find research supporting these points, and I really only wrote it based on personal experience and conversations with colleagues. If anyone out there knows of supporting research (how would you even begin is an interesting question), let me know.
Concluding with an excerpt from an editorial by Kai Simmons, published in 2008, unfortunately still true today:
There are no numerical shortcuts for evaluating research quality. What counts is the quality of a scientist’s work wherever it is published. That quality is ultimately judged by scientists, raising the issue of the process by which scientists review each others’ research. However, unless publishers, scientists, and institutions make serious efforts to change how the impact of each individual scientist’s work is determined, the scientific community will be doomed to live by the numerically driven motto, “survival by your impact factors.”
- Deep impact: unintended consequences of journal rank
- The San Francisco Declaration on Research Assessment
- Cash for papers: putting a premium on publications
- Impact factor distortions
- The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations
Comments are closed.