Taming the Impact Factor
Quite a bit has been written about how the journal impact factor (JIF) is a bad metric. The JIF is supposed to measure a journal’s impact using a formula that normalizes the number of cited articles in a given time frame (typically a year). It is calculated exclusively by Thomson-Reuters, and is trademarked by this company.
Reminder: a journal’s impact factor (JIF) is calculated as follows:
Journal X’s 2008 impact factor =
Citations in 2008 (in journals indexed by Thomson-Reuters Scientific) to all articles published by Journal X in 2007–2008
divided by
Number of articles deemed to be “citable” by Thomson-Reuters Scientific that were published in Journal X in 2007–2008.
Among the criticisms made at the JIF is that it is subject to editorial manipulation (mainly by lowering the of the denominator of “citable articles”) and that it is irreproducible. The JIF is also an arithmetic mean of a non-Gaussian distribution and, as such, it is a wrong measure of central tendency to use: if forced to do so, a median would be more appropriate. The reason the distribution is non-Gaussian, is that inevitably few papers are cited a large number of times, and most papers are cited a few times on a very skewed, exponential distribution. The JIF also lacks robustness, as the JIF of a journal in any given year can be dramatically skewed by a single paper .
Moreover, the JIF is misused to evaluate individual researchers’ achievements, when in fact this is a completely wrong application of the metric, as even Thomson-Reuters state:
In the case of academic evaluation for tenure it is sometimes inappropriate to use the impact of the source journal to estimate the expected frequency of a recently published article.
The European Association of Science Editors have issued a statement that “journal impact factors are used only – and cautiously – for measuring and comparing the influence of entire journals, but not for the assessment of single papers, and certainly not for the assessment of researchers or research programmes”.
Yet journals continue to trump their impact factors as a major selling point. Every year, around this time the new JIFs are published. Editors are quick to herald new rises in their JIFs, no matter how small, without giving pause to think of the meaning of what may very well be a temporal noise (1.5%? Really?).
One question which I think has not been looked into is the statistical significance of differences between journals’ impact factors. The JIFs are calculated to the third decimal point. Is an IF of 4.872 really different than 4.875? Is 6.541 significantly larger than 5.081? or 4.971? As pointed out in NeuroDojo a year ago, we need some measure of distribution to gauge whether differences between journals (or between the same journal in 2010 and 2011) are indeed meaningful.
Only the distribution of citations per journal per year is not Gaussian, it’s exponential. So using symmetrical error bars around the mean (which should really be a median, but let’s not get into this) is not appropriate here.
So: any takers for the following project?
1. Get citation data for journals X and Y, 2009, 2010
2. Find distribution of citations for X in 2009, and X in 2010. Same for Y.
3. Compare distributions D(X, 2009) with D(X, 2010) . Perhaps using the two-sample Kolmogorov-Smirnov test. (Anybody has other ideas?)
3.1 Do more of (3) for more journals, years.
4. Publish, thanking me profusely in the Acknowledgment sections.
5. Watch the citations roll in…
Now, I know what you are thinking, gentle reader. Why even bother assessing the differences between the scores of two journals using a metric that is essentially bad? Well, the JIF is bad not because of any intrinsic reason (well using the mean instead of the median kinda sucks) but mainly because it has been misused and hyped so much. The reason for this misuse and hyperbole is mainly due to ignorance of what one can and cannot measure using the JIF. Therefore, investing a bit more in understanding the JIF can let us inform others of how it can be used properly: with caution, for assessing journals (and not people or articles) and even so, using extreme caution. Let’s face it, use of the JIF is not going away anywhere soon, so it needs to be tamed.
Thanks to all the people who prompted the writing of this post, via a twitter conversation over the past couple of days:



















There is so written about the problems of Impact Factors (including by me!) that it is easy to overlook that they do have a purpose. In these days were new journals are popping up like mushrooms in a tropical rainforest, an Impact Factor does researchers recognize that a journal is a real, ongoing, viable outlet for a publication.
Impact Factor is not the only way to do this. But we should recognize that researchers need some sort of ways to figure out if publishing in a journal is better than burying a manuscript in their back yard.
[…] We really don’t care what statistical method you used (funny) ANTIBIOTICS…TIME TO CLOSE SOME LOOPHOLES Smug Promoters of Vaccine-Preventable Disease Gun violence is a U.S. public health problem (yes, it is) Taming the Impact Factor […]
[…] Taming the Impact Factor by Iddo Friedberg […]
Though, impact factor is very popular measure of journal quality, there are more robust (less popular) standards like Eigenfactor. Eigenfactor scores are intended to measure the propensity of given journal to be used. Eigen value scores are considered to reflect how often an average researcher would refer a given journal. visit: eigenfactor.org for more information