博文

Contrary to what one might expect, Nobel laureates and Fields medalists have a rather large fraction (10% or more) of uncited publications. This is the case for (in total) 75 examined researchers from the fields of mathematics (Fields medalists), physics, chemistry, and physiology or medicine (Nobel laureates). We study several indicators for these researchers, including the h-index, total number of publications, average number of citations per publication, the number (and fraction) of uncited publications, and their interrelations. The most remarkable result is a positive correlation between the h-index and the number of uncited articles. We also present a Lotkaian model, which partially explains the empirically found regularities.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Introduction

We were surprised to notice that a winner of the Fields medal (the highest award in mathematics, named after John Charles Fields and awarded by the Fields Institute) had a non-negligible percentage of uncited publications. Checking some other awardees revealed that this was not an exception. For some of them, this percentage turned out to be more than 10%, excluding editorials or book reviews. This took us by surprise because we expected that top scientists, and especially mathematicians, write only publications dealing with difficult and important problems, which, once solved, lead to high quality and, hence, highly cited publications. Assuming that a publication by such a visible author is not his best work, even then one expects that because of his status, this work would not be ignored. In other words, one expects the Matthew Effect comes into play. This observation led to an investigation of uncitedness among outstanding researchers. More particularly, we looked at Fields medalists and Nobel Prize winners.

We investigated the field of mathematics (Fields medal winners: in principle four winners every 4 years) and Nobel Prize winners in the fields of physics, chemistry and physiology or medicine (one, two, or three winners a year). It became immediately clear that even among these eminent scientists having many uncited publications was quite common. In most cases, we found more than 10% uncited publication, and this over all studied fields. Of course, an uncited publication may, in principle, gain citations at some time in the future. This is related to the phenomenon of Delayed Recognition (Garfield, 1980; Gl nzel, Schlemmer, & Thijs, 2003). Publications that remain uncited for a prolonged period of time and subsequently receive several citations are known as Sleeping Beauties (van Raan, 2004; Burrell, 2005). In other words, such an—as yet—uncited publication is not necessarily a “never” cited publication; see the methodological part in the next section.

This phenomenon (uncited publications of top researchers) has apparently not yet been addressed in the literature, although Gl nzel, Debackere, Thijs, & Schubert (2006, p. 267) note in passing: “The fact that a document is less frequently cited or even (still) uncited several years after publication provides information about its reception by colleagues but does not reveal anything about its quality or the standing of its author(s) in the community. Uncited papers by Nobel Prize winners may just serve as an example.”

Data were collected during the period October-November 2010 using Thomson Reuters' Web of Science (WoS). While collecting the total number of publications and the number of uncited publications, we also collected the readily available average number of citations per publication and the author's h-index (Hirsch, 2005).

Next, we studied the scatter plots resulting from the relations between any two of the above-mentioned indicators. We especially looked for increasing or decreasing relationships. This revealed (details are given in the next section) an increasing relationship between the total number of publications and the h-index, the h-index and the average number of citations per publication, between the number of uncited publications and the total number of publications, and between the number of uncited publications and the h-index. Decreasing relationships were found between the number of uncited publications and the average number of citations per publication and between the fraction of uncited publications and the average number of citations per publication.

In the third section, we examine how these indicators are related in a continuous Lotkaian system. The Lotka system can be considered a first approximation of the reality that publications are cited in a very skewed way. In this setting, one can prove most of the decreasing and increasing relationships that were found empirically (sometimes needing an extra condition). This yields a partial explanation (“partial',” because we assume Lotkaian systems) of the above findings. Although the Lotkaian system has some drawbacks (it lacks the 0 as frequency and is only an approximation of reality), we believe that it is a good approximation that still allows for a heuristic explanation of the relations found: The mathematics of more intricate models quickly grows too complicated.

This article ends with conclusions, open problems, and other suggestions for further research.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Methodology

We obtained the list of recent Fields medalists from the website of the International Mathematical Union (http://www.mathunion.org/general/prizes/fields/prizewinners/) and recent Nobel laureates in physics, chemistry, and physiology or medicine from the website of the Nobel Prize committee (http://nobelprize.org/). We restricted the number of laureates to between 15 and 26 per discipline to find clear relations between any of the two indicators mentioned above and to obtain manageable clouds of points. Bringing all fields together already yields clouds of 75 points. In practice, we included 18 Fields medalists (mathematics) between 1990 and 2006, 16 Nobel laureates in physics between 2004 and 2010, 15 Nobel laureates in chemistry between 2004 and 2010, and 26 Nobel laureates in physiology or medicine between 1999 and 2010. Scientists with very common names were not included as it was too difficult to collect correct data. We consider it acceptable to delete a few names and think that this does not bias the data set that we used. Citation data collection took place during the period October-November 2010 from WoS (including proceedings).

Using the advanced search facility, queries were performed as follows. First, we did a search on the author's name followed by the first initial and an asterisk. This often revealed a second initial so that a specific query could be made. If such a specific query was possible, we searched for the author with one or two initials (no asterisk anymore), otherwise, we just continued with the original result. The result was then limited to “possible” subfield categories. This list was then “analyzed” for institutes. The result showed the institute or institutes where the scientist worked during his or her career and, more important, revealed homonyms working in the same field. These were deleted.

The following data were collected:

· T₁: the total number of publications

· n₁: the total number of uncited publications

· μ: the average number of citations per publication

· h: the h-index

The average number of citations per publication was obtained from Wos' citation report. The number n₁ (the total number of uncited publications) may refer to many recent publications. Further on, we will, however, consider only uncited publications published before the year 2006, because it is very unlikely that they will gain any citations later on. These publications will be called “never-cited” publications.

Of course, no one can be absolutely certain that these will never be cited, but the time period used (2005 or older) guarantees that most of these publications will indeed be never cited. To verify this claim, we examined all publications (n=332) published in 1990 in the five journals that were ranked highest in the Journal Citation Reports category Biology (ranked according to the impact factor). It was found that 70 publications had not yet been cited after 5 years; of these, only four (5.7%) gained citations in later years. Moreover, their citation numbers are rather low: 1, 1, 2, and 8, respectively (in January 2011). This small case study illustrates how extraordinary it is for publications to gain a first citation after more than 5 years. Indeed, as shown theoretically by Burrell (2002), the longer a publication remains uncited, the less likely it is to ever gain a citation. We therefore conclude that virtually all “never-cited” publications will indeed never be cited.

The never-cited publications constitute the real objective of our study. Therefore, we also collected the following data, by appropriately limiting publication years:

T₂: the total number of publications published strictly before 2006

n₂: the total number of never-cited publications, i.e., those publications included in T₂ the were uncited at the moment of data collection.

We will study not only the never-cited indicators n₂ and n₂/T₂ (the fraction of never-cited publications) in relation to the other indicators but also the relations between h, T₁, T₂ and μ. Note that we do not use versions of h or μ that are restricted to publications before 2006. We found out that the results are the same qualitatively and even quantitatively as there is almost no difference between the two “μ”-versions and the h-index just stays unchanged in almost all cases (because recent publications usually do not contribute to the h-index). Results are described in the next section.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Empirical Results

Figure 1 depicts the relation between h and T₂ (Figure 1). The corresponding scatter plot relating h and T₁ is basically identical and hence not shown. For all fields these plots show an increasing relationship and a roughly concave shape. They correspond to the results shown in (Liu, Rao, & Rousseau, 2009) for horticulture journals. In the next section we will show how this shape can be explained assuming a Lotkaian distribution.

Figure 1. Empirical relation between T₂ (horizontal axis) and h (vertical axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

The relation between h and μ is depicted in Figure 2. Although Figure 2 is more scattered than the previous figure, we can still see the (expected) increasing relation between h and μ. This will be partially explained in the next section.

Figure 2. Empirical relation between h (vertical axis) and μ (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Now we come to the main topic of this contribution, namely, uncitedness, or better never-citedness. Figure 3 shows box plots of the relative number of uncited publications per field. These illustrate that the median lies around 10% to 20% of papers that are uncited. We now turn to the relations between uncitedness and other indicators. Figure 4 clearly shows the increasing relation between n₂ (total number of never-cited publications) and T₂ (total number of publications published strictly before 2006). This is an interesting result: The more publications an author has, the more never-cited publications (in general). This scientometric observation, which we will partially explain in the next section, provides a rationale for the fact that also highly visible scientists, such as Nobel laureates, can have many never-cited publications. If we assume that some percentage of all publications will remain uncited, then an increasing relation between number of never-cited publications and total number of publications automatically follows. Of course, this is an explanation based solely on descriptive statistics. A complete explanation would have to take the content and potential impact of these never-cited publications into account; this is, however, beyond the scope of the present article.

Figure 3. Box plot of relative uncitedness (n₂/T₂) per field (a: mathematics, b: chemistry, c: physics, d: physiology or medicine). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 4. Empirical relation between n₂ (vertical axis) and T₂ (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 5 shows the relation between n₂ and μ. Although quite scattered, we may infer, at least visually, that this relation is decreasing. This visual observation is, however, not confirmed statistically, because the correlation coefficient is not significantly different from zero. So, we must admit that a decreasing trend is weak at best. Yet, on logical grounds, such a decreasing trend seems to be expected: The higher the (absolute) number of never-cited publications, the lower the average number of citations per publication (in general). In the next section, it will be shown that a decreasing relation is also expected in the Lotkaian model.

Figure 5. An empirical relation between μ (vertical axis) and n₂ (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 6 depicts an increasing relation between n₂ and h. At first sight, this might be surprising because an increasing h-index implies that publications are more cited and this should decrease n₂. However, this is not the case. Uncited publications never contribute to the h-index; hence, there exists no direct relationship between the h-index and the number of uncited publications. The increasing relationship is because of the fact that both indicators are correlated with T: there exists an increasing relation between h and T₂ (Figure 1) and an increasing relation between n₂ and T₂ (Figure 4). Hence, based on these facts, the increasing relation shown in Figure 6 is not surprising. Further explanations will be given in the sequel.

Figure 6. An empirical relation between n₂ (vertical axis) and h (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

We further investigated the relation between the fractions of never-cited publications (fraction with respect to T₂): n₂/T₂ and T₂, n₂, μ and h. Only weak relations could be found. We show only (Figure 7) the cloud of points depicting the relation between n₂/T₂ and μ. This shows a (weak) decreasing relation, which will also be proved in the next section. A referee suggested also showing the relation between n₂/T₂ and n₂. However, the resulting cloud of points is very scattered, and no conclusions could be drawn from it. Hence, it is not included.

Figure 7. Empirical relation between n₂/T₂ (vertical axis) and μ (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Theory

In this section, we will show that several empirical results of the previous section can also be found theoretically in a continuous Lotkaian framework. Here, we do not make a distinction between “old” and “all” publications; a similar remark holds for citedness. We use the following notations:

· T: total number of publications

· μ: average number of citations per publication

· h: h-index

· n: total number of noncited publications

We work in a Lotkaian framework, where the density of publications with citation density j is given by

with C > 0 > 1, j ≥ 1. It is well-known that

(see e.g., Egghe, 2005), and if > 2,

where A denotes the total number of citations received by these T publications. Hence,

or, equivalently:

In Egghe and Rousseau (2006), we proved the following formula for the h-index:

This formula corresponds to a concave function, which explains the shape we empirically found in Figure 1.

Combining Equations 6 and 7 yields:

Equation 7 connects the three indicators T, μ and h. This leads to a partial explanation of Figures 1 and 2, given in Proposition 1.

Proposition 1. If μ and T are strictly increasing then h is strictly increasing. The same conclusion holds when one of the two parameters is constant and the other one is strictly increasing.

Proof. This result readily follows from Equation 8 and the fact that for μ > 1 (see Equation 5) is a strictly increasing, positive function of μ.

Similarly, we have Proposition 2.

Proposition 2. If μ and T are strictly decreasing then h is strictly decreasing. The same conclusion holds when one of the two parameters is constant and the other one is strictly decreasing.

This is proved in the same way as Proposition 1. Details are left to the reader.

We approximate the number of noncited papers, denoted as n, with the expression:

A referee remarked that Equation 8, in fact, approximates the number of papers with one citation rather than zero citations. This is correct because j ≥ 1: Lotkaian systems do not include the sources that produce 0 items (here, the uncited or never-cited papers). The theoretical analysis here is thus concerned with lowly cited rather than uncited publications. The main reason for using a Lotkaian system instead of other distributions that do include the zero is simplicity. We admit that Equation 8 is, at best, a crude approximation, but our approach has the advantage of yielding relatively simple formulae (see also the final section).

Alternatively, in the empirical part, we could have studied the total number of lowly cited publications, instead of the noncited ones. By lowly cited publications we mean those with at most one citation (or at most 2 citations). We are convinced that for lowly cited publications very similar graphs would have emerged.

Equation 8 boils down to calculating:

now for j ≥ 0. It is easily seen that

(10)

But, by (3), we have:

(11)

Using Equation 5, Equation 11 becomes:

(12)

which shows the connection between the three indicators T, μ and n. Finally,

(13)

which brings us to proposition 3.

Proposition 3. If μ is strictly increasing, then n/T is strictly decreasing.

Proof. This result follows from Equation 14 because is positive and strictly decreasing for μ > 1.

Proposition 3 explains the decreasing trend observed in Figure 7. We formulate two other propositions.

Proposition 4. If μ is strictly increasing and T is strictly decreasing then n is strictly decreasing. If one of the two variables (μ or T) is constant, then the same conclusion holds.

Proposition 5. If μ is strictly decreasing and T is strictly increasing then n is strictly increasing. If one of the two variables (μ or T) is constant, then the same conclusion holds.

The proofs follow along the same lines as the proof of Proposition 3, using Equation 13.

Proposition 5 partially explains Figure 4. The next proposition partially explains Figures 5 and 6.

Proposition 6. If μ is strictly decreasing and h is strictly increasing, then n is strictly increasing. If one of the two variables (μ or h) is constant, then the same conclusion holds.

Proof. By Equation 8 and because is a strictly increasing, positive function of μ we see that T is strictly increasing. Combining this result with Proposition 5 yields that n is strictly increasing.

Clearly, other relationships can be found, but as we do not need them in this article, this is left to the reader.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Conclusions and Open Problems

7. Conclusions and Open Problems

This article shows that the group of Fields medalists (mathematics) and Nobel Prize laureates in physics, chemistry, and physiology or medicine have a sizable fraction of never-cited publications. Although, in the present article, we have not investigated “average” scientists, we hypothesize that their fraction of never-cited publications is of similar proportions (further research should confirm this hypothesis). In this regard, top scientists do not seem exceptional. A similar observation can be made for other relations between the number of publications, the average number of citations per publication and the h-index. Empirically we found (omitting indices for reasons of simplicity):

where the symbols ↗ and ↖ stand for strictly increasing, respectively, strictly decreasing. The most remarkable result is the fact that when h increases also n increases, which is because of the fact that, empirically, n and h are increasing functions of T.

These results are partially confirmed in the theoretical section. We explicitly acknowledge the concern of one referee that the law of Lotka does not contain the zero-citation case and that it cannot be used to explain the experimental findings in this paper on never-cited publications. Although we evidently agree that the law of Lotka does not contain the zero-citation case, we do not agree that the law of Lotka cannot be used here. Lotka's law starts from one citation and hence includes the lowly cited publications—in these cases, the law of Lotka has high values (there are many publications with 1, 2, … citations), just like the case of zero citations (there are many publications with zero citations). In this article, we use regularities of the law of Lotka for lowly cited publications to provide a heuristic partial explanation of the empirical results obtained for uncited publications. The fact that the approximation can partially explain several empirical regularities obtained in this paper, illustrates that the approximation is not without merit. We agree that it would be interesting to explore how the indicators studied here are interrelated in a framework that is based on a distribution that does include the 0, such as a Lomax or Pareto type 2 distribution. We leave this as an open problem.

Our results do not explain in a concrete way why Nobel laureates and Fields medalists write relatively many publications that are never-cited. We only show that their publication-citation pattern follows the usual distributional lines as, e.g., explained by Lotka's law. Further explanations are needed, which would require the help of experts in the respective fields and/or of Nobel laureates themselves.

Another interesting idea would be to focus on highly cited publications instead of lowly cited ones, and their relation with some of the indicators used here. Of course, one could also study other scientific fields besides the ones we explored.

Finally, it would be interesting to compare the top scientists studied here with “average” ones to discover how general the phenomena studied are. It seems likely that the number and fraction of never-cited publications of an average scientist is at least as high as those of a Nobel laureate and Fields medalist. Indeed, as recently remarked by Danell (2011, p. 51): “Highly cited authors tend to write the highly cited articles, but all authors can write uncited articles.”

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

References

1. Top of page

2. Abstract

3. Introduction

4. Methodology

5. Empirical Results

6. Theory

8. References

· Burrell, Q.L. (2002). Will this paper ever be cited? Journal of the American Society for Information Science and Technology, 53(3), 232–235.

Direct Link:

o Abstract

o Full Article (HTML)

o PDF(99K)

o References

· Burrell, Q.L. (2005). Are “sleeping beauties” to be expected? Scientometrics, 65(3), 381–389.

o Web of Science® Times Cited: 14

· Danell, R. (2011). Can the quality of scientific work be predicted using information on the author's track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.

Direct Link:

o Abstract

o Full Article (HTML)

o PDF(309K)

o References

· Egghe, L. (2005). Power Laws in the information production process: Lotkaian informetrics. Oxford (UK): Elsevier.

· Egghe, L., & Rousseau, R. (2006). An informetric model for the Hirsch index. Scientometrics, 69(1), 121–129.

o Web of Science® Times Cited: 87

· Garfield, E. (1980). Premature discovery or delayed recognition – Why? Current Contents, #21 (pp. 5–10). Retrieved from http://garfield.library.upenn.edu/essays/v4p488y1979-80.pdf

· Gl nzel, W., Debackere, K., Thijs, B., & Schubert, A. (2006). A concise review on the role of author self-citations in information science, bibliometrics and science policy. Scientometrics, 67(2), 263–277.

o CAS

· Gl nzel, W., Schlemmer, B., & Thijs, B. (2003). Better late than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–86.

o CAS

· Hirsch, J.E. (2005). An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.

o Web of Science® Times Cited: 781

o PubMed,

o CAS,

· Liu, YX., Ravichandra Rao, I.K., & Rousseau, R. (2009). Empirical series of journal h-indices: The JCR category Horticulture as a case study. Scientometrics, 80(1), 59–74.