武夷山分享 http://blog.sciencenet.cn/u/Wuyishan 中国科学技术发展战略研究院研究员;南京大学信息管理系博导

博文

诺奖得主论文也不一定被引用哦 精选

已有 8927 次阅读 2011-11-19 07:35 |个人分类:科学计量学研究|系统分类:科研笔记| 论文

诺奖得主论文也不一定被引用哦

武夷山

 

JASIST杂志2011年第8期发表比利时著名科学计量学家、普赖斯奖得主Leo Egghe及其伙伴合写的文章,Thoughts on Uncitedness: Nobel Laureates and Fields Medalists as Case Studies(关于“未被引用”的思考:以诺贝尔奖和菲尔兹奖得主为例)。

他们研究了总共75位诺贝尔奖和菲尔兹奖得主的论文被引数据,发现即使对于这个科学精英群体,也有10%以上的论文从未被引用过。另外,他们研究了各类指标与H指数的相关性,得出了有意思的结果:这些科学精英的H指数与其未被引文献量是高度相关的!

他们还试图用洛特卡定律对“科学精英的论文也有相当比例未获引用”这一现象进行了部分说明。

R. DanellJASIST杂志2011年第1期发表的文章中说了一句似乎是废话的话:

“高被引作者往往能写出高被引的文章,任何人都能写出不被引用的文章”。

这个“任何人” ,当然包括诺奖得主和菲尔兹奖得主这样的科学精英。

原文如下:(http://onlinelibrary.wiley.com/doi/10.1002/asi.21557/full

Abstract

1.Top of page

2.Abstract

3.Introduction

4.Methodology

5.Empirical Results

6.Theory

7.Conclusions and Open Problems

8.References

Contrary to what one might expect, Nobel laureates and Fields medalists have a rather large fraction (10% or more) of uncited publications. This is the case for (in total) 75 examined researchers from the fields of mathematics (Fields medalists), physics, chemistry, and physiology or medicine (Nobel laureates). We study several indicators for these researchers, including the h-index, total number of publications, average number of citations per publication, the number (and fraction) of uncited publications, and their interrelations. The most remarkable result is a positive correlation between the h-index and the number of uncited articles. We also present a Lotkaian model, which partially explains the empirically found regularities.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Introduction

1.Top of page

2.Abstract

3.Introduction

4.Methodology

5.Empirical Results

6.Theory

7.Conclusions and Open Problems

8.References

We were surprised to notice that a winner of the Fields medal (the highest award in mathematics, named after John Charles Fields and awarded by the Fields Institute) had a non-negligible percentage of uncited publications. Checking some other awardees revealed that this was not an exception. For some of them, this percentage turned out to be more than 10%, excluding editorials or book reviews. This took us by surprise because we expected that top scientists, and especially mathematicians, write only publications dealing with difficult and important problems, which, once solved, lead to high quality and, hence, highly cited publications. Assuming that a publication by such a visible author is not his best work, even then one expects that because of his status, this work would not be ignored. In other words, one expects the Matthew Effect comes into play. This observation led to an investigation of uncitedness among outstanding researchers. More particularly, we looked at Fields medalists and Nobel Prize winners.

We investigated the field of mathematics (Fields medal winners: in principle four winners every 4 years) and Nobel Prize winners in the fields of physics, chemistry and physiology or medicine (one, two, or three winners a year). It became immediately clear that even among these eminent scientists having many uncited publications was quite common. In most cases, we found more than 10% uncited publication, and this over all studied fields. Of course, an uncited publication may, in principle, gain citations at some time in the future. This is related to the phenomenon of Delayed Recognition (Garfield, 1980; Gl nzel, Schlemmer, & Thijs, 2003). Publications that remain uncited for a prolonged period of time and subsequently receive several citations are known as Sleeping Beauties (van Raan, 2004; Burrell, 2005). In other words, such an—as yet—uncited publication is not necessarily a “never” cited publication; see the methodological part in the next section.

This phenomenon (uncited publications of top researchers) has apparently not yet been addressed in the literature, although Gl nzel, Debackere, Thijs, & Schubert (2006, p. 267) note in passing: “The fact that a document is less frequently cited or even (still) uncited several years after publication provides information about its reception by colleagues but does not reveal anything about its quality or the standing of its author(s) in the community. Uncited papers by Nobel Prize winners may just serve as an example.”

Data were collected during the period October-November 2010 using Thomson Reuters' Web of Science (WoS). While collecting the total number of publications and the number of uncited publications, we also collected the readily available average number of citations per publication and the author's h-index (Hirsch, 2005).

Next, we studied the scatter plots resulting from the relations between any two of the above-mentioned indicators. We especially looked for increasing or decreasing relationships. This revealed (details are given in the next section) an increasing relationship between the total number of publications and the h-index, the h-index and the average number of citations per publication, between the number of uncited publications and the total number of publications, and between the number of uncited publications and the h-index. Decreasing relationships were found between the number of uncited publications and the average number of citations per publication and between the fraction of uncited publications and the average number of citations per publication.

In the third section, we examine how these indicators are related in a continuous Lotkaian system. The Lotka system can be considered a first approximation of the reality that publications are cited in a very skewed way. In this setting, one can prove most of the decreasing and increasing relationships that were found empirically (sometimes needing an extra condition). This yields a partial explanation (“partial',” because we assume Lotkaian systems) of the above findings. Although the Lotkaian system has some drawbacks (it lacks the 0 as frequency and is only an approximation of reality), we believe that it is a good approximation that still allows for a heuristic explanation of the relations found: The mathematics of more intricate models quickly grows too complicated.

This article ends with conclusions, open problems, and other suggestions for further research.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Methodology

1.Top of page

2.Abstract

3.Introduction

4.Methodology

5.Empirical Results

6.Theory

7.Conclusions and Open Problems

8.References

We obtained the list of recent Fields medalists from the website of the International Mathematical Union (http://www.mathunion.org/general/prizes/fields/prizewinners/) and recent Nobel laureates in physics, chemistry, and physiology or medicine from the website of the Nobel Prize committee (http://nobelprize.org/). We restricted the number of laureates to between 15 and 26 per discipline to find clear relations between any of the two indicators mentioned above and to obtain manageable clouds of points. Bringing all fields together already yields clouds of 75 points. In practice, we included 18 Fields medalists (mathematics) between 1990 and 2006, 16 Nobel laureates in physics between 2004 and 2010, 15 Nobel laureates in chemistry between 2004 and 2010, and 26 Nobel laureates in physiology or medicine between 1999 and 2010. Scientists with very common names were not included as it was too difficult to collect correct data. We consider it acceptable to delete a few names and think that this does not bias the data set that we used. Citation data collection took place during the period October-November 2010 from WoS (including proceedings).

Using the advanced search facility, queries were performed as follows. First, we did a search on the author's name followed by the first initial and an asterisk. This often revealed a second initial so that a specific query could be made. If such a specific query was possible, we searched for the author with one or two initials (no asterisk anymore), otherwise, we just continued with the original result. The result was then limited to “possible” subfield categories. This list was then “analyzed” for institutes. The result showed the institute or institutes where the scientist worked during his or her career and, more important, revealed homonyms working in the same field. These were deleted.

The following data were collected:

·            T1: the total number of publications

·            n1: the total number of uncited publications

·            μ: the average number of citations per publication

·            h: the h-index

The average number of citations per publication was obtained from Wos' citation report. The number n1 (the total number of uncited publications) may refer to many recent publications. Further on, we will, however, consider only uncited publications published before the year 2006, because it is very unlikely that they will gain any citations later on. These publications will be called “never-cited” publications.

Of course, no one can be absolutely certain that these will never be cited, but the time period used (2005 or older) guarantees that most of these publications will indeed be never cited. To verify this claim, we examined all publications (n=332) published in 1990 in the five journals that were ranked highest in the Journal Citation Reports category Biology (ranked according to the impact factor). It was found that 70 publications had not yet been cited after 5 years; of these, only four (5.7%) gained citations in later years. Moreover, their citation numbers are rather low: 1, 1, 2, and 8, respectively (in January 2011). This small case study illustrates how extraordinary it is for publications to gain a first citation after more than 5 years. Indeed, as shown theoretically by Burrell (2002), the longer a publication remains uncited, the less likely it is to ever gain a citation. We therefore conclude that virtually all “never-cited” publications will indeed never be cited.

The never-cited publications constitute the real objective of our study. Therefore, we also collected the following data, by appropriately limiting publication years:

T2: the total number of publications published strictly before 2006

n2: the total number of never-cited publications, i.e., those publications included in T2 the were uncited at the moment of data collection.

We will study not only the never-cited indicators n2 and n2/T2 (the fraction of never-cited publications) in relation to the other indicators but also the relations between h, T1, T2 and μ. Note that we do not use versions of h or μ that are restricted to publications before 2006. We found out that the results are the same qualitatively and even quantitatively as there is almost no difference between the two “μ”-versions and the h-index just stays unchanged in almost all cases (because recent publications usually do not contribute to the h-index). Results are described in the next section.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Empirical Results

1.Top of page

2.Abstract

3.Introduction

4.Methodology

5.Empirical Results

6.Theory

7.Conclusions and Open Problems

8.References

Figure 1 depicts the relation between h and T2 (Figure 1). The corresponding scatter plot relating h and T1 is basically identical and hence not shown. For all fields these plots show an increasing relationship and a roughly concave shape. They correspond to the results shown in (Liu, Rao, & Rousseau, 2009) for horticulture journals. In the next section we will show how this shape can be explained assuming a Lotkaian distribution.

Figure 1. Empirical relation between T2 (horizontal axis) and h (vertical axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

The relation between h and μ is depicted in Figure 2. Although Figure 2 is more scattered than the previous figure, we can still see the (expected) increasing relation between h and μ. This will be partially explained in the next section.

Figure 2. Empirical relation between h (vertical axis) and μ (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Now we come to the main topic of this contribution, namely, uncitedness, or better never-citedness. Figure 3 shows box plots of the relative number of uncited publications per field. These illustrate that the median lies around 10% to 20% of papers that are uncited. We now turn to the relations between uncitedness and other indicators. Figure 4 clearly shows the increasing relation between n2 (total number of never-cited publications) and T2 (total number of publications published strictly before 2006). This is an interesting result: The more publications an author has, the more never-cited publications (in general). This scientometric observation, which we will partially explain in the next section, provides a rationale for the fact that also highly visible scientists, such as Nobel laureates, can have many never-cited publications. If we assume that some percentage of all publications will remain uncited, then an increasing relation between number of never-cited publications and total number of publications automatically follows. Of course, this is an explanation based solely on descriptive statistics. A complete explanation would have to take the content and potential impact of these never-cited publications into account; this is, however, beyond the scope of the present article.

Figure 3. Box plot of relative uncitedness (n2/T2) per field (a: mathematics, b: chemistry, c: physics, d: physiology or medicine). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Figure 4. Empirical relation between n2 (vertical axis) and T2 (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Figure 5 shows the relation between n2 and μ. Although quite scattered, we may infer, at least visually, that this relation is decreasing. This visual observation is, however, not confirmed statistically, because the correlation coefficient is not significantly different from zero. So, we must admit that a decreasing trend is weak at best. Yet, on logical grounds, such a decreasing trend seems to be expected: The higher the (absolute) number of never-cited publications, the lower the average number of citations per publication (in general). In the next section, it will be shown that a decreasing relation is also expected in the Lotkaian model.

Figure 5. An empirical relation between μ (vertical axis) and n2 (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Figure 6 depicts an increasing relation between n2 and h. At first sight, this might be surprising because an increasing h-index implies that publications are more cited and this should decrease n2. However, this is not the case. Uncited publications never contribute to the h-index; hence, there exists no direct relationship between the h-index and the number of uncited publications. The increasing relationship is because of the fact that both indicators are correlated with T: there exists an increasing relation between h and T2 (Figure 1) and an increasing relation between n2 and T2 (Figure 4). Hence, based on these facts, the increasing relation shown in Figure 6 is not surprising. Further explanations will be given in the sequel.

Figure 6. An empirical relation between n2 (vertical axis) and h (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

We further investigated the relation between the fractions of never-cited publications (fraction with respect to T2): n2/T2 and T2, n2, μ and h. Only weak relations could be found. We show only (Figure 7) the cloud of points depicting the relation between n2/T2 and μ. This shows a (weak) decreasing relation, which will also be proved in the next section. A referee suggested also showing the relation between n2/T2 and n2. However, the resulting cloud of points is very scattered, and no conclusions could be drawn from it. Hence, it is not included.

Figure 7. Empirical relation between n2/T2 (vertical axis) and μ (horizontal axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Theory

1.Top of page

2.Abstract

3.Introduction

4.Methodology

5.Empirical Results

6.Theory

7.Conclusions and Open Problems

8.References

In this section, we will show that several empirical results of the previous section can also be found theoretically in a continuous Lotkaian framework. Here, we do not make a distinction between “old” and “all” publications; a similar remark holds for citedness. We use the following notations:

·            T: total number of publications

·            μ: average number of citations per publication

·            h: h-index

·            n: total number of noncited publications

We work in a Lotkaian framework, where the density of publications with citation density j is given by

  • (1)

with C > 0 > 1, j ≥ 1. It is well-known that

  • (2)

(see e.g., Egghe, 2005), and if > 2,

  • (3)

where A denotes the total number of citations received by these T publications. Hence,

  • (4)

or, equivalently:

  • (5)

In Egghe and Rousseau (2006), we proved the following formula for the h-index:

  • (6)

This formula corresponds to a concave function, which explains the shape we empirically found in Figure 1.

Combining Equations 6 and 7 yields:

  • (7)

Equation 7 connects the three indicators T, μ and h. This leads to a partial explanation of Figures 1 and 2, given in Proposition 1.

Proposition 1. If μ and T are strictly increasing then h is strictly increasing. The same conclusion holds when one of the two parameters is constant and the other one is strictly increasing.

Proof. This result readily follows from Equation 8 and the fact that for μ > 1 (see Equation 5) is a strictly increasing, positive function of μ.

Similarly, we have Proposition 2.

Proposition 2. If μ and T are strictly decreasing then h is strictly decreasing. The same conclusion holds when one of the two parameters is constant and the other one is strictly decreasing.

This is proved in the same way as Proposition 1. Details are left to the reader.

We approximate the number of noncited papers, denoted as n, with the expression:

  • (8)

A referee remarked that Equation 8, in fact, approximates the number of papers with one citation rather than zero citations. This is correct because j ≥ 1: Lotkaian systems do not include the sources that produce 0 items (here, the uncited or never-cited papers). The theoretical analysis here is thus concerned with lowly cited rather than uncited publications. The main reason for using a Lotkaian system instead of other distributions that do include the zero is simplicity. We admit that Equation 8 is, at best, a crude approximation, but our approach has the advantage of yielding relatively simple formulae (see also the final section).

Alternatively, in the empirical part, we could have studied the total number of lowly cited publications, instead of the noncited ones. By lowly cited publications we mean those with at most one citation (or at most 2 citations). We are convinced that for lowly cited publications very similar graphs would have emerged.

Equation 8 boils down to calculating:

  • (9)

now for j ≥ 0. It is easily seen that

  • (10)

But, by (3), we have:

  • (11)

Using Equation 5, Equation 11 becomes:

  • (12)

which shows the connection between the three indicators T, μ and n. Finally,

  • (13)

which brings us to proposition 3.

Proposition 3. If μ is strictly increasing, then n/T is strictly decreasing.

Proof. This result follows from Equation 14 because is positive and strictly decreasing for μ > 1.

Proposition 3 explains the decreasing trend observed in Figure 7. We formulate two other propositions.

Proposition 4. If μ is strictly increasing and T is strictly decreasing then n is strictly decreasing. If one of the two variables (μ or T) is constant, then the same conclusion holds.

Proposition 5. If μ is strictly decreasing and T is strictly increasing then n is strictly increasing. If one of the two variables (μ or T) is constant, then the same conclusion holds.

The proofs follow along the same lines as the proof of Proposition 3, using Equation 13.

Proposition 5 partially explains Figure 4. The next proposition partially explains Figures 5 and 6.

Proposition 6. If μ is strictly decreasing and h is strictly increasing, then n is strictly increasing. If one of the two variables (μ or h) is constant, then the same conclusion holds.

Proof. By Equation 8 and because is a strictly increasing, positive function of μ we see that T is strictly increasing. Combining this result with Proposition 5 yields that n is strictly increasing.

Clearly, other relationships can be found, but as we do not need them in this article, this is left to the reader.

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

Conclusions and Open Problems

1.Top of page

2.Abstract

3.Introduction

4.Methodology

5.Empirical Results

6.Theory

7.Conclusions and Open Problems

8.References

This article shows that the group of Fields medalists (mathematics) and Nobel Prize laureates in physics, chemistry, and physiology or medicine have a sizable fraction of never-cited publications. Although, in the present article, we have not investigated “average” scientists, we hypothesize that their fraction of never-cited publications is of similar proportions (further research should confirm this hypothesis). In this regard, top scientists do not seem exceptional. A similar observation can be made for other relations between the number of publications, the average number of citations per publication and the h-index. Empirically we found (omitting indices for reasons of simplicity):

where the symbols ↗ and ↖ stand for strictly increasing, respectively, strictly decreasing. The most remarkable result is the fact that when h increases also n increases, which is because of the fact that, empirically, n and h are increasing functions of T.

These results are partially confirmed in the theoretical section. We explicitly acknowledge the concern of one referee that the law of Lotka does not contain the zero-citation case and that it cannot be used to explain the experimental findings in this paper on never-cited publications. Although we evidently agree that the law of Lotka does not contain the zero-citation case, we do not agree that the law of Lotka cannot be used here. Lotka's law starts from one citation and hence includes the lowly cited publications—in these cases, the law of Lotka has high values (there are many publications with 1, 2, … citations), just like the case of zero citations (there are many publications with zero citations). In this article, we use regularities of the law of Lotka for lowly cited publications to provide a heuristic partial explanation of the empirical results obtained for uncited publications. The fact that the approximation can partially explain several empirical regularities obtained in this paper, illustrates that the approximation is not without merit. We agree that it would be interesting to explore how the indicators studied here are interrelated in a framework that is based on a distribution that does include the 0, such as a Lomax or Pareto type 2 distribution. We leave this as an open problem.

Our results do not explain in a concrete way why Nobel laureates and Fields medalists write relatively many publications that are never-cited. We only show that their publication-citation pattern follows the usual distributional lines as, e.g., explained by Lotka's law. Further explanations are needed, which would require the help of experts in the respective fields and/or of Nobel laureates themselves.

Another interesting idea would be to focus on highly cited publications instead of lowly cited ones, and their relation with some of the indicators used here. Of course, one could also study other scientific fields besides the ones we explored.

Finally, it would be interesting to compare the top scientists studied here with “average” ones to discover how general the phenomena studied are. It seems likely that the number and fraction of never-cited publications of an average scientist is at least as high as those of a Nobel laureate and Fields medalist. Indeed, as recently remarked by Danell (2011, p. 51): “Highly cited authors tend to write the highly cited articles, but all authors can write uncited articles.”

Jump to… Top of page Abstract Introduction Methodology Empirical Results Theory Conclusions and Open Problems References

References

1.    Top of page

2.    Abstract

3.    Introduction

4.    Methodology

5.    Empirical Results

6.    Theory

7.    Conclusions and Open Problems

8.    References

·          Burrell, Q.L. (2002). Will this paper ever be cited? Journal of the American Society for Information Science and Technology, 53(3), 232–235.

Direct Link:

o    Abstract

o    Full Article (HTML)

o    PDF(99K)

o    References

·          Burrell, Q.L. (2005). Are “sleeping beauties” to be expected? Scientometrics, 65(3), 381–389.

o    CrossRef,

o    Web of Science® Times Cited: 14

·          Danell, R. (2011). Can the quality of scientific work be predicted using information on the author's track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.

Direct Link:

o    Abstract

o    Full Article (HTML)

o    PDF(309K)

o    References

·          Egghe, L. (2005). Power Laws in the information production process: Lotkaian informetrics. Oxford (UK): Elsevier.

·          Egghe, L., & Rousseau, R. (2006). An informetric model for the Hirsch index. Scientometrics, 69(1), 121–129.

o    CrossRef,

o    Web of Science® Times Cited: 87

·          Garfield, E. (1980). Premature discovery or delayed recognition – Why? Current Contents, #21 (pp. 5–10). Retrieved from http://garfield.library.upenn.edu/essays/v4p488y1979-80.pdf

·          Gl nzel, W., Debackere, K., Thijs, B., & Schubert, A. (2006). A concise review on the role of author self-citations in information science, bibliometrics and science policy. Scientometrics, 67(2), 263–277.

o    CrossRef,

o    CAS

·          Gl nzel, W., Schlemmer, B., & Thijs, B. (2003). Better late than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–86.

o    CrossRef,

o    CAS

·          Hirsch, J.E. (2005). An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.

o    CrossRef,

o    PubMed,

o    CAS,

o    Web of Science® Times Cited: 781

·          Liu, YX., Ravichandra Rao, I.K., & Rousseau, R. (2009). Empirical series of journal h-indices: The JCR category Horticulture as a case study. Scientometrics, 80(1), 59–74.

o    CrossRef,

o    Web of Science®

·          van Raan, A.F.J. (2004). Sleeping beauties in science. Scientometrics, 59(3), 461–466.

o    CrossRef

 



https://blog.sciencenet.cn/blog-1557-509533.html

上一篇:氢弹之父谈美国科学技术(续)
下一篇:(代发)血路
收藏 IP: 219.142.138.*| 热度|

35 井然哲 张彦斌 武京治 骆小红 吉宗祥 刘全慧 吕喆 钟炳 吴飞鹏 李毅伟 孙学军 张志东 魏瑞斌 吴吉良 曾新林 刘用生 王涛 刘广明 王芳 曹聪 赵星 许培扬 刘玉仙 柏舟 陈湘明 陈凯华 陈筝 李泳 曾杰 李永丹 wgq3867 arpku crossludo liangqiang dulizhi95

该博文允许注册用户评论 请点击登录 评论 (26 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-12-22 15:57

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部