wl2119的个人博客分享 http://blog.sciencenet.cn/u/wl2119

博文

什么是引用

已有 5927 次阅读 2014-6-12 23:22 |个人分类:文献计量理论基础|系统分类:科研笔记| 影响因子, 引用, impact, factor, 自引

一.信息科学和文献计量学中关于引用的定义

Susan Cozzens(1989) "citation is only secondarily a reward system".

Linda C. Smith  "citations are signposts left behind after information has been utilized".

Blaise Cronin defined citation as,"frozen footprints in thelandscape ofscholarly achievement … which bear witness to the passage ofideas"

Blaise Cronin 同时指出问题“If authors can be educated as to the informational role of citations and encouraged to be more restrained and selective in their referencing habits, then it should be possible to arrive at a greater consistency in referencing practice generally.”引用习惯有待规范

Chapman, 也指出了25个引用的问题,he delineated 25 shortcomings, biases, deficiencies, and limitations of citationanalysis .

Wouters(1997) has devoted a large monograph on citationculture and in 1998 .

Leydesdorffhas initiated the discussion about reappraisal ofexisting theories of citation.

Westney(1998), 的定义是citations are nevertheless indicators of scholarly impact: “Despite its flaws, citation analysis has demonstrated its reliability and usefulness as a tool for rankingand evaluating scholars and their publications. No other methodology permitssuch preciseidentification of the individuals who have influenced thought,theory, and practice in world science and technology.”

Glänzel and Schoepflin (1999) have citations morepragmatically interpreted as “oneimportant form of use of scientific information within the framework of documented science communication,” “a formalised account of the information use and can be taken as a strong indicator of reception at this level.”

Weinstock, 1971列出了15个进行引用的原因:

1.  Paying homage topioneers

2.  Giving credit forrelated work (homage to peer)

3.  Identifyingmethodology, equipment, etc.

4.  Providingbackground reading

5.  Correcting one'sown work

6.  Correcting thework of others

7.  Criticisingprevious work

8.  Substantiatingclaims

9.  Alerting toforthcoming work

10.  Providing leadsto poorly disseminated,poorly indexed, or uncited work

11.  Authenticatingdata and classes of facts – physical constants, etc.

12.  Identifyingoriginal publications in which an idea or concept was discussed

13.  Identifyingoriginal publications or other work describing an eponymic concept or term

14.  Disclaiming workor ideas of others (negative claim)

15.  Disputingpriority claims of others (negative homage)

Garfield and Weinstock指出“引用”不仅代表积极作用,还包含了中性和负面意义。常见的负面引用即推翻或驳斥先前的研究结果。

Garfield,文章不被引用的五个原因,

1.The first and most important one is lacking relevance  of the topic. Irrelevant topics are obviouslynot cited.

2.Unawareness is the second reason that  is due to insufficient retrieval of publishedinformation relevant for an author’s research work.  Citations omitted by reason of unawareness aresometimes added by referees reviewing  papersprior to acceptance for publication in a journal. According to Garfield (1986)these  papers not recognised by unawarenessare whether Disregard has little do with bibliographic amnesiadescribed by Garfield.

3.Disregard is simply  a reason that is already beyond the borderlineto unethical communication behaviour. Results  by colleagues relevant for the author’sresearch to be published are this way demonstratively  ignored.

4.The fourth reason is a consequence of obsolescence as anexpression of ‘natural’  obliteration.

5.Finally, the fifth reason occurs rather seldom inscientific literature, as it is an  expressionof the disappearance of  the ‘users’ ofinformation. In order words, literature still  being relevant in the context of the researchtopic is not cited because there are no more  authors who could cite it. We can considersuch topics extinct.

“if a paper receives 5 or 10 citations a year throughout several years after its publication,it is very likely that its content will become integrated into the body of knowledge of the respective subject field; if, on the other hand, no reference is made atall to the paper during 5 to 10 years after publication, it is likely that the results involved do not contribute essentially to the contemporary scientific paradigm system of the subject field in question” (Braunet al., 1985).

Braunet,1985:文章发表后几年内被引5或10次,说明其内容已经整合到该领域知识体中.相反,文章发表后5-10年都未被引用过,说明对该领域科研基本没有贡献.

二.自引

自引分两种:Authorself-citations and journal self-citation.

Journal self-citation , occurs if a paper published in agiven journal is cited by a paper published in the same journal.  a low share of journal self-citations (for instance, < 10%) is, for example, characteristic for review journals (see,Schubertand Braun, 1993)

Glänzel and Schoepflin, 1995: ageing of journalself-citations can significantly differ from that of “foreign” publications.

期刊自引衰退速度与整体被引衰退速度显著不同

整体被引相对平缓,期刊自引峰值明显.

Author self-citation, occurs if an author refers toa ownpaper, that is, if he was the author or one of the co-authors of the citedpaper. The spectrum of these self-citations ranges from the obvious case, if an author refers to his own work, to more hidden forms, if the co-authors of the author in question are citing him or themselves in another paper.

Bonzi and Snyder, 1991, Nederhofet al, 1993认为自引是很自然的事情

MacRoberts and MacRoberts have given a first overview of theunsolved problem of self citations in their critical review on problems ofcitation analysis in 1989.

Snyderand Bonzi(1998) and Aksnes(2002),A self citation occurs whenever the set of co-authors of the citing paper and that of the cited one are not disjoint, that is, if these sets share at least one author .

Glänzelet al. (2003) ,Self-citations and foreign citations proved not to be independent variable. Moreover, the conditional expectation of self-citations for given number of foreign citation could be characterised by asquare-root law. 格兰采尔发现,自引与它引的关系遵循普莱斯的平方根定律

The research also show that low visibility goes with highself-citation shares.

 


Glänzelet al. (2003).distribution of author self-citationsand foreign citations .格兰采尔还研究了自引与他引的衰变规律,自引衰变缓慢.

三.引用影响的影响因素(Factorsinfluencing citation impact

从文章的参考文献可以区分“软硬”程度.

Price (1970) used the share of references not older thanfive years in all references of a journal to distinguish between hard science,soft science, technology, and non-science (Price Index). Moedredefined this index in 1989 as share of most recent references for individual papers (PriceIndex per paper).

1981, Line analysed structure of social science literature,and investigated what makes social science different.

Egghe(1997) and Glänzel and Schoepflin(2001) have publishedfurther studies on these topics.

Glänzel and Schoepflin(1999). The percentage of referencesto serials characterises typical differences in the communication behaviour inthe sciences, social sciences and humanities . In the social sciences and, evenmore, in the humanities a considerable partof cited information isoriginated in non-science literature. In case of engineering, the target is inpart outside the scientific community; information is used, e.g., for theadvancement in technology.

社会科学、人文学,相当多一部分参考文献来自于非科学领域的信息.工程学,部分参考文献来自科学领域以外.

Citation impact主要受到一下五种因素的影响

1. the subject matter and within the subject, the “level ofabstraction”

2. the paper’s age

3. the paper’s “social status” (through the author(s) andthe journal)

4. the document type

5. the observation period

Citation patterns are strongly influenced by subject characteristics. Citation measures are therefore without normalisation – notappropriate for cross-field comparisons.

期刊的社会地位和研究的时间窗口是引用影响的重要影响因素

四.引用影响的测量指标

影响因子

The most important measure is the Impact Factor (Garfield,1979)

IFn(J) = cn/(pn-1+pn-2)

Immediacy Index : IIn(J) =cn/pn

有关影响因子的讨论从未停止过,讨论其弊病的研究如下:

1.  There is no normalisation for reference practices and traditions in the different fields anddisciplines (Pinskiand Narin, 1976).

2.  "There is no distinction in regard to the nature and merits of the citing journals" (Tomer,1986).

3.  There is a bias infavour ofjournals with large papers, e.g. review journals tend to have higherimpact factors (Pinskiand Narin, 1976).

4.  Citation frequencyis subject to age bias (Asai, 1981, Rousseau, 1988, Glänzeland Schoepflin,1995, Moedet al., 1998).

5.  There is no indication of the deviations from this statistic (see, for instance, Schubert andGlänzel, 1983).

6.  The average time for a journal article frompublication to peak in citations is not always twoyears, or as Garfield(1986b) writes "if we change the two-year based periodused to calculate impact, some type of journals are found to have higher impacts".(cf. also Glänzeland Schoepflin, 1995, Moedet al., 1998)

7.  One single measure might not be sufficient todescribe citation patterns of scientific journals.

8.  The concept of citable document is not operationalised adequately. As a result, journal impactfactors published in ISI’s Journal Citation Reports are inaccurate for a number of journals (Moedand van Leeuwen, 1995, 1996).

9.  In the calculationof JCR impact factors, errors are made due to incorrect identification of(cited) journals, for instance for the journal Angewandte Chemie InternationalEdition (Braunand Glänzel, 1995, van Leeuwenet al, 1997).

1978,Lindseyintroduced the  Corrected Quality Ratiodefined as

(number of citations)3/2/(number of publications)1/2

由于缺乏可解释性所以没有应用.

Allison has given an interesting but undeservedlyneglectedapproach in 1980, use statistical function

(standard deviation – mean value)/(mean value)2

as an inequality measure of distributions of scientific productivityand citation impact .

Schubertand Glänzel(1983) have studied the statisticalreliability of journal Impact Factors.

Asai(1981) introduces an Adjusted Impact Factor which countsthe weighted sum of citations over a period of four years instead of one yearasin case of the original Impact Factor.

Pinski and Narin(1976) provides for each journal a size-indepedentInfluence Weightdetermined by the number of the journal’s citations and references.

Geller(1978) suggests a 'corrected' influence weight thatcould be interpreted as the probability that a given journal will be cited fromthe other journals.

(Glänzeland Schoepflin, 1995 and Moed et al., 1998), Thisthree-year citation window proved to be a good compromise between therelatively fast obsolescence of technology oriented literature, of most areasin life sciences, of experimental physics literature, on one hand, and of theslowlyageing theoretical

and mathematical topics in physics, on the other.

(Glänzel and Moed(2002), An overview of applications,problems and limitations of the Impact Factor.

Braun ET al., 1985. Scientometirc Indicators:

Schubert & Braun, 1986. Scientometrics:

不能使用影响因子作为期刊排名和期刊筛选的标准。

然而,期刊影响因子可以作为研究评估的辅助工具。尤其是研究作者或期刊被引情况时作为统计学上的工具。


一份期刊中所有文章的引用分布是偏态的.

The distribution of citations among documents in a journal appearsto be skewed (Seglen, 1997).

因此将影响因子这种平均被引指标作为偏态分布模型的代表值就不恰当了。

(Glanzel and Schoepflin, 1994), One of the parameters suggestedis the percentage of uncited articles in a journal.

UNCF (The uncitedness factor ) for journal X in year T is definedas the percentage of articles, notes and reviews published in X in years T-2 andT-1 and not cited in year T, relative to the total number of articles, notes andreviews published in X in the years T-2 and T-1.

H.F. MOED 研究发现UNCFIF之间有显著的负相关性Spearman  rank 系数-0.96Pearson  (linear) 系数-0.55.



研究引用层级越高,样本和统计分布的异质性越高。因此在国家和超国家水平研究引用模式,仅仅一个指标如平均引用率,就无法反应整体状况。Thetwo selected journals Trends in Genetic sand American Journal of Respiratoryand Critical Care Medicine have almost the same mean citation rate, namely 7.0 and6.9, respectively (publication period 1995-1996,3-year citation windows). Theshapes of the two distributions is, however, characterized by differentfeatures.例如Trendsin Genetics and American Journal of RespiratoryCritical Care Medicine两份期刊,1995-1996年发表的文章在3年的引用观测窗口中平均引用率非常接近7.06.9,然而引用分布模式完全不同。


Highly cited publication characetrising the ‘high end’ ofcitation impact.

To determine the authors or papers with most citationsusually a fixed number (Vlachy, 1986) of items or a certain quantile is selectedfroma rank statistic (for instance, ‘the top decile papers’ (Hofer Gee andNarin, 1986)). In some lists, papers orauthors are considered highly cited ifthe number of citations received by them simply exceeds a given fixed value(for instance, “papers cited more than 400 times” (Garfield)). These a prioricriteria reflect neither field-specific peculiarities nor deviations caused bythe particular choice of publication and citation periods.

“选择被引频次超过400的文章”这是一种先验标准,然而它与期刊主题无关,又与引用观察窗口无关。

Garfield: “in some fields with fewer researchers, 100citations may qualify a work”). 加菲尔德认为研究者较少的领域,被引100次以上可以认为是高质量文章。

According to Glänzel and Schubert(1992) thresholdsdetermining highly cited papers should meet the following criteria: 格兰采尔和舒伯特认为这个被引阈值应符合如下标准才认为是高被引:

1.  They should begreat enough to guarantee that the selected items form a real elite. On theother hand, it should be small enough to obtain a statistical population of eliteitems.

2.  They should beflexible in order to compensate for the unequal publication and citationbehaviour in different science fields and to allow "fine tuning" inorder to adjust the size of selected groups.

3.  The thresholdshould be time invariant with respect to the citation window.

A paper highly cited if the number of citations it hasreceived during a given period exceeds

ks(j)= s·max(1,xj)

xj is the average citation rate of the reference standard.

In verbal terms, a paper is consideredhighly cited if it hasreceived at least s citations, and the number of citations amounts at leasts-times the reference standard. max(1,xj)是固定值,it filters noise and makes surethat the mean citation rate of highly cited papers increases with risingthresholds。但s值仍然是一个主观系数,引文时间窗口过大,s值对于区分高被引无法起到作用,但这一公式仍是解决选取高被引的最好办法。


“被引不代表质量,而是被科学社群接受的指数”


IF的标准化指标(归一化指标)

MOCR(Mean Observed Citation Rate), is Citations per Publications(CPP).

MECR(Mean Expected Citation Rate), average citation rate ofall papers published in the same journal in the same year.

RCR (The Relative Citation Rate) is defined as the ratio ofthe Citation Rate per Publication to the Expected Citation Rate perPublication, that is,

RCR= MOCR/MECR ( Schubert and al., 1989).

FECR= Σfi/n,

where fi is the weighted average of the impact of those subfiedlds to which the ith paper was assigned.

SNIP

SNIP(SourceNormalized Impact per Paper), Moed2010年基于Scopus数据库推出。从名字可以看出,来源标准化的篇均影响指标。创造的初衷是为了在Scopus数据库内跨学科比较期刊的影响。SNIP是“源归一化引用指标”的重要代表。

SNIP=RIP/RDCP

RIP篇均粗影响:期刊前3年内所有文章的被引平均值;

RDCP数据库相对引用潜力:期刊所属主题在数据库内的相对引用潜力;

RDCP=Rjdb/Mdb

Rjdb前10年j期刊在数据库内所有文章的参考文献平均值;Mdb数据库中该主题领域所有期刊Rj的均值;因此可以说RDCP体现的是j刊在所属主题内的位置。

1.去除主题专有的引用行为差异(跨学科评价)

2.考虑主题领域因素测量期刊的引用

3.继承Garfield的“引用潜力”思想——篇均参考文献数/某一主题领域

4.使用该指标,首先要界定“主题领域”

5.同时必须考虑数据库的覆盖范围

Moed也指出了SNIP的缺陷

1.SNIP没有考虑期刊的类型,综述型期刊文章reference多,SNIP会偏高

2.SNIP没考虑某一领域文献的增长趋势问题

3.SNIP没有考虑自引问题

4.SNIP没有考虑跨学科引用程度的问题

5.SNIP的年度波动过大

 

H指数

J. E. HIRSCH(2005),  introduced the h-index as a performancemeasure. Most of the literature derived from this paper (nearly 200 citingitems until the end of 2008).

L. BORNMANN, H. D. DANIEL(2007), “What do we know about theh index?” reviewed the h-index.

HIRSCH(2005) has recently suggested  a new  indicator  for the  assessment  of the  research  performance of individual scientists. This measure –called h-index –is designed forapplication at the micro  level,  and measures  both  publication activity and citation impact. According to his  definition, “a  scientist  has index h if h of  his  or  herNp papers  have  at least h-citations each and the other (Np-h) papers have <=h citationseach”.

Hirsch’s idea, which appears to be tracked back to SirArthur Eddington (EDWARDS,2005).H指数思想追溯到Arthur Eddington。一经问世受到物理学届和科学计量学届的关注。(BALL, 2005)(DINIZ BATISTA et  al., 2005; POPOV,  2005)(BORNMANN &DANIEL,  2005; BRAUN et  al., 2005, GLÄNZEL, 2006), VAN RAAN(2005).后者也表明科学表现不能仅用一个指标代表。

Glänzel (2006), analyzed the mathematical properties of the h-indexirrespective of its utility aspects.

Using Gumbel’s extreme value theory, he concludedthat in the class of distributions obeying an asymptotical power law(“asymptotically Paretian distributions” – the most typical class ofdistributions in scientometrics, among other fields) 格兰采尔根据耿贝尔的极值定理,推导出文章的被引率分布规律遵循“渐进帕累托分布”-渐进幂律——科学计量学最典型的类分布。也是h指数的数学特征。

Let X be a random variable. In our case X represents thecitation rate of a paper. The probability distribution of X is denoted by pk =P(X = k) for every k 0 and the cumulative distribution function is denoted by F(k) = P(X < k).Put Gk = G(k) := 1–F(k) = P(X k).

Gumbel’s r-th characteristic extreme value (ur) is then definedas ur := G 1(r/n) = max {k: G(k) r/n},where n is a given sample with distribution F. The theoretical h-index (H) canconsequently be defined as

H := max {r: ur r} = max {r:max {k: G(k) r/n} r}.

If there exists such index rso that ur =rthen we have obviouslyH := rand we can write H:= uH.


文章总数与总被引数与h指数有很强的关联度:

1.The h-index is proportional to the (α+1)-th root of thenumber of publications, in the case of Price distributions this results in asquare-root law.

2. The highest citation impact is also a power function ofthe h-index.

3. The property Hc/H2 =  constant for  Paretian  distributions with α>1  confirms oneimportant finding by HIRSCH (2005). Note that this constant is dependant of theparameter α; different distributions thus result in different ratiosHc/H2.

4. The  number  of h-citations  is  a function  of  the square  of  the h-index  and  a constant dependent of the Pareto exponentand in the Price case this coefficient becomes a logarithmic function of thenumber of publications.

5. The relatively  low  number of  papers  of most  individuals  does, however,  not allow reliablestatistical analysis of extreme values (cf. GLÄNZEL & SCHUBERT, 1988).

h指数是既简单又高度内涵的指标,可以用于任何水平的评价,从国家,机构,主题,期刊甚至是作者。兼顾发表文章数量和被引频次两种要素。h指数是强大的累积指标,单单文章发表数量的增加不影响h指数的变化,因为未被引和低被引文章无法影响h的变化。且h指数与其他引用影响指标高度相关。

The h-index is an extremely simple and comprehensible composite indicator which can be applied to any level of aggregation but favorably to the assessment of research performance of individual scientists.

– This indicator combines citation impact with publication activity measures.

The h-index is a robust cumulative indicator. Increasing publications alone does not have immediate effect on this index.

The h-index measures “durable” performance, not only single peaks.

Any document type can be included since the h-index is not changed by addinguncited papers.

The h-index correlates with other bibliometric indicators of ‘significance’. (WolfgangGlänzel, Science Focus, 2006, 1 (1), 10-11)

当然h指数也有很多缺点

Theh-indexputs newcomers at a disadvantage since both publication output andobserved citation rates will be relatively low.  对于科研新人来说h值非常低

The index allows scientists to rest on their laurels (‘your papers do the jobfor you’) since the number of citation received might increase even if no new papersare published. 允许吃科研老本

This indicator is based on rather long-term observations. Therefore, it does notshow decay in a scientist’s carrier by the same reason as above.  指标没有衰减期(僵尸)

The index is not independent of subject-specific communication behaviour andcannot be normalised in a similar manner as other publication- or citation basedindicators.  无相应归一化指标

An important problem arises in finding appropriate reference standards for comparisoneven in the same subject field. 即使主题范围内比较也较困难

The indicator is suited for the micro level but at higher levels of aggregationthere are more versatile indicators. The application of appropriate indicators setsinstead of one single measure can provide a more adequate and multifacetedpicture of reality.

By definition, the h-index cannot exceed the number of publications. Thus it putssmall but highly-cited paper sets at a disadvantage (‘small is not beautiful’).对于小而强的文章不公平

According to my experience, the h-indexis certainly useful for identifying outstandingperformance but it seems to fail in assessing fair and good performance. Thereason can be found in the skewed rank-frequency distributions which arecharacterized by extremely long tails with many ties (Glänzel and Persson,2005, Glänzel, 2006). h指数适合鉴别杰出的科研贡献,但缺乏公平性。



h指数与文章数量和影响因子有显著的关联



h指数与n1/3IF2/3显著正相关,r2>0.95.


……




https://blog.sciencenet.cn/blog-713101-802888.html

上一篇:影响因子、被引半衰期以及归一化引用指标
下一篇:科研合作计量指标
收藏 IP: 193.190.253.*| 热度|

2 陈敬朴 Editage意得辑

该博文允许注册用户评论 请点击登录 评论 (1 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-17 01:46

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部