最近的Science出版了Dealing with Data的专刊,其中有篇题名Metaknowledge的文章中第一副图介绍了一个领域内的新生、一位知名教授以及计算机对于一篇文章中提取到的信息的不同,很有意思,也值得思考。贴出来,不知道是不是涉及到版权问题,如果有问题我再删除吧。
Fig. 1. Readers vary in the information they extract from an article. A new graduate student perceives a tiny fraction of available information, focusing on familiar authors, terms, references, and institutions. Her evaluation is limited to categorical classification (e.g., of the authors) into known and unknown (“important” and “unimportant”). For comparison she has the small collection of papers she has read. A leading scientist perceives a wealth of latent data, assembling individuals into mentorship relations and locating terms, as well as graphical and mathematical idioms, in historical and theoretical context. His evaluations generate rank orders based on his experience in the field. He can compare a paper to thousands, and searches a large literature efficiently. An appropriately trained computer would complement this expertise with quantification and scale. It can rapidly access quantitative and relational information about authors, terms, and institutions, and order these items along a range of measures. For comparison it can already access a large fraction of the scientific literature—millions of articles and an increasing pool of digitized books; in the future it will scrape further data from Web pages, online databases, video records of conferences, etc.