Safemetrics分享 http://blog.sciencenet.cn/u/jerrycueb 以勤奋、谦虚、严谨、规范、持久的习惯和态度做安全科学研究。 'Wonder en is gheen Wonder'

博文

Bibliometric data visualization

已有 4289 次阅读 2017-7-3 00:48 |个人分类:知识图谱|系统分类:科研笔记

Instructions
The procedure of collecting bibliometric data happens in Web of Knowledge (formerly known as ISI Web of Knowledge). WOS is an academic citation indexing and search service, which is combined with web linking and is provided by Thomson Reuters. WOS covers the sciences, social sciences, arts and humanities.
There are other well known databases such as Google-scholar and Scopus but at the time of our study, WOS was the best candidate to retrieve the required data for our particular case study.
Retrieving the data

From the main search web page of web of knowledge/science, specify the term for search and for having the citation in a result, the database should be selected as a “Web of Science core collection”.

Using refine results will be helpful to narrow down to the most relevant articles records. This step should better to be taken in order to reduce the data amount which will cause better and smoother visualization experience.


In the result page for retrieving the data, use the option “Save to other file formats”


The maximum records allowed for downloading is 500 each time, so it means if the aim is to retrieve 10,000 citation information the process should be repeated for 20 times. To be clear, the next time you are getting the data in records field you should write 501 - 1000 and so on.

Please note that for now the import in our custom R scripts only works for Tab-delimited files, so you need to download this format from the Web of Knowledge site.


Using Analysis Scripts

First, you need to install R Studio and the R Programming language. You can follow the instructions at the R Studio installation page. Gephi, which is used for visualization, can be downloaded from a separate installation site. If you are having problems with Gephi and a recent OS X installation, these instructions should work.

A folder named "literature review" is provided for download end of this article, so the main task is to download and unzip this folder, put your raw data in to input folder and run "exploration.Rmd" (note that you have to install some required packages before hand, more explanation is provided below) then click on "Knit HTML" button. After couple of seconds depending on the amount of raw data, you will end up with a graphic description of your data in a HTML format plus, the required files for visualizing your data in Gephi application.

So for preparing your Rstudio program for once in all, follow these steps:

R Setup

R Studio needs very little setup. You just need to install the packages using the install.packages(“packagename”)command from R console or use R studio’s GUI option. As you can see from the picture below, you need to install these following packages: "splitstackshape", "reshape", "plyr" and "stringr".

Rstudio1.JPG


The second step one needs to pay attention is the working directory. Set your working directory to the project base directory.

Screenshot 2014-11-26 10.18.31.png

Recommended: After you downloaded the "literature review" zip file and extract it in your local machine (the folder is provided for download end of this article), set the working directory as your "Literature review" folder. If you haven't installed the required packages before hand, do it so, otherwise you will receive bunch of error referring the missing required packages.  After you managed to install the packages, just select initiate "Knith HTML" in "exploration.Rmd" script.


Screenshot 2014-11-26 10.33.55.png

Every thing should go smooth and after couple of seconds (depending on your raw data amount) you will end up with generated nodes and edges in out put folder.


Visualization with Gephi

This example continues from the previous section and assumes you have the author_edges.csv file ready and Gephi installed. Start up Gephi and create a new blank project. Then you ready to load the data. Using the image below, go to the Data Laboratory tab, select Import Spreadsheet option and load the CSV file. In this case choose the Edges table option and semicolon as the separator option.


Untitled.png

Next you can explore and visualize the data using Gephi. A good guide for getting started is the Gephi Quickstart Tutorial. Because the datasets are large in many cases, it is a good idea to use Gephi filtering options. Giant Component filter is a good starting point, because it leaves the largest connected component of the graph visible for analysis. In other words eliminates the small, random notes that are not connected to the larger whole.


Import author network

  1. New project

  2. Data laboratory

  3. Import spreadsheet, Nodes table, author_nodes.csv

  4. Change Freq and TotalTimesCited to Integer

  5. Import spreadsheet, Edges table, author_edges.csv

  6. Chage YearPublished and TimesCited to Integer. Uncheck Create missing nodes box


Import citation network to Gephi

  1. New project

  2. Data laboratory

  3. Import spreadsheet, Nodes table, citation_nodes.csv

  4. Change YearPublished to Integer

  5. Import spreadsheet, Edges table, citation_edges.csv

  6. Change YearPublished to integer


Possible analysis steps, citation network

  1. Layout => ForceAtlas2

    1. Try Disseminate hubs after a while

  2. Statistics => PageRank

  3. Partition => Nodes => Origin

  4. Ranking => Size = > InDegree

    1. Possibly increase max to ~40-50

  5. Ranking => Color => PageRank




https://blog.sciencenet.cn/blog-554179-1064249.html

上一篇:corrplot
下一篇:[下载]科学知识图谱指南
收藏 IP: 123.139.66.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-3-29 03:15

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部