The procedure of collecting bibliometric data happens in Web of Knowledge (formerly known as ISI Web of Knowledge). WOS is an academic citation indexing and search service, which is combined with web linking and is provided by Thomson Reuters. WOS covers the sciences, social sciences, arts and humanities. There are other well known databases such as Google-scholar and Scopus but at the time of our study, WOS was the best candidate to retrieve the required data for our particular case study. Retrieving the dataFrom the main search web page of web of knowledge/science, specify the term for search and for having the citation in a result, the database should be selected as a “Web of Science core collection”. Using refine results will be helpful to narrow down to the most relevant articles records. This step should better to be taken in order to reduce the data amount which will cause better and smoother visualization experience. In the result page for retrieving the data, use the option “Save to other file formats” The maximum records allowed for downloading is 500 each time, so it means if the aim is to retrieve 10,000 citation information the process should be repeated for 20 times. To be clear, the next time you are getting the data in records field you should write 501 - 1000 and so on. Please note that for now the import in our custom R scripts only works for Tab-delimited files, so you need to download this format from the Web of Knowledge site. Using Analysis Scripts First, you need to install R Studio and the R Programming language. You can follow the instructions at the R Studio installation page. Gephi, which is used for visualization, can be downloaded from a separate installation site. If you are having problems with Gephi and a recent OS X installation, these instructions should work. A folder named "literature review" is provided for download end of this article, so the main task is to download and unzip this folder, put your raw data in to input folder and run "exploration.Rmd" (note that you have to install some required packages before hand, more explanation is provided below) then click on "Knit HTML" button. After couple of seconds depending on the amount of raw data, you will end up with a graphic description of your data in a HTML format plus, the required files for visualizing your data in Gephi application. So for preparing your Rstudio program for once in all, follow these steps: R SetupR Studio needs very little setup. You just need to install the packages using the install.packages(“packagename”)command from R console or use R studio’s GUI option. As you can see from the picture below, you need to install these following packages: "splitstackshape", "reshape", "plyr" and "stringr". The second step one needs to pay attention is the working directory. Set your working directory to the project base directory. Recommended: After you downloaded the "literature review" zip file and extract it in your local machine (the folder is provided for download end of this article), set the working directory as your "Literature review" folder. If you haven't installed the required packages before hand, do it so, otherwise you will receive bunch of error referring the missing required packages. After you managed to install the packages, just select initiate "Knith HTML" in "exploration.Rmd" script. Every thing should go smooth and after couple of seconds (depending on your raw data amount) you will end up with generated nodes and edges in out put folder. Visualization with Gephi This example continues from the previous section and assumes you have the author_edges.csv file ready and Gephi installed. Start up Gephi and create a new blank project. Then you ready to load the data. Using the image below, go to the Data Laboratory tab, select Import Spreadsheet option and load the CSV file. In this case choose the Edges table option and semicolon as the separator option. Next you can explore and visualize the data using Gephi. A good guide for getting started is the Gephi Quickstart Tutorial. Because the datasets are large in many cases, it is a good idea to use Gephi filtering options. Giant Component filter is a good starting point, because it leaves the largest connected component of the graph visible for analysis. In other words eliminates the small, random notes that are not connected to the larger whole. Import author network
Import citation network to Gephi
Possible analysis steps, citation network
|