|||
SCI/EI文献数据融合软件设计与实现 Design and Application of Data Fusion Software on Papers Indexed By SCI and EI |
摘要 [目的] 设计一款具有SCI/EI数据库文献数据查重和数据融合功能的软件.[应用背景] 帮助分析人员获得来自SCI/EI数据库的文献融合数据集, 更好地满足微观学科情报分析对灵活构建多来源期刊文献数据集的需求.[方法] 利用两种自动算法和一种半自动算法实现SCI/EI文献数据的准确查重, 在对两者的全记录字段进行深入微观文本分析的基础上实现数据融合.[结果] 可自动标记SCI/EI文献数据的重复记录并生成查重后的融合数据表.[结论] 有效解决两个不同期刊文献数据源的统一分析数据集构建问题. | |
关键词 :查重, 融合, EI, SCI, 软件设计 | |
Abstract: [Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases. |
Key words:Duplicate checking Data fusion EI SCI Software design
基金资助:
本文系中国科学院文献情报中心青年人才领域前沿项目"学科化知识服务辅助工具优化设计"(项目编号:青1209)的研究成果之一.
通讯作者:于健 E-mail: yuj@mail.las.ac.cn E-mail: yuj@mail.las.ac.cn全文pdf下载链接:http://www.infotech.ac.cn/CN/abstract/abstract3977.shtml
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-23 16:49
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社