物理学科馆员分享 http://blog.sciencenet.cn/u/PSLibrarian Life Is Art, Sharing Is Best

博文

SCI/EI文献数据融合软件设计与实现

已有 3282 次阅读 2015-1-12 18:30 |个人分类:数据分析工具|系统分类:论文交流| SCI, 软件设计, 融合, 查重

SCI/EI文献数据融合软件设计与实现

Design and Application of Data Fusion Software on Papers Indexed By SCI and EI

摘要 

[目的] 设计一款具有SCI/EI数据库文献数据查重和数据融合功能的软件.[应用背景] 帮助分析人员获得来自SCI/EI数据库的文献融合数据集, 更好地满足微观学科情报分析对灵活构建多来源期刊文献数据集的需求.[方法] 利用两种自动算法和一种半自动算法实现SCI/EI文献数据的准确查重, 在对两者的全记录字段进行深入微观文本分析的基础上实现数据融合.[结果] 可自动标记SCI/EI文献数据的重复记录并生成查重后的融合数据表.[结论] 有效解决两个不同期刊文献数据源的统一分析数据集构建问题.


关键词 查重, 融合, EI, SCI, 软件设计  
Abstract

[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.

Key wordsDuplicate checking  Data fusion  EI  SCI  Software design  
基金资助:

本文系中国科学院文献情报中心青年人才领域前沿项目"学科化知识服务辅助工具优化设计"(项目编号:青1209)的研究成果之一.

通讯作者:于健 E-mail: yuj@mail.las.ac.cn   E-mail: yuj@mail.las.ac.cn        

全文pdf下载链接:http://www.infotech.ac.cn/CN/abstract/abstract3977.shtml



https://blog.sciencenet.cn/blog-260374-858821.html

上一篇:SCI论文作者甄别软件设计及应用

1 王启云

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2022-5-27 07:42

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部