|||
T. Theodosiou et al. / Journal of Biomedical Informatics 44 (2011) 919–926
The basic assumption is that the terms that have almost the same probability of being observed within the results of a query and also in the entire PubMed database (excluding the results of the query) do not contain important information specific to the documents related to the query
基本假设是:一个词在检索结果文献集中被观察到的概率,如果与其在整个数据库(排除掉检索到文献集)中出现的概率无异的话,那么这词就不会包含重要的信息,这个信息的重要性是相对于与检索策略相关文献来说的。
用“jamia[jour]”检索所有发表在美国医学信息学会杂志的文章,结果是这样的:
'jamia[jour]' < 3328 PubMed documents | MeSHy | cite us | code | contact us | BAT cave
会不会太重视罕见词了呢?
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-20 10:15
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社