||
一:解决的问题:
1:用户查询分类
(1):Specific queries :例如.Does Netscape support the JDK 1.1 code-signing API
(2):Broad-topic queries:例如. Find information about the Java programming language
2:问题
(1):对于Specific queries ,由于相关term太少,很难查找出来想要的结果
(2):对于 Broad-topic queries,相关的查询结果太多,很难找到the authoritative or definitive pages
3:基于文本分析的局限
大多数popular网页时缺乏有效充足的自描述的,例如www.harvard.edu是一个authoritative网页,但是往往是其他的大量网页会包括harvard,www.harvard.edu反而不会有很多。
二:问题的解决
1:利用link structure
2:提出Authorities and Hubs 概念
(1):Authorities:pages that are recognized as providing significant, trustworthy, and useful information on a topic.
(2):Hubs:Hubs are index pages that provide lots of useful links to relevant content pages (topic authorities)
(3):In-degree. Number of pointers to a page and is one simple measure of authority
(4):Out-degree. Number of pointers from a page to other pages
3:Roor Set and Base Set
(1):p.查询语句
(2):Root Set Rp.包含查询结果的网页
(3) :Base Set:扩展Root Set
1:all pages linked to by pages in root set
2:all pages that link to a page in root set
4:子图构建算法
5:计算Hubs和Authorities
(1):对于每一个$p\in S$,有
Authority score : ap (vector a)
Hub score : hp (vector h)
(2):Initialize all ap = hp = 1
(3):Maintain normalized scores:
$\sum_{p\in S} (a_p)^2 = 1$
$\sum_{p\in S} (h_p)^2 = 1$
(4):迭代算法
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-15 22:59
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社