博文

算法学习（十)：HITS

已有 3570 次阅读 2013-3-15 14:42 |系统分类:科研笔记| 学习

一：解决的问题：

1：用户查询分类

（1）：Specific queries ：例如.Does Netscape support the JDK 1.1 code-signing API

（2）：Broad-topic queries：例如. Find information about the Java programming language

2:问题

（1）：对于Specific queries ，由于相关term太少，很难查找出来想要的结果

（2）：对于 Broad-topic queries，相关的查询结果太多，很难找到the authoritative or definitive pages

3：基于文本分析的局限

大多数popular网页时缺乏有效充足的自描述的，例如www.harvard.edu是一个authoritative网页，但是往往是其他的大量网页会包括harvard，www.harvard.edu反而不会有很多。

二：问题的解决

1：利用link structure

2：提出Authorities and Hubs 概念

（1）：Authorities：pages that are recognized as providing significant, trustworthy, and useful information on a topic.

（2）：Hubs：Hubs are index pages that provide lots of useful links to relevant content pages (topic authorities)

（3）：In-degree. Number of pointers to a page and is one simple measure of authority

（4）：Out-degree. Number of pointers from a page to other pages

3:Roor Set and Base Set

（1）：p.查询语句

（2）：Root Set Rp.包含查询结果的网页

（3）：Base Set：扩展Root Set

1：all pages linked to by pages in root set

2：all pages that link to a page in root set

4:子图构建算法



   5：计算Hubs和Authorities

（1）：对于每一个$p\in S$，有

                Authority score : a_p(vector a)

                Hub score         : h_p(vector h)

（2）：Initialize all a_p = h_p = 1

（3）：Maintain normalized scores:

                  $\sum_{p\in S} (a_p)^2 = 1$

                  $\sum_{p\in S} (h_p)^2 = 1$

（4）：迭代算法

转载本文请联系原作者获取授权，同时请注明本文来自沈成光科学网博客。
链接地址：https://blog.sciencenet.cn/blog-796597-670554.html

上一篇：算法学习（九）：聚类

收藏 IP: 210.30.97.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

博文发布时间已经超过87600小时，评论已关闭。

沈成光

扫一扫，分享此博文

yxzfscg的个人博客分享 http://blog.sciencenet.cn/u/yxzfscg

博文

算法学习（十)：HITS

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

沈成光

全部精选博文导读

相关博文

yxzfscg的个人博客分享 http://blog.sciencenet.cn/u/yxzfscg

博文

算法学习（十)：HITS

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

沈成光

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)