||
此数据集包含来自4所大学的网页,相应的标记分类为教授、学生、项目或其他网页。
This dataset contains webpages from 4 universities, labeled with whether they are professor, student, project, or other pages.
本项目的目的是开发一个反映万维网内容的概率性、象征性知识库。
To develop a probabilistic, symbolic knowledge base that mirrors the content of the world wide web.
如果成功,这将使网络上的文本信息以计算机可理解的形式提供,从而实现更复杂的信息检索和问题解决方案。
If successful, this will make text information on the web available in computer-understandable form, enabling much more sophisticated information retrieval and problem solving.
项目思路:
*学习分类器从文本中预测网页类型。
* Learning classifiers to predict the type of webpage from the text.
*您能通过使用图形模型,利用页面之间相互指向的相关性来提高预测的准确性吗?
* Can you improve accuracy by exploiting correlations between pages that point to each other using graphical models?
数据集下载地址及相关论文网站:
http://www.cs.cmu.edu/~webkb/
更多精彩文章请关注微信号:
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-23 22:01
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社