cjpnudt的个人博客分享 http://blog.sciencenet.cn/u/cjpnudt

博文

[读论文]-KDD15-040 一个为预测推文流行性自励点处理模型

已有 2402 次阅读 2015-12-14 14:57 |个人分类:读论文|系统分类:科研笔记

TitleSEISMIC:  A Self-Exciting Point Process Model for Predicting Tweet Popularity

SEISMIC:一个为预测推文流行性自励点处理模型

Social  networking websites allow users to create and share content. Big information  cascades of post re-sharing can form as users of these sites re-share others’  posts with their friends and followers. One of the central challenges in  understanding such cascading behaviors is in forecasting information  outbreaks, where a single post becomes widely popular by being reshared by  many users.

In this  paper, we focus on predicting the final number of reshares of a given post.  We build on the theory of self-exciting point processes to develop a  statistical model that allows us to make accurate predictions. Our model  requires no training or expensive feature engineering. It results in a simple  and efficiently computable formula that allows us to answer questions, in  real-time, such as: Given a post’s resharing history so far, what is our  current estimate of its final number of reshares? Is the post resharing  cascade past the initial stage of explosive growth? And, which posts will be  the most reshared in the future?

We validate  our model using one month of complete Twitter data and demonstrate a strong  improvement in predictive accuracy over existing approaches. Our model gives  only 15% relative error in predicting final size of an average information  cascade after observing it for just one hour.

社交网络允许用户创建和分享内容。推文在用户和他的好友之间反复分享会产生大的信息级联。理解这种级联效应行为的核心挑战就是预测信息的爆发,就是一个简单的帖子会被大量用户转发。

本文聚焦预测某个给定的推文的最终分享数。我们在自励点过程的理论上构建了一个统计模型,该模型可以实现准确的预测。我们的模型不要求训练和昂贵的特征工程。他导致一个简单而且有效的计算公式可以实时地回答问题:给定一个到目前的推文分享历史,我们当前对该帖子的分享数预测是什么?该推文是否过了爆发期?哪个推文将会是最多分享的。

我们用一个月的万全的推文来测试我们的模型,在预测准确度上有一个强的改进。观察一个小时之后,我们的模型在预测最终规模水平上仅仅有15%的相关错误。

 

 

本文牛在什么地方?我觉得不是最终的准确度,那个东西你闭着眼睛都能猜到80%,为什么?绝大部分不会被转发嘛。其他的稍微总结总结规律就可以猜中50%以上。真正让我觉得牛的是:一、人家聚焦在如此基础的问题上。就研究转发的个数。二、能拿到tweet的全部数据。牛!

 

可以参考这篇文章的写法,看看如何表述一个模型的。

 

 

 

 




https://blog.sciencenet.cn/blog-656867-943325.html

上一篇:[读论文]-SIGIR03-039 基于非负矩阵分解的文档聚类
下一篇:[读论文]-KDD15-041 组织结构图的推断
收藏 IP: 73.210.49.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-4 21:59

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部