||
Title:SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity |
SEISMIC:一个为预测推文流行性自励点处理模型 |
Social networking websites allow users to create and share content. Big information cascades of post re-sharing can form as users of these sites re-share others’ posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post’s resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour. |
社交网络允许用户创建和分享内容。推文在用户和他的好友之间反复分享会产生大的信息级联。理解这种级联效应行为的核心挑战就是预测信息的爆发,就是一个简单的帖子会被大量用户转发。 本文聚焦预测某个给定的推文的最终分享数。我们在自励点过程的理论上构建了一个统计模型,该模型可以实现准确的预测。我们的模型不要求训练和昂贵的特征工程。他导致一个简单而且有效的计算公式可以实时地回答问题:给定一个到目前的推文分享历史,我们当前对该帖子的分享数预测是什么?该推文是否过了爆发期?哪个推文将会是最多分享的。 我们用一个月的万全的推文来测试我们的模型,在预测准确度上有一个强的改进。观察一个小时之后,我们的模型在预测最终规模水平上仅仅有15%的相关错误。
|
本文牛在什么地方?我觉得不是最终的准确度,那个东西你闭着眼睛都能猜到80%,为什么?绝大部分不会被转发嘛。其他的稍微总结总结规律就可以猜中50%以上。真正让我觉得牛的是:一、人家聚焦在如此基础的问题上。就研究转发的个数。二、能拿到tweet的全部数据。牛!
|
可以参考这篇文章的写法,看看如何表述一个模型的。
|
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-6-4 21:59
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社