论文部分内容阅读
网络新闻焦点识别及演化跟踪对新闻检索、新闻推荐和舆情分析等起着非常重要的作用.当前的新闻焦点识别方法存在着焦点识别不清、演化跟踪偏斜以及不能捕获焦点报道的强度分布等问题.通过深入分析新闻报道的特点及LDA(Latent Dirichlet Allocation)主题模型原理,把报道文档发布的时间信息引入LDA模型中,提出一种基于焦点和时间联合建模的新闻焦点演化跟踪方法 DST-LDA(Dynamic Subtopic and Time based Topic Model).该模型避免了以往跟踪算法严重依赖时间分割的局限性,能够产生文档-焦点θ、焦点-词汇φ及焦点-时间π三个分布矩阵,通过选择新闻焦点的特征词和特征时间,高效地分类出新闻焦点并识别出各焦点持续的时间分布及报道力度.本文在4个新闻数据集上分别对DST-LDA算法进行了实验验证,并与其它主流算法进行了对比.实验证明:本文算法在新闻焦点演化跟踪方面达到了良好效果.
Focusing on network news and tracking evolvement plays a very important role in news retrieval, news recommendation and public opinion analysis etc. The current news focus recognition methods have the following problems: unclear recognition, skewness of evolution tracking and intensity distribution of inability to capture the focus report Problem.Through in-depth analysis of the characteristics of news reports and LDA (Latent Dirichlet Allocation) theme model principle, the time information released by the report documents is introduced into the LDA model, and a focus evolution tracking method DST- LDA (Dynamic Subtopic and Time based Topic Model). This model avoids the limitation that the previous tracking algorithm relies heavily on time partitioning, and can generate three distribution matrices: document-focus θ, focus-vocabulary φ and focus-time π. Focus on feature words and feature time, effectively sort out the news focus and identify the focus of the sustained time distribution and coverage.This paper tested the DST-LDA algorithm on four news data sets, and with other mainstream Algorithm is contrasted.Experimental results show that this algorithm is in the tracking of news focus evolution To good results.