论文部分内容阅读
垃圾邮件过滤本质上是一个二类文本分类问题,特征选择是其一个重要的组成部分。针对垃圾邮件过滤问题的特殊性,基于“差分贡献”的思想对文档频数和互信息量这两种传统的特征选择方法进行了改进,设计了新的垃圾邮件过滤特征选择方法。实验结果表明,基于差分贡献的特征选择方法使得垃圾邮件过滤的精度得到了有效的提高。
Spam filtering is essentially a second-class text classification problem, feature selection is an important part of it. In view of the particularity of the spam filtering problem, this paper improves two traditional methods of feature selection such as document frequency and mutual information based on the idea of “differential contribution”, and designs a new spam filtering feature selection method. The experimental results show that the feature selection method based on the difference contribution makes the spam filtering accuracy improve effectively.