论文部分内容阅读
基于文本写作常采用一个意思由多个不同写法的单词来表述,研究词义文本分类法被用来替代使用关键词分类算法以提高分类准确率.分析wordNet内Synset架构,认为一个兼顾词义以及词义间关系的词义文本分类系统可应用到网页分类中.该系统同时注意到固定的文本类别结构以及结构内不断增长的文件数目间的区别,加入了基于类别信息聚类方法的类别拓展的功能.仿真实验证明,该分类系统与现有的基于语义的分类系统相比,在分类准确度性能上能提高13%.基于类别信息类聚的文本拓展功能与采用基于相似度的类聚方法的系统相比获得了一个质量更高的新增类别.
Text-based writing often use a meaning by a number of words written in different wording, the study of thesaurus classification is used to replace the use of keyword classification algorithm to improve the classification accuracy.Analysis of WordNet within the Synset architecture, that a compromise between word and meaning The relational semantic text categorization system can be applied to the web page classification.The system not only takes into account the difference between the fixed text category structure and the growing number of files in the structure but also adds the function of category expansion based on the category information clustering method. Experiments show that the classification system can improve the classification accuracy performance by 13% compared with the existing semantic-based classification system.Based on the category information clustering, the text expansion function is similar to the system phase based on the similarity-based clustering method Than to obtain a new category of higher quality.