论文部分内容阅读
[目的/意义]鉴于传统的作者身份识别方法不适用于当前大量涌现的网络文本。综述近年文本作者身份识别的典型方法和关键问题,并进行客观分析和评价,以期为进一步研究提供新的思路。[方法 /过程]分别从应用领域、文体特征选取、作者身份建模和性能评价指标等方面对国内外作者身份识别相关研究现状进行客观分析,梳理相关领域研究发展脉络和趋势。[结果/结论]作者身份识别需要适应短文本、不规范文本、海量、高维和多语种环境,需更具表现和刻画能力的多层面特征和相应的作者身份建模方法,并借助信息检索、机器学习和自然语言处理领域的最新研究成果提高效率和准确率。
[Purpose / Significance] In view of the traditional method of author identification does not apply to the current mass network text. This paper summarizes the typical methods and key problems of text author identification in recent years, and carries out objective analysis and evaluation with a view to providing new ideas for further research. [Methods / Processes] Objectively analyze the status quo of the research on identification of authors both at home and abroad from the aspects of application field, selection of stylistic features, author identity modeling and performance evaluation index, combing the development context and trend of related fields. [Result / Conclusion] Authorship identification needs multi-level features that need to be more expressive and descriptive, as well as appropriate author identity modeling methods that adapt to short texts, nonstandard texts, massively, highdimensional and multilingual environments, Recent research in machine learning and natural language processing improves efficiency and accuracy.