论文部分内容阅读
统计句法分析利用概率评价模型评价每棵候选句法树存在的可能性 ,选择概率值最高的候选句法树作为最终的句法分析结果 .因此 ,统计句法分析的核心是一个概率评价模型 ,而各种概率评价模型的本质区别主要在于它们分别是根据上下文中的哪些特征来赋予句法树概率的 .在统计句法分析研究领域 ,虽然已经提出了大量的概率评价模型 ,然而 ,不同的模型用到了不同类型的特征 .如何评价这些特征类型对于句法分析的作用呢 ?针对以上的问题 ,本研究为统计句法分析提出了一种特征类型的分析模型 ,该模型可以从信息论的角度量化地分析不同类型的上下文特征对于句法结构的预测作用 .其基本思想是利用信息论中熵与条件熵的度量来显示一个特征类型是否抓住了预测句法结构的主要信息 .如果加入某个特征类型之后当前句法结构的不确定性 (熵 )明显下降 ,则认为该特征类型抓住了上下文中影响句法结构的某些主要信息 .特征类型分析的信息论模型利用预测信息量、预测信息增益、预测信息关联度以及预测信息总量四种度量从不同的侧面量化地分析各种特征类型及特征类型组合对于当前目标的预测作用 .实验以 Penn Tree Bank为训练集 ,将上下文中不同的特征类型对于句法分析规则的预测作用进行了系统的量化分析 ,得出了一系列有关不同
Statistical syntax analysis uses the probabilistic evaluation model to evaluate the probability of each candidate syntax tree, and selects the candidate syntax tree with the highest probability value as the final result of syntactic analysis.Therefore, the core of statistical syntax analysis is a probabilistic evaluation model, The essential difference between the evaluation models is that they give the probability of the syntax tree according to which features in the context respectively.In the field of statistical syntax analysis, although a large number of probability evaluation models have been proposed, different models use different types of How to evaluate the effect of these feature types on syntactic analysis? In view of the above problems, this study proposes a feature type analysis model for statistical syntactic analysis, which can quantitatively analyze different types of contextual features from the perspective of information theory The basic idea is to use the measurement of entropy and conditional entropy in information theory to show whether a feature type captures the main information of predictive syntactic structure.If the uncertainty of the current syntactic structure after adding a feature type (Entropy) decreased significantly, It is considered that the feature type grasps some of the main information in the context that affects the syntactic structure.The information theory model of feature type analysis makes use of the four kinds of metrics of prediction information amount, prediction information gain, prediction information relevance degree and prediction information amount from different aspects Quantitatively analyzing the predictions of various feature types and feature type combinations on the current target.Experimental experiments using Penn Tree Bank as a training set systematically quantitatively analyzed the predictive effects of different feature types in the context on the syntax analysis rules A series of different about