论文部分内容阅读
为克服ID3算法应用于字音转换时,运算速度慢、易受数据稀疏问题影响的缺点,提出了一种面向字音转换的新决策树算法“有条件维数扩展算法”(conditional mixincrementing algorithm,CMI)。在ID3的基础上,CMI使用先验发音学知识指导下的互信息量方法选择决策属性,并引入2个参量,最小可信度与最大支持数,控制叶子节点。实验结果表明,CMI简化了运算过程,降低了稀疏数据对所生成决策树预测性能的影响。相同实验条件下,CMI在运算速度上比ID3提高了3.3倍,在决策树的预测正确率上提高了11.6%。
To overcome the shortcomings of the ID3 algorithm applied to the phonetic conversion, the calculation speed is slow and susceptible to data sparseness problem, proposed a new decision tree algorithm for the pronunciation conversion “conditional mixincrementing algorithm” CMI). On the basis of ID3, CMI uses the mutual information method under the guidance of prior phonetics to select decision attributes and introduces two parameters, minimum confidence and maximum support, to control leaf nodes. Experimental results show that CMI simplifies the computation process and reduces the impact of sparse data on the prediction performance of the generated decision tree. Under the same experimental conditions, CMI increased by 3.3 times faster than ID3 and increased by 11.6% in the prediction accuracy of decision trees.