论文部分内容阅读
传统的特征选择方法基本上是以精度为优化目标,没有充分考虑数据样本类别分布倾斜性,在数据分布不平衡的数据集上性能表现不理想。在不平衡数据集上通过有放回的抽样方法独立地从数据集大类样本集中随机抽取多个样本子集,使每次随机抽取的样本数量与小类样本数量一致,然后将各抽取的样本子集分别与小类样本集组合成多个新的训练样本集。对多个新样本集的特征子集以集成学习的方式采用投票机制进行投票,数据集的最终特征子集以得票数目超过半数的特征共同组合而成。在UCI不平衡数据集上的实验结果显示,提出的方法表现出了较好的性能,是一种能够处理不平衡问题的有效特征选择方法。
The traditional method of feature selection is based on the precision as the optimization objective. The tilt of data sample type distribution is not fully taken into account. The performance of the data set with unbalanced data is not satisfactory. On the unbalanced dataset, multiple subsets of samples are randomly selected from the large dataset sample set by the method of putting back the sample, so that the number of randomly selected samples and the number of small sample samples are the same. Then, The sample subset is combined with the small sample set into a number of new training sample sets. Voting mechanisms are used to vote for feature subsets of multiple new sample sets in an integrated learning manner. The final feature subset of the data set is composed of more than half of the votes. The experimental results on UCI unbalanced datasets show that the proposed method shows good performance and is an effective feature selection method that can deal with the imbalance problem.