作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (5): 188-195. doi: 10.19678/j.issn.1000-3428.0068887

• 先进计算与数据处理 • 上一篇    下一篇

基于加权局部密度的双超球支持向量机算法

王梦珍, 张德生, 张晓   

  1. 西安理工大学理学院, 陕西 西安 710054
  • 收稿日期:2023-11-22 修回日期:2024-03-01 出版日期:2025-05-15 发布日期:2024-05-06
  • 通讯作者: 王梦珍,E-mail:2387517846@qq.com E-mail:2387517846@qq.com
  • 基金资助:
    国家自然科学基金面上项目(12171388)。

Twin-Hypersphere Support Vector Machine Algorithm Based on Weighted Local Density

WANG Mengzhen, ZHANG Desheng, ZHANG Xiao   

  1. College of Science, Xi'an University of Technology, Xi'an 710054, Shaanxi, China
  • Received:2023-11-22 Revised:2024-03-01 Online:2025-05-15 Published:2024-05-06

摘要: 使用一对超球面描述样本分布的双超球支持向量机(THSVM)算法没有考虑样本数据的密度信息,容易受噪声干扰,对所有特征赋予相同权重,忽略了不同特征对分类结果的影响。针对上述问题,提出了基于加权局部密度的双超球支持向量机(WLDTHSVM)算法。首先,利用信息增益计算每个特征的权重,并将特征权重应用到欧氏距离以及核函数的计算中,降低了不相关或弱相关的特征对样本相似性的影响;其次,利用特征加权的欧氏距离,构造一种新的加权局部密度函数,不仅考虑了样本点近邻的类别信息,而且考虑不同特征对样本间距离的影响,将归一化加权局部密度与误差项结合来增强模型的抗噪声干扰能力;最后,用特征加权的决策函数判定测试样本点的所属类别。在人工数据集和UCI数据集上对WLDTHSVM算法的可行性与有效性进行验证,实验结果表明,WLDTHSVM算法与支持向量机(SVM)、孪生支持向量机(TWSVM)、THSVM等对比算法相比,在11个UCI数据集上平均准确率最高可提升2.76百分点,在含噪数据集上具有较好的分类表现。

关键词: 支持向量机, 局部密度, 特征权重, 信息增益, 核函数

Abstract: The Twin-Hypersphere Support Vector Machine (THSVM) algorithm determines the distribution of samples using a pair of hyperspheres. It assigns an identical weight to each feature and ignores the density information of samples and the influence of different features on the classification of the samples. It is also sensitive to noise. To address these issues, this paper proposes a THSVM algorithm based on weighted local density (WLDTHSVM). First, the information gain is used to calculate the weight of each feature, and the calculated weights of all features are used to compute the Euclidean distances and kernel functions. This step reduces the impact of irrelevant or weakly relevant features on the similarity of samples. Second, the weighted local density function is defined based on the weighted feature Euclidean distances. The weighted density function not only considers the class information of the nearest neighbors of the sample, but also takes into account the influence of different features on the sample spacing. It combines the normalized weighted local density with the error term to enhance the anti-noise ability of the model. Finally, a weighted feature decision function is proposed to determine the category to which a test sample belongs. The usability and effectiveness of the proposed algorithm are assessed using UCI datasets and two artificial datasets. The experimental results show that, compared with algorithm such as the Support Vector Machine (SVM), Twin Support Vector Machines (TWSVM), THSVM, the WLDTHSVM algorithm has up to a 2.76 percentage points higher average accuracy on 11 UCI datasets, and it has a better classification performance on noisy datasets.

Key words: Support Vector Machine(SVM), local density, feature weighting, information gain, kernel function

中图分类号: