基于加权局部密度的双超球支持向量机算法

doi:10.19678/j.issn.1000-3428.0068887

摘要/Abstract

摘要：

使用一对超球面描述样本分布的双超球支持向量机(THSVM)算法没有考虑样本数据的密度信息, 容易受噪声干扰, 对所有特征赋予相同权重, 忽略了不同特征对分类结果的影响。针对上述问题, 提出了基于加权局部密度的双超球支持向量机(WLDTHSVM)算法。首先, 利用信息增益计算每个特征的权重, 并将特征权重应用到欧氏距离以及核函数的计算中, 降低了不相关或弱相关的特征对样本相似性的影响; 其次, 利用特征加权的欧氏距离, 构造一种新的加权局部密度函数, 不仅考虑了样本点近邻的类别信息, 而且考虑不同特征对样本间距离的影响, 将归一化加权局部密度与误差项结合来增强模型的抗噪声干扰能力; 最后, 用特征加权的决策函数判定测试样本点的所属类别。在人工数据集和UCI数据集上对WLDTHSVM算法的可行性与有效性进行验证, 实验结果表明, WLDTHSVM算法与支持向量机(SVM)、孪生支持向量机(TWSVM)、THSVM等对比算法相比, 在11个UCI数据集上平均准确率最高可提升2.76百分点, 在含噪数据集上具有较好的分类表现。

关键词: 支持向量机, 局部密度, 特征权重, 信息增益, 核函数

Abstract:

The Twin-Hypersphere Support Vector Machine (THSVM) algorithm determines the distribution of samples using a pair of hyperspheres. It assigns an identical weight to each feature and ignores the density information of samples and the influence of different features on the classification of the samples. It is also sensitive to noise. To address these issues, this paper proposes a THSVM algorithm based on weighted local density (WLDTHSVM). First, the information gain is used to calculate the weight of each feature, and the calculated weights of all features are used to compute the Euclidean distances and kernel functions. This step reduces the impact of irrelevant or weakly relevant features on the similarity of samples. Second, the weighted local density function is defined based on the weighted feature Euclidean distances. The weighted density function not only considers the class information of the nearest neighbors of the sample, but also takes into account the influence of different features on the sample spacing. It combines the normalized weighted local density with the error term to enhance the anti-noise ability of the model. Finally, a weighted feature decision function is proposed to determine the category to which a test sample belongs. The usability and effectiveness of the proposed algorithm are assessed using UCI datasets and two artificial datasets. The experimental results show that, compared with algorithm such as the Support Vector Machine (SVM), Twin Support Vector Machines (TWSVM), THSVM, the WLDTHSVM algorithm has up to a 2.76 percentage points higher average accuracy on 11 UCI datasets, and it has a better classification performance on noisy datasets.

Key words: Support Vector Machine(SVM), local density, feature weighting, information gain, kernel function

王梦珍, 张德生, 张晓. 基于加权局部密度的双超球支持向量机算法[J]. 计算机工程, 2025, 51(5): 188-195.

WANG Mengzhen, ZHANG Desheng, ZHANG Xiao. Twin-Hypersphere Support Vector Machine Algorithm Based on Weighted Local Density[J]. Computer Engineering, 2025, 51(5): 188-195.

https://www.ecice06.com/CN/Y2025/V51/I5/188

图/表 9

图1 WLDTHSVM算法几何解释

Fig.1 Geometric interpretation of WLDTHSVM algorithm

图2 WLDTHSVM算法流程

Fig.2 Procedure of WLDTHSVM algorithm

图3 WLDTHSVM和THSVM在Ripley数据集上的模拟结果

Fig.3 Simulation results of WLDTHSVM and THSVM on Ripley dataset

图4 WLDTHSVM和THSVM在Banana数据集上的模拟结果

Fig.4 Simulation results of WLDTHSVM and THSVM on Banana dataset

图5 4种算法在不同噪声比例下的模拟结果

Fig.5 Simulation results of the four algorithms with different noise ratios

参考文献 25

1	CORTES C , VAPNIK V . Support vector networks. Machine Learning, 1995, 20 (3): 273- 297.
2	CHAABANE S B , HIJJI M , HARRABI R , et al. Face recognition based on statistical features and SVM classifier. Multimedia Tools and Applications, 2022, 81 (6): 8767- 8784. doi: 10.1007/s11042-021-11816-w
3	WANG Y, WU Q. Research on face recognition technology based on PCA and SVM[C]//Proceedings of International Conference on Big Data Analytics. Washington D. C., USA: IEEE Press, 2022: 248-252.
4	张裕平, 龚晓峰, 雒瑞森. 基于稀疏化双向二维主成分分析的人脸识别. 计算机工程, 2019, 45 (12): 232- 236. URL
	ZHANG Y P , GONG X F , LUO R S . Face recognition based on sparse two-direction two-dimensional principle component analysis. Computer Engineering, 2019, 45 (12): 232- 236. URL
5	金海波, 赵欣越. 共形预测框架下的高可靠入侵检测算法. 计算机工程, 2022, 48 (7): 130- 140. URL
	JIN H B , ZHAO X Y . High-reliability intrusion detection algorithm under conformal prediction framework. Computer Engineering, 2022, 48 (7): 130- 140. URL
6	DU R Z , LI Y , LIANG X Y , et al. Support vector machine intrusion detection scheme based on cloud-fog collaboration. Mobile Networks and Applications, 2022, 27 (1): 431- 440. doi: 10.1007/s11036-021-01838-x
7	GUPTA D , BORAH P , SHARMA M M , et al. Data-driven mechanism based on fuzzy Lagrangian twin parametric-margin support vector machine for biomedical data analysis. Neural Computing and Applications, 2022, 34, 11335- 11345. doi: 10.1007/s00521-021-05866-2
8	KHEMCHANDANI R , CHANDRA S . Twin support vector machines for pattern classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29 (5): 905- 910. doi: 10.1109/TPAMI.2007.1068
9	文传军, 詹永照, 陈长军. 最大间隔最小体积球形支持向量机. 控制与决策, 2010, 25 (1): 79- 83.
	WEN C J , ZHAN Y Z , CHEN C J . Maximal-margin minimal-volume hypersphere support vector machine. Control and Decision, 2010, 25 (1): 79- 83.
10	PENG X J , XU D . A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Sciences, 2013, 221, 12- 27. doi: 10.1016/j.ins.2012.09.009
11	郭慧, 刘忠宝, 赵文娟, 等. 模糊双超球学习机. 广西大学学报, 2018, 43 (3): 1097- 1102.
	GUO H , LIU Z B , ZHAO W J , et al. A fuzzy twin-hypersphere learning machine. Journal of Guangxi University, 2018, 43 (3): 1097- 1102.
12	EBRAHIMPOUR Z , WAN W , KHOOJINE A S , et al. Twin hyper-ellipsoidal support vector machine for binary classification. IEEE Access, 2020, 8, 87341- 87353. doi: 10.1109/ACCESS.2020.2990611
13	PENG X J , SHEN J D . A twin-hyperspheres support vector machine with automatic variable weights for data classification. Information Sciences, 2017, 417, 216- 235. doi: 10.1016/j.ins.2017.07.007
14	XU Y , RUI G . A twin hyper-sphere multi-class classification support vector machine. Journal of Intelligent and Fuzzy Systems: Applications in Engineering and Technology, 2014, 27 (4): 1783- 1790.
15	TAX D , DUIN R . Support vector data description. Machine Learning, 2004, 54, 45- 66. doi: 10.1023/B:MACH.0000008084.60811.49
16	LI H , WANG H G , FAN W H . Multimode process fault detection based on local density ratio-weighted support vector data description. Industrial and Engineering Chemistry Research, 2017, 56 (9): 2475- 2491. doi: 10.1021/acs.iecr.6b03306
17	邱云志, 汪廷华, 戴小路. 双重特征加权模糊支持向量机. 计算机应用, 2022, 42 (3): 683- 687.
	QIU Y Z , WANG T H , DAI X L . Double feature-weighting fuzzy support vector machine. Computer Application, 2022, 42 (3): 683- 687.
18	RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks. Science, 2014, 344 (6191): 1492- 1496. doi: 10.1126/science.1242072
19	CHA M , KIM J S , BAEK J G . Density weighted support vector data description. Expert Systems with Applications, 2014, 41 (7): 3343- 3350. doi: 10.1016/j.eswa.2013.11.025
20	CHEN Y J , HAO Y T . A feature weighted support vector machine and k-nearest neighbor algorithm for stock market indices prediction. Expert Systems with Applications, 2017, 80, 340- 355. doi: 10.1016/j.eswa.2017.02.044
21	RIPLEY B D , HJORT N L . Pattern recognition and neural networks. Cambridge, USA: MIT Press, 1995.
22	BELETE D M , HUCHAIAH M D . Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. International Journal of Computers and Applications, 2022, 44 (9): 875- 886. doi: 10.1080/1206212X.2021.1974663
23	REZVANI S , WANG X , POURPANAH F . Intuitionistic fuzzy twin support vector machines. IEEE Transactions on Fuzzy Systems, 2019, 27 (11): 2140- 2151. doi: 10.1109/TFUZZ.2019.2893863
24	KE T , LIAO Y , WU M , et al. Maximal margin hyper-sphere SVM for binary pattern classification. Engineering Applications of Artificial Intelligence, 2023, 117, 105615. doi: 10.1016/j.engappai.2022.105615
25	ELEN A , BAS S , KOZKURT C . An adaptive Gaussian kernel for support vector machine. Arabian Journal for Science and Engineering, 2022, 47 (8): 10579- 10588. doi: 10.1007/s13369-022-06654-3

[1]	宋英华, 徐亚安, 张远进. 基于SARIMA-SVM模型的季节性PM_2.5浓度预测[J]. 计算机工程, 2025, 51(1): 51-59.
[2]	陈增照, 王政, 郑秋雨. 基于全范围头部姿态估计的教师注意力识别算法[J]. 计算机工程, 2024, 50(7): 96-103.
[3]	费煜哲, 蔡欣, 赵鸣博, 杨圣豪. 基于隐式表达的服装三维重建[J]. 计算机工程, 2024, 50(5): 220-228.
[4]	江雨燕, 邵金, 李平. 融合自动权重学习的深度子空间聚类[J]. 计算机工程, 2022, 48(8): 77-84,97.
[5]	乔彩彩, 吴成茂, 李昌兴, 王佳烨. 结合隶属度与像素交替引导滤波的鲁棒模糊聚类算法[J]. 计算机工程, 2022, 48(8): 224-233.
[6]	王芙银, 张德生, 肖燕婷. 基于加权共享近邻与累加序列的密度峰值算法[J]. 计算机工程, 2022, 48(4): 61-69.
[7]	王志江, 秦品乐, 柴锐, 武峰, 程一彤, 史玥. 基于深度学习的牙齿嵌塞自动判别方法[J]. 计算机工程, 2022, 48(4): 307-313.
[8]	张晓明, 郑理欣, 王会勇. 基于图排序和最大信息增益的领域实体抽取方法[J]. 计算机工程, 2022, 48(12): 140-149.
[9]	张冰玉, 潘晴, 田妮莉, Everett Xiaolin Wang. 一种基于多重特征融合的信源个数估计方法[J]. 计算机工程, 2021, 47(4): 115-119,126.
[10]	王海, 翁晨傲, 李克, 骆曦. 一种面向基站扇区方向角估计的改进SVM算法[J]. 计算机工程, 2021, 47(4): 120-126.
[11]	孙静勇, 马福民. 基于邻域归属信息混合度量的粗糙K-Means算法[J]. 计算机工程, 2021, 47(3): 109-116.
[12]	乔钰, 胡晓辉, 曹乐. 基于局部密度预测的传输参数自适应拥塞控制策略[J]. 计算机工程, 2021, 47(12): 185-191,199.
[13]	连晓伟, 马垚, 陈永乐, 张壮壮, 王建华. 基于载荷特征与统计特征的Shodan流量识别[J]. 计算机工程, 2021, 47(1): 117-122.
[14]	袁哲明, 杨晶晶, 陈渊. 基于最大信息系数与冗余分摊的特征选择方法[J]. 计算机工程, 2020, 46(8): 101-105.
[15]	张瑞, 陈红卫. 基于特征优化与SVPSO的工控入侵检测[J]. 计算机工程, 2020, 46(4): 19-25.

选择文件类型/文献管理软件名称

选择包含的内容