作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (7): 123-132. doi: 10.19678/j.issn.1000-3428.0068211

• 人工智能与模式识别 • 上一篇    下一篇

增量式稀疏密度加权孪生支持向量回归机

丁伟杰, 顾斌杰*(), 潘丰   

  1. 江南大学轻工过程先进控制教育部重点实验室, 江苏 无锡 214122
  • 收稿日期:2023-08-10 出版日期:2024-07-15 发布日期:2023-12-05
  • 通讯作者: 顾斌杰
  • 基金资助:
    国家自然科学基金(51961125102)

Incremental Sparse Density-Weighted Twin Support Vector Regression

Weijie DING, Binjie GU*(), Feng PAN   

  1. Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi 214122, Jiangsu, China
  • Received:2023-08-10 Online:2024-07-15 Published:2023-12-05
  • Contact: Binjie GU

摘要:

密度加权孪生支持向量回归机(DWTSVR)是一种能够反映数据内在分布的回归算法, 具有预测精度高和鲁棒性强等优点, 然而其并不适用于训练样本以增量形式提供的场景。针对该问题, 提出一种增量式稀疏密度加权孪生支持向量回归机(ISDWTSVR)。首先, 辨别新增数据是否为异常样本, 并赋予有效样本适当的权重, 减小异常样本对模型泛化性能的影响; 其次, 结合矩阵降维与主成分分析思想筛选出原始核矩阵中的一组特征列向量基代替原特征, 实现核矩阵列稀疏化, 以获得稀疏解; 接着, 借助牛顿迭代法和增量学习策略对上一时刻的模型信息进行调整, 实现模型的增量更新, 同时结合矩阵求逆引理避免增量更新过程中直接求解逆矩阵, 进一步加快训练速度; 最后, 在UCI基准数据集上进行仿真实验, 并与现有代表性算法进行比较。实验结果表明, ISDWTSVR继承了DWTSVR的泛化性能, 在大规模数据集Bike-Sharing上, 新增一个样本模型更新平均CPU时间为5.13 s, 较DWTSVR缩短了97.94%, 有效地解决了模型必须从头开始重新训练的问题, 适用于大规模数据集的在线学习。

关键词: 孪生支持向量回归机, 增量学习, 稀疏化, 密度加权, 牛顿迭代法

Abstract:

The Density-Weighted Twin Support Vector Regression(DWTSVR) is a regression algorithm that reflects the internal distribution of data with high prediction accuracy and robustness. However, DWTSVR is unsuitable when training samples are provided in incremental form. To address this problem, this paper proposes the Incremental Sparse DWTSVR(ISDWTSVR). First, to reduce the impact of abnormal samples on the generalization performance of the model, whether the new data is an abnormal sample is identified, and appropriate weights are assigned to valid samples. Next, combining the ideas of matrix dimensionality reduction and principal component analysis, a set of feature column vector bases in the original kernel matrix is screened to replace the original features, achieving sparsity of the kernel matrix and obtaining sparse solutions. Then, using the Newton iteration method and incremental learning strategy, the model information of the previous moment is adjusted to achieve an incremental update of the model. Additionally, the matrix inverse lemma is utilized to avoid solving the inverse matrix directly during the process of incremental updating, further accelerating the training. Finally, simulation experiments are performed using ISDWTSVR on UCI benchmark datasets, and the results are compared with those of existing representative algorithms. Experimental results show that ISDWTSVR inherits the generalization performance of DWTSVR. Upon adding a new sample to Bike-Sharing, a large-scale dataset, the average CPU time for model update is 5.13 s, which is 97.94% shorter than that of DWTSVR. Thus, ISDWTSVR effectively addresses the problem of model retraining from scratch and is suitable for online learning of large-scale datasets.

Key words: Twin Support Vector Regression(TSVR), incremental learning, sparsification, density-weighted, Newton iteration method