基于正交约束的广义可分离非负矩阵分解算法

doi:10.19678/j.issn.1000-3428.0064749

摘要/Abstract

摘要：

可分离非负矩阵分解(NMF)是通过抽取数据集中的部分样本或关键主题来表示整个数据集的一种特殊NMF方法。广义可分离非负矩阵分解(GSNMF)算法是由可分离NMF扩展的算法，可以同时得到数据集中的关键样本和关键主题两类特征，使分解结果更具有可解释性，但在处理某些数据集时由于选择方法存在的缺陷，GSNMF算法只能单独选择行或列的特征，从而失去可解释性的优点。为此，引入正交约束来修正GSNMF算法的选取结果，提出一种基于正交约束的广义可分离非负矩阵分解(OGSNMF)算法，利用非负特性及正交约束的特点，限制迭代过程中关于行和列的迭代矩阵，确保得到行和列的特征，并获取更加精确的分解结果。在此基础上，引入相对近似误差作为实验指标，结合分解结果的秩在行与列上的分配作为实验评判标准。实验结果表明，与原有算法相比，OGSNMF算法在处理数据集时，相对近似误差提高了1~3个百分点，说明在分解过程中损失的信息更少，确保能够获取到行和列的特征，得到更具有可解释性的分解结果。

关键词: 降维, 非负矩阵分解, 广义可分离非负矩阵分解, 正交约束, 数据表示

Abstract:

Separable Nonnegative Matrix Factorization(NMF) is a special NMF method used to represent an entire dataset by extracting partial samples or key topics from the dataset.Generalized Separable Nonnegative Matrix Factorization(GSNMF) is an extended algorithm of separable NMF that can simultaneously obtain two types of features of key samples and key topics in the dataset to enable the decomposition results to more interpretable.However, it is shown that when GSNMF processes certain datasets, because of its defective selection algorithm, the decomposition process select only the characteristics of the row or column, thus losing the advantage of interpretability. The orthogonal constraints are introduced to correct the selection results of GSNMF and a algorithm of GSNMF with Orthogonal constraints(OGSNMF) is derived. OGSNMF uses the characteristics of nonnegative and orthogonal constraints to limit the iteration matrix of rows and columns in the iterative process. This ensures that the features of rows and columns are obtained simultaneously, thus yielding more accurate and thorough decomposition results. To verify the effectiveness of the algorithm, the relative approximation error is introduced as the experimental index, and the distribution of rank on rows and columns of the decomposition results is used as the experimental evaluation standard. A comparison of the experimental results shows that the OGSNMF improves the relative approximation error by 1 to 3 percentage points over that of the original algorithm, indicating that the loss of information under the decomposition process is less.The algorithm also ensures that the characteristics of rows and columns can be obtained, making the decomposition results more interpretable.

Key words: dimension reduction, Nonnegative Matrix Factorization(NMF), Generalized Separable Nonnegative Matrix Factorization(GSNMF), orthogonal constraint, data representation

陈君航, 杨祖元, 刘名扬, 李陵江. 基于正交约束的广义可分离非负矩阵分解算法[J]. 计算机工程, 2023, 49(8): 46-53.

Junhang CHEN, Zuyuan YANG, Mingyang LIU, Lingjiang LI. Generalized Separable Nonnegative Matrix Factorization Algorithm Based on Orthogonal Constraints[J]. Computer Engineering, 2023, 49(8): 46-53.

https://www.ecice06.com/CN/Y2023/V49/I8/46

图/表 11

表1 0-K问题的分解结果

Table 1 Factorization results of 0-K problem

数据集	r/类	n/个	GSNMF
数据集	r/类	n/个	(r₁, r₂)	相对近似误差/%
ORL	6	9	[0, 6]	83.37
Frey	5	51	[0, 5]	87.30

表2 数据集的类别数和总体样本数

Table 2 The number of categories and overall sample of the datasets 单位: 个

数据集	特征数	类别数	抽取样本总数
ORL	1 024	5	54
Frey	560	5	55
UMIST	1 024	7	126
AR	1 024	7	42
CMU_PIE	1 024	7	140

表3 各类算法的秩的分配和相对近似误差百分比

Table 3 The rank assignment and relative approximate error percentage of each algorithms

数据集	r	OGSNMF		GSPA		GSFGM		SPA^*相对近似误差/%	SPA-C相对近似误差/%	SPA-R相对近似误差/%
数据集	r	(r₁, r₂)	相对近似误差/%	(r₁, r₂)	相对近似误差/%	(r₁, r₂)	相对近似误差/%	SPA^*相对近似误差/%	SPA-C相对近似误差/%	SPA-R相对近似误差/%
ORL	5	[4, 1]	84.41	[0, 5]	83.37	[0, 5]	83.37	83.37	83.41	83.37
Frey	5	[2, 3]	87.89	[0, 5]	87.29	[0, 5]	87.29	87.29	84.47	87.29
UMIST	7	[2, 5]	72.01	[0, 7]	71.20	[0, 7]	71.20	71.20	70.40	71.20
AR	7	[1, 6]	77.94	[0, 7]	76.48	[0, 7]	76.48	76.48	77.82	76.48
CMU_PIE	7	[3, 4]	78.36	[0, 7]	77.79	[0, 7]	77.79	77.79	77.99	77.79

表4 不同算法的复杂度

Table 4 The complexity of different algorithms

对比算法	算法复杂度
SPA^*	$ O(m+n+mn) $
SPA-C	$ O\left(m\right) $
SPA-R	$ O\left(n\right) $
GSPA	$ O\left(mn\right) $
GSFGM	$ O({m}^{2}n+{n}^{2}m+{n}^{3}+{m}^{3}) $
OGSNMF	$ O({m}^{2}n+{n}^{2}m+{n}^{3}+{m}^{3}) $

图1 UMIST数据集中OGSNMF参数对分解结果的影响

Fig.1 Effect of OGSNMF parameters on the factorization results in the UMIST dataset

表5 行秩和列秩在

$ {\boldsymbol{\gamma }}_{1} $

和

$ {\boldsymbol{\gamma }}_{2} $

参数变化趋势下的变化

Table 5 Change of row rank and column rank under the change trend of parameters

$ {\boldsymbol{\gamma }}_{1} $

and

$ {\boldsymbol{\gamma }}_{2} $

固定$ {\gamma }_{2} $，变化$ {\gamma }_{1} $		固定$ {\gamma }_{1} $，变化$ {\gamma }_{2} $
r₁	r₂	r₁	r₂
3	4	1	6
1	6	1	6
4	3	1	6
3	4	1	6
2	5	1	6
3	4	2	5
1	6	2	5
1	6	1	6
1	6	2	5
1	6	1	6

图2 Frey数据集分解结果可视化

Fig.2 Visualization of factorization results of the Frey dataset

图3 ORL数据集分解结果的可视化

Fig.3 Visualization of the factorization results of the ORL dataset

图4 AR数据集分解结果的可视化

Fig.4 Visualization of factorization results of the AR dataset

图5 UMIST数据集分解结果的可视化

Fig.5 Visualization of factorization results of the UMIST dataset

图6 CMU_PIE数据集分解结果的可视化

Fig.6 Visualization of the factorization results of the CMU_PIE dataset

参考文献 25

1	PENG C, ZHANG Z L, KANG Z, et al. Nonnegative matrix factorization with local similarity learning. Information Sciences, 2021, 562, 325- 346. doi: 10.1016/j.ins.2021.01.087
2	WANG J Y, GUAN S Z, LIU S P, et al. Minimum-volume multichannel nonnegative matrix factorization for blind audio source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29, 3089- 3103. doi: 10.1109/TASLP.2021.3120603
3	DALHOUMI O, BOUGUILA N, AMAYRI M, et al. Bayesian matrix factorization for semibounded data. IEEE Transactions on Neural Networks and Learning Systems, 2021, 68 (2): 3954- 3958.
4	FU X, HUANG K J, SIDIROPOULOS N D. On identifiability of nonnegative matrix factorization. IEEE Signal Processing Letters, 2018, 25 (3): 328- 332. doi: 10.1109/LSP.2018.2789405
5	LIU Z Q, TAN V Y F. Rank-one NMF-based initialization for NMF and relative error bounds under a geometric assumption[C]//Proceedings of Information Theory and Applications Workshop. Washington D. C., USA: IEEE Press, 2018: 1-15.
6	GILLIS N, VAVASIS S A. Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36 (4): 698- 714. doi: 10.1109/TPAMI.2013.226
7	GILLIS N, LUCE R. Robust near-separable nonnegative matrix factorization using linear optimization. Journal of Machine Learning Research, 2014, 15 (1): 1249- 1280.
8	HU W J, CHOI K S, WANG P L, et al. Convex nonnegative matrix factorization with manifold regularization. Neural Networks, 2015, 63, 94- 103. doi: 10.1016/j.neunet.2014.11.007
9	WANG S Z, PENG J, LIU W. Discriminative separable nonnegative matrix factorization by structured sparse regularization. Signal Processing, 2016, 120, 620- 626. doi: 10.1016/j.sigpro.2015.10.021
10	NAGPAL A, SHARMA C, GARG R, et al. Near-separable non-negative matrix factorization using L1-optimization[C]//Proceedings of ACM Joint International Conference on Data Science and Management. New York, USA: ACM Press, 2019: 121-127.
11	NADISIC N, VANDAELE A, COHEN J E, et al. Sparse separable nonnegative matrix factorization. Berlin, Germany: Springer, 2021: 335- 350.
12	BALABANOV O, GRIGORI L. Randomized gram: schmidt process with application to GMRES. SIAM Journal on Scientific Computing, 2022, 44 (3): 1450- 1474. doi: 10.1137/20M138870X
13	APRIANSYAH M R, YOKOTA R. Parallel QR factorization of block low-rank matrices. ACM Transactions on Mathematical Software, 2022, 48 (3): 1- 28.
14	KUMAR A, SINDHWANI V, KAMBADUR P. Fast conical hull algorithms for near-separable non-negative matrix factorization[EB/OL]. [2022-04-10]. https://arxiv.org/abs/1210.1190.
15	KUMAR A, SINDHWANI V. Near-separable non-negative matrix factorization with L1 and Bregman loss functions[C]//Proceedings of SIAM International Conference on Data Mining. Philadelphia, USA: Society for Industrial and Applied Mathematics, 2015: 343-351.
16	DAMLE A, SUN Y K. A geometric approach to archetypal analysis and nonnegative matrix factorization. Technometrics, 2017, 59 (3): 361- 370. doi: 10.1080/00401706.2016.1247017
17	DE HANDSCHUTTER P, GILLIS N, VANDAELE A, et al. Near-convex archetypal analysis. IEEE Signal Processing Letters, 2020, 27, 81- 85. doi: 10.1109/LSP.2019.2957604
18	NGUYEN T, FU X, WU R Y. Memory-efficient convex optimization for self-dictionary separable nonnegative matrix factorization: a frank-wolfe approach. IEEE Transactions on Signal Processing, 2022, 70, 3221- 3236. doi: 10.1109/TSP.2022.3177845
19	PAN J J, GILLIS N. Generalized separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (5): 1546- 1561. doi: 10.1109/TPAMI.2019.2956046
20	DEAN R C, VARSHNEY L R. Optimal recovery of missing values for non-negative matrix factorization. IEEE Open Journal of Signal Processing, 2021, 2, 207- 216. doi: 10.1109/OJSP.2021.3069373
21	DING C, LI T, PENG W, et al. Orthogonal nonnegative matrix t-factorizations for clustering[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2006: 126-135.
22	LIANG N Y, YANG Z Y, LI Z N, et al. Multi-view clustering by non-negative matrix factorization with co-orthogonal constraints. Knowledge-Based Systems, 2020, 194, 105582. doi: 10.1016/j.knosys.2020.105582
23	MIRZAL A. A convergent algorithm for orthogonal nonnegative matrix factorization. Journal of Computational and Applied Mathematics, 2021, 15 (4): 1069- 1102.
24	SHEN X B, YUAN Y H, SHEN F M, et al. A novel multi-view dimensionality reduction and recognition framework with applications to face recognition. Journal of Visual Communication and Image Representation, 2018, 53, 161- 170. doi: 10.1016/j.jvcir.2018.03.004
25	GAN Y F, YANG T, HE C. A deep graph embedding network model for face recognition[C]//Proceedings of the 12th International Conference on Signal Processing. Washington D. C., USA: IEEE Press, 2015: 1268-1271.

[1]	赵俊涛, 李陶深, 卢志翔. 基于最优近邻的局部保持投影方法[J]. 计算机工程, 2024, 50(9): 161-168.
[2]	霍跃华, 赵法起. 基于Stacking与多特征融合的加密恶意流量检测[J]. 计算机工程, 2023, 49(5): 165-172,180.
[3]	古楠楠. 针对标签噪声数据的自步半监督降维[J]. 计算机工程, 2023, 49(11): 131-142.
[4]	郑秋梅, 徐林康, 王风华, 林超. 基于改进自注意力机制的金字塔场景解析网络[J]. 计算机工程, 2023, 49(1): 242-249.
[5]	李晋国, 焦旭斌. 雾计算环境下入侵检测模型研究[J]. 计算机工程, 2022, 48(5): 43-52.
[6]	张恒, 陈晓红, 蓝宇翔, 李舜酩. 基于深度学习的监督型典型相关分析[J]. 计算机工程, 2022, 48(5): 222-228.
[7]	陶洋, 鲍灵浪, 胡昊. 结合表示学习与嵌入子空间学习的降维方法[J]. 计算机工程, 2021, 47(6): 83-87,97.
[8]	刘彦雯, 张金鑫, 张宏杰, 经玲. 基于双重局部保持的不完整多视角嵌入学习方法[J]. 计算机工程, 2021, 47(6): 115-122,141.
[9]	周培春, 吴兰岸. 多尺度多核高斯过程隐变量模型[J]. 计算机工程, 2021, 47(2): 285-292.
[10]	吕少卿, 赵雪莉, 张潘, 任新成. 一种保留社区结构信息的网络嵌入算法[J]. 计算机工程, 2021, 47(12): 122-130.
[11]	罗彬珅, 刘利民, 董健, 刘璟麒. 基于SAE-GA-SVM模型的雷达新型干扰识别[J]. 计算机工程, 2020, 46(6): 281-287.
[12]	何发镁, 马慧珍, 王旭仁, 冯安然. 基于特征分组聚类的异常入侵检测系统研究[J]. 计算机工程, 2020, 46(4): 123-128,134.
[13]	张恩豪, 陈晓红, 刘鸿, 朱玉莲. 基于典型相关分析的多视图降维算法综述[J]. 计算机工程, 2020, 46(2): 1-10.
[14]	王旭仁,马慧珍,冯安然,许祎娜. 基于信息增益与主成分分析的网络入侵检测方法[J]. 计算机工程, 2019, 45(6): 175-180.
[15]	孙营,王波涛. 基于可变形部件改进模型的夜间车辆检测方法[J]. 计算机工程, 2019, 45(3): 202-206.

选择文件类型/文献管理软件名称

选择包含的内容