基于重要性采样的异质超网络表示学习

doi:10.19678/j.issn.1000-3428.0069679

摘要/Abstract

摘要：

异质超网络能够建模现实世界中的各种高阶元组关系, 表征超网络的异质高阶信息, 同时异质超网络具有不同程度的不可分解性, 而现有研究方法没有充分考虑高阶元组关系(超边)的不可分解性。针对上述问题, 提出一种基于重要性采样的异质超网络表示学习方法HRIS, 将紧密高阶元组关系融入超网络表示学习中。首先, 该方法提出判断节点的概念, 融合不可分解因子与元组相似度改进随机游走对重要节点的采样来捕获超网络中紧密的高阶元组关系。其次, 为了使序列更具全局性与多样性, 引入数据增强中的随机交换方法来解决过拟合问题, 同时提出基于节点度的随机删除方法提升鲁棒性。最后, 构建一个负采样增强的skip-gram模型NSE-skip-gram来获得高质量的节点表示向量。在4个真实数据集上的实验结果表明: 对于链接预测任务, HRIS显著优于基线方法; 对于超网络重建任务, 在所有重建比例下, HRIS在全球定位系统(GPS)和drug数据集上较最优基线方法平均提升3.75和9.79百分点。

关键词: 表示学习, 高阶元组关系, 重要性采样, 数据增强, 负采样增强, 链接预测, 超网络重建

Abstract:

Heterogeneous hypernetworks can model various high-order tuple relations found in the real world, which represent heterogeneous high-order information within the hypernetwork. However, heterogeneous hypernetworks have different degrees of indecomposability, and existing research methods do not fully consider the indecomposability of high-order tuple relations regarded as hyperedges. To address this issue, a heterogeneous hypernetwork representation learning method based on importance sampling, called HRIS, is proposed, which incorporates close high-order tuple relations into hypernetwork representation learning. First, it proposes judgment nodes, and incorporates indecomposable factors and tuple similarity to improve the sampling of important nodes through random walks to capture tight high-order tuple relations within the hypernetwork. Second, to make the sequences more global and diverse, the random swap method in data augmentation is introduced for solving overfitting problems, and a random deletion method based on node degree is proposed to improve robustness. Finally, a skip-gram model with negative sampling enhancement, called NSE-skip-gram, is proposed to obtain high-quality node representation vectors. Experiments conducted on four real hypernetwork datasets reveal that for the link prediction task, the HRIS demonstrates a significant improvement over other baseline methods; for the hypernetwork reconstruction task, the HRIS exhibits an average improvement of 3.75 and 9.79 percentage points compared to the optimal baseline method on the Global Positioning System (GPS) and drug datasets at all reconstruction ratios, respectively.

Key words: representation learning, high-order tuple relation, importance sampling, data augmentation, negative sampling enhancement, link prediction, hypernetwork reconstruction

夏青青, 朱宇, 王晓英, 黄建强, 曹腾飞. 基于重要性采样的异质超网络表示学习[J]. 计算机工程, 2025, 51(11): 133-143.

XIA Qingqing, ZHU Yu, WANG Xiaoying, HUANG Jianqiang, CAO Tengfei. Heterogeneous Hypernetwork Representation Learning Based on Importance Sampling[J]. Computer Engineering, 2025, 51(11): 133-143.

https://www.ecice06.com/CN/Y2025/V51/I11/133

图/表 8

图1 异质超网络

Fig.1 Heterogeneous hypernetwork

图2 元组相似度计算框架

Fig.2 Calculation framework of tuple similarity

图3 HRIS方法框架

Fig.3 HRIS method framework

图4 GPS超网络重建结果

Fig.4 GPS hypernetwork reconstruction results

图5 drug超网络重建结果

Fig.5 drug hypernetwork reconstruction results

图6 参数敏感度分析

Fig.6 Parameter sensitivity analysis

参考文献 30

1	赵琳琳, 吴安彪, 袁野, 等. 位置社交网络上的图表示学习. 计算机学报, 2022, 45 (4): 838- 857.
	ZHAO L L , WU A B , YUAN Y , et al. Graph representation learning on location-based social networks. Chinese Journal of Computers, 2022, 45 (4): 838- 857.
2	YU J L, YIN H Z, LI J D, et al. Self-supervised multi-channel hypergraph convolutional network for social recommendation[C]//Proceedings of the Web Conference 2021. New York, USA: ACM Press, 2021: 413-424.
3	YANG D Q , QU B Q , YANG J , et al. LBSN2Vec: heterogeneous hypergraph embedding for location-based social networks. IEEE Transactions on Knowledge and Data Engineering, 2022, 34 (4): 1843- 1855.
4	LUNG R I , GASKÓ N , SUCIU M A . A hypergraph model for representing scientific output. Scientometrics, 2018, 117 (3): 1361- 1379.
5	FENG S , HEATH E , JEFFERSON B , et al. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinformatics, 2021, 22 (1): 287.
6	RASHID M A , AHMAD S , SIDDIQUI M K , et al. An analysis of eccentricity-based invariants for biochemical hypernetworks. Complexity, 2021 (1): 1974642.
7	张正康, 杨丹, 聂铁铮, 等. 基于图结构聚类的自监督学习疾病诊断方法. 计算机工程, 2024, 50 (7): 360- 371. doi: 10.19678/j.issn.1000-3428.0068187
	ZHANG Z K , YANG D , NIE T Z , et al. Self-supervised learning based on graph structural clustering for disease diagnosis method. Computer Engineering, 2024, 50 (7): 360- 371. doi: 10.19678/j.issn.1000-3428.0068187
8	HONG S , ZHOU Z , ZIO E , et al. An adaptive method for health trend prediction of rotating bearings. Digital Signal Processing, 2014, 35, 117- 123.
9	MATHIOUDAKIS M, KOUDAS N. TwitterMonitor: trend detection over the twitter stream[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. New York, USA: ACM Press, 2010: 1155-1158.
10	JANNACH D , ZANKER M , FELFERNIG A , et al. Recommender systems: an introduction. New York, USA: Cambridge University Press, 2010: 1- 10.
11	李忠伟, 周洁, 刘昕, 等. 融合时间和知识信息的生成对抗网络序列推荐算法. 计算机工程, 2024, 50 (11): 70- 79. doi: 10.19678/j.issn.1000-3428.0068300
	LI Z W , ZHOU J , LIU X , et al. Sequence recommendation algorithm based on generative adversarial network integrating time and knowledge information. Computer Engineering, 2024, 50 (11): 70- 79. doi: 10.19678/j.issn.1000-3428.0068300
12	李宇琦, 陈维政, 闫宏飞, 等. 基于网络表示学习的个性化商品推荐. 计算机学报, 2019, 42 (8): 1767- 1778.
	LI Y Q , CHEN W Z , YAN H F , et al. Learning graph-based embedding for personalized product recommendation. Chinese Journal of Computers, 2019, 42 (8): 1767- 1778.
13	AGARWAL S, BRANSON K, BELONGIE S. Higher order learning with graphs[C]//Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM Press, 2006: 17-24.
14	SUN L, JI S W, YE J P. Hypergraph spectral learning for multi-label classification[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2008: 668-676.
15	HUANG J, LIU X, SONG Y Q. Hyper-path-based representation learning for hyper-networks[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York, USA: ACM Press, 2019: 449-458.
16	HUANG J , CHEN C , YE F H , et al. Hyper2vec: biased random walk for hyper-network embedding. Berlin, Germany: Springer, 2019.
17	GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2016: 855-864.
18	NAJORK M, WIENER J L. Breadth-first crawling yields high-quality pages[C]//Proceedings of the 10th International Conference on World Wide Web. New York, USA: ACM Press, 2001: 114-118.
19	TU K, CUI P, WANG X, et al. Structural deep embedding for hyper-networks[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2018: 426-433.
20	FENG Y F, YOU H X, ZHANG Z Z, et al. Hypergraph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2019: 3558-3565.
21	LIU S, LAI C, TORIUMI F. HyperS2V: a framework for structural representation of nodes in hyper networks[EB/OL]. [2024-03-07]. https://arxiv.org/abs/2311.04149v1.
22	WEI J, ZOU K. EDA: easy data augmentation techniques for boosting performance on text classification tasks[EB/OL]. [2024-03-07]. https://doi.org/10.48550/arXiv.1901.11196.
23	MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2013: 3111-3119.
24	ZHENG V W, CAO B, ZHENG Y, et al. Collaborative filtering meets mobile recommendation: a user-centered approach[C]//Proceedings of the 24th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2010: 236-241.
25	HARPER F M , KONSTAN J A . The MovieLens datasets. ACM Transactions on Interactive Intelligent Systems, 2016, 5 (4): 1- 19.
26	BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. New York, USA: Curran Associates, Inc., 2013: 2787-2795.
27	PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2014: 701-710.
28	ABU-EL-HAIJA S, PEROZZI B, AL-RFOU R, et al. Watch your step: learning node embeddings via graph attention[EB/OL]. [2024-03-07]. https://arxiv.org/abs/1710.09599v2.
29	MA X W, QIN G, QIU Z Y, et al. RiWalk: fast structural node embedding via role identification[C]//Proceedings of the IEEE International Conference on Data Mining (ICDM). Washington D.C., USA: IEEE Presss, 2019: 478-487.
30	姜正申, 刘宏志, 付彬, 等. 集成学习的泛化误差和AUC分解理论及其在权重优化中的应用. 计算机学报, 2019, 42 (1): 1- 15.
	JIANG Z S , LIU H Z , FU B , et al. Decomposition theories of generalization error and AUC in ensemble learning with application in weight optimization. Chinese Journal of Computers, 2019, 42 (1): 1- 15.

[1]	朱思远, 李佳圣, 邹丹平, 何迪, 郁文贤. 基于半监督学习的非结构化道路缺陷检测算法[J]. 计算机工程, 2025, 51(9): 14-24.
[2]	李小雨, 罗娜. 基于迁移类内变化增强数据的小样本学习方法[J]. 计算机工程, 2025, 51(9): 242-251.
[3]	马淦, 谷雨, 彭冬亮. 结合改进YOLOv5s和动态数据增强的海面舰船检测[J]. 计算机工程, 2025, 51(9): 294-305.
[4]	王帅, 史艳翠. 基于个性化数据增强的自监督序列推荐算法[J]. 计算机工程, 2025, 51(8): 190-202.
[5]	田银花, 杨立飞, 韩咚, 杜玉越. 基于改进BERT和轻量化CNN的业务流程合规性检查方法[J]. 计算机工程, 2025, 51(7): 199-209.
[6]	庞鑫, 葛凤培, 李艳玲. 声景识音：数字化时代声学场景分类的探索与前沿[J]. 计算机工程, 2025, 51(6): 1-19.
[7]	商雅名, 吴安彪, 袁野, 王一舒. 基于个性化PageRank高阶邻域聚合的图神经网络增强[J]. 计算机工程, 2025, 51(6): 38-48.
[8]	刘春雨, 陈庆锋, 莫少聪, 谢泽. 基于逻辑规则和图神经网络的知识图谱补全[J]. 计算机工程, 2025, 51(3): 131-143.
[9]	张兴鹏, 何东, 杨模, 叶杭滨. 基于多尺度注意力和数据增强的细胞核分割[J]. 计算机工程, 2025, 51(2): 387-396.
[10]	马恒志, 钱育蓉, 冷洪勇, 吴海鹏, 陶文彬, 张依杨. 知识图谱嵌入研究进展综述[J]. 计算机工程, 2025, 51(2): 18-34.
[11]	李文浩, 张东, 李冠宇. ComHA: 融合几何变换与层次结构的知识图谱嵌入模型[J]. 计算机工程, 2025, 51(11): 123-132.
[12]	李维刚, 厉许昌, 田志强, 李金灵. 基于自蒸馏框架的点云分类及其鲁棒性研究[J]. 计算机工程, 2024, 50(9): 72-81.
[13]	刘娟, 段友祥, 陆誉翕, 张鲁. 引入知识增强和对比学习的知识图谱补全[J]. 计算机工程, 2024, 50(7): 112-122.
[14]	林芷薇, 杨祖元, 王斯秋, 杨超. 基于多尺度线性全局注意力的运动员检测算法[J]. 计算机工程, 2024, 50(7): 352-359.
[15]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.

选择文件类型/文献管理软件名称

选择包含的内容