作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 150-160,174. doi: 10.19678/j.issn.1000-3428.0063898

• 人工智能与模式识别 • 上一篇    下一篇

基于多通道图卷积自编码器的图表示学习

袁立宁1, 胡皓1, 刘钊2   

  1. 1. 中国人民公安大学 信息网络安全学院, 北京 100038;
    2. 中国人民公安大学 研究生院, 北京 100038
  • 收稿日期:2022-02-12 修回日期:2022-03-22 发布日期:2022-05-24
  • 作者简介:袁立宁(1995-),男,硕士研究生,主研方向为机器学习、图神经网络;胡皓,硕士研究生;刘钊(通信作者),讲师、博士。
  • 基金资助:
    国家重点研发计划“基于大数据技术文物安全综合信息应用平台关键技术研究”(2020YFC1522600);中央高校基本科研业务费专项资金“视频中显著物体检测研究方法”(2019JKF425)。

Graph Representation Learning Based on Multi-Channel Graph Convolutional Autoencoders

YUAN Lining1, HU Hao1, LIU Zhao2   

  1. 1. School of Information Cyber Security, People's Public Security University of China, Beijing 100038, China;
    2. Graduate School, People's Public Security University of China, Beijing 100038, China
  • Received:2022-02-12 Revised:2022-03-22 Published:2022-05-24

摘要: 针对基于图卷积的自编码器模型对原始图属性和拓扑信息的保留能力有限、无法学习结构和属性之间深度关联信息等问题,提出基于多通道图卷积自编码器的图表示学习模型。设计拓扑和属性信息保留能力实验,验证了基于图卷积的自编码器模型具备保留节点属性和拓扑结构信息的能力。构建特定信息卷积编码器和一致信息卷积编码器,提取图的属性空间特征、拓扑空间特征以及两者关联特征,生成属性嵌入、拓扑嵌入和一致性嵌入,同时建立与编码器对称的卷积解码器,还原编码器过程。使用重构损失、局部约束和一致性约束,优化各编码器生成的低维嵌入表示。最终将蕴含不同图信息的多种嵌入进行融合,生成各节点的嵌入表示。实验结果表明,该模型在BlogCatalog和Flickr数据集上节点分类的Micro-F1和Macro-F1明显高于基线模型,在Citeseer数据集上节点聚类的精度和归一化互信息相比于表现最优的基线模型提升了11.84%和34.03%。上述实验结果证明了该模型采用的多通道方式能够在低维嵌入中保留更丰富的图信息,提升图机器学习任务的性能表现。

关键词: 图表示学习, 图卷积网络, 自编码器, 节点分类, 节点聚类

Abstract: This study proposes a graph representation learning model based on multi-channel graph convolutional autoencoders to address the limited ability of graph convolutional autoencoders in fusing node attributes and graph topology, and their inability to learn deep associations between node attributes.First, design topology and attribute information retention capability experiments are designed to verify the ability of a graph convolutional autoencoder in retaining node attribute and topological structure information.Second, specific and consensus convolutional encoders are designed to extract attribute- and topology-space features and their association, as well as to generate attribute, topology, and consensus embeddings.Third, convolutional decoders symmetric to the encoders are designed for recovering the encoder process.Fourth, reconstruction loss, local and consensus constraints are introduced to optimize low-dimensional embeddings generated by different encoders.Finally, multiple embeddings that contain different graph information are fused to generate an embedding representation for each node.The proposed model performs better than the baseline models in terms of node classification and node clustering.Its Micro-F1 and Macro-F1 for node classification are significantly higher than those of the baseline models of the BlogCatalog and Flickr datasets.Meanwhile, its Clustering Accuracy(Cluster-Acc) and Normalized Mutual Information(NMI) for node clustering on the Citeseer dataset are 11.84% and 34.03% higher, respectively, than the best-performing baseline model.The results show that the multi-channel approach adopted in the proposed model can retain richer graph information in low-dimensional embedding and improve downstream task performance.

Key words: graph representation learning, Graph Convolution Network(GCN), autoencoder, node classification, node clustering

中图分类号: