作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (11): 123-130. doi: 10.19678/j.issn.1000-3428.0065936

• 人工智能与模式识别 • 上一篇    下一篇

基于对抗训练的伪标签约束自编码器

富坤, 孙明磊, 郝玉涵, 刘赢华   

  1. 河北工业大学 人工智能与数据科学学院, 天津 300401
  • 收稿日期:2022-10-08 出版日期:2023-11-15 发布日期:2023-02-09
  • 作者简介:

    富坤(1979-), 女, 副教授、博士, 主研方向为社会网络分析、网络表示学习

    孙明磊, 硕士研究生

    郝玉涵, 硕士研究生

    刘赢华, 硕士研究生

  • 基金资助:
    国家自然科学基金面上项目(62072154)

Adversarial Training Based Pseudo Label Constraint Auto-Encoder

Kun FU, Minglei SUN, Yuhan HAO, Yinghua LIU   

  1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
  • Received:2022-10-08 Online:2023-11-15 Published:2023-02-09

摘要:

社会网络的节点经常存在标注缺失、标注错误和人工标注成本高的现象,这种现象对监督或半监督的网络表示学习效果产生了不利的影响。提出一个自监督学习的网络表示学习模型——基于对抗训练的伪标签约束自编码器(AT-PLCAE)。设计一个伪标签约束自编码器,通过缩短原始图的伪标签和网络表示的伪标签之间的距离,减少编码过程中产生的信息损失,约束和引导模型有效学习。同时,设计与伪标签约束自编码器相适应的对抗网络,组织表示的潜在空间结构。将潜在表示后验分布与输入的特定先验分布相匹配后,该模型能够缓解过拟合问题并提升模型的泛化能力。在Cora、Citeseer、Wiki和Pubmed这4个公开数据集上进行节点分类实验,结果表明,AT-PLCAE模型在分类准确率方面学习效果优于基准方法,与基准方法的最高分类准确率相比,在Cora数据集上提升0.018,在Citeseer和Pubmed数据集上均提升0.011。同时消融实验结果表明,针对伪标签约束自编码器的对抗训练增强了模型的泛化能力。

关键词: 自监督学习, 网络表示学习, 伪标签, 自编码器, 对抗训练, 泛化

Abstract:

Nodes in social networks often suffer from missing labels, labeling errors, and high manual labeling costs, which negatively affect supervised and semi-supervised network representation learning. In this paper, a self-supervised network representation learning model, the Adversarial Training based Pseudo Label Constraint Auto-Encoder(AT-PLCAE) is proposed, whereby an autoencoder constrained by pseudo-label is designed. By reducing the distance between the pseudo-label of the original graph and that of the network representation, the model reduces information loss during the coding process, thereby learning effectively. Adversarial networks are designed so that the pseudo-label-constrained auto-encoder can organize the potential spatial structure of the representation layer. By forcing the posterior distribution of the latent representation to match the specific prior distribution of the input, the model alleviates the overfitting problem and improves generalization. Node classification experiments on four public datasets: Cora, Citeseer, Wiki, and PubMed, showed that the AT-PLCAE model is superior to the benchmark method. Compared with the benchmark method, AT-PLCAE improved the highest classification accuracy value by 0.018 on Cora, 0.011 on CiteSeer, and PubMed, respectively. The ablation experiment results showed that adversarial networks designed for the pseudo-label-constrained auto-encoder enhance the generalization of the model.

Key words: self-supervised learning, network representation learning, pseudo label, auto-encoder, adversarial training, generalization