作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 16-21. doi: 10.19678/j.issn.1000-3428.0061221

• 热点与综述 • 上一篇    下一篇

基于图深度学习的金融文本多标签分类算法

金雨澄1,2, 王清钦1,2, 高剑3, 苗仲辰3, 林越峰3, 项雅丽1,2, 熊贇1,2   

  1. 1. 复旦大学 计算机科学技术学院, 上海 210438;
    2. 上海市数据科学重点实验室, 上海 200438;
    3. 上海金融期货信息技术有限公司, 上海 200120
  • 收稿日期:2021-03-22 修回日期:2021-05-18 发布日期:2021-06-04
  • 作者简介:金雨澄(1998—),男,硕士研究生,主研方向为数据挖掘;王清钦,硕士研究生;高剑(通信作者),硕士;苗仲辰、林越峰,博士;项雅丽,硕士研究生;熊贇,教授、博士。
  • 基金资助:
    国家自然科学基金(U1636207,U1936213)。

Multi-label Financial Text Classification Algorithm Based on Graph Deep Learning

JIN Yucheng1,2, WANG Qingqin1,2, GAO Jian3, MIAO Zhongchen3, LIN Yuefeng3, XIANG Yali1,2, XIONG Yun1,2   

  1. 1. School of Computer Science and Technology, Fudan University, Shanghai 200438, China;
    2. Shanghai Key Laboratory of Data Science, Shanghai 200438, China;
    3. Shanghai Financial Features Information Technology Co., Ltd., Shanghai 200120, China
  • Received:2021-03-22 Revised:2021-05-18 Published:2021-06-04

摘要: 金融文本多标签分类算法可以根据用户需求在海量金融资讯中实现信息检索。为进一步提升金融文本标签识别能力,建模金融文本多标签分类中标签之间的相关性,提出基于图深度学习的金融文本多标签分类算法。图深度学习通过深度网络学习局部和全局的图结构特征,可以刻画节点之间的复杂关系。通过建模标签关联实现标签之间的知识迁移,是构造具有强泛化能力算法的关键。所提算法结合标签之间的关联信息,采用基于双向门控循环网络和标签注意力机制得到的新闻文本对应不同标签的特征表示,通过图神经网络学习标签之间的复杂依赖关系。在真实数据集上的实验结果表明,显式建模标签之间的相关性能够极大地增强模型的泛化能力,在尾部标签上的性能提升尤其显著,相比CAML、BIGRU-LWAN和ZACNN算法,该算法在所有标签和尾部标签的宏观F1值上最高提升3.1%和6.9%。

关键词: 文本多标签分类, 深度学习, 图神经网络, 注意力网络, 金融文本

Abstract: Multi-label financial text classification can retrieve relevant information from massive financial news according to user needs.To further improve the performance of multi-label financial text classification, this study proposes an algorithm to model the correlation between labels based on graph deep learning.Graph deep learning can describe the complex relationships between nodes by learning local and global graph structure features through deep neural networks.Modeling the correlation between labels can realize knowledge transfer between labels, which is key to constructing an algorithm with strong generalization ability.Therefore, this study utilizes graph neural network to learn the complex dependency between labels based on statistical information along with feature representations extracted using the bi-directional gated recurrent network and label attention mechanism. Experimental results on real world datasets show that modeling label correlations can significantly improve the classification performance, especially on tail labels.Compared with CAML, BIGRU-LWAN and ZACNN algorithms, the proposed algorithm improves the macro F1 values of all labels and tail labels up to 3.1% and 6.9%.

Key words: multi-label text classification, deep learning, graph neural network, attention network, financial text

中图分类号: