Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (2): 245-252. doi: 10.19678/j.issn.1000-3428.0069385

• Cyberspace Security • Previous Articles    

Text Steganalysis Method Based on Hierarchy-Aware Matching

JIA Jianghao, ZHANG Ziwei, GAO Liting, WEN Juan, XUE Yiming   

  1. College of Information and Electrical Engineering, China Agricultural University, Beijing 100089, China
  • Received:2024-02-21 Revised:2024-09-13 Published:2026-02-04

基于层次感知匹配的文本隐写分析方法

贾江浩, 张梓葳, 郜丽婷, 文娟, 薛一鸣   

  1. 中国农业大学信息与电气工程学院, 北京 100089
  • 作者简介:贾江浩,男,硕士研究生,主研方向为文本隐写分析;张梓葳,博士研究生;郜丽婷,硕士研究生;文娟(通信作者),副教授、博士,E-mail:wenjuan@cau.edu.cn;薛一鸣,教授。
  • 基金资助:
    国家自然科学基金(62272463)。

Abstract: Existing text steganalysis models experience difficulty in learning and extracting multilayer effective information that truly exists in encrypted data. To address this issue, a text steganalysis method, HAM-Stega, based on hierarchy-aware matching is proposed. This method utilizes the matching relationship between the relative distance between text information and label information in steganographic data to obtain a feature-matching relationship between text and coarse- and fine-grained labels in a hierarchy-aware manner. Based on this, joint embedding and matching learning loss functions are designed to guide the classification of text feature representations and obtain the final hierarchical classification information. The experimental results show that HAM-Stega's detection accuracy on the Large multidistribution mixed dataset, which is similar to real-world scenarios, improves by approximately 1.25—7.42 percentage points compared to the comparison model, indicating that the proposed model has an effective steganalysis detection capability on mixed datasets. Simultaneously, HAM-Stega can extract and detect other layers of effective information present in the steganographic data, such as steganographic algorithms for encrypted text, embedding rates, and corpus types. It improves the hierarchical classification metrics Macro-F1 and Micro-F1 by 5.41 and 4.36 percentage points, respectively, compared with the pretrained BERT model.

Key words: information security, text steganalysis, hierarchy-aware matching, Graph Neural Network (GNN), BERT

摘要: 针对现有文本隐写分析模型难以学习和提取载密数据中真实存在的多层有效信息的问题,提出一种基于层次感知匹配的文本隐写分析方法HAM-Stega。该方法利用隐写数据中的文本信息与标签信息之间相对距离的匹配关系,以层次感知的方式获取文本与粗粒度、细粒度标签之间的特征匹配关系,以此设计联合嵌入损失函数和匹配学习损失函数,引导文本特征表示进行分类学习,得到最终的层次分类信息。实验结果表明,HAM-Stega在更符合现实场景的多分布混合数据集Large上的检测精度比对比模型提高了1.25~7.42百分点,表明该模型在混合数据集上具有有效的隐写分析检测能力。同时, HAM-Stega对于隐写数据中存在的其他多层有效信息(载密文本的隐写算法、嵌入率、语料类型等)可以进行提取和检测,其在层次分类指标Macro-F1和Micro-F1上相较于预训练的BERT模型分别提高了5.41和4.36百分点。

关键词: 信息安全, 文本隐写分析, 层次感知匹配, 图神经网络, BERT

CLC Number: