基于Word2vec的自然语言隐写分析方法

doi:10.19678/j.issn.1000-3428.0050407

计算机工程 ›› 2019, Vol. 45 ›› Issue (3): 309-314. doi: 10.19678/j.issn.1000-3428.0050407

基于Word2vec的自然语言隐写分析方法

喻靖民^a,b,向凌云^a,b,c,曾道建^a,b

长沙理工大学 a.综合交通运输大数据智能处理湖南省重点实验室; b.计算机与通信工程学院; c.智能道路与车路协同湖南省重点实验室,长沙 410114

收稿日期:2018-02-05 出版日期:2019-03-15 发布日期:2019-03-15
作者简介:喻靖民(1993—),男,硕士研究生,主研方向为隐写分析、自然语言处理;向凌云、曾道建,讲师、博士。
基金资助:
国家自然科学基金(61202439,61602059);湖南省教育厅科学研究重点项目(16A008)。

Natural Language Steganalysis Method Based on Word2vec

YU Jingmin^a,b,XIANG Lingyun^a,b,c,ZENG Daojian^a,b

a.Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation; b.School of Computer and Communication Engineering; c.Hunan Provincial Key Laboratory of Smart Roadway and Cooperative Vehicle-Infrastructure Systems,Changsha University of Science and Technology,Changsha 410114,China

Received:2018-02-05 Online:2019-03-15 Published:2019-03-15

摘要/Abstract

摘要：

为数字化表示文本内容的语义信息,并提高基于同义词替换的隐写文本检测精度,提出一种新的自然语言隐写分析方法。利用Word2vec对大规模语料库进行训练获得包含丰富语义信息的多维词向量,使用同义词及其上下文词向量之间的余弦距离度量2个词之间的相关度,并计算同义词在特定上下文中的合适度。根据信息嵌入过程中同义词替换操作对文本同义词合适度的影响提取检测特征形成特征向量,采用贝叶斯分类模型训练特征向量得到隐写分析特征,从而识别隐写文本。实验结果表明,该方法对于不同嵌入率下隐写文本的平均检测精确率和召回率分别达到97.71%和92.64%,具有较好的检测性能。

关键词: 自然语言, 词向量, 同义词替换, 隐写分析, 上下文合适度

Abstract:

In order to represent the semantic information of the text content for digitization and improve the accuracy of detecting stego texts based on synonym substitution,a novel natural language steganalyisis method is proposed.Word2vec is employed to train a large-scale corpus to obtain multi-dimensional word vectors which contains rich semantic information.Then,it uses the cosine distance between a synonym and its context word vector to measure the correlation between two words,and calculates the fitness of synonyms in a specific context.According to the effect on the context fitness of the synonyms caused by the synonym substitutions in the embedding process,detection features are extracted to form a feature vector,and the Bayesian classification model is employed to train feature vector for the task of steganalysis feature to detect the stego texts.Experimental results show that the proposed method has good detection performance,whose average detection precision and average recall for the stego texts with different embedding rates achieve 97.71% and 92.64%,respectively.

Key words: natural language, word vector, synonym substitution, steganalysis, context fitness

中图分类号:

TP391

喻靖民,向凌云,曾道建. 基于Word2vec的自然语言隐写分析方法[J]. 计算机工程, 2019, 45(3): 309-314.

YU Jingmin,XIANG Lingyun,ZENG Daojian. Natural Language Steganalysis Method Based on Word2vec[J]. Computer Engineering, 2019, 45(3): 309-314.

https://www.ecice06.com/CN/Y2019/V45/I3/309

参考文献

［1］WILSON A,KER A D.Avoiding detection on Twitter:embedding strategies for linguistic steganography［J］.Electronic Imaging,2016(8):1-9.
［2］HU H,ZUO X,ZHANG W,et al.Adaptive text steganography by exploring statistical and linguistical distortion［C］//Proceedings of the 2nd International Conference on Data Science in Cyberspace.Washington D.C.,USA:IEEE Press,2017:145-150.
［3］WINSTEIN K.Lexical steganography through adaptive modulation of the word choice hash［EB/OL］.［2018-01-07］.http://alumni.imsa.edu/~keithw/tlex/lsteg.ps.
［4］CHANG C Y,CLARK S.Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method［J］.Computational Linguistics,2014,40(2):403-448.
［5］杨潇,李峰,向凌云.基于矩阵编码的同义词替换隐写算法［J］.小型微型计算机系统,2015,36(6):1296-1300.
［6］霍林,肖豫川.基于二元依存同义词替换隐写算法［J］.计算机应用研究,2018,35(4):1174-1178.
［7］罗纲,孙星明,向凌云,等.针对同义词替换信息隐藏的检测方法研究［J］.计算机研究与发展,2008,45(10):1696-1703.
［8］YU Z,HUANG L,CHEN Z,et al.Steganalysis of synonym-substitution based natural language water-marking［J］.International Journal of Multimedia and Ubiquitous Engineering,2012,4:21-34.
［9］CHEN Z,HUANG L,MIAO H,et al.Steganalysis against substitution-based linguistic steganography based on context clusters［J］.Computers and Electrical Engineering,2011,37(6):1071-1081.
［10］CHEN Z,HUANG L,YANG W.Detection of substitution-based linguistic steganography by relative frequency analysis［J］.Digital Investigation,2011,8(1):68-77.
［11］XIANG L,SUN X,LUO G,et al.Linguistic steganalysis using the features derived from synonym frequency［J］.Multimedia Tools and Applications,2014,71(3):1893-1911.
［12］HINTON G E.Learning distributed representations of concepts［C］//Proceedings of the 8th Annual Conference of the Cognitive Science Society.Amherst,USA:Erlbaum Associates,Inc.,1986:12.
［13］MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space［EB/OL］.［2018-01-07］.https://arxiv.org/abs/1301.3781.
［14］MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality［C］//Proceedings of NIPS’13.［S.l.］:Curran Associates,Inc.,2013:3111-3119.
［15］梁军,柴玉梅,原慧斌,等.基于深度学习的微博情感分析［J］.中文信息学报,2014,28(5):155-161.
［16］孙紫阳,顾君忠,杨静.基于深度学习的中文实体关系抽取方法［J］.计算机工程,2018,44(9):164-170.

[1]	魏嵬, 丁香香, 郭梦星, 杨钊, 刘辉. 文本相似度计算方法综述[J]. 计算机工程, 2024, 50(9): 18-32.
[2]	钱清, 龙永, 蒋忠远, 段春红, 王宏. 基于深度强化学习的自适应图像隐写算法[J]. 计算机工程, 2024, 50(8): 319-327.
[3]	王晋涛, 秦昂, 张元, 陈一飞, 王廷凤, 谢承霖, 邹刚. 基于注意力增强与特征融合的中文医学实体识别[J]. 计算机工程, 2024, 50(7): 324-332.
[4]	陈佳玉, 王元龙, 张虎. 基于文本知识增强的问题生成模型[J]. 计算机工程, 2024, 50(6): 86-93.
[5]	程腾腾, 姚春龙, 于晓强, 李旭, 王庆丰. 基于多头注意力机制融合常识知识的共情对话生成[J]. 计算机工程, 2024, 50(6): 94-101.
[6]	曹渝昆, 程宇, 何祯奕, 徐康乐, 颜家洛, 李云峰. 文档上下文异构表示的句子级关系抽取方法[J]. 计算机工程, 2024, 50(5): 111-119.
[7]	朱贵德, 黄海. 文本视觉问答综述[J]. 计算机工程, 2024, 50(2): 1-14.
[8]	崔蒙蒙, 刘井平, 阮彤, 宋雨秋, 杜渂. 基于双重多视角表示的目标级隐性情感分类[J]. 计算机工程, 2024, 50(1): 79-90.
[9]	李鸿鹏, 马博, 杨雅婷, 王磊, 王震, 李晓. 基于槽位语义增强提示学习的篇章级事件抽取方法[J]. 计算机工程, 2023, 49(9): 23-31.
[10]	郭艳霞, 金勇, 唐宏, 彭金枝. 基于动态卷积与残差门控的多模态情感识别[J]. 计算机工程, 2023, 49(7): 94-101.
[11]	李静雯, 赵奎. 基于改进PCFG算法的口令猜测方法[J]. 计算机工程, 2023, 49(5): 38-47.
[12]	杨文忠, 丁甜甜, 康鹏, 卜文秀. 基于舆情新闻的中文关键词抽取综述[J]. 计算机工程, 2023, 49(3): 1-17.
[13]	王春东, 孙嘉琪, 杨文军. 基于矫正理解的中文文本对抗样本生成方法[J]. 计算机工程, 2023, 49(2): 37-45.
[14]	蔡瑞初, 张盛强, 许柏炎. 基于结构感知混合编码模型的代码注释生成方法[J]. 计算机工程, 2023, 49(2): 61-69.
[15]	黄君扬, 王振宇, 梁家卿, 肖仰华. 基于自裁剪异构图的NL2SQL模型[J]. 计算机工程, 2022, 48(9): 71-77,88.

选择文件类型/文献管理软件名称

选择包含的内容

基于Word2vec的自然语言隐写分析方法

Natural Language Steganalysis Method Based on Word2vec

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于Word2vec的自然语言隐写分析方法

Natural Language Steganalysis Method Based on Word2vec

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价