基于潜在语义的双层图像-文本多模态检索语义网络

doi:10.3969/j.issn.1000-3428.2016.07.050

计算机工程

基于潜在语义的双层图像-文本多模态检索语义网络

董永亮,柴旭清

(河南师范大学计算机与信息工程学院,河南新乡 453000)

收稿日期:2015-10-19 出版日期:2016-07-15 发布日期:2016-07-15
作者简介:董永亮(1978－),男,讲师、硕士,主研方向为大数据、信息检索;柴旭清,工程师、硕士。
基金资助:
河南省科技厅基金资助项目(142102310524);河南省教育厅基金资助项目(15A520081,17A520009,SKL-2016-1992,SKL-2016-1167)。

Two-layer Image-text Semantic Network for Multi-modal Retrieval Based on Latent Semantic

DONG Yongliang,CHAI Xuqing

(College of Computer and Information Engineering,Henan Normal University,Xinxiang,Henan 453000,China)

Received:2015-10-19 Online:2016-07-15 Published:2016-07-15

摘要/Abstract

摘要： 为提高多模态检索中相似性匹配的准确度,同时保持检索结果的可解释性,构建一种双层的多模态语义网络。对每个单模态的数据分别建立一个子语义网络,把子语义网络中的节点聚类成不同的分组。将子语义网络的分组作为节点,依据语义关系建立多模态语义网络,并进一步聚类成不同的分组。在进行信息检索时,按照与构建多模态语义网络相反的顺序即可检索到相关的信息。实验结果表明,与基于哈希索引、低秩矩阵嵌入和深度神经网络的检索方法相比,所提方法具有更高的检索准确性。

关键词: 多模态, 潜在语义, 层次模型, 聚类算法, 跨模态检索, 深度神经网络

Abstract: In order to improve the accuracy of similarity matching and ensure interpretability of retrieval results in multi-modal information retrieval,a two-layer multi-modal semantic network is proposed.Firstly,a sub-semantic network is built for the data of each single model,and the nodes in each sub-semantic network are clustered into different groups.Secondly,by assuming each group in the sub-semantic network as a node,a multi-modal semantic network is built based on semantic relationships,and the nodes in this multi-modal semantic network are further clustered into different groups.While retrieving information,the information can be retrieved by reversing steps of building the multi-modal semantic network.Experimental results show that the proposed method has higher retrieval accurary than the methods based on Hash index,low-rank matrix embedding or deep neural network.

Key words: multi-modal, latent semantic, hierarchical model, clustering algorithm, cross-modal retrieval, deep neural network

中图分类号:

TP319

董永亮,柴旭清. 基于潜在语义的双层图像-文本多模态检索语义网络[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2016.07.050.

DONG Yongliang,CHAI Xuqing. Two-layer Image-text Semantic Network for Multi-modal Retrieval Based on Latent Semantic[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2016.07.050.

http://www.ecice06.com/CN/Y2016/V42/I7/299

参考文献

参考文献［1］田明明.基于多模态信息融合的知识空间构建研究［D］.武汉:华中师范大学,2014. ［2］王大玲,冯时,张一飞,等.社会媒体多模态、多层次资源推荐技术研究［J］.智能系统学报,2014,9(3):265-275. ［3］Liang Xie,Pan Peng,Lu Yansheng.A Semantic Model for Cross-modal and Multi-modal Retrieval［C］//Proceedings of the 3rd ACM Conference on Multimedia Retrieval.New York,USA:ACM Press,2013:175-182. ［4］吴梦麟,陈强,孙权森.结合影像和文本信息的医学病例检索［J］.计算机辅助设计与图形学学报,2014,26(9):1430-1437. ［5］刘亚楠,吴飞,庄越挺.基于多模态子空间相关性传递的视频语义挖掘［J］.计算机研究与发展,2015,46(1):1-8. ［6］张鸿,吴飞,庄越挺.跨媒体相关性推理与检索研究［J］.计算机研究与发展,2015,45(5):869-876. ［7］Kumar S,Udupa R.Learning Hash Functions for Cross-view Similarity Search［C］//Proceedings of International Joint Conference on Artificial Intelligence.Barcelona,Spain:AAAI Press,2011:1360-1365. ［8］Song Jinkuan,Yang Yang,Yang Yi,et al.Inter-media Hashing for Large-scale Retrieval from Heterogeneous Data Sources［C］//Proceedings of 2013 ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2013:785- 796. ［9］Zhen Yi,Yeung D Y.A Probabilistic Model for Multimodal Hash Function Learning［C］//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2012:940-948. ［10］Bronstein M M,Bronstein A M,Michel F,et al.Data Fusion Through Cross-modality Metric Learning Using Similarity-sensitive Hashing［C］//Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2010:3594-3601. ［11］Lu Xinyan,Wu Fei,Tang Siliang,et al.A Low Rank Structural Large Margin Method for Cross-modal Ranking［C］//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,USA:ACM Press,2013:433-442. (下转第309页) (上接第303页) ［12］刘菲,刘学亮.基于稀疏编码的多模态信息交叉检索［J］.中国图象图形学报,2015,20(9):1170-1176. ［13］Wang Wei,Ooi B C,Yang Xiaoyan,et al.Effective Multi-modal Retrieval Based on Stacked Auto-encoders［J］.Proceedings of the VLDB Endowment,2014,7(8):649-660. ［14］Kiros R,Zemel R,Salakhutdinov R R.A Multiplicative Model for Learning Distributed Text-based Attribute Representations［C］//Tsauro G,Tourctzky D S,Ln T K,et al.Advances in Neural Information Processing Systems.［S.l.］:Morgan Kaufmann Publishers,2014:2348-2356. ［15］Kiros R,Salakhutdinov R,Zemel R.Multimodal Neural Language Models［C］//Proceedings of the 31st Interna-tional Conference on Machine Learning.［S.l.］:Interna-tional Machine Learning Society,2014:595-603. ［16］Pourian N,Manjunath B S.Retrieval of Images with Objects of Specific Size,Location,and Spatial Configuration［C］//Proceedings of 2015 IEEE Winter Conference on Applications of Computer Vision.Washington D.C.,USA:IEEE Press,2015:960-967. ［17］Shi Jianbo,Malik J.Normalized Cuts and Image Segmentation［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905. ［18］Rasiwasia N,Jose C P,Coviello E,et al.A New Approach to Cross-modal Multimedia Retrieval［C］//Proceedings of 2010 International Conference on Multimedia.New York,USA:ACM Press,2010:251-260. ［19］Zhen Yi,Yeung D Y.A Probabilistic Model for Multimodal Hash Function Learning［C］//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2012:940-948. ［20］Jia Yangqing,Salzmann M,Darrell T.Learning Cross-modality Similarity for Multinomial Data［C］//Pro-ceedings of 2011 IEEE International Conference on Com-puter Vision.Washington D.C.,USA:IEEE Press,2011:2407-2414. 编辑金胡考

[1]	崔晓丹, 刘达维, 刘逸凡, 赵志滨, 任酉贵, 闫永明. 新闻类短视频关键帧摘要模型的研究与实现[J]. 计算机工程, 2023, 49(8): 182-189.
[2]	程适, 王雪萍, 刘悦, 史玉回. 面向非线性方程组的学习型头脑风暴优化算法[J]. 计算机工程, 2023, 49(7): 47-54.
[3]	郭艳霞, 金勇, 唐宏, 彭金枝. 基于动态卷积与残差门控的多模态情感识别[J]. 计算机工程, 2023, 49(7): 94-101.
[4]	靳雁霞, 史志儒, 杨晶, 刘亚变, 乔星宇, 张翎. 布料与精细建模物体间的碰撞检测算法研究[J]. 计算机工程, 2023, 49(7): 269-277.
[5]	陈锐, 孙羽菲, 郭强, 隋轶丞, 周振辉, 石昌青, 张玉志. OclDNN:一种可应用于TensorFlow的通用DNN库[J]. 计算机工程, 2023, 49(4): 138-148.
[6]	李晓腾, 张盼盼, 勾智楠, 高凯. 基于多任务学习的多模态命名实体识别方法[J]. 计算机工程, 2023, 49(4): 114-119.
[7]	石磊, 张吉涛, 高宇飞, 卫琳, 陶永才. 基于Transformer与BiLSTM的网络流量入侵检测[J]. 计算机工程, 2023, 49(3): 29-36,57.
[8]	衡红军, 范昱辰, 王家亮. 基于Transformer的多方面特征编码图像描述生成算法[J]. 计算机工程, 2023, 49(2): 199-205.
[9]	王春东, 孙嘉琪, 杨文军. 基于矫正理解的中文文本对抗样本生成方法[J]. 计算机工程, 2023, 49(2): 37-45.
[10]	刘金硕, 詹岱依, 邓娟, 王丽娜. 基于深度神经网络和联邦学习的网络入侵检测[J]. 计算机工程, 2023, 49(1): 15-21,30.
[11]	王帅坤, 周志勇, 胡冀苏, 钱旭升, 耿辰, 陈光强, 纪建松, 戴亚康. 基于深度学习的肝脏CT-MR图像无监督配准[J]. 计算机工程, 2023, 49(1): 223-233.
[12]	董卫宇, 李海涛, 王瑞敏, 任化娟, 孙雪凯. 基于堆叠卷积注意力的网络流量异常检测模型[J]. 计算机工程, 2022, 48(9): 12-19.
[13]	蒋雪瑶, 力维辰, 刘井平, 李直旭, 肖仰华. 基于多模态模式迁移的知识图谱实体配图[J]. 计算机工程, 2022, 48(8): 70-76.
[14]	王子珩, 姜忠鼎. 支持多模态交互的桌面增强显示系统[J]. 计算机工程, 2022, 48(7): 177-188.
[15]	张恒, 陈晓红, 蓝宇翔, 李舜酩. 基于深度学习的监督型典型相关分析[J]. 计算机工程, 2022, 48(5): 222-228.

选择文件类型/文献管理软件名称

选择包含的内容

基于潜在语义的双层图像-文本多模态检索语义网络

Two-layer Image-text Semantic Network for Multi-modal Retrieval Based on Latent Semantic

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于潜在语义的双层图像-文本多模态检索语义网络

Two-layer Image-text Semantic Network for Multi-modal Retrieval Based on Latent Semantic

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价