基于IndRNN与BN的深层图像描述模型

doi:10.19678/j.issn.1000-3428.0058761

计算机工程 ›› 2021, Vol. 47 ›› Issue (10): 194-200. doi: 10.19678/j.issn.1000-3428.0058761

基于IndRNN与BN的深层图像描述模型

曹渝昆, 魏健强, 孙涛, 徐越

上海电力大学计算机科学与技术学院, 上海 201306

收稿日期:2020-06-27 修回日期:2020-09-21 发布日期:2020-10-12
作者简介:曹渝昆(1976-),女,副教授、博士,主研方向为自然语言处理、深度学习、知识图谱;魏健强、孙涛、徐越,硕士研究生。
基金资助:
国家自然科学基金青年基金项目“代理重加密在智能电网安全数据共享中的应用及关键技术研究”（61802249）。

Deep Image Description Model Based on IndRNN and BN

CAO Yukun, WEI Jianqiang, SUN Tao, XU Yue

College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China

Received:2020-06-27 Revised:2020-09-21 Published:2020-10-12

摘要/Abstract

摘要： 现有图像描述模型存在解码端层次不深、训练效率低下的问题，且生成的描述语句在语言连贯性和内容多样性方面效果欠佳，为此，提出一种基于独立循环神经网络的深层图像描述模型Deep-NIC。采用独立循环神经元与批标准化方法构建解码单元，通过解码单元的多层叠加建立深层解码端。使用谷歌inception V3作为编码端，构建深层图像描述模型。在数据集MS COCO2014上进行对比实验，结果表明，与基线模型相比，Deep-NIC模型的BLEU-4、METEOR、CIDER评分分别提升3.2%、10.3%、8.18%，其更容易训练且具有更好的拟合效果。

关键词: 图像描述, 深层图像描述模型, 深层解码端, 独立循环神经网络, 批标准化

Abstract: The existing image description models face the challenges of low training efficiency, low level of the decoder, and the poor grammar coherence and content diversity of the generated descriptive sentences.To address the problem, a deep image description model, Deep-NIC, based on Independent Recurrent Neural Network(IndRNN) is proposed.The deep decoder unit is built using both independent recurrent neuron and the Batch Normalization(BN) method.Then based on the stacked multiple layers of decoder units, the deep decoder is established.Finally, the Google inception V3 has been used as the encoder to build a deep image description model.Experimental results on the data set MS COCO2014 show that compared to the baseline model NIC, the Deep-NIC model delivers a performance improvement of 3.2% under the BLEU-4 scoring standards, 10.3% under METEOR, and 8.18% under CIDER.The proposed model is easier to train, and can provide better fitting performance.

Key words: image description, deep image description model, deep decoder, Independent Recurrent Neural Network(IndRNN), Batch Normalization(BN)

中图分类号:

TP399

曹渝昆, 魏健强, 孙涛, 徐越. 基于IndRNN与BN的深层图像描述模型[J]. 计算机工程, 2021, 47(10): 194-200.

CAO Yukun, WEI Jianqiang, SUN Tao, XU Yue. Deep Image Description Model Based on IndRNN and BN[J]. Computer Engineering, 2021, 47(10): 194-200.

http://www.ecice06.com/CN/Y2021/V47/I10/194

图/表 10

20211016174215

20211016174219

20211016174222

20211016174226

20211016174229

20211016174233

20211016174237

20211016174240

20211016174244

20211016174248

参考文献

[1] 陈龙杰, 张钰, 张玉梅, 等.基于多注意力多尺度特征融合的图像描述生成算法[J].计算机应用, 2019, 39(2):354-359. CHEN L J, ZHANG Y, ZHANG Y M, et al.Image description generation algorithm based on multi-attention and multi-scale feature fusion[J].Journal of Computer Applications, 2019, 39(2):354-359.(in Chinese)
[2] 汤鹏杰, 王瀚漓, 许恺晟.LSTM逐层多目标优化及多层概率融合的图像描述[J].自动化学报, 2018, 44(7):1237-1249. TANG P J, WANG H L, XU K S.LSTM layer by layer multi-object optimization and multi-layer probability fusion image description[J].Acta Automatica Sinica, 2018, 44(7):1237-1249.(in Chinese)
[3] LECUN Y, BENGIO Y, HINTON G.Deep learning[J].Nature, 2015, 521(7553):436-444.
[4] HOPFIELD J J.Neural networks and physical systems with emergent collective computational abilities[J].Proceedings of the National Academy of Sciences of the United States of America, 1982, 79(8):2554-2558.
[5] MCNEELY-WHITE D, BEVERIDGE J R, DRAPER B A.Inception and ResNet features are(almost) equivalent[J].Cognitive Systems Research, 2020, 59:312-318.
[6] LI S, LI W, COOK C, et al.Independently Recurrent Neural Network(IndRNN):building a longer and deeper RNN[EB/OL].[2020-05-05].https://arxiv.org/pdf/1803.04831v3.pdf.
[7] WANG J, LI S, AN Z, et al.Batch-normalized deep neural networks for achieving fast intelligent fault diagnosis of machines[J].Neurocomputing, 2019, 329(15):53-65.
[8] KULKARNI G, PREMRAJ V, ORDONEZ V, et al.Baby talk:understanding and generating simple image descriptions[J].IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(12):2891-2903.
[9] FARHADI A, HEJRATI S M M, SADEGHI M A, et al.Every picture tells a story:generating sentences from images[EB/OL].[2020-05-05].http://www.cs.cmu.edu/afs/.cs.cmu.edu/Web/People/afarhadi/papers/sentence.pdf.
[10] VINYALS O, TOSHEV A, BENGIO S, et al.Show and tell:a neural image caption generator[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:12-15.
[11] XU K, BA J, KIROS R, et al.Show, attend and tell:neural image caption generation with visual attention[J].Computer Ence, 2015, 5:2048-2057.
[12] LI J, MEI X, PROKHOROV D, et al.Deep neural network for structural prediction and lane detection in traffic scene[J].IEEE Transactions on Neural Networks & Learning Systems, 2017, 28(3):690-703.
[13] QU Y, LIN L, SHEN F, et al.Joint hierarchical category structure learning and large-scale image classification[J].IEEE Transactions on Image Processing, 2017, 26(9):4331-4346.
[14] SHELHAMER E, LONG J, DARRELL T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
[15] GONG C, TAO D, LIU W, et al.Label propagation via teaching-to-learn and learning-to-teach[J].IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(6):1452-1465.
[16] HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 International Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[17] DOGNIN P L, MELNYK I, MROUEH Y, et al.Adversarial semantic alignment for improved image captions[EB/OL].[2020-05-05].https://arxiv.org/pdf/1805.00063v3.pdf.
[18] SANGER T D.Optimal unsupervised learning in a single-layer linear feedforward neural network[J].Neural Networks, 1989, 2(6):459-473.
[19] YAO T, PAN Y, LI Y, et al.Incorporating copying mechanism in image captioning for learning novel objects[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:10-20.
[20] XU Y, WU B, SHEN F, et al.Exact adversarial attack to image captioning via structured output learning with latent variables[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:142-156.
[21] HOCHREITER S, SCHMIDHUBER J.Long short-term memory[J].Neural Computation, 1997, 9(8):1735-1780.

选择文件类型/文献管理软件名称

选择包含的内容

基于IndRNN与BN的深层图像描述模型

Deep Image Description Model Based on IndRNN and BN

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献

相关文章 8

编辑推荐

Metrics

本文评价

[1]	衡红军, 范昱辰, 王家亮. 基于Transformer的多方面特征编码图像描述生成算法[J]. 计算机工程, 2023, 49(2): 199-205.
[2]	李小剑, 谢晓尧, 徐洋, 张思聪. 基于CNN-SIndRNN的恶意TLS流量快速识别方法[J]. 计算机工程, 2022, 48(4): 148-157,164.
[3]	王庆荣, 魏怡萌, 朱昌锋, 田可可. 基于时空图卷积网络的交通事故风险预测研究[J]. 计算机工程, 2022, 48(11): 22-29.
[4]	王会勇, 卢超, 张晓明. 基于小样本学习和语义信息的图像描述模型[J]. 计算机工程, 2021, 47(8): 260-270.
[5]	李雅红, 周海英, 徐少伟. 基于对象关系网状转换器的图像描述模型[J]. 计算机工程, 2021, 47(5): 197-204.
[6]	徐守坤, 吉晨晨, 倪楚涵, 李宁. 融合施工场景及空间关系的图像描述生成模型[J]. 计算机工程, 2020, 46(6): 256-265.
[7]	卢来,王军民,范锐. 基于自适应增强的图像二值描述子[J]. 计算机工程, 2016, 42(6): 230-234,240.
[8]	杨进, 刘建波, 赵静. 基于离散余弦变换的图像局部特征描述子[J]. 计算机工程, 2012, 38(14): 173-176.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于IndRNN与BN的深层图像描述模型

Deep Image Description Model Based on IndRNN and BN

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献

相关文章 8

编辑推荐

Metrics

本文评价