计算机工程 ›› 2012, Vol. 38 ›› Issue (24): 161-165.doi: 10.3969/j.issn.1000-3428.2012.24.038

• 人工智能及识别技术 • 上一篇    下一篇

基于多模态特征融合的新闻故事单元分割

刘嘉琦 1,封化民 1,2,闫建鹏 1   

  1. (1. 西安电子科技大学通信工程学院,西安 710071;2. 北京电子科技学院,北京 100070)
  • 收稿日期:2011-11-22 修回日期:2012-02-10 出版日期:2012-12-20 发布日期:2012-12-18
  • 作者简介:刘嘉琦(1987-),男,硕士研究生,主研方向:视频检索,视频语义提取;封化民,教授、博士;闫建鹏,硕士研究生
  • 基金项目:
    国家自然科学基金资助项目(60972139);北京市自然科学基金资助项目(4092041)

News Story Unit Segmentation Based on Multi-modal Feature Fusion

LIU Jia-qi 1, FENG Hua-min 1,2, YAN Jian-peng 1   

  1. (1. School of Telecommunication Engineering, Xidian University, Xi’an 710071, China; 2. Beijing Electronic Science and Technology Institution, Beijing 100070, China)
  • Received:2011-11-22 Revised:2012-02-10 Online:2012-12-20 Published:2012-12-18

摘要: 对新闻视频进行结构分析,提出一种基于多模态特征融合的新闻故事单元分割方法。将新闻视频分割成音频流和视频流,选择静音区间为音频候选点,将镜头边界切变点作为视频候选点,做主持人镜头和主题字幕的探测,挑选主持人镜头为候选区间,并记录主题字幕的起始位置和结束位置,利用时间轴融合音频候选点、视频候选点、主持人镜头和主题字幕,对新闻视频进行故事单元分割。实验结果表明,该方法的查全率为83.18%,查准率为83.92%。

关键词: 新闻视频, 多模态特征, 字幕, 音频, 故事单元分割

Abstract: News story unit segmentation method based on multi-modal feature fusion is proposed in this paper by analyzing news video structure. News video is divided into audio stream and video stream. Mute intervals are detected as audio candidate points, and the shot segmentations for news video are detected and shot boundary points are chosen as video candidate points, anchorperson shot and topic caption are detected. Story units are detected by fusing audio candidate points, video candidate points, anchorperson shot and topic caption based on time axis. Experimental results show that this method can get 83.18% in recall and 83.92% in precision.

Key words: news videom, ulti-modal feature, caption, audio, story unit segmentation

中图分类号: