计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于核主成分分析与小波变换的高质量微博提取

彭敏1,2,傅慧1,黄济民1,黄佳佳1,刘纪平1,2   

  1. (1. 武汉大学计算机学院,武汉 430000; 2. 武汉大学深圳研究院,广东 深圳 518000)
  • 收稿日期:2014-11-17 出版日期:2016-01-15 发布日期:2016-01-15
  • 作者简介:彭敏(1973-),女,教授、博士后,主研方向为主成分分析、自然语言处理;傅慧,硕士研究生;黄济民,本科生;黄佳佳,博士研究生;刘纪平,讲师。
  • 基金项目:
    国家自然科学基金资助项目(61472291, 61303115);2013年深圳知识创新计划基础研究基金资助项目。

High Quality Microblog Extraction Based on Kernel Principal Component Analysis and Wavelet Transformation

PENG Min  1,2,FU Hui  1,HUANG Jimin  1,HUANG Jiajia  1,LIU Jiping  1,2   

  1. (1.School of Computer,Wuhan University,Wuhan 430000,China; 2.Institute of Shenzhen,Wuhan University,Shenzhen,Guangdong 518000,China)
  • Received:2014-11-17 Online:2016-01-15 Published:2016-01-15

摘要: 在线社交媒体中存在大量的噪音和冗余信息,为对其进行过滤和筛选,获取高质量的信息,提出基于核主分析和小波变换的高质量微博提取框架,并设计一种基于多特征融合的高质量信息的提取算法,将信息特征转换到小波域以更好地捕获信号间的细节差异。利用最大期望算法度量各个特征的权值,进一步融合得到特征综合值。为降低噪声特征对信息质量提取的影响并提高算法运算速度,引入核主成分分析对特征进行变换。实验结果表明,该框架能够提取出更高质量的微博,并且大幅减少运算时间。

关键词: 信息提取, 特征融合, 小波变换, 期望最大算法, 核主成分分析

Abstract: Massive social event relevant messages are generated in online social media,which makes the filtering and screening of them be a challenge.In order to obtain massages with high quality,a high quality information extraction framework based on Kernel Principal Component Analysis and Wavelet Transformation(KPCA-WT) is proposed.Based on multiple features fusion,the paper designs an algorithm to extract the microblogs of high quality,which transforms the features into wavelet domain to capture the details differences between the feature signals.The features’ weights are evaluated by employing Expectation Maximization (EM) algorithm and fused further to get a comprehensive value of each message,in order to reduce the effect of noise features,and to speed up the operation,the features are transformed through KPCA.Experimental results show that the proposed framework can extract information with higher quality and greatly reduce the time consumption.

Key words: information extraction, feature fusion, wavelet transformation, Expectation Maximization(EM) algorithm, Kernel Principal Component Analysis(KPCA)

中图分类号: