作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

改进的软件错误报告自动分类算法

黄 伟 1,林 劼 1,江育娥 1,江秉华 2   

  1. (1. 福建师范大学软件学院,福州350108; 2. 南京医科大学病理学系,南京210029)
  • 收稿日期:2014-06-19 出版日期:2015-06-15 发布日期:2015-06-15
  • 作者简介:黄 伟(1991 - ),男,硕士研究生,主研方向:数据挖掘;林 劼,副教授、博士;江育娥、江秉华,教授、博士。
  • 基金资助:

    国家自然科学重大国际(地区)合作研究基金资助项目(81320108019);福建省自然科学基金资助项目(2014J01220)。

Improved Automatic Classification Algorithm of Software Bug Report

HUANG Wei 1,LIN Jie 1,JIANG Yu’e 1,JIANG Binhua 2   

  1. (1. Faculty of Software,Fujian Normal University,Fuzhou 350108,China; 2. Department of Pathology,Nanjing Medical University,Nanjing 210029,China)
  • Received:2014-06-19 Online:2015-06-15 Published:2015-06-15

摘要:

软件错误报告的自动分类能够节省大量人力和时间,然而用户提交的错误报告主观性较强,对错误报告的描述较随意,造成自动分类的效率低下。为此,基于传统的词频-逆向文件频率(TF-IDF)算法,结合文档内词条频度与词条在同类别及不同类别文档中的分布情况,提出2 种特征降维的改进算法,降维后再对词条进行权值处理, 进一步提高特征降维的效果。实验结果表明,应用该算法得到的错误报告自动分类在精确率、召回率、F1 值和准确度等指标上比现有算法都有明显提高。

关键词: 特征降维, 错误报告, 文本自动分类, 词频-逆向文件频率, 特征权重, 频率

Abstract:

Automatic classification of software bug reports save a large number of time and human resources. However, the bug reports submitted by users have a strong subjectivity,with casual text descriptions. This results in ineffective classification. Two improved algorithms are proposed to reduce feature dimensions in classifying bug reports from their text descriptions. These two algorithms are based on the traditional Term Frequency-Inverse Document Frequency(TFIDF) algorithm,combined with the term frequency in documentations and the distribution of the term in the same category and different types of categories. One weight processing is used after feature dimension reduction in order to get a better result. Experimental results indicate that the proposed algorithm has better performance in term of precision, recall,F1 score,and accuracy than the current algorithms.

Key words: feature dimension reduction, bug report, text automatic classification, Term Frequency-Inverse Document Frequency(TF-IDF), feature weight, frequency

中图分类号: