摘要: 软件过程产品间可追溯关联挖掘对软件维护及需求跟踪等众多领域至关重要。基于此,提出一种基于潜在语义索引提取程序代码和中文文档关联信息的方法,该方法是对向量空间模型的改进,通过分析文本间隐含的语义结构来确定关联度,而不依赖于词项的匹配。实验结果表明,该方法不依赖于代码和文档预先定义的同义词库和知识库,并能一定程度上提高查全率和查准率。
关键词:
软件维护,
可追溯关联挖掘,
隐含语义索引,
信息检索,
跨语言信息检索
Abstract: Traceability link recovery among software process products is very important in many fields, such as software maintenance, as well as requirement trac. Based on Latent Semantic Indexing(LSI), the traceability recovery information can be extracted automatically from program source code and the related Chinese documentation. The obvious advantage is that the presented method does not rely on the pre-defined thesaurus and knowledge for the code and documentation, and to some extent, it improves the recall and precision.
Key words:
software maintenance,
traceability association mining,
Latent Semantic Indexing(LSI),
Information Retrieval(IR),
Cross-Language Information Retrieval(CLIR)
中图分类号:
杨雪敏, 张毅坤, 崔颖安, 张保卫, 夏辉. 基于LSI的代码-文档可追溯关联挖掘研究[J]. 计算机工程, 2011, 37(8): 34-36.
YANG Xue-Min, ZHANG Yi-Kun, CUI Ying-An, ZHANG Bao-Wei, JIA Hui. Research on Code and Documentation Traceability Association Mining Based on LSI[J]. Computer Engineering, 2011, 37(8): 34-36.