摘要: 网络中的很多程序资源在知识概念上有内在的联系,却没有超链接将它们连接在一起。将网络程序资源中的算法知识名称获取出来,组织成一个算法知识专家库文件,用于识别程序设计资源所含的知识点,即可将程序设计资源按知识点相互联系。为了自动获取程序资源中的算法知识名称,提出一种基于自然语言处理的算法知识名称发现方法。通过发现含有算法知识名称语句的字符串模式,从程序资源中提取可能含算法知识名称的字符串,从中找出最有可能出现在算法知识名称中的分词,并根据这些分词获取算法知识名称。实验结果表明,与原有人工整理出的算法知识名称集合相比,该方法新增了11.2%的算法知识点和13.6%的算法知识名称。
关键词:
知识发现,
模式发现,
自然语言处理,
算法知识名称,
中文分词,
词性标注
Abstract: There are many programming resources on the Internet.Although these programming resources have internal relations,there are often no hyperlinks connecting them.Getting the terms of algorithmic knowledge,organizing the terms to an expert file,which is used for recognizing the knowledge in the programming resources,the programming resources can be connected by the knowledge.To get the terms of algorithmic knowledge,this paper proposes a method to discover terms of algorithmic knowledge based on natural language processing.This method consists of discovering the patterns of strings which contain terms of algorithmic knowledge,extracting from programming resources that probably contain terms of algorithmic knowledge according to the discovered patterns,finding the word segmentation most likely appearing in the terms of algorithmic knowledge,and fetching the terms of algorithmic knowledge according to the word segmentation.This method increases 11.2% algorithmic knowledge and 13.6% terms of algorithmic knowledge in comparison with the manual collection of terms of algorithmic knowledge which is obtained by previous work.
Key words:
knowledge discovery,
pattern discovery,
natural language processing,
terms of algorithmic knowledge,
Chinese word segmentation,
part-of-speech tagging
中图分类号:
朱国进,郑宁. 基于自然语言处理的算法知识名称发现[J]. 计算机工程, 2014, 40(12): 126-131.
ZHU Guojin,ZHENG Ning. Discovering Terms of Algorithmic Knowledge Based on Natural Language Processing[J]. Computer Engineering, 2014, 40(12): 126-131.