摘要: 本体是语义检索的核心。本体构建主要包括领域概念获取和概念间关系获取,其中领域概念获取是本体构建的基础。采用基于最大熵模型的方法来获取概念,通过对领域文本进行挖掘而得到名词性短语,使用改进的TF-IDF公式从中抽取具有领域性的短语,并经人工修正后得到本体概念。实验表明该方法提高了概念的准确性和完整性。
关键词:
本体,
最大熵模型,
自然语言处理
Abstract: Ontology is the core of the semantic retrieval. Ontology construction mainly includes concept extraction and the extraction of relationship between concepts, and the concept extraction is the base of ontology construction. In this paper, the domain-specific concepts are extracted by the approach which is based on the maximum entropy model, the base noun phrases are mined from the texts in the field, the domain-specific phrases are extracted from the phrases, and the phrases to form the ontology concept are corrected. Experimental results demonstrate the incensement of the accuracy and completeness of the concepts.
Key words:
ontology,
maximum entropy model,
natural language processing
中图分类号:
韦小丽;孙 涌;张书奎;苗艳军. 基于最大熵模型的本体概念获取方法[J]. 计算机工程, 2009, 35(24): 114-116.
WEI Xiao-li; SUN Yong; ZHANG Shu-kui; MIAO Yan-jun. Ontological Concept Extraction Method Based on Maximum Entropy Model[J]. Computer Engineering, 2009, 35(24): 114-116.