Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2023, Vol. 49 ›› Issue (1): 303-310. doi: 10.19678/j.issn.1000-3428.0063516

• Development Research and Engineering Application • Previous Articles     Next Articles

Open Domain Answer Selection Model Fusing Double Matching-Focus

HE Junfei1, ZHANG Huibing1, HU Xiaoli2   

  1. 1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China;
    2. Teaching Practice Department, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
  • Received:2021-12-13 Revised:2022-02-28 Published:2022-06-30

双匹配焦点融合的开放域答案选择模型

何俊飞1, 张会兵1, 胡晓丽2   

  1. 1. 桂林电子科技大学 广西可信软件重点实验室, 广西 桂林 541004;
    2. 桂林电子科技大学 教学实践部, 广西 桂林 541004
  • 作者简介:何俊飞(1996-),男,硕士研究生,主研方向为问答系统;张会兵,教授、博士;胡晓丽(通信作者),高级实验师、硕士。
  • 基金资助:
    国家自然科学基金(62267003,62177012,61967005);桂林市科学研究与技术开发计划项目(2020010304)。

Abstract: Open-domain answer selection model is an important part of Question Answering (QA) system, which can find the answer that best matches the question by scoring different candidate answers for the same question.Current research on answer selection models in open-domain QA systems rarely focus on the fusion of word-level and sentence-level, which has resulted in lack of contextual semantic connections in matching relationships, or the loss of grammatical and semantic details of individual words.According to the adjacent similarity principle, this paper proposes an answer selection model fusing double matching-focus.Firstly, according to the characteristics of multi-sentence association in question and answer tasks, a word embedding method is designed that can embed the question and answer relationship and question and answer semantic relationship of words into word vectors, and uses word vectors to directly calculate the cosine similarity of word pairs to obtain word-level matching focus.then, sentence-level matching focus is extracted through the encoder-decoder model with added attention mechanism.finally, two focus distribution matrices are aligned with the question's words, and the relative distance between the focuses is obtained by merging word-level and sentence-level matching question-answer relevance score.Experimental results on two public QA datasets, Wiki-QA and TREC-QA, show that the mean Average Precision(mAP) index of the proposed answer selection model was 0.080 1 and 0.057 1, whilst the Mean Reciprocal Rank(MRR) index was improved by 0.017 6 and 0.006 6 compared to the multi-hop attention model and the hierarchical ranking model, respectively.

Key words: Question Answering(QA) system, adjacent similarity, matching-focus, word embedding vector, translation model

摘要: 开放域答案选择模型通过对同一问题的不同候选答案打分,寻找与问题最匹配的答案,是问答(QA)系统的重要组成部分。现有开放域QA系统中的答案选择模型较少关注词级与句子级的融合,导致在匹配关系上缺乏上下文的语义联系,或损失个别单词在语法语义上的细节信息。基于相邻相似原理,提出一种融合双匹配焦点的答案选择模型。根据问答任务多语句关联的特点,设计一种可以将词语的问答承接关系和问答语义关系嵌入进词向量的词嵌入方式,并利用该词向量直接计算词对的余弦相似度,得到词级匹配焦点。通过引入注意力机制的Encoder-Decoder模型提取句子级词对匹配焦点,以问题为基准对齐两个焦点分布矩阵,并使用焦点间的相对距离融合词级与句子级匹配矩阵,获得问题与答案的相关性得分。在Wiki-QA、TREC-QA两个公开问答数据集上的实验结果表明,该模型与多跳注意力模型、层级排序模型相比,平均准确率均值分别提高0.080 1和0.057 1,平均倒数排名分别提高0.017 6和0.006 6。

关键词: 问答系统, 相邻相似, 匹配焦点, 词嵌入向量, 翻译模型

CLC Number: