Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2020, Vol. 46 ›› Issue (4): 70-76,84. doi: 10.19678/j.issn.1000-3428.0054276

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Phrase Mining in Ecommerce Based on Cooperative Training

XU Yong1, LIU Jingping1, XIAO Yanghua1, ZHU Muhua2   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China;
    2. Alibaba Network Technology Co., Ltd., Hangzhou 311121, China
  • Received:2019-03-18 Revised:2019-05-09 Online:2020-04-15 Published:2020-04-07

基于协同训练的电商领域短语挖掘

许勇1, 刘井平1, 肖仰华1, 朱慕华2   

  1. 1. 复旦大学 计算机科学技术学院, 上海 200433;
    2. 阿里巴巴网络技术有限公司, 杭州 311121
  • 作者简介:许勇(1994-),男,硕士研究生,主研方向为知识图谱;刘井平,博士;肖仰华(通信作者),教授;朱慕华,博士。
  • 基金资助:
    国家自然科学基金面上项目"面向大规模知识图谱的查询处理关键技术研究"(61472085)。

Abstract: The texts in ecommerce usually do not follow the way of expression as the texts in general domains,resulting in low accuracy of traditional phrase mining methods in the ecommerce text mining.Therefore,this paper proposes a phrase mining method based on cooperative training.Through the phrase classification model based on semantic features,the antitone expression of ecommerce texts is effectively detected.Then the phrase mining framework of cooperative training is constructed,so as to reducing the cost of marking training data in the domain corpus.On this basis,the Stacking method is used to integrate the advantages of statistical model and semantic model,thus improving the overall mining performance of the model.Experimental results on Taobao query corpus show that compared with ClassPhrase and AutoPhrase methods,the proposed method has higher accuracy and recall rate.

Key words: ensemble learning, phrase mining, cooperative training, deep learning, Named Entity Recognition(NER)

摘要: 电商领域的文本通常不遵循通用领域文本的表达方式,导致传统短语挖掘方法在电商领域文本中的挖掘精度较低。为此,提出一种基于协同训练的电商领域短语挖掘方法。通过基于语义特征的短语分类模型来有效检测电商领域文本中的反序表达,构建协同训练的短语挖掘框架,以降低领域语料中标注训练数据的成本,在此基础上,利用Stacking方法集成统计模型和语义模型的优点,提升模型整体挖掘性能。在淘宝网查询语料上的实验结果表明,相比于ClassPhrase、AutoPhrase方法,该方法具有更高的精度和召回率。

关键词: 集成学习, 短语挖掘, 协同训练, 深度学习, 命名实体识别

CLC Number: