作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

众包平台作弊用户自动识别

陈霞1,闵华清2,宋恒杰2   

  1. (1.岭南师范学院 信息科学与技术学院,广东 湛江 524048; 2.华南理工大学 软件学院,广州 510006)
  • 收稿日期:2015-10-19 出版日期:2016-08-15 发布日期:2016-08-15
  • 作者简介:陈霞(1976-),女,讲师、硕士,主研方向为数据挖掘、人工智能;闵华清、宋恒杰,教授、博士。
  • 基金资助:
    国家自然科学基金资助项目(61402399);湛江市科技攻关计划基金资助项目(2015B01050);岭南师范学院自然科学基金资助项目(QL1410,YL1505)。

Automatic Identification of Cheating Users on Crowdsourcing Platform

CHEN Xia  1,MIN Huaqing  2,SONG Hengjie  2   

  1. (1.School of Information Science and Technology,Lingnan Normal University,Zhanjiang,Guangdong 524048,China; 2.School of Software Engineering,South China University of Technology,Guangzhou 510006,China)
  • Received:2015-10-19 Online:2016-08-15 Published:2016-08-15

摘要: 众包借助于网络聚集大众的群体智慧有效地完成各种任务,但在现实的众包平台中普遍存在仅为获得报酬而不认真工作的作弊用户,使得众包获得的任务数据质量不够可靠,制约了众包解决问题的能力。针对该问题,提出作弊用户自动识别方法。通过对百度众包平台参与用户的答题行为进行分析,总结出百度众包平台中存在的作弊用户类型,基于对作弊用户行为特征的分析,采用逻辑回归模型对众包用户建模,根据用户行为特征值计算获得众包用户的可靠性,进而基于用户可靠性实现作弊用户自动识别。实验结果表明,与现有的多数投票决策、标准问题集、SpEM方法相比,该方法的识别精确度较高,可达97%。

关键词: 众包, 作弊用户, 行为特征, 逻辑回归模型, 可靠性, 精确性

Abstract: Crowdsourcing can effectively solve a wide variety of tasks by employing the collective intelligence of distributed human population in the network.However,cheating users on crowdsourcing platforms can submit unreliable answers to obtain rewards.They degrade the quality of crowdsourcing services and restrict task resolution.Aiming at this problem,this paper proposes an automatic identification method of cheating users.It systematically analyzes cheating users’ behavioral characteristics and empirically summarizes the possible spamming types in the Baidu Crowdsourcing Platform(BCP).Based on the above analysis results,a logistic regression model is constructed to obtain objective measures of user reliability.According to the user’s reliability,the cheating users can be automatically identified.Experimental results show that compared with the baseline methods of majority voting,gold question set and SpEM method,the proposed method has higher recognition accuracy,reaching 97%.

Key words: crowdsourcing, cheating user, behavior characteristics, logistic regression model, reliability, accuracy

中图分类号: