作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (4): 277-284. doi: 10.19678/j.issn.1000-3428.0057442

• 开发研究与工程应用 • 上一篇    下一篇

基于特征组分层与半监督学习的鼠标轨迹识别

康璐璐1,3, 范兴容2, 王茜竹1,3, 杨晓雅1,3, 明蕊1   

  1. 1. 重庆邮电大学 通信与信息工程学院, 重庆 400065;
    2. 重庆工商大学 计算机科学与信息工程学院, 重庆 400067;
    3. 重庆邮电大学 电子信息与网络工程研究院, 重庆 400065
  • 收稿日期:2020-02-20 修回日期:2020-03-27 发布日期:2020-04-14
  • 作者简介:康璐璐(1994-),女,硕士研究生,主研方向为机器学习、数据挖掘;范兴容,讲师、博士;王茜竹,高级工程师、博士;杨晓雅,硕士研究生;明蕊,本科生。
  • 基金资助:
    重庆市自然科学基金(cstc2018jcyjAX0587);重庆市科技重大主题专项重点示范项目(cstc2018jszx-cyztzxX0035);中国移动科研基金项目(MCM20170203)。

Mouse Trajectory Recognition Based on Feature Group Hierarchy and Semi-Supervised Learning

KANG Lulu1,3, FAN Xingrong2, WANG Qianzhu1,3, YANG Xiaoya1,3, MING Rui1   

  1. 1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
    2. School of Computer Science and Information Engineering, Chongqing Technology and Business University, Chongqing 400067, China;
    3. Electronic Information and Networking Research Institute, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2020-02-20 Revised:2020-03-27 Published:2020-04-14

摘要: 传统时间序列分类方法存在鼠标轨迹特征挖掘不充分、数据不平衡与标记样本量少等问题,造成识别效果较差。结合特征组分层和半监督学习,提出一种鼠标轨迹识别方法。通过不同视角构建有层次的鼠标轨迹特征组,并借鉴半监督学习的思想,利用多个随机森林模型对未标记样本进行伪标记,且将抽取标签预测一致且置信度较高的部分样本加入到训练集中。基于基础特征组和辅助特征组,在扩充后的训练集上训练随机森林模型,以实现鼠标轨迹的人机识别。实验结果表明,该方法可有效识别鼠标轨迹,且精确率、召回率与调和均值分别达到97.83%、94.72%和96.56%。

关键词: 鼠标轨迹识别, 特征组分层, 半监督学习, 随机森林模型, 不平衡数据

Abstract: Traditional time series classification methods have problems such as insufficient mining of mouse trajectory features,unbalanced data,and few labeled samples,resulting in poor recognition results.Combining feature group hierarchy and semi-supervised learning,this paper proposes a mouse track recognition method.In this method,hierarchical mouse trajectory feature groups are constructed from different perspectives.Then based on the idea of semi-supervised learning,multiple random forest models are used to pseudo-label unlabeled samples,and some samples with consistent label predictions and high confidence are added to the training set.Based on the basic feature set and auxiliary feature set,the random forest model is trained on the expanded training set to realize the human-machine recognition of the mouse trajectory.The experimental results show that this method can effectively identify the mouse track,and its precision,recall rate and harmonic mean values reach 97.83%,94.72% and 96.56%,respectively.

Key words: mouse trajectory recognition, feature group hierarchy, semi-supervised learning, random forest model, unbalanced data

中图分类号: