Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (6): 312-320. doi: 10.19678/j.issn.1000-3428.0058006

• Development Research and Engineering Application • Previous Articles    

Study on Financial Fraud Account Detection Based on Imbalanced Datasets

Lü Fang1,2, TANG Fenghe1,2, HUANG Junheng1,2, WANG Bailing1,2   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology(Weihai), Weihai, Shandong 264209, China;
    2. Research Institute of Cyberspace Security, Harbin Institute of Technology(Weihai), Weihai, Shandong 264209, China
  • Received:2020-04-08 Revised:2020-06-19 Published:2020-07-03
  • Contact: 国家重点研发计划“网络空间安全”重点专项(2017YFB0801804)。 E-mail:wbl@hit.edu.cn

面向非平衡数据集的金融欺诈账户检测研究

吕芳1,2, 汤丰赫1,2, 黄俊恒1,2, 王佰玲1,2   

  1. 1. 哈尔滨工业大学(威海) 计算机科学与技术学院, 山东 威海 264209;
    2. 哈尔滨工业大学(威海)网络空间安全研究院, 山东 威海 264209
  • 作者简介:吕芳(1990-),女,博士研究生,主研方向为金融安全、数据挖掘、图挖掘;汤丰赫,本科生;黄俊恒,副教授;王佰玲(通信作者),教授、博士。

Abstract: For the detection of bank accounts involved in fraud, this paper proposes a framework, iForest-SMOTE, which is applicable to the imbalanced financial datasets.Based on the dynamic transaction features of the accounts, the transaction behavior features are extracted from the dimensions of statistical information, sequential order information and supervision information.Then a datasets equalization strategy for data pre-processing is proposed to address the problem of cross-region sample synthesis, which is faced by the oversampling technology, ADASYN, on the financial account datasets.The strategy uses the iForest algorithm for mixed sampling of the data to remove the majority of noisy data and reduce the difficulty of the classifier learning from the minor classes.On this basis, a random forest classifier is designed to implement the detection of the accounts involved in financial fraud.The experimental results on the datasets of actual financial account transactions show that iForest-SMOTE has a clear advantage in the recall rate and accuracy over ADASYN, SMOTE and other sampling techniques.Its F-value is at least 2.13 percentage points higher than that of the other algorithms.

Key words: isolation forest, imbalanced classification, fraud account detection, random forest, feature mining

摘要: 针对非平衡金融数据集,提出一种银行欺诈账户检测框架iForest-SMOTE。基于账户的动态交易特点,从统计、时序、监督信息维度抽取账户交易行为特征。针对过采样技术ADASYN在金融账户数据集中存在的跨区域样本合成问题,提出一种基于iForest算法的数据集均衡预处理策略,通过iForest算法对数据进行混合采样,在去除多数类噪声数据的同时降低分类器对少数类的学习难度。在此基础上,设计随机森林分类器实现金融欺诈账户检测。在真实金融账户交易数据集上进行实验,结果表明,与ADASYN、SMOTE等采样技术相比,iForest-SMOTE在召回率和准确率方面具有明显优势,F-value值至少能够提升2.13个百分点。

关键词: 隔离森林, 非平衡分类, 欺诈账户检测, 随机森林, 特征挖掘

CLC Number: