源代码中的API密钥自动识别方法

doi:10.19678/j.issn.1000-3428.0048197

计算机工程 ›› 2018, Vol. 44 ›› Issue (6): 117-121,129. doi: 10.19678/j.issn.1000-3428.0048197

源代码中的API密钥自动识别方法

薛敏¹,方勇²,黄诚¹,刘亮 ²

1.四川大学电子信息学院,成都 610065; 2.四川大学网络空间安全学院,成都 610207

收稿日期:2017-07-31 出版日期:2018-06-15 发布日期:2018-06-15
作者简介:薛敏(1993—),女,硕士研究生,主研方向为Web安全;方勇,教授、博士;黄诚,博士;刘亮,讲师、博士。

Automatic Identification Method of API Key in Source Code

XUE Min ¹,FANG Yong ²,HUANG Cheng¹,LIU Liang²

1.College of Electronics and Information,Sichuan University,Chengdu 610065,China; 2.College of Cybersecurity,Sichuan University,Chengdu 610207,China

Received:2017-07-31 Online:2018-06-15 Published:2018-06-15

摘要/Abstract

摘要： 应用程序编程接口(API)密钥的泄露可能导致相关服务被恶意利用,从而造成难以预估的经济损失。为此,通过对样本进行基本特征统计和源代码静态结构分析,提取出不同项目代码中API密钥的共性特征,从而构建一种基于机器学习的自动识别源代码中API密钥的方法。实验结果表明,该识别方法的检索性能比全文匹配搜索、关键字搜索和信息熵值搜索等传统检测方式更优。

关键词: 应用程序编程接口密钥, 源代码, 机器学习, 静态结构, 信息熵

Abstract: The leak of Application Programming Interface(API) key may cause the illegal use of services,and then lead to unpredictable economic losses.The common characteristics of API keys in different project codes are extracted by analyzing the basic characteristics statistics and the source code static structure of the samples.Then,an automatic identification method based on machine learning is built to detect the API keys in the source code.The result of 10-fold cross-validation experiment results show that the identification method is better in retrieval performance than traditional detection approaches such as full-text matching search,keywords search and information entropy search.

Key words: Application Programming Interface(API) key, source code, machine learning, static structure, information entropy

中图分类号:

TP309

薛敏,方勇,黄诚,刘亮. 源代码中的API密钥自动识别方法[J]. 计算机工程, 2018, 44(6): 117-121,129.

XUE Min,FANG Yong,HUANG Cheng,LIU Liang. Automatic Identification Method of API Key in Source Code[J]. Computer Engineering, 2018, 44(6): 117-121,129.

https://www.ecice06.com/CN/Y2018/V44/I6/117

参考文献

［1］FARRELL S.API keys to the kingdom［J］.IEEE Internet Computing,2009,13(5):91-93.
［2］VIENNOT N,GARCIA E,NIEH J.A measurement study of google play［C］//Proceedings of ACM International Conference on Measurement & Modeling of Computer Systems.New York,USA:ACM Press,2014:221-233.
［3］Techspot.10 000 AWS secret access keys carelessly left in code up-loaded to GitHub［EB/OL］.［2017-07-31］.http://www.techspot.com/news/56127-10000-aws-secret-access-keys-carelessly-left-in-code-uploaded-to-github.html.
［4］We reverse engineered 16k apps,here’s what we found［EB/OL］.［2017-07-31］.https://hackernoon.com/we-reverse-engineered-16k-apps-heres-what-we-found-51bdf3b456bb.
［5］“Truffle Hog” tool detects secret key leaks on GitHub［EB/OL］.［2017-07-31］.http://www.security week.com/truffle-hog-tool-detects-secret-key-leaks-github.
［6］SINHA V S,SAHA D,DHOOLIA P,et al.Detecting and mitigating secret-key leaks in source code repositories［C］//Proceedings of the 12th Working Conference on Mining Software Repositories.Washington D.C.,USA:IEEE Press,2015:396-400.
［7］刘浩广,蔡绍洪.信息熵及其随机性［J］.贵州大学学报(自然科学版),2007,24(4):350-351.
(下转第129页) (上接第121页) ［8］Scanning data for entropy anomalies［EB/OL］.［2017-07-31］.http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html.
［9］PENG F,SCHUURMANS D.Combining naive Bayes and n-gram language models for text classification［C］//Proceedings of European Conference on Information Retrieval.Berlin,Germany:Springer,2003:335-350.
［10］NORVIG P.Natural language corpus data［EB/OL］.［2017-07-31］.http://www.norvig.com/ngrams/ch14.pdf.
［11］WANG Wei,SHIRLEY K.Breaking bad:detecting malicious domains using word segmentation［EB/OL］.［2017-07-31］.https://arxiv.org/abs/1506.04111 2015.
［12］邓爱萍.程序代码相似度度量算法研究［J］.计算机工程与设计,2008,29(17):4636-4638.
［13］石野,黄龙和,车天阳,等.基于语法树的程序相似度判定方法［J］.吉林大学学报(信息科学版),2014,32(1):95-100.
［14］COMANICIU D,MEER P.Mean shift:a robust approach toward feature space analysis［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,24(5):603-619.
［15］张华伟,王明文,甘丽新.基于随机森林的文本分类模型研究［J］.山东大学学报(理学版),2006,41(3):139-143.
［16］张希翔,赵欢.基于随机森林的语音人格预测方法［J］.计算机工程,2017,43(6):253-258.
［17］KOHAVI R.A study of cross-validation and bootstrap for accuracy estimation and model selection［C］//Proceedings of International Joint Conference on Artificial Intelligence.New York,USA:ACM Press,1995:1137-1145.
［18］GOUTTE C,GAUSSIER E.A probabilistic interpretation of precision,recall and F-score,with implication for evalua-tion［C］//Proceedings of European Conference on Infor-mation Retrieval.Berlin,Germany:Springer,2005:345-359.

[1]	徐明亮, 李芳媛, 马浩然, 何飞. 大规模神经记录的峰电位聚类算法(特邀)[J]. 计算机工程, 2024, 50(6): 1-34.
[2]	李永飞, 李铭洋, 常鑫, 曹可欣. 基于可解释性深度学习的物联网水质监测数据异常检测[J]. 计算机工程, 2024, 50(6): 179-187.
[3]	孙毅, 王会梅, 鲜明, 向航. Kubeflow异构算力调度策略研究[J]. 计算机工程, 2024, 50(2): 25-32.
[4]	陈治旭, 靳雁霞, 芦烨, 杨晶, 刘亚变, 史志儒. 基于子图卷积神经网络的多精度服装建模方法[J]. 计算机工程, 2023, 49(4): 174-181.
[5]	陈何雄, 罗宇薇, 韦云凯, 郭威, 杭菲璐, 何映军, 杨宁. 基于联邦学习的SDN异常流量协同检测技术[J]. 计算机工程, 2023, 49(3): 168-176.
[6]	陈天宇, 楚程钱, 万思远, 万永菁, 孙静. 基于条件轻量级神经网络的视频入侵检测算法[J]. 计算机工程, 2023, 49(12): 152-160.
[7]	刘金硕, 詹岱依, 邓娟, 王丽娜. 基于深度神经网络和联邦学习的网络入侵检测[J]. 计算机工程, 2023, 49(1): 15-21,30.
[8]	葛昕, 邹福泰, 郭万达, 谭越, 李林森. 社交僵尸网络发展综述[J]. 计算机工程, 2022, 48(8): 12-24.
[9]	俞莎莎, 牛保宁. 基于交易不可信度的比特币非法交易检测[J]. 计算机工程, 2022, 48(8): 166-172.
[10]	金海波, 赵欣越. 共形预测框架下的高可靠入侵检测算法[J]. 计算机工程, 2022, 48(7): 130-140.
[11]	钱龙, 赵静, 韩京宇, 毛毅. 基于标签相关性的K近邻多标签学习[J]. 计算机工程, 2022, 48(6): 73-78,88.
[12]	孙福禄, 王宇嘉, 刘子怡. 基于节点引力与鱼记忆的社区检测算法[J]. 计算机工程, 2022, 48(5): 104-111.
[13]	李莉, 任振康, 石可欣. 代价敏感的Boosting软件缺陷预测方法[J]. 计算机工程, 2022, 48(3): 175-180.
[14]	刘鹏, 叶润, 闫斌, 谢茜, 刘睿. 一种深度回声状态网络的输入尺度自适应算法[J]. 计算机工程, 2022, 48(2): 92-98,105.
[15]	陈良臣, 傅德印. 面向小样本数据的机器学习方法研究综述[J]. 计算机工程, 2022, 48(11): 1-13.

选择文件类型/文献管理软件名称

选择包含的内容

源代码中的API密钥自动识别方法

Automatic Identification Method of API Key in Source Code

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

源代码中的API密钥自动识别方法

Automatic Identification Method of API Key in Source Code

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价