User Identity Information Aggregation Method for Darknet Web Page

doi:10.19678/j.issn.1000-3428.0066805

Abstract

Abstract:

The distribution of user identity information dispersed across darknet Web pages exhibits sparse and irregular characteristics, and current mainstream information aggregation techniques cannot be directly applied to this context. This study proposes a user identity information aggregation model based on coreference relation extraction.The model inputs a pair of user identity information and its contextual background, determines whether the information pair contains a coreference relation, and constructs a corresponding user identity information dataset for aggregation experiments. To further enhance the recognition ability of the model, the baseline model is enriched with entity category information, leading to the proposal of an entity category-sensitive coreference relation extraction model.To address the inability to obtain sufficient training samples through certain identity category information in darknet, a few-shot learning task is introduced to construct a multitask-based user identity information aggregation model under low-resource conditions.The experimental results show that, under low-resource conditions, the F1 value of the optimized aggregation model reaches 87.03%, which is 11.98 percentage points higher than that of the baseline model.

Key words: darknet, user identity information, information aggregation, relation extraction, few-shot learning, multi-task learning

摘要：

暗网网页中用户身份标识信息的分布呈现出稀疏、不规律的特点，当前主流的信息聚合技术无法直接应用于该场景。提出一种基于共指关系抽取的用户身份信息聚合模型，该模型输入一对用户身份信息及其上下文语境，返回该信息对是否包含共指关系，并且构建相应的用户身份信息数据集用于聚合实验。为进一步提升模型的识别能力，在基线模型的基础上引入实体类别信息，提出实体类别敏感的共指关系抽取模型。针对暗网中通过某些身份类别信息无法获取足够多训练样本的问题，引入少样本学习任务，构建基于多任务的低资源条件下用户身份信息聚合模型。实验结果表明，在低资源条件下，经过优化的聚合模型F1值达到87.03%，较基线模型提升11.98个百分点。

关键词: 暗网, 用户身份信息, 信息聚合, 关系抽取, 少样本学习, 多任务学习

Yuyan WANG, Jiapeng ZHAO, Jinqiao SHI, Liyan SHEN, Hongmeng LIU, Yanyan YANG. User Identity Information Aggregation Method for Darknet Web Page[J]. Computer Engineering, 2023, 49(11): 187-194, 210.

王雨燕, 赵佳鹏, 时金桥, 申立艳, 刘洪梦, 杨燕燕. 暗网网页用户身份信息聚合方法[J]. 计算机工程, 2023, 49(11): 187-194, 210.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0066805

https://www.ecice06.com/EN/Y2023/V49/I11/187

Figures/Tables 8

Fig.1 Procedure of identifying and aggregating identity information of darknet users

Fig.2 Architecture of multi-task darknet user identity information aggregation model

Fig.3 An example of darknet drug trafficking Web page

Fig.4 An example of training sample

References 37

1	LEE K, HE L H, LEWIS M, et al. End-to-end neural coreference resolution[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1707.07045.
2	KIRSTAIN Y, RAM O, LEVY O. Coreference resolution without span representations[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2101.00434.
3	BOHNET B, ALBERTI C, COLLINS M. Coreference resolution through a seq2seq transition-based system[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2211.12142.
4	HU W M, TIAN G D, KANG Y X, et al. Dual sticky hierarchical dirichlet process hidden Markov model and its application to natural language description of motions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (10): 2355- 2373. doi: 10.1109/TPAMI.2017.2756039
5	CHEN P H, LIN C J, SCHÖLKOPF B. A tutorial on ν-support vector machines. Applied Stochastic Models in Business and Industry, 2005, 21 (2): 111- 136. doi: 10.1002/asmb.537
6	TEIXEIRA J, SARMENTO L, OLIVEIRA E. A bootstrapping approach for training a NER with conditional random fields[C]//Proceedings of Portuguese Conference on Artificial Intelligence. Berlin, Germany: Springer, 2011: 664-678.
7	ABBAS G, PHILIPPE L, AHMAD R, et al. Context-aware adversarial training for name regularity bias in named entity recognition. Transactions of the Association for Computational Linguistics, 2021, 9, 586- 604. doi: 10.1162/tacl_a_00386
8	EMELYANOV A A, ARTEMOVA E. Multilingual named entity recognition using pretrained embeddings, attention mechanism and NCRF[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1906.09978.
9	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1810.04805.
10	JOSHI A, LAL R, FININ T, et al. Extracting cybersecurity related linked data from text[C]//Proceedings of 2013 IEEE International Conference on Semantic Computing. Washington D. C., USA: IEEE Press, 2013: 252-259.
11	金砚硕, 迟呈英, 战学刚. 一种基于隐马尔可夫聚类的信息提取方法. 情报杂志, 2008, 27 (3): 96- 98. doi: 10.3969/j.issn.1002-1965.2008.03.032
	JIN Y S, CHI C Y, ZHAN X G. A method for text information extraction based on hidden Markov model clustering. Journal of Information, 2008, 27 (3): 96- 98. doi: 10.3969/j.issn.1002-1965.2008.03.032
12	李中仁. 基于条件随机场的信息抽取与情报信息可视化[D]. 北京: 北方工业大学, 2017.
	LI Z R. Information extraction and information visualization based on conditional random fields[D]. Beijing: North China University of Technology, 2017. (in Chinese)
13	GRISHMAN R. Adaptive information extraction and sublanguage analysis[C]//Proceedings of IJCAI'01. Washington D. C., USA: IEEE Press, 2001: 1-4.
14	ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network[C]//Proceedings of COLING'14. Washington D. C., USA: IEEE Press, 2014: 2335-2344.
15	SANTOS C, XIANG B, ZHOU B, et al. Classifying relations by ranking with convolutional neural networks[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1504.06580v1.
16	WANG L L, CAO Z, DE MELO G, et al. Relation classification via multi-level attention CNNs[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2016: 1298-1307.
17	ZHANG S, ZHENG D, HU X, et al. Bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. Washington D. C., USA: IEEE Press, 2015: 73-78.
18	ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 2: Short Papers). Stroudsburg, USA: Association for Computational Linguistics, 2016: 207-212.
19	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
20	RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1 (8): 9. URL
21	ZHANG Z Y, HAN X, LIU Z Y, et al. ERNIE: enhanced language representation with informative entities[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1905.07129.
22	JOSHI M, CHEN D Q, LIU Y H, et al. SpanBERT: improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 2020, 8, 64- 77. doi: 10.1162/tacl_a_00300
23	PETERS M E, NEUMANN M, LOGAN R L, et al. Knowledge enhanced contextual word representations[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1909.04164.
24	WANG R Z, TANG D Y, DUAN N, et al. K-adapter: infusing knowledge into pre-trained models with adapters[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2002.01808.
25	YAMADA I, ASAI A, SHINDO H, et al. LUKE: deep contextualized entity representations with entity-aware self-attention[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2010.01057.
26	ZHOU W X, CHEN M H. An improved baseline for sentence-level relation extraction[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2102.01373.
27	WEI J, ZOU K. EDA: easy data augmentation techniques for boosting performance on text classification tasks[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1901.11196.
28	VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1606.04080.
29	SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 4080-4090.
30	PETRONI F, ROCKTÄSCHEL T, LEWIS P, et al. Language models as knowledge bases?[EB/OL]. [2022-12-02]. https://arxiv.org/abs/1909.01066.
31	BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2020: 1877-1901.
32	SCHICK T, SCHÜTZE H. It's not just size that matters: small language models are also few-shot learners[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2009.07118.
33	SCHICK T, SCHÜTZE H. Exploiting cloze questions for few shot text classification and natural language inference[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2001.07676.
34	GAO T Y, FISCH A, CHEN D Q. Making pre-trained language models better few-shot learners[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2012.15723.
35	SAINZ O, RIGAU G. Ask2Transformers: zero-shot domain labelling with pre-trained language models[EB/OL]. [2022-12-02]. https://arxiv.org/abs/2101.02661.
36	CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 221-230.
37	FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: PMLR, 2017: 1126-1135.

[1]	ZHANG Tianpeng, HAN Jing, LÜ Xueqiang. Super-Resolution-Aided Small-Target Detection Based on Multi-Task Learning [J]. Computer Engineering, 2024, 50(9): 304-312.
[2]	CAO Yukun, CHENG Yu, HE Zhenyi, XU Kangle, YAN Jialuo, LI Yunfeng. Sentence-level Relation Extraction Method with Document Context Heterogeneous Representation [J]. Computer Engineering, 2024, 50(5): 111-119.
[3]	WU Haipeng, QIAN Yurong, LENG Hongyong. Multimodal Relation Extraction Based on Bidirectional Attention Mechanism [J]. Computer Engineering, 2024, 50(4): 160-167.
[4]	LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia. Text-Relation-Extraction Algorithm Based on Large-Language Model and Semantic Enhancement [J]. Computer Engineering, 2024, 50(4): 87-94.
[5]	SI Mingyue, QI Bin, ZHANG Wensheng, ZHANG Lei. Multi-Dimensional Data Calculation and Few-Shot Learning for Intelligent Transportation Based on Tensor Calculation [J]. Computer Engineering, 2024, 50(4): 41-49.
[6]	Haochen XU, Manhua LIU. Facial Landmark Detection Based on Hierarchical Self-Attention Network [J]. Computer Engineering, 2024, 50(2): 239-246.
[7]	LIAO Tao, ZHANG Guochang, ZHANG Shunxiang. Document-Level Relation Extraction Based on Dual-Granularity Graphs [J]. Computer Engineering, 2024, 50(10): 164-173.
[8]	Linghui KONG, Zheheng RAO, Yanyan XU, Shaoming PAN. Intelligent Routing Algorithm for Wireless Networks Based on Deep Reinforcement Learning [J]. Computer Engineering, 2023, 49(9): 199-207, 216.
[9]	Xiaoli LIU, Yitong WANG. Multi-density Graph-based Session Recommendation Using Self-supervised Learning [J]. Computer Engineering, 2023, 49(9): 60-68, 78.
[10]	Jinshuo LIU, Daichen WANG, Juan DENG, Lina WANG. Classification of Harmful Information on Internet Based on Long-Tailed Classification Algorithm [J]. Computer Engineering, 2023, 49(8): 13-19, 28.
[11]	Haoxin LIU, Chao DONG, Zhinan GOU, Kai GAO. Few-Shot Relation Extraction Method Fusing with Hybrid Representation [J]. Computer Engineering, 2023, 49(8): 63-68.
[12]	Jianhong MA, Tian GONG, Shuang YAO. Document-Level Relation Extraction Based on Evidential Sentences and Graph Convolutional Network [J]. Computer Engineering, 2023, 49(8): 104-110.
[13]	LIAO Tao, SUN Haojie, ZHANG Shunxiang. Entity-Relation Joint Extraction Model Based on Span and Feature Fusion [J]. Computer Engineering, 2023, 49(6): 107-114.
[14]	LI Xiaoteng, ZHANG Panpan, GOU Zhinan, GAO Kai. Multi-Modal Named Entity Recognition Method Based on Multi-Task Learning [J]. Computer Engineering, 2023, 49(4): 114-119.
[15]	WU Xueying, DUAN Youxiang, CHANG Lunjie, LI Shiyin, SUN Qifeng. Research on Entity and Relation Joint Extraction for Geological Domain [J]. Computer Engineering, 2023, 49(3): 121-127.

Please choose a citation manager

Content to export