作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (1): 312-320. doi: 10.19678/j.issn.1000-3428.0060051

• 开发研究与工程应用 • 上一篇    

基于特征增强聚合的融合广告点击率预测模型

蒋兴渝, 黄贤英, 陈雨晶, 徐福   

  1. 重庆理工大学 计算机科学与工程学院, 重庆 400054
  • 收稿日期:2020-11-18 修回日期:2021-01-20 发布日期:2021-01-21
  • 作者简介:蒋兴渝(1994-),男,硕士研究生,主研方向为深度学习、推荐系统;黄贤英(通信作者),教授;陈雨晶,本科生;徐福,硕士研究生。
  • 基金资助:
    国家社会科学基金(17XXW005);重庆理工大学研究生创新项目(clgycx20203118)。

Hybrid Advertising Click-through Rate Prediction Model Based on Feature Enhancement Combination

JIANG Xingyu, HUANG Xianying, CHEN Yujing, XU Fu   

  1. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Received:2020-11-18 Revised:2021-01-20 Published:2021-01-21

摘要: 传统点击率(CTR)预测模型多在单一特征级上进行特征交互,未能充分利用不同特征级上的有效信息。基于特征增强聚合方法提出一种融合广告CTR预测(APNN)模型。在CTR预测模型的嵌入层中引入一阶信息重要性进行特征增强,通过注意力因子分解机(AFM)模型与基于乘积产生层的神经网络(PNN)模型融合不同特征级交互特征和增强的嵌入特征,并利用多个全连接层从融合特征中获得更多潜在的高阶信息。实验结果表明,相比AFM、PNN、FNN等模型,APNN模型的预测精度较高,其在Criteo数据集上的AUC和LogLoss指标较PNN模型分别提高1.74和1.42个百分点。

关键词: 点击率预测, 一阶信息重要性, 特征增强, 因子分解机, 深度神经网络

Abstract: In the existing models for Click-Through Rate (CTR) prediction, the features interact with each other on a single feature level, and the models usually ignore the effective information of different feature levels.This paper proposes a hybrid model, APNN, based on feature enhancement aggregation for ad CTR prediction.First-order information importance is introduced into the embedding layer of the CTR prediction model for feature enhancement.Then the Attentional Factorization Machine (AFM) model and the Product-based Neural Network (PNN) are used to fuse the interaction features of different feature levels and the enhanced embedding features.On this basis, multiple fully connected layers are used to obtain more potential high-order information from fused features.The experimental results show that compared with AFM, PNN, FNN and other models, the proposed APNN model displays a higher accuracy.Its AUC is 1.74 percentage points higher than that of the PNN model, and its LogLoss is 1.42 percentage points higher on the Criteo dataset.

Key words: Click-Through Rate(CTR) prediction, first-order information importance, feature enhancement, Factorization Machine(FM), Deep Neural Network(DNN)

中图分类号: