作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (9): 231-241. doi: 10.19678/j.issn.1000-3428.0069154

• 图形图像处理 • 上一篇    下一篇

面向交通场景的强鲁棒性场景图生成网络

周玮1, 闵卫东1,2,3,*()   

  1. 1. 南昌大学数学与计算机学院,江西 南昌 330031
    2. 南昌大学元宇宙研究院,江西 南昌 330031
    3. 江西省智慧城市重点实验室,江西 南昌 330031
  • 收稿日期:2024-01-02 修回日期:2024-05-15 出版日期:2025-09-15 发布日期:2025-09-26
  • 通讯作者: 闵卫东
  • 基金资助:
    国家自然科学基金(62076117); 江西省智慧城市重点实验室(20192BCD40002)

Robust Scene Graph Generation Network for Traffic Scenes

ZHOU Wei1, MIN Weidong1,2,3,*()   

  1. 1. School of Mathematics and Computer Science, Nanchang University, Nanchang 330031, Jiangxi, China
    2. Institute of Metaverse, Nanchang University, Nanchang 330031, Jiangxi, China
    3. Jiangxi Key Laboratory of Smart City, Nanchang 330031, Jiangxi, China
  • Received:2024-01-02 Revised:2024-05-15 Online:2025-09-15 Published:2025-09-26
  • Contact: MIN Weidong

摘要:

交通场景图是对交通场景进行结构化表示,在智能交通领域中发挥着重要作用。当前场景图生成方法通过预测实体对之间的关系以生成无偏场景图。然而,由于数据集的长尾分布与实体关系的模糊特征表示,因此现有方法生成的交通场景图无法为下游任务提供准确且具有丰富含义的交通场景信息。为了解决上述问题,提出1个上下文语义嵌入(CSE)和粗细粒度混合(CFGB)的交通场景图生成网络CSE-CFGB。使用CSE模块建立实体与谓词的独特语义表示,使用CFGB网络对实体间关系谓词进行强鲁棒性预测,主干分支(MB)使用CSE表示对实体之间的关系进行直接预测,粗粒度分支(CB)使用重加权机制负责学习头部谓词的鲁棒特征,而细粒度分支(FB)使用Logit调整方法负责细化对尾部谓词的学习,再配备分支权重表,使2个辅助分支能很好地合作以帮助MB平衡头部和尾部谓词的预测结果。在Visual Genome数据集上的实验结果表明,所提的场景图生成网络在PredCls任务中取得了平均性能指标Mean@50和Mean@100分别为49.5%和51.7%,能有效解决模型训练中实体关系表示模糊和数据集长尾分布的问题。

关键词: 场景图生成, 长尾分布, 特征表示, 上下文语义嵌入, 粗细粒度混合

Abstract:

Traffic scene graph plays an important role in structurally representing traffic scenes. Current methods for scene graph generation predict relationships between entities to generate unbiased scene graphs. However, with existing methods, the long-tailed distribution of datasets and ambiguous feature representation of entity relationships result in traffic scene graphs that fail to provide accurate and meaningful traffic scene information for downstream tasks. To address these issues, this study proposes a Contextual Semantic Embedding (CSE) and Coarse-Fine-Grained Blending (CFGB) traffic scene graph generation network CSE-CFGB. Specifically, the CSE module is used to establish the unique semantic representations of entities and predicates. Subsequently, the CFGB network is employed to robustly predict relationships between entities. The Main Branch (MB) utilizes CSE to directly predict relationships between entities; the Coarse-grained Branch (CB) is responsible for learning robust features of head predicates using a reweighting mechanism; and the Fine-grained Branch (FB) refines the learning of tail predicates using a Logit adjustment method. Additionally, a branch weights table is incorporated to facilitate cooperation between the two auxiliary branches and help balance the prediction performance of the head and tail predicates by the MB. In experimental evaluations conducted on the Visual Genome dataset, the proposed scene graph generation network achieved excellent performance in the PredCls task, with average performance metrics Mean@50和Mean@100 reaching 49.5% and 51.7%, respectively. The experimental results indicate that the proposed method addresses the issues of ambiguous entity relationship representation and long-tailed distributions in a dataset during model training.

Key words: scene graph generation, long-tail distribution, feature representation, Contextual Semantic Embedding (CSE), Coarse-Fine-Grained Blending (CFGB)