作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于基础设施即代码的异构边缘侧模型训练优化方法研究

  • 发布日期:2026-02-11

Research on Optimization Methods for Edge-Side Model Training Based on Infrastructure as Code

  • Published:2026-02-11

摘要: 随着边缘智能在工业检测、移动终端、智慧安防等场景中的广泛应用,深度学习模型逐渐从云端向边缘侧迁移。然而,在云端预训练的模型部署至边缘侧微调,通常因边缘资源受限而导致训练效率下降,需协同优化模型超参数和资源参数以维持模型高精度和训练效率。现有技术大多专注于面向服务端场景进行超参数优化,缺乏对边缘设备异构性和资源动态变化的考虑,可能导致模型精度及训练效率显著降低。本文提出异构感知超参数优化方法(H2PO),通过基础设施即代码(IaC)技术实现异构设备及动态资源的统一接口,并引入预测模型以指导模型训练时的在线超参数调优,使得模型超参数实时适应资源的动态变化。在异构设备上的实验表明,H2PO可以在资源受限条件下有效提升模型准确率2.5%、资源利用率4.2%,相比现有方法最多降低训练时间开销71.3%,并适用于不同深度学习模型。

Abstract: With the widespread adoption of edge intelligence in industrial inspection, mobile devices, and smart security applications, deep learning models are increasingly shifting from cloud-centric to edge-side deployment. However, when cloud-pretrained models are fine-tuned on edge devices, limited computational and memory resources often lead to degraded training efficiency. To maintain high model accuracy and training performance, it becomes essential to jointly optimize both model hyperparameters and resource-related parameters. Existing methods predominantly focus on hyperparameter optimization in server-side environments and insufficiently account for the heterogeneity of edge devices and the dynamic variability of edge resources, which may result in substantial drops in accuracy and training efficiency during edge-side fine-tuning. To address these challenges, this paper proposes a Heterogeneity-Aware Hyperparameter Optimization method (H2PO). Leveraging Infrastructure as Code (IaC), H2PO provides a unified interface for managing heterogeneous devices and dynamically changing resources. Additionally, a lightweight prediction model is integrated to guide online hyperparameter adjustment during training, enabling hyperparameters to adapt in real time to fluctuations in resource availability. Experiments on heterogeneous devices show that H2PO can effectively improve model accuracy by 2.5% and resource utilization by 4.2% under resource-constrained conditions. Compared with existing methods, it reduces training time overhead by up to 71.3% and is applicable to different deep learning models.