作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 317-327. doi: 10.19678/j.issn.1000-3428.0069470

• 开发研究与工程应用 • 上一篇    下一篇

基于OpenGL ES的移动端实时视频超分辨率显示

陆晓华, 王慈*()   

  1. 华东师范大学通信与电子工程学院, 上海 200241
  • 收稿日期:2024-03-04 修回日期:2024-06-12 出版日期:2025-11-15 发布日期:2024-08-15
  • 通讯作者: 王慈
  • 基金资助:
    上海市市级科技重大专项(2021SHZDZX)

Mobile Real-Time Video Super-Resolution Display Based on OpenGL ES

LU Xiaohua, WANG Ci*()   

  1. School of Communication and Electronic Engineering, East China Normal University, Shanghai 200241, China
  • Received:2024-03-04 Revised:2024-06-12 Online:2025-11-15 Published:2024-08-15
  • Contact: WANG Ci

摘要:

当前主流的视频超分算法主要应用在服务器端或离线视频转换等业务场景中, 当部署到移动端设备时, 存在计算复杂、推理速度慢等问题, 特别是在实时音视频通信(RTC)业务场景中, 这些主流的超分算法虽然在图像质量上可以满足精度需求, 但是在处理时间上很难达到性能要求, 从而影响算法的实际应用效果。提出一种基于卷积神经网络(CNN)改进的实时视频超分技术(OGSR)。首先采用分组卷积和通道混淆的方式, 在基本不降低超分图像质量的条件下优化神经网络模型, 成倍减小前向推理的计算量; 其次使用OpenGL ES图形加速接口, 将模型参数和通道数据布局成最快采样的纹理数据, 上传显存以用于GPU的并行计算; 最后在GPU的着色器(Shader)中通过渲染像素坐标反向计算通道索引和模型参数索引来实现超分算法的核心模块, 从而达到像素级别的细粒度并发。实验结果表明, 利用OGSR对QVGA(320×240像素)和nHD(640×360像素)分辨率的视频帧进行3倍超分放大, 在不同机型的移动手机上都可以达到15~30帧/s的帧率, 同时放大后的图像质量与标准CNN模型的图像质量的误差在2%以内, 表明OGSR可以满足实时业务场景的需求, 性能提升显著。

关键词: 视频超分, 分组卷积, 通道混淆, OpenGL ES, 纹理

Abstract:

Current mainstream video super-resolution algorithms are primarily applied in business scenarios such as server-side or offline video conversion. When deployed on mobile devices, challenges such as complex computations and slow inference speeds are observed. Although these mainstream super-resolution algorithms can satisfy the accuracy requirements of image quality, satisfying the performance requirements in terms of processing time is challenging, which affects the practical application of the algorithms, particularly in Real-Time audio and video Communication (RTC) business scenarios. This paper proposes a real-time video Super-Resolution technology based on OpenGL ES (OGSR) and Convolutional Neural Network (CNN) improvement and optimization. First, by using grouped convolution and channel obfuscation, the neural network model is optimized without significantly reducing the quality of the super-resolution image, which exponentially reduces the computational cost of forward inference. Subsequently, the OpenGL ES graphics acceleration interface is used to outline the model parameters and channel data into the fastest sampled texture data, which is uploaded to the graphics memory for parallel computing on the GPU. Finally, using the Shader of the GPU, the channel index and model parameter index are calculated in reverse by rendering pixel coordinates to achieve the core module of the super-resolution algorithm, thereby achieving fine-grained concurrency at the pixel level. The experimental results show that the triple super-resolution amplification of QVGA (320×240 pixels) and nHD (640×360 pixels) resolution video frames can achieve a frame rate of 15—30 frame/s on mobile phones of various models. Moreover, the quality error of the enlarged image is within 2% of that of the standard CNN model, which meets the requirements of real-time business scenarios and significantly improves performance.

Key words: video super-resolution, grouped convolution, channel shuffle, OpenGL ES, texture