基于OpenGL ES的移动端实时视频超分辨率显示

doi:10.19678/j.issn.1000-3428.0069470

摘要/Abstract

摘要：

当前主流的视频超分算法主要应用在服务器端或离线视频转换等业务场景中, 当部署到移动端设备时, 存在计算复杂、推理速度慢等问题, 特别是在实时音视频通信(RTC)业务场景中, 这些主流的超分算法虽然在图像质量上可以满足精度需求, 但是在处理时间上很难达到性能要求, 从而影响算法的实际应用效果。提出一种基于卷积神经网络(CNN)改进的实时视频超分技术(OGSR)。首先采用分组卷积和通道混淆的方式, 在基本不降低超分图像质量的条件下优化神经网络模型, 成倍减小前向推理的计算量; 其次使用OpenGL ES图形加速接口, 将模型参数和通道数据布局成最快采样的纹理数据, 上传显存以用于GPU的并行计算; 最后在GPU的着色器(Shader)中通过渲染像素坐标反向计算通道索引和模型参数索引来实现超分算法的核心模块, 从而达到像素级别的细粒度并发。实验结果表明, 利用OGSR对QVGA(320×240像素)和nHD(640×360像素)分辨率的视频帧进行3倍超分放大, 在不同机型的移动手机上都可以达到15~30帧/s的帧率, 同时放大后的图像质量与标准CNN模型的图像质量的误差在2%以内, 表明OGSR可以满足实时业务场景的需求, 性能提升显著。

关键词: 视频超分, 分组卷积, 通道混淆, OpenGL ES, 纹理

Abstract:

Current mainstream video super-resolution algorithms are primarily applied in business scenarios such as server-side or offline video conversion. When deployed on mobile devices, challenges such as complex computations and slow inference speeds are observed. Although these mainstream super-resolution algorithms can satisfy the accuracy requirements of image quality, satisfying the performance requirements in terms of processing time is challenging, which affects the practical application of the algorithms, particularly in Real-Time audio and video Communication (RTC) business scenarios. This paper proposes a real-time video Super-Resolution technology based on OpenGL ES (OGSR) and Convolutional Neural Network (CNN) improvement and optimization. First, by using grouped convolution and channel obfuscation, the neural network model is optimized without significantly reducing the quality of the super-resolution image, which exponentially reduces the computational cost of forward inference. Subsequently, the OpenGL ES graphics acceleration interface is used to outline the model parameters and channel data into the fastest sampled texture data, which is uploaded to the graphics memory for parallel computing on the GPU. Finally, using the Shader of the GPU, the channel index and model parameter index are calculated in reverse by rendering pixel coordinates to achieve the core module of the super-resolution algorithm, thereby achieving fine-grained concurrency at the pixel level. The experimental results show that the triple super-resolution amplification of QVGA (320×240 pixels) and nHD (640×360 pixels) resolution video frames can achieve a frame rate of 15—30 frame/s on mobile phones of various models. Moreover, the quality error of the enlarged image is within 2% of that of the standard CNN model, which meets the requirements of real-time business scenarios and significantly improves performance.

Key words: video super-resolution, grouped convolution, channel shuffle, OpenGL ES, texture

陆晓华, 王慈. 基于OpenGL ES的移动端实时视频超分辨率显示[J]. 计算机工程, 2025, 51(11): 317-327.

LU Xiaohua, WANG Ci. Mobile Real-Time Video Super-Resolution Display Based on OpenGL ES[J]. Computer Engineering, 2025, 51(11): 317-327.

https://www.ecice06.com/CN/Y2025/V51/I11/317

图/表 19

图1 改进的神经网络超分实现方案

Fig.1 Improved neural network super-resolution implementation scheme

图2 标准卷积和分组卷积对比

Fig.2 Comparison between standard convolution and grouped convolution

图3 通道混淆处理过程

Fig.3 The processing of channel shuffle

图4 单个渲染管线的数据结构

Fig.4 Data structure of a single rendering pipeline

图5 Conv1卷积权重数据的纹理布局

Fig.5 Texture layout of Conv1 convolutional weight data

图6 Conv2卷积权重数据的纹理布局

Fig.6 Texture layout of Conv2 convolutional weight data

图7 通道数据仅沿垂直方向进行纹理布局

Fig.7 Texture layout of channel data only along the vertical direction

图8 Conv1输出通道数据的纹理布局

Fig.8 Texture layout of Conv1 output channel data

图9 Conv2输出通道数据的纹理布局

Fig.9 Texture layout of Conv2 output channel data

图10 Conv2组件渲染算法流程

Fig.10 Rendering algorithm flow of Conv2 component

图11 Conv2组件中计算的像素点和权重数据

Fig.11 Pixel points and weight data calculated in the Conv2 component

图12 Conv2组件中参与计算的输入通道

Fig.12 Input channels involved in computation within the Conv2 component

图13 性能对比结果

Fig.13 Performance comparison results

图14 模型计算图

Fig.14 Computation graph of model

参考文献 31

1	KEYS R. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(6): 1153- 1160. doi: 10.1109/TASSP.1981.1163711
2	黄淑英, 吴昕, 杨勇, 等. 自适应正则化稀疏表示的遥感图像SR重建. 小型微型计算机系统, 2023, 44(3): 573- 581.
	HUANG S Y, WU X, YANG Y, et al. Remote sensing image super-resolution reconstruction based on adaptive regularized sparse representation. Journal of Chinese Computer Systems, 2023, 44(3): 573- 581.
3	CHAO D, CHEN C, HE K M. Image super-resolution using deep convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2016: 1-5.
4	KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 1646-1654.
5	KIM J, LEE J K, LEE K M. Deeply-recursive convolutional network for image super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 1637-1645.
6	ZHOU Y P, LI Z, GUO C L, et al. SRFormer: permuted self-attention for single image super-resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 12734-12745.
7	WANG Y, YANG W, CHEN X, et al. SinSR: diffusion-based image super-resolution in a single step[EB/OL]. [2023-10-05]. https://arxiv.org/abs/2311.14760.
8	YI X P, XU H, ZHANG H, et al. Diff-Retinex: rethinking low-light image enhancement with a generative diffusion model[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 12268-12277.
9	CABALLERO J, LEDIG C, AITKEN A, et al. Real-time video super-resolution with spatio-temporal networks and motion compensation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 2848-2857.
10	WANG X T, CHAN K C K, YU K, et al. EDVR: video restoration with enhanced deformable convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2019: 1954-1963.
11	TIAN Y P, ZHANG Y L, FU Y, et al. TDAN: temporally-deformable alignment network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 3357-3366.
12	LIANG J Y, CAO J Z, FAN Y C, et al. VRT: a video restoration transformer. IEEE Transactions on Image Processing, 2024, 33, 2171- 2182. doi: 10.1109/TIP.2024.3372454
13	CHEN Z, LONG F, QIU Z, et al. Learning spatial adaptation and temporal coherence in diffusion models for video super-resolution[EB/OL]. [2023-10-05]. https://arxiv.org/abs/2403.17000.
14	ZHOU X, ZHANG L, ZHAO X, et al. Video super-resolution transformer with masked inter & intra-frame attention[EB/OL]. [2023-10-05]. https://arxiv.org/abs/2401.06312.
15	ZHANG X D, ZENG H, ZHANG L. Edge-oriented convolution block for real-time super resolution on mobile devices[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM Press, 2021: 4034-4043.
16	BERGER G, DHINGRA M, MERCIER A, et al. QuickSRNet: plain single-image super-resolution architecture for faster inference on mobile platforms[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2023: 2187-2196.
17	SHI W Z, CABALLERO J, HUSZAR F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 1874-1883.
18	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 4510-4520.
19	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 6848-6856.
20	MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNetV2: practical guidelines for efficient CNN architecture design[EB/OL]. [2023-10-05]. https://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf.
21	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 13728-13737.
22	WANG A, CHEN H, LIN Z, et al. RepViT: revisiting mobile CNN from ViT perspective[EB/OL]. [2023-10-05]. https://arxiv.org/abs/2307.09283.
23	GOU W R, YI Z Y, XIANG Y, et al. SYENet: a simple yet effective network for multiple low-level vision tasks with real-time performance on mobile device[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 12148-12161.
24	WILLIAMS S, WATERMAN A, PATTERSON D. Roofline. Communications of the ACM, 2009, 52(4): 65- 76. doi: 10.1145/1498765.1498785
25	孙毅, 王会梅, 鲜明, 等. Kubeflow异构算力调度策略研究. 计算机工程, 2024, 50(2): 25- 32. doi: 10.19678/j.issn.1000-3428.0067396
	SUN Y, WANG H M, XIAN M, et al. Research on heterogeneous computing scheduling strategy for Kubeflow. Computer Engineering, 2024, 50(2): 25- 32. doi: 10.19678/j.issn.1000-3428.0067396
26	周义涛, 李阳, 韩超, 等. 适用于S-NUCA异构处理器的任务调度与热管理系统. 计算机工程, 2024, 50(2): 196- 205. doi: 10.19678/j.issn.1000-3428.0067161
	ZHOU Y T, LI Y, HAN C, et al. Task scheduling and thermal management system for S-NUCA heterogeneous processor. Computer Engineering, 2024, 50(2): 196- 205. doi: 10.19678/j.issn.1000-3428.0067161
27	李博, 黄东强, 贾金芳, 等. 基于CPU与GPU的异构模板计算优化研究. 计算机工程, 2023, 49(4): 131- 137. doi: 10.19678/j.issn.1000-3428.0064282
	LI B, HUANG D Q, JIA J F, et al. Research on optimization of heterogeneous stencil computing based on CPU and GPU. Computer Engineering, 2023, 49(4): 131- 137. doi: 10.19678/j.issn.1000-3428.0064282
28	MUNSHI A, GINSBURG D, SHREINER D. OpenGL ES 2.0 programming guide[M]. [S. l. ]: Pearson Education, 2008.
29	李双峰. TensorFlow Lite: 端侧机器学习框架. 计算机研究与发展, 2020, 57(9): 1839- 1853.
	LI S F. TensorFlow Lite: on-device machine learning framework. Journal of Computer Research and Development, 2020, 57(9): 1839- 1853.
30	李林, 若朴. 详解Core ML框架及智能音箱HomePod. 计算机与网络, 2017, 43(12): 34- 35.
	LI L, RUO P. Detailed explanation of Core ML framework and intelligent speaker HomePod. Computer & Network, 2017, 43(12): 34- 35.
31	甘润东, 沈舒尹, 张宇哲. MXNet框架中基于OpenCL核函数的多维线性数据处理. 数据与计算发展前沿, 2022, 4(2): 29- 38.
	GAN R D, SHEN S Y, ZHANG Y Z. Multidimensional linear data processing based on OpenCL kernel function in MXNet framework. Frontiers of Data and Computing Development, 2022, 4(2): 29- 38.

[1]	王亚, 甘青松, 沈琦, 宋余庆, 刘毅, 韩凯, 刘哲. 基于动态联合加权的带钢表面缺陷分类方法[J]. 计算机工程, 2025, 51(6): 286-296.
[2]	袁昊, 葛海波, 辛世澳, 胥冬梅, 杨雨迪. 基于深度纹理特征的伪装目标边缘细化检测[J]. 计算机工程, 2024, 50(10): 89-99.
[3]	常文斌, 牟明任, 贾海鹏, 张云泉, 张思佳. 基于OpenGL ES的图像滤波算法实现及优化研究[J]. 计算机工程, 2023, 49(11): 257-266.
[4]	崔云轩, 刘桂华, 余东应, 郭中远, 张文凯. 点线特征融合的激光雷达单目惯导SLAM系统[J]. 计算机工程, 2022, 48(7): 254-263.
[5]	党良慧, 张玉金, 路东生. 基于纹理免疫的JPEG预压缩图像降尺度因子检测[J]. 计算机工程, 2022, 48(5): 272-280.
[6]	陈苏阳, 宋晓宁. 融合多尺度特征的多监督人脸活体检测算法[J]. 计算机工程, 2022, 48(12): 79-85,94.
[7]	黄胜, 向思皓, 胡峰, 马婷, 卢冰. 自适应不规则纹理的胶囊内镜图像无损压缩[J]. 计算机工程, 2022, 48(10): 21-27,36.
[8]	文斌, 朱晗. 基于自适应权重的立体匹配优化算法[J]. 计算机工程, 2021, 47(4): 268-276.
[9]	姜玉宁, 李劲华, 赵俊莉. 基于生成式对抗网络的图像超分辨率重建算法[J]. 计算机工程, 2021, 47(3): 249-255.
[10]	谭镭, 孙怀江. SKASNet:用于语义分割的轻量级卷积神经网络[J]. 计算机工程, 2020, 46(9): 261-267.
[11]	程晓悦, 赵龙章, 胡穹, 史家鹏. 基于密集层和注意力机制的快速语义分割[J]. 计算机工程, 2020, 46(4): 247-252,259.
[12]	曾娅琴, 张琳琳, 张若楠, 杨波. 基于MobileNet的恶意软件家族分类模型[J]. 计算机工程, 2020, 46(4): 162-168.
[13]	陶谦, 熊风光, 刘涛, 况立群, 韩燮, 梁振斌, 常敏. 多幅点云数据与纹理序列间的自动配准方法[J]. 计算机工程, 2020, 46(10): 259-265,274.
[14]	李鹏, 陈嘉琦, 马味敏, 叶方跃. Contourlet域下基于多尺度特征的声呐图像分割[J]. 计算机工程, 2019, 45(9): 253-259.
[15]	张景莲, 彭艳兵. 基于特征融合的恶意代码分类研究[J]. 计算机工程, 2019, 45(8): 281-286,295.

选择文件类型/文献管理软件名称

选择包含的内容