1 |
REN Z L, ZHANG Q S, GAO X Y, et al. Multi-modality learning for human action recognition. Multimedia Tools and Applications, 2021, 80 (11): 16185- 16203.
doi: 10.1007/s11042-019-08576-z
|
2 |
VIELZEUF V, LECHERVY A, PATEUX S, et al. CentralNet: a multilayer approach for multimodal fusion[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2019: 575-589.
|
3 |
周雪雪, 雷景生, 卓佳宁. 基于多模态特征学习的人体行为识别方法. 计算机系统应用, 2021, 30 (4): 146- 152.
URL
|
|
ZHOU X X, LEI J S, ZHUO J N. Human action recognition algorithm based on multi-modal features learning. Computer Systems & Applications, 2021, 30 (4): 146- 152.
URL
|
4 |
|
5 |
|
6 |
XU C, WU X, LI Y C, et al. Cross-modality online distillation for multi-view action recognition. Neurocomputing, 2021, 456, 384- 393.
doi: 10.1016/j.neucom.2021.05.077
|
7 |
CRASTO N, WEINZAEPFEL P, ALAHARI K, et al. MARS: motion-augmented RGB stream for action recognition[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 7874-7883.
|
8 |
STROUD J C, ROSS D A, SUN C, et al. D3D: distilled 3D networks for video action recognition[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision. Washington D. C., USA: IEEE Press, 2020: 614-623.
|
9 |
LI Y X, LU Z C, XIONG X H, et al. PERF-Net: pose empowered RGB-flow net[C]//Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision. Washington D. C., USA: IEEE Press, 2022: 798-807.
|
10 |
KENDALL A, GRIMES M, CIPOLLA R. PoseNet: a convolutional network for real-time 6-DOF camera relocalization[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2016: 2938-2946.
|
11 |
WU M C, CHIU C T, WU K H. Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks[C]//Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2019: 2202-2206.
|
12 |
SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+D: a large scale dataset for 3D human activity analysis[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 1010-1019.
|
13 |
CHEN C, JAFARI R, KEHTARNAVAZ N. UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]//Proceedings of IEEE International Conference on Image Processing. Washington D. C., USA: IEEE Press, 2015: 168-172.
|
14 |
WANG J, NIE X H, XIA Y, et al. Cross-view action modeling, learning, and recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2014: 2649-2656.
|
15 |
KUEHNE H, JHUANG H, GARROTE E, et al. HMDB: a large video database for human motion recognition[C]//Proceedings of International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2012: 2556-2563.
|
16 |
DAVOODIKAKHKI M, YIN K K. Hierarchical action classification with network pruning[C]//Proceedings of International Symposium on Visual Computing. Berlin, Germany: Springer, 2020: 291-305.
|
17 |
CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 1302-1310.
|
18 |
HARA K, KATAOKA H, SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 6546-6555.
|
19 |
WU H B, MA X, LI Y B. Spatiotemporal multimodal learning with 3D CNNs for video action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (3): 1250- 1261.
doi: 10.1109/TCSVT.2021.3077512
|
20 |
DHIMAN C, VISHWAKARMA D K. View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Transactions on Image Processing, 2020, 29, 3835- 3844.
doi: 10.1109/TIP.2020.2965299
|
21 |
SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 618-626.
|
22 |
PEREZ-RUA J M, VIELZEUF V, PATEUX S, et al. MFAS: multimodal fusion architecture search[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 6959-6968.
|
23 |
JOZE H R V, SHABAN A, IUZZOLINO M L, et al. MMTM: multimodal transfer module for CNN fusion[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 13286-13296.
|
24 |
MOON G, KWON H, LEE K M, et al. IntegralAction: pose-driven feature integration for robust human action recognition in videos[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 3334-3343.
|
25 |
DE BOISSIERE A M, NOUMEIR R. Infrared and 3D skeleton feature fusion for RGB-D action recognition. IEEE Access, 2020, 8, 168297- 168308.
doi: 10.1109/ACCESS.2020.3023599
|
26 |
QIU Z F, YAO T, NGO C W, et al. Learning spatio-temporal representation with local and global diffusion[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 12048-12057.
|
27 |
LIU M Y, YUAN J S. Recognizing human actions as the evolution of pose estimation maps[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 1159-1168.
|
28 |
XU W Y, WU M Q, ZHAO M, et al. Fusion of skeleton and RGB features for RGB-D human action recognition. IEEE Sensors Journal, 2021, 21 (17): 19157- 19164.
doi: 10.1109/JSEN.2021.3089705
|
29 |
ISLAM M M, IQBAL T. HAMLET: a hierarchical multimodal attention-based human activity recognition algorithm[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington D. C., USA: IEEE Press, 2021: 10285-10292.
|
30 |
DAS S, SHARMA S, DAI R, et al. VPN: learning video-pose embedding for activities of daily living[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 72-90.
|
31 |
WANG Y C, XIAO Y, XIONG F, et al. 3DV: 3D dynamic voxel for action recognition in depth video[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 508-517.
|
32 |
ZHU J G, ZOU W, ZHU Z, et al. Action machine: toward person-centric action recognition in videos. IEEE Signal Processing Letters, 2019, 26 (11): 1633- 1637.
doi: 10.1109/LSP.2019.2942739
|