| 1 |
MASCARO E V, SLIWOWSKI D, LEE D. HOI4ABOT: human-object interaction anticipation for human intention reading collaborative roBOTs[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2309.16524.
|
| 2 |
BENMESSABIH T , SLAMA R , HAVARD V , et al. Online human motion analysis in industrial context: a review. Engineering Applications of Artificial Intelligence, 2024, 131, 107850.
doi: 10.1016/j.engappai.2024.107850
|
| 3 |
DRAGAN A D, SRINIVASA S. Formalizing assistive teleoperation[M]//ROY N, NEWMAN P, SRINIVASA S. Robotics: science and systems Ⅷ. Cambridge, USA: MIT Press, 2012: 73-80.
|
| 4 |
WANG Z K, DEISENROTH M, BEN AMOR H, et al. Probabilistic modeling of human movements for intention inference[M]//ROY N, NEWMAN P, SRINIVASA S. Robotics: science and systems Ⅷ. Cambridge, USA: MIT Press, 2012: 433-440.
|
| 5 |
KOPPULA H S , SAXENA A . Anticipating human activities using object affordances for reactive robotic response. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (1): 14- 29.
doi: 10.1109/TPAMI.2015.2430335
|
| 6 |
DANG L M , MIN K , WANG H X , et al. Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recognition, 2020, 108, 107561.
doi: 10.1016/j.patcog.2020.107561
|
| 7 |
ZIAEEFARD M , BERGEVIN R . Semantic human activity recognition: a literature review. Pattern Recognition, 2015, 48 (8): 2329- 2345.
doi: 10.1016/j.patcog.2015.03.006
|
| 8 |
GRAUMAN K, WESTBURY A, BYRNE E, et al. Ego4D: around the world in 3, 000 hours of egocentric video[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 18995-19012.
|
| 9 |
PASCA R, GAVRYUSHIN A, HAMZA M, et al. Summarize the past to predict the future: natural language descriptions of context boost multimodal object interaction anticipation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2024: 18286-18296.
|
| 10 |
MUR-LABADIA L, MARTINEZ-CANTIN R, GUERRERO J J, et al. AFF-ttention! Affordances and attention models for short-term object interaction anticipation[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2025: 167-184.
|
| 11 |
FURNARI A, BATTIATO S, FARINELLA G M. Leveraging uncertainty to rethink loss functions and evaluation measures for egocentric action anticipation[C]//Proceedings of ECCV'19. Berlin, Germany: Springer, 2019: 389-405.
|
| 12 |
PEI B Q, CHEN G, XU J L, et al. EgoVideo: exploring egocentric foundation model and downstream adaptation[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2406.18070.
|
| 13 |
RAJASEGARAN J, RADOSAVOVIC I, RAVISHANKAR R, et al. An empirical study of autoregressive pre-training from videos[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2501.05453.
|
| 14 |
CHO H, KANG D U, CHUN S Y. Short-term object interaction anticipation with disentangled object detection @ Ego4D short term object interaction anticipation challenge[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2407.05713.
|
| 15 |
|
| 16 |
RAGUSA F, FARINELLA G M, FURNARI A. StillFast: an end-to-end approach for short-term object interaction anticipation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2023: 3636-3645.
|
| 17 |
|
| 18 |
KIM S, HUANG D J, XIAN Y Q, et al. PALM: predicting actions through language models[C]//Proceedings of ECCV'24. Berlin, Germany: Springer, 2024: 140-158.
|
| 19 |
|
| 20 |
TRAN V, WANG Y, ZHANG Z K, et al. Knowledge distillation for human action anticipation[C]//Proceedings of the IEEE International Conference on Image Processing (ICIP). Washington D.C., USA: IEEE Press, 2021: 2518-2522.
|
| 21 |
MANOUSAKI V, PAPOUTSAKIS K, ARGYROS A. Graphing the future: activity and next active object prediction using graph-based activity representations[C]//Proceedings of Advances in Visual Computing. Berlin, Germany: Springer, 2022: 299-312.
|
| 22 |
RASOULI A, KOTSERUBA I, TSOTSOS J K. Pedestrian action anticipation using contextual feature fusion in stacked RNNs[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2005.06582.
|
| 23 |
OSMAN N, CAMPORESE G, COSCIA P, et al. SlowFast rolling-unrolling LSTMs for action anticipation in egocentric videos[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Washington D.C., USA: IEEE Press, 2021: 3430-3438.
|
| 24 |
GIRDHAR R, GRAUMAN K. Anticipative video Transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2022: 13485-13495.
|
| 25 |
GU X, QIU J N, GUO Y, et al. TransAction: ICL-SJTU submission to EPIC-kitchens action anticipation challenge 2021[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2107.13259.
|
| 26 |
MIECH A, LAPTEV I, SIVIC J, et al. Leveraging the present to anticipate the future in videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2020: 2915-2922.
|
| 27 |
ZHANG T Y, MIN W Q, ZHU Y, et al. An egocentric action anticipation framework via fusing intuition and analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 402-410.
|
| 28 |
DESSALENE E , DEVARAJ C , MAYNORD M , et al. Forecasting action through contact representations from first person video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (6): 6703- 6714.
doi: 10.1109/TPAMI.2021.3055233
|
| 29 |
TAI T M, FIAMENI G, LEE C K, et al. Unified recurrence modeling for video action anticipation[C]//Proceedings of the 26th International Conference on Pattern Recognition (ICPR). Washington D.C., USA: IEEE Press, 2022: 3273-3279.
|
| 30 |
张天予, 闵巍庆, 韩鑫阳, 等. 视频中的未来动作预测研究综述. 计算机学报, 2023, 46 (6): 1315- 1338.
|
|
ZHANG T Y , MIN W Q , HAN X Y , et al. A survey on future action anticipation in videos. Chinese Journal of Computers, 2023, 46 (6): 1315- 1338.
|
| 31 |
NI Z F , VALLS MASCARÓ E , AHN H , et al. Human-object interaction prediction in videos through gaze following. Computer Vision and Image Understanding, 2023, 233, 103741.
doi: 10.1016/j.cviu.2023.103741
|
| 32 |
LIU M, TANG S Y, LI Y, et al. Forecasting human-object interaction: joint prediction of motor attention and actions in first person video[C]//Proceedings of ECCV'20. Berlin, Germany: Springer, 2020: 704-721.
|
| 33 |
THAKUR S, BEYAN C, MORERIO P, et al. Enhancing next active object-based egocentric action anticipation with guided attention[C]//Proceedings of the IEEE International Conference on Image Processing (ICIP). Washington D.C., USA: IEEE Press, 2023: 1450-1454.
|
| 34 |
WANG X, ZHANG S W, QING Z W, et al. OadTR: online action detection with transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2022: 7545-7555.
|
| 35 |
GIRASE H, AGARWAL N, CHOI C, et al. Latency matters: real-time action forecasting Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 18759-18769.
|
| 36 |
GUERMAL M, ALI A, DAI R, et al. JOADAA: joint online action detection and action anticipation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2024: 6875-6884.
|
| 37 |
CHEN J X, LI X Y, CAO J H, et al. RHINO: learning real-time humanoid-human-object interaction from human demonstrations[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2502.13134.
|
| 38 |
FERNANDO B, HERATH S. Anticipating human actions by correlating past with the future with Jaccard similarity measures[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 13219-13228.
|
| 39 |
ROY D, FERNANDO B. Action anticipation using latent goal learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE Press, 2022: 808-816.
|
| 40 |
XU X Y, LI Y L, LU C W. Learning to anticipate future with dynamic context removal[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 12724-12734.
|
| 41 |
莫凌飞, 蒋红亮, 李煊鹏. 基于深度学习的视频预测研究综述. 智能系统学报, 2018, 13 (1): 85- 96.
|
|
MO L F , JIANG H L , LI X P . Review of deep learning-based video prediction. CAAI Transactions on Intelligent Systems, 2018, 13 (1): 85- 96.
|
| 42 |
LIU T S, LAM K M. A hybrid egocentric activity anticipation framework via memory-augmented recurrent and one-shot representation forecasting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 13894-13903.
|
| 43 |
WU C Y, LI Y H, MANGALAM K, et al. MeMViT: memory-augmented multiscale vision Transformer for efficient long-term video recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 13577-13587.
|
| 44 |
DIKO A, AVOLA D, PRENKAJ B, et al. Semantically guided representation learning for action anticipation[C]//Proceedings of the ECCV'25. Berlin, Germany: Springer, 2025: 448-466.
|
| 45 |
CAO C Q , SUN Z , LÜ Q Y , et al. VS-TransGRU: a novel Transformer-GRU-based framework enhanced by visual-semantic fusion for egocentric action anticipation. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (11): 11605- 11618.
doi: 10.1109/TCSVT.2024.3425598
|
| 46 |
SENER F, SINGHANIA D, YAO A. Temporal aggregate representations for long-range video understanding[C]//Proceedings of ECCV'20. Berlin, Germany: Springer, 2020: 154-171.
|
| 47 |
GUO H J, AGARWAL N, LO S Y, et al. Uncertainty-aware action decoupling Transformer for action anticipation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2024: 18644-18654.
|
| 48 |
WANG J H, CHEN G, HUANG Y F, et al. Memory-and-anticipation Transformer for online action understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2024: 13778-13789.
|
| 49 |
CAMPORESE G, COSCIA P, FURNARI A, et al. Knowledge distillation for action anticipation via label smoothing[C]//Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Washington D.C., USA: IEEE Press, 2021: 3312-3319.
|
| 50 |
ROY D, FERNANDO B. Predicting the next action by modeling the abstract goal[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Berlin, Germany: Springer, 2024: 162-177.
|
| 51 |
QI Z B , WANG S H , ZHANG W G , et al. Uncertainty-boosted robust video activity anticipation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (12): 7775- 7792.
doi: 10.1109/TPAMI.2024.3393730
|
| 52 |
HAN X , ZHANG Z Y , DING N , et al. Pre-trained models: past, present and future. AI Open, 2021, 2, 225- 250.
doi: 10.1016/j.aiopen.2021.08.002
|
| 53 |
VONDRICK C, PIRSIAVASH H, TORRALBA A. Anticipating visual representations from unlabeled video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 98-106.
|
| 54 |
ZHONG Y, ZHENG W S. Unsupervised learning for forecasting action representations[C]//Proceedings of the 25th IEEE International Conference on Image Processing (ICIP). Washington D.C., USA: IEEE Press, 2018: 1073-1077.
|
| 55 |
WU Y , ZHU L C , WANG X H , et al. Learning to anticipate egocentric actions by imagination. IEEE Transactions on Image Processing, 2021, 30, 1143- 1152.
doi: 10.1109/TIP.2020.3040521
|
| 56 |
|
| 57 |
ROTONDO T, FARINELLA G, TOMASELLI V, et al. Action anticipation from multimodal data[C]//Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Prague, Czech Republic: Science and Technology Publications, 2019: 154-161.
|
| 58 |
ZATSARYNNA O, ABU FARHA Y, GALL J. Multi-modal temporal convolutional network for anticipating actions in egocentric videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2021: 2249-2258.
|
| 59 |
SHEN Y, NI B B, LI Z F, et al. Egocentric activity prediction via event modulated attention[C]//Proceedings of ECCV'18. Berlin, Germany: Springer, 2018: 202-217.
|
| 60 |
MANOUSAKI V, BACHARIDIS K, PAPOUTSAKIS K, et al. VLMAH: visual-linguistic modeling of action history for effective action anticipation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Washington D.C., USA: IEEE Press, 2023: 1909-1919.
|
| 61 |
ZHONG Z Y, SCHNEIDER D, VOIT M, et al. Anticipative feature fusion Transformer for multi-modal action anticipation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2023: 6057-6066.
|
| 62 |
KIM M H , JUNG J W , LEE E G , et al. Disentangled adaptive fusion Transformer using adversarial perturbation for egocentric action anticipation. Expert Systems with Applications, 2025, 282, 127648.
|
| 63 |
GHOSH S, AGGARWAL T, HOAI M, et al. Text-derived knowledge helps vision: a simple cross-modal distillation for video-based action anticipation[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2210.05991.
|
| 64 |
WANG S S , ZHANG C , WANG L , et al. Long and short-term collaborative decision-making Transformer for online action detection and anticipation. Pattern Recognition, 2025, 168, 111773.
|
| 65 |
|
| 66 |
NAGARAJAN T, LI Y H, FEICHTENHOFER C, et al. Ego-topo: environment affordances from egocentric video[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 160-169.
|
| 67 |
HUANG Y, YANG X S, XU C S. Multimodal global relation knowledge distillation for egocentric action anticipation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM Press, 2021: 245-254.
|
| 68 |
CHANG C Y, HUANG D A, XU D F, et al. Procedure planning in instructional videos[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 334-350.
|
| 69 |
DAMEN D M, DOUGHTY H, FARINELLA G M, et al. Scaling egocentric vision: the "equation missing" dataset[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 753-771.
|
| 70 |
GIBSON J J. The theory of affordances [M]//JEN J G, WILLIAM M, CINDI K, et al. The people, place, and space reader. London, UK: Routledge, 2014: 56-60.
|
| 71 |
DO T T, NGUYEN A, REID I. AffordanceNet: an end-to-end deep learning approach for object affordance detection[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Washington D.C., USA: IEEE Press, 2018: 5882-5889.
|
| 72 |
MYERS A, TEO C L, FERMVLLER C, et al. Affordance detection of tool parts from geometric features[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Washington D.C., USA: IEEE Press, 2015: 1374-1381.
|
| 73 |
NGUYEN A, KANOULAS D, CALDWELL D G, et al. Object-based affordances detection with convolutional neural networks and dense conditional random fields[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Washington D.C., USA: IEEE Press, 2017: 5908-5915.
|
| 74 |
NAGARAJAN T, FEICHTENHOFER C, GRAUMAN K. Grounded human-object interaction hotspots from video[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2019: 8687-8696.
|
| 75 |
LUO H C , ZHAI W , ZHANG J , et al. Learning visual affordance grounding from demonstration videos. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35 (11): 16857- 16871.
doi: 10.1109/TNNLS.2023.3298638
|
| 76 |
LI G, JAMPANI V, SUN D Q, et al. LOCATE: localize and transfer object parts for weakly supervised affordance grounding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 10922-10931.
|
| 77 |
ROY D, RAJENDIRAN R, FERNANDO B. Interaction region visual Transformer for egocentric action anticipation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2024: 6726-6736.
|
| 78 |
LIU S W, TRIPATHI S, MAJUMDAR S, et al. Joint hand motion and interaction hotspots prediction from egocentric videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 3272-3282.
|
| 79 |
JIANG J J , NAN Z X , CHEN H , et al. Predicting short-term next-active-object through visual attention and hand position. Neurocomputing, 2021, 433, 212- 222.
|
| 80 |
GUAN J Q, YUAN Y, KITANI K M, et al. Generative hybrid representations for activity forecasting with no-regret learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 170-179.
|
| 81 |
FATHI A, REN X F, REHG J M. Learning to recognize objects in egocentric activities[C]//Proceedings of the CVPR'11. Washington D.C., USA: IEEE Press, 2011: 3281-3288.
|
| 82 |
YIN L, YE Z F, REHG J M. Delving into egocentric actions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2015: 287-295.
|
| 83 |
FATHI A, LI Y, REHG J M. Learning to recognize daily actions using gaze[C]//Proceedings of ECCV'12. Berlin, Germany: Springer, 2012: 314-327.
|
| 84 |
LI Y, LIU M, REHG J M. In the eye of beholder: joint learning of gaze and actions in first person video[C]//Proceedings of ECCV'18. Berlin, Germany: Springer, 2018: 639-655.
|
| 85 |
刘华虓, 于金艳, 宋申苧, 等. 移动互联网信息无障碍研究综述. 吉林大学学报(理学版), 2025, 63 (1): 124- 138.
|
|
LIU H X , YU J Y , SONG S N , et al. A review on information accessibility in mobile internet. Journal of Jilin University (Science Edition), 2025, 63 (1): 124- 138.
|
| 86 |
DAMEN D , DOUGHTY H , FARINELLA G M , et al. Rescaling egocentric vision: collection, pipeline and challenges for EPIC-KITCHENS-100. International Journal of Computer Vision, 2022, 130 (1): 33- 55.
|
| 87 |
SONG Y L , BYRNE E , NAGARAJAN T , et al. Ego4D goal-step: toward hierarchical understanding of procedural activities. Advances in Neural Information Processing Systems, 2023, 36, 38863- 38886.
|
| 88 |
FURNARI A, FARINELLA G. What would you expect? Anticipating egocentric actions with rolling-unrolling LSTMs and modality attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2019: 6251-6260.
|
| 89 |
|
| 90 |
ROY D , FERNANDO B . Action anticipation using pairwise human-object interactions and Transformers. IEEE Transactions on Image Processing, 2021, 30, 8116- 8129.
|
| 91 |
QI Z B , WANG S H , SU C , et al. Self-regulated learning for egocentric video activity anticipation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (6): 6715- 6730.
|
| 92 |
LIU X , HAO C , YU Z T , et al. From recognition to prediction: leveraging sequence reasoning for action anticipation. ACM Transactions on Multimedia Computing, Communications, and Applications, 2024, 20 (11): 1- 19.
|
| 93 |
THAKUR S, BEYAN C, MORERIO P, et al. Leveraging next-active objects for context-aware anticipation in egocentric videos[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2024: 8642-8651.
|
| 94 |
TONG Z, SONG Y B, WANG J, et al. VideoMAE: masked autoencoders are data-efficient learners for self-supervised video pre-training[EB/OL]. [2025-03-11]. https://arxiv.org/abs/2203.12602.
|