[1]
[2]
[3]
[4]
[5]
CHENG F, BERTASIUS G. Tallformer: temporal action
localization with long-memory transformer[C]//Proceedi
ngs of the 17th European Conference on Computer Vis
ion. Cham, Switzerland: Springer Press, 2022: 503-521.
TIRUPATTUR P, Duarte K, RAWAT Y S, SHAH M.
Modeling multi-label action dependencies for temporal a
ction localization[C]//Proceedings of 2021 IEEE/CVF C
onference on Computer Vision and Pattern Recognition
(CVPR). Nashville, TN, USA: IEEE/CVF Press, 2021:
1460-1470.
ZHANG M, HU H Y, LI Z J. Temporal action localiz
ation with coarse-to-fine network[J]. IEEE Transactions
on Circuits and Systems for Video Technology, 2022 1
0(2): 96378-96387.
HE B, YANG X T, KANG L, CHENG Z Y, ZHOU
X F, SHRIVASTAVA A. ASM-Loc: action-aware segm
ent modeling for weakly-supervised temporal action loca
lization[C]//Proceedings of 2022 Conference on Comput
er Vision and Pattern Recognition(CVPR). New Orleans,
LA, USA: CVPR Press, 2022: 13915-13925.
HU Y F, FU J, CHEN M Y, GAO J Y, DONG J F,
FAN B, LIU H M. Learning proposal-aware re-ranking
for weakly-supervised temporal action localization[J]. I
EEE Transactions on Circuits and Systems for Video T
echnology, 2024, 34(1): 207-220.
[6]
[7]
[8]
[9]
郭文斌,杨兴明,蒋哲远,吴克伟,谢昭.多时间尺度一致性的
弱监督时序动作定位[J].计算机工程与应用,2023,59(10):1
51-161.
GUO W B, YANG X M, JIANG Z Y, WU K W, XI
E Z. Weakly supervised timing action localization with
multi-time scale consistency[J].Computer Engineering a
nd Application, 2023, 59(10): 151-161.(in Chinese)
侯永宏,李岳阳,郭子慧.基于对比学习的弱监督时序动作
定位[J].天津大学学报(自然科学与工程技术版),2023,56
(1):73-80.
HOU Y H, LI Y Y, GUO Z H. Weakly supervised te
mporal action localization based on contrastive learning
[J].Journal of Tianjin University (Natural Science and E
ngineering Technology Edition), 2023, 56(1): 73-80.(in
Chinese)
Zhou J X, Wu Y. Temporal feature enhancement dilate
d convolution network for weakly-supervised temporal a
ction localization[C]//Proceedings of 2021 IEEE/CVF Int
ernational Conference on Applications of Computer Visi
on(CVPR). Waikoloa, HI, USA: IEEE/CVF Press, 2023:
6028-6037.
Huang L, Wang L, Li H. Weakly supervised temporal
action localization via representative snippet knowledge
propagation[C]//Proceedings of 2022 IEEE/CVF Confer
ence on Computer Vision and Pattern Recognition(CVP
R). New Orleans, LA, USA: IEEE/CVF Press, 2022: 3
262-3271.
[10] QING Z W, SUHS, WEIHAO GAN, WANG D L, W
U W, WANG X, QIAO Y, YAN J J, GAO C X, SA
NG N. Temporal context aggregation network for temp
oral action proposal refinement[C]//Proceedings of 2021
Conference on Computer Vision and Pattern Recogniti
on(CVPR). Nashville, TN, USA: CVPR Press, 2021: 48
5-494.
[11] SRIDHAR D, QUADER N, MURALIDHARAN S, LI
Y, DAI P, LU J. Class semantics-based attention for ac
tion detection[C]//Proceedings of 2021 IEEE/CVF Intern
ational Conference on Computer Vision. Montreal, QC,
Canada: IEEE/CVF Press, 2021: 13719-13728.
[12] ZHENG Z, WANG P, LIU W, LI J, YE R, REN D.
Distance-IoU loss: faster and better learning for boundi
ng box regression[C]//Proceedings of the AAAI Confere
nce onArtificial Intelligence. [S. l.]: AAAI Press, 2020:
12993–13000.
[13] HUANG J, KONG M, CHEN L Y, et al. Temporal rp
n learning for weakly-supervised temporal action localiz
ation[C]// Proceedings of the 15th Asian Conference on
Machine Learning. New York, USA: PMLR Press, 20
24: 470-485.
[14] LIU Y, ZHU H, REN H, SHI J, WANG D. Fusion de
tection network with discriminative enhancement for we
akly-supervised temporal action localization[J]. Expert S
ystems with Applications, 2024, 238(2): 122000-122010.
[15] PAUL S, ROY S, ROY-CHOWDHURY A K. W-TAL
C: Weakly-Supervised Temporal Activity Localization a
nd Classification[C]//Proceedings of the 15th European
Conference on Computer Vision. Cham, Switzerland: Sp
ringer Press, 2018: 588-607.
[16] ZHANG C, CAO M, YANG D, CHEN J, ZOU Y. Co
la: Weakly-supervised temporal action localization with
snippet contrastive learning[C]//Proceedings of 2021 IEE
E/CVF Conference on Computer Vision and Pattern Re
cognition(CVPR). Nashville, TN, USA: IEEE/CVF Press,
2021: 16005-16014.
[17] GAO J, CHEN M, XU C. Fine-grained temporal contra
stive learning for weakly-supervised temporal action loc
alization[C]//Proceedings of 2022 IEEE/CVF Conference
on Computer Vision and Pattern Recognition(CVPR).
New Orleans, LA, USA: IEEE/CVF Press, 2022: 19967-19977.
[18] LIU P, WANG C, QIN J, LIN G. Feature enhancement
and foreground-background separation for weakly super
vised temporal action localization[C]//Proceedings of the
5thACM International Conference on Multimedia. New
York, USA: ACM Press, 2024: 1-7
[19] QU S, CHEN G, LI Z, ZHANG L, LU F, KNOLL A.
ACM-Net: action context modeling network for weakly-supervised temporal action localization[EB/OL]. [2021-0
4-07]. https://arxiv.org/abs/2104.02967.
[20] HONG F, FENG J, XU D, SHAN Y, ZHENG W. Cro
ss-modal consensus network for weakly supervised temp
oral action localization[C]//Proceedings of the 29th AC
M International Conference on Multimedia. New York,
USA: AVM Press, 2021: 1591-1599.
[21] JIANG H, TANG H, YAN M, ZHANG J, XU M, HU
Y, ZHU J, NIE L. Revisiting unsupervised temporal a
ction localization: the primacy of high-quality actionness
and pseudolabels[C]//Proceedings of the 32th ACM Int
ernational Conference on Multimedia. New York, USA:
ACM Press, 2024: 5643–5652.
[22] WANG C, WANG J, XU W. Double branch synergies
with modal reinforcement for weakly supervised tempo
ral action detection[J].Journal of Visual Communication
and Image Representation, 2024, 99: 104090-104097.
[23] LI Z, WANG Z, LIU Q. Weakly supervised temporal a
ction localization with actionness-guided false positive s
uppression[J]. Neural networks: the official journal of t
he International Neural Network Society, 2024, 175: 10
6307-106318.
[24] YUN W, QI M, WANG C, MA H. Weakly-Supervised
temporal action localization by inferring salient snippet-feature[C]//Proceedings of the AAAI Conference on Ar
tificial Intelligence. [S. l.]: AAAI Press, 2024: 6908-69
16.
[25] YANG W, ZHANG T, YU X, QI T, ZHANG Y, WU
F. Uncertainty guided collaborative training for weakly
supervised temporal action detection[C]//Proceedings of
2021 IEEE/CVF Conference on Computer Vision and
Pattern Recognition(CVPR). Nashville, TN, USA: IEEE/
CVF Press, 2021: 53–63.
[26] LUO Z, GUILLORY D, SHI B, KE W, WAN F, DA
RRELL T, XU H. Weakly-supervised action localization
with expectation-maximization multiinstance learning[C]
//Proceedings of the 16th European Conference on Com
puter Vision. Cham, Switzerland: Springer Press, 2020:
729-745.
[27] RADFORD A, KIM J W, HALLACY C, RAMESH A,
GOH G, AGARWAL S, SASTRY G, ASKELL A, M
ISHKIN P, CLARK J, et al. Learning transferable vis
ual models from natural language supervision[C]//Procee
dings of the International Conference on Machine Learn
ing. New York, USA: PMLR Press, 2021: 8748-8763.
[28] JU C, HAN T, ZHENG K, ZHANG Y, XIE W. Prom
pting visual-language models for efficient video underst
anding[C]//Proceedings of the 17th European Conference
on Computer Vision. Cham, Switzerland: Springer Pres
s, 2021: 105-124.
[29] LEI J, YU L, BERG T L, BANSAL M. Tvqa+: Spatio-temporal grounding for video question answering[EB/O
L]. [2019-04-225]. https://arxiv.org/abs/1904.11574.
[30] CARREIRA J, ZISSERMAN A. Quo vadis, action reco
gnition? a new model and the kinetics dataset[C]//Proce
edings of the 2017 IEEE Conference on Computer Visi
on and Pattern Recognition. Honolulu, HI, USA: CVPR
Press, 2017: 4724-4733.
[31] FEICHTENHOFER C. X3D: expanding architectures for
efficient video recognition[C]//Proceedings of the 2020
IEEE Conference on Computer Vision and Pattern Rec
ognition. Seattle, WA, USA: CVPR Press, 2020: 200-2
10.
[32] CICEK O, ABDULKADIR A, LIENKAMP S S, BRO
X T, RONNEBERGER O. 3D U-Net: learning dense v
olumetric segmentation from sparse annotation[C]//Proce
edings of the 19th International Conference on Medical
Image Computing and Computer-Assisted Intervention.
Cham, Switzerland: Springer Press, 2016: 424-432.
[33] LI X, ZHONG Z, WU J, YANG Y, LIN Z, LIU H. E
xpectation-maximization attention networks for semantic
segmentation[C]//Proceedings of 2019 IEEE/CVF Intern
ational Conference on Computer Vision. Seoul, Korea
(South): IEEE/CVF Press, 2019: 9166-9175.
[34] ALDOUS D, FILL J. Reversible markov chains and ra
ndom walks on graphs[J].Journal of Theoretical Probabil
ity, 1999, 2(1):91-100.
[35] PENNINGTON J, SOCHER R, MANNING C D. Glov
e: global vectors for word representation[C]//Proceedings
of the 2014 conference on empirical methods in natur
al language processing (EMNLP). Doha, Qatar: Associat
ion for Computational Linguistics Press, 2014: 1532–15
43.
[36] LIN Z, ZHAO Z, ZHANG Z, WANG Q, LIU H. Wea
kly-supervised video moment retrieval via semantic com
pletion network[C]//Proceedings of the AAAI Conferenc
e on Artificial Intelligence.[S. l.]: AAAI Press, 2020: 1
1539–11546.
[37] IDREES H, ZAMIR A, JIANG Y, GORBAN A, LAPT
EV I, SUKTHANKAR R, SHAH M. The THUMOS ch
allenge on action recognition for videos "in the wild"[J].
Computer Vision and Image Understanding, 2017, 155:
1-23.
[38] CABA HEILBRON F, ESCORCIA V, GHANEM B, et
al. ActivityNet:a large-scale video benchmark for huma
n activity understanding[C]//Proceedings of the 2015 IE
EE Conference on Computer Vision and Pattern Recogn
ition. Piscataway, NJ: IEEE Press, 2015: 961-970.
[39] LIN T, ZHAO X, SU H, WANG C, YANG M. BSN:
boundary sensitive network for temporal action proposal
generation[C]//Proceedings of the 15th European Confe
rence on Computer Vision. Cham, Switzerland: Springer
Press, 2018: 3-21.
[40] LONG F, YAO T, QIU Z, TIAN X, LUO J, MEI T.
Gaussian temporal awareness networks for action localiz
ation[C]//Proceedings of the 2019 IEEE Conference on
Computer Vision and Pattern Recognition. Long Beach,
CA, USA: CVPR Press, 2019: 344–353.
[41] ZHAO P, XIE L, JU C, ZHANG Y, TIAN Q. Bottom
up temporal action localization with mutual regularizatio
n[C]//Proceedings of the 16th European Conference on
Computer Vision. Cham, Switzerland: Springer Press, 2
020: 539-555.
[42] CHEN M, GAO J, YANG S, AND XU C. Dual-Evide
ntial learning for weakly-supervised temporal action loc
alization[C]//Proceedings of the 17th European Conferen
ce on Computer Vision. Cham, Switzerland: Springer P
ress, 2022: 192-208.
[43] TANG X, FAN J, LUO C, ZHANG Z, ZHANG M, Y
ANG Z. DDG-Net: discriminability-driven graph networ
k for weakly-supervised temporal action localization[C]//
Proceedings of the European Conference on Computer
Vision. Cham, Switzerland: Springer Press, 2023: 6599
6609.
[44] REN H, YANG W, ZHANG T, ZHANG Y. Proposal
based multiple instance learning for weakly-supervised t
emporal action localization[C]//Proceedings of the 2023
IEEE Conference on Computer Vision and Pattern Reco
gnition. Vancouver, BC, Canada: CVPR Press, 2023: 2
394-2404.
[45] ZHANG S C, ZHAO C H. Cross-Video contextual kno
wledge exploration and exploitation for ambiguity reduc
tion in weakly supervised temporal action localization[J].
IEEE Transactions on Circuits and Systems for Video
Technology, 2023, 34(6): 4568–4580.
[46] HUANG L, WANG L, LI H. Foreground-Action consis
tency network for weakly supervised temporal action lo
calization[C]//Proceedings of 2021 IEEE/CVF Internation
al Conference on Computer Vision. Montreal, QC, Cana
da: IEEE/CVF Press, 2021: 7982–7991.
[47] LI J, YANG T, JI W, WANG I, CHENG L. Exploring
denoised cross-video contrast for weakly-supervised te
mporal action localization[C]//Proceedings of the 2022 I
EEE Conference on Computer Vision and Pattern Reco
gnition. New Orleans, LA, USA: CVPR Press, 2022: 1
9882–19892.
[48] 曹雨欣.弱监督时序动作增量定位方法研究[D].陕西: 西
安理工大学, 2024.
CAO X Y. Research on incremental positioning method
of weakly supervised sequential motion[D].Shaanxi: Xi '
an University of Technology, 2024.(in Chinese)
[49] ZHAO Y, ZHANG H, GAO Z, GUAN W, WANG M,
CHEN S. A snippets relation and hard-snippets mask
network for weakly supervised temporal action localizati
on[J].IEEE Transactions on Circuits and Systems for Vi
deo Technology, 2024, 34(8): 7202-7215.
[50] CHO K, MERRIENBOER B V, GULCEHRE C, BAH
DANAU D, BOUGARES F, SCHWENK H, BENGIO
Y. Learning phrase representations using rnn encoder-de
coder for statistical machine translation[C]//Proceedings
of the 2014 Conference on Empirical Methods in Natur
al Language Processing (EMNLP). Doha, Qatar: Associ
ation for Computational Linguistics Press, 2014: 1724–1
734.
[51] HOCHREITER S SCHMIDHUBER J. Long short-term
memory[J]. Neural computation, 1997, 9(8): 1735–1780 |