[1] JING G, ELIANA S, QI W, JESSE T, XIN E W, et al.
Vision-and-Language Navigation: A Survey of Tasks,
Methods, and Future Directions[J], Proceedings of the 60th
Annual Meeting of the Association for Computational
Linguistics (Volume 1 Long Papers), 2022
[2] ANDERSON P, WU Q, TENEY D, BRUCE J, JOHNSON
M, SÜNDERHAUF N, et al. Vision-and-language
navigation: Interpreting visually-grounded navigation
instructions in real environments. In: Pro-ceedings of the
IEEE Conference on Computer Vision and Pat-tern
Recognition. Salt Lake City, USA: IEEE, 2018:
3674−3683.
[3] KEJI H, YAN H, QI W, JIANHUA Y, DONG A,
SHUANGLIN S, LIANG W, et al. Landmark-RxR:
Solving Vision-and-Language Navigation with
Fine-Grained Alignment Supervision[C], Conference on
Neural Information Processing Systems, 2021, 34.
[4] WANRONG Z, XIN E W, TSU-JUI F, AN Y,
PRADYUMNA N, KAZOO S, SUGATO B, WILLIAM Y
W, et al. Multimodal Text Style Transfer for Outdoor
Vision-and-Language Navigation[J], Computing Research
Repository, 2021.
[5] HUI WEI, LUPING WANG. Visual Navigation Using
Projection of Spatial Right-Angle in Indoor
Environment.[J], IEEE Transactions on Image Processing,
2018, 27(7): 3164-3177.
[6] GENGZE ZHOU, YICONG HONG, QI WU. NavGPT:
Explicit Reasoning in Vision-and-Language Navigation
with Large Language Models[J], Computing Research
Repository, 2024: 7641-7649.
[7] BINGQIAN L, YUNSHUANG N, ZIMING W, JIAQI C,
SHIKUI M, JIANHUA H, HANG X, XIAOJUN C,XIAODAN L, et al. NavCoT: Boosting LLM-Based
Vision-and-Language Navigation via Learning
Disentangled Reasoning[J], CoRR, 2024, abs/2403.07376.
[8] CHIH-YAO M, JIASEN L, ZUXUAN W, GHASSAN A,
ZSOLT K, RICHARD S, CAIMING X, et al.
Self-Monitoring Navigation Agent Via Auxiliary Progress
Estimation.[J], arXiv: Artificial Intelligence, 2019,
abs/1901.03035
[9] NIYATI R, ROBERTO B, LORENZO B, RITA C, et al.
AIGeN: an Adversarial Approach for Instruction
Generation in VLN[J], Computer Vision and Pattern
Recognition, 2024: 2070-2080.
[10] FENGDA Z, XIWEN L, YI Z, QIZHI Y, XIAOJUN C,
XIAODAN L, et al. SOON: Scenario Oriented Object
Navigation with Graph-based Exploration[J],
Proceedings - IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, 2021,
abs/2103.17138.
[11] JESSE T, MICHAEL M, MAYA C, LUKE Z, et al.
Vision-and-Dialog Navigation.[J], CoRL, 2019: 394-406.
[12] DANIEL F, RONGHANG H, VOLKAN C, ANNA R,
JACOB A, LOUIS-PHILIPPE M, TAYLOR B, KATE S,
DAN K, TREVOR D, et al. Speaker-Follower Models for
Vision-and-Language Navigation.[J], Neural Information
Processing Systems, 2018, 31: 3314-3325.
[13] YICONG H, CRISTIAN R, YUANKAI Q, QI W,
STEPHEN G, et al. Language and Visual Entity
Relationship Graph for Agent Navigation[J], Neural
Information Processing Systems, 2020, abs/2010.09304:
7685-7696.
[14] 王少桐, 况立群, 韩慧妍, 熊风光, 薛红新. 基于优势后
见经验回放的强化学习导航方法[J]. 计算机工程, 2024,
50(1): 313-319.
WANG, S. T., KUANG, L. Q., HAN, H. Y., XIONG, F. G.,
& XUE, H. X. Reinforcement Learning Navigation Method
Based on Advantage Experience Replay. Computer
Engineering, 2024, 50(1), 313-319.
[15] 闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技
术综述[J]. 计算机工程, 2021, 47(10): 16-25.
YAN, J. J., ZHANG, K. S., & HU, X. P. Review of Path
Planning Technology Based on Reinforcement Learning.
Computer Engineering, 2021, 47(10), 16-25.
[16] HOWARD C, ALANE S, DIPENDRA M, NOAH S,
YOAV A, et al. Touchdown: Natural Language Navigation
and Spatial Reasoning in Visual Street Environments[J],
Computing Research Repository, 2019: 12538-12547.
[17] DEVENDRA S C, KANTHASHREE M S, RAMA K P,
DHEERAJ R, RUSLAN S, et al. Gated-Attention
Architectures for Task-Oriented Language Grounding.[J],
Proceedings of the AAAI Conference on Artificial
Intelligence, 2018, 32(1).
[18] DANNY D, FEI X, MEHDI S M S, COREY L,
AAKANKSHA C, BRIAN I, AYZAAN W, JONATHAN T,
QUAN V, TIANHE ( Y, WENLONG H, YEVGEN C,
PIERRE S, DANIEL D, SERGEY L, VINCENT V,
KAROL H, MARC T, KLAUS G, ANDY Z, IGOR M,
PETE F, et al. PaLM-E: an Embodied Multimodal
Language Model[C], International Conference on Machine
Learning, 2023.
[19] LONG Y, LI X, CAI W, DONG H, et al. Discuss Before
Moving: Visual Language Navigation Via Multi-expert
Discussions[C], IEEE International Conference on
Robotics and Automation, 2024.
[20] XINGHANG L, MINGHUAN L, HANBO Z, CUNJUN Y,
JIE X, HONGTAO W, CHILAM C, YA J, WEINAN Z,
HUAPING L, HANG L, TAO K, et al. Vision-Language
Foundation Models As Effective Robot Imitators[C],
International Conference on Learning Representations,
2024.
[21] MICHAEL A, ANTHONY B, NOAH B, YEVGEN C,
OMAR C, BYRON D, CHELSEA F, CHUYUAN F,KEERTHANA G, KAROL H, ALEX H, DANIEL H,
JASMINE H, JULIAN I, BRIAN I, ALEX I, ERIC J,
ROSARIO J R, KYLE J, SALLY J, NIKHIL J J, RYAN J,
DMITRY K, YUHENG K, KUANG-HUEI L, SERGEY L,
YAO L, LINDA L, CAROLINA P, PETER P, JORNELL Q,
KANISHKA R, JAREK R, DIEGO R, PIERRE S,
NICOLAS S, CLAYTON T, ALEXANDER T, VINCENT
V, FEI X, TED X, PENG X, SICHUN X, MENGYUAN Y,
ANDY Z, et al. Do As I Can, Not As I Say: Grounding
Language in Robotic Affordances[J], arXiv preprint arXiv,
2022, 2204(01691): 287-318.
[22] ABHINAV R, KARAN S, XIAO L, BHORAM L,
HAN-PANG C, ALVARO V, et al. SayNav: Grounding
Large Language Models for Dynamic Planning to
Navigation in New Environments[J], Proceedings of the
International Conference on Automated Planning and
Scheduling, 2024, 34: 464-474.
[23] YIJIE G, JUNHYUK O, SATINDER S, HONGLAK L, et
al. Generative Adversarial Self-Imitation Learning[J],
Computing Research Repository, 2018, abs/1812.00950.
[24] LIYIMING K, XIUJUN L, YONATAN B, ARI H, ZHE G,
JINGJING L, JIANFENG G, YEJIN C, SIDDHARTHA S,
et al. Tactical Rewind: Self-Correction Via Backtracking in
Vision-And-Language Navigation[J], Proceedings - IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition, 2019: 6741-6749.
[25] GEORG O, MARC G B, AARON V D O, REMI M, et al.
Count-Based Exploration with Neural Density Models[C],
International Conference on Machine Learning, 2017:
2721-2730.
[26] HANQING W, WENGUAN W, WEI L, CAIMING X,
JIANBING S, et al. Structured Scene Memory for
Vision-Language Navigation[J], 2021 IEEE/CVF
Conference on Computer Vision and Pattern Recognition
(CVPR), 2021.
[27] JUNNAN L, DONGXU L, CAIMING X, STEVEN H, et al.
BLIP: Bootstrapping Language-Image Pre-training for
Unified Vision-Language Understanding and
Generation.[J], International Conference on Machine
Learning, 2022, abs/2201.12086: 12888-12900.
[28] WILL DABNEY, GEORG OSTROVSKI, ANDRÉ
BARRETO. Temporally-Extended Ε-Greedy
Exploration.[C], International Conference on Learning
Representations, 2021.
[29] EDWARD J H, YELONG S, PHILLIP W, ZEYUAN A,
YUANZHI L, SHEAN W, WEIZHU C, et al. LoRA:
Low-Rank Adaptation of Large Language Models[J],
Computing Research Repository, 2022, abs/2106.09685.
[30] HUILIN T, JINGKE M, WEI-SHI Z, YUAN-MING L,
JUNKAI Y, YUNONG Z, et al. Loc4Plan: Locating Before
Planning for Outdoor Vision and Language Navigation[J],
ACM International Conference on Multimedia, 2024:
4073-4081.
[31] XIAOHUA Z, BASIL M, ALEXANDER K, LUCAS B, et
al. Sigmoid Loss for Language Image Pre-Training[J],
Computing Research Repository, 2023: 11941-11952.
|