1 |
SUTTON R S, BARTO A G. Reinforcement learning: an introduction. Cambridge, USA: MIT Press, 2018.
|
2 |
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529- 533.
doi: 10.1038/nature14236
|
3 |
KOBER J, BAGNELL J A, PETERS J. Reinforcement learning in robotics: a survey. The International Journal of Robotics Research, 2013, 32(11): 1238- 1274.
doi: 10.1177/0278364913495721
|
4 |
SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484- 489.
doi: 10.1038/nature16961
|
5 |
|
6 |
|
7 |
NAGABANDI A, KAHN G, FEARING R S, et al. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Washington D. C., USA: IEEE Press, 2018: 7559-7566.
|
8 |
LORIS R, JEYHOON M, PAOLO F, et al. Model-based reinforcement learning variable impedance control for human-robot collaboration. Journal of Intelligent&Robotic Systems, 2020, 100, 417- 433.
|
9 |
SONTAKKE S, MEHRJOU A, ITTI L, et al. Causal curiosity: RL agents discovering self-supervised experiments for causal representation learning[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: IMLS, 2021: 9848-9858.
|
10 |
WANG Z H, XIAO X S, ZHU Y K, et al. Task-independent causal state abstraction[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, Robot Learning. Washington D. C., USA: IEEE Press, 2021: 1-10.
|
11 |
TOMAR M, ZHANG A, CALANDRA R, et al. Model-invariant state abstractions for model-based reinforcement learning[EB/OL]. [2023-10-12]. https://arxiv.org/pdf/2102.09850.
|
12 |
ZHANG A, MCALLISTER R, CALANDRA R, et al. Learning invariant representations for reinforcement learning without reconstruction[EB/OL]. [2023-10-12]. https://arxiv.org/pdf/2006.10742.
|
13 |
DING W H, LIN H H, LI B, et al. Generalizing goal-conditioned reinforcement learning with variational causal reasoning[EB/OL]. [2023-10-12]. https://arxiv.org/pdf/2207.09081v6.
|
14 |
|
15 |
|
16 |
WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: IMLS, 2016: 1995-2003.
|
17 |
|
18 |
SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: IMLS, 2015: 1889-1897.
|
19 |
MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: IMLS, 2016: 1928-1937.
|
20 |
HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft Actor-Critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: IMLS, 2018: 1861-1870.
|
21 |
DEPEWEG S, HERNÁNDEZ-LOBATO J M, DOSHI-VELEZ F, et al. Learning and policy search in stochastic dynamical systems with Bayesian neural networks[EB/OL]. [2023-10-12]. https://arxiv.org/pdf/1605.07127.
|
22 |
DEISENROTH M, RASMUSSEN C E. PILCO: a model-based and data-efficient approach to policy search[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: IMLS, 2011: 465-472.
|
23 |
SPIRTES P, GLYMOUR C. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 1991, 9(1): 62- 72.
doi: 10.1177/089443939100900106
|
24 |
SPIRTES P, GLYMOUR C, SCHEINES R. Causation, prediction, and search. Cambridge, USA: MIT Press, 2001.
|
25 |
CHICKERING D M. Optimal structure identification with greedy search. Journal of Machine Learning Research, 2002, 3(Nov): 507- 554.
|
26 |
CAI R C, QIAO J, ZHANG Z, et al. SELF: structural equational likelihood framework for causal discovery[C]//Proceedings of the AAAI Conference on Artificial Intelligence. [S. l. ]: AAAI Press, 2018: 1-8.
|
27 |
SPIRTES P, GLYMOUR C, SCHEINES R. Constructing Bayesian networks models of gene expression networks from microarray data[C]//Proceedings of the Atlantic Symposium on Computational Biology. Atlantic, USA: [s. n. ]: 2000, 255-259.
|
28 |
MALINSKY D, SPIRTES P. Causal structure learning from multivariate time series in settings with unmeasured confounding[C]//Proceedings of the 2018 ACM SIGKDD Workshop on Causal Discovery. New York, USA: ACM Press, 2018: 23-47.
|
29 |
GERHARDUS A, RUNGE J. High-recall causal discovery for autocorrelated time series with latent confounders. Advances in Neural Information Processing Systems, 2020, 33, 615- 625.
|
30 |
CAI R C, WU S Y, QIAO J, et al. THP: topological Hawkes processes for learning granger causality on event sequences[EB/OL]. [2023-10-12]. https://arxiv.org/pdf/2105.10884.
|
31 |
HUANG B, ZHANG K, ZHANG J, et al. Causal discovery from heterogeneous/nonstationary data. Journal of Machine Learning Research, 2020, 21(89): 1- 53.
|
32 |
|
33 |
PEARL J. Models, reasoning and inference. Cambridge, UK: Cambridge University Press, 2000,
|