作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (10): 418-428. doi: 10.19678/j.issn.1000-3428.0068701

• 开发研究与工程应用 • 上一篇    

基于混合动作强化学习的电动汽车聚合商决策优化算法

孔月萍*(), 杨世海, 段梅梅, 丁泽诚, 方凯杰   

  1. 国网江苏省电力有限公司营销服务中心, 江苏 南京 210019
  • 收稿日期:2023-10-29 出版日期:2024-10-15 发布日期:2024-01-25
  • 通讯作者: 孔月萍
  • 基金资助:
    国网江苏省电力有限公司科技项目(J2022127)

Optimal Decision-making Algorithm for Electric Vehicle Aggregator Based on Hybrid Action Reinforcement Learning

KONG Yueping*(), YANG Shihai, DUAN Meimei, DING Zecheng, FANG Kaijie   

  1. Marketing Service Center of State Grid Jiangsu Electric Power Co., Ltd., Nanjing 210019, Jiangsu, China
  • Received:2023-10-29 Online:2024-10-15 Published:2024-01-25
  • Contact: KONG Yueping

摘要:

电动汽车可以在聚合商的集中式管理下形成规模化灵活可调资源, 从而在能源市场上套利并为电网提供辅助服务。为此, 提出一种基于混合动作强化学习的电动汽车聚合商决策优化算法。该算法利用连续动作优化市场投标决策, 根据离散动作控制不同功率分解策略的动态切换, 从而实现市场投标与功率分解决策的联合优化。此外, 还提出了一种考虑单位灵活性价值的电动汽车聚合灵活性建模方法, 在最大化日总灵活性价值的同时确保每台汽车的充电需求得到满足。仿真实验结果表明, 动态策略切换能够充分利用优先级分解策略和比例分解策略在延缓电池衰减、维持电池双向调节灵活性方面的各自优势, 与仅考虑投标决策优化的算法相比, 所提算法可以进一步提升电动汽车充电站的运行经济性。

关键词: 强化学习, 混合动作输出, 电动汽车聚合商, 功率分解, 市场投标

Abstract:

Electric vehicles (EV), when managed centrally by aggregators, can be utilized as flexible and adjustable resources to participate in energy market arbitrage and provide ancillary services to the grid. To optimize this potential, this study introduces an advanced decision-making algorithm for EV aggregators based on hybrid action reinforcement learning. The algorithm uses continuous actions to optimize market bidding decisions and discrete actions to manage the dynamic switching between different power disaggregation strategies, realizing a joint optimization of market bidding and power disaggregation. In addition, the study presents an EV aggregator flexibility modelling method that considers the value of unit flexibility, aiming to maximize the total daily flexibility value while ensuring that the charging demand of each vehicle is met. Simulation results show that dynamic policy switching effectively leverages the strengths of both priority decomposition and proportional decomposition strategies, helping to reduce battery degradation and maintain the flexibility of two-way battery regulation. The proposed algorithm enhances the operational economy of EV charging stations, outperforming algorithms that focus solely on optimizing the bidding decision.

Key words: reinforcement learning, hybrid action output, Electric Vehicle(EV) aggregator, power allocation, market bidding