
Separated Trust Regions Policy Optimization Method
Published on Feb 4, 202523 Views
In this work, we propose a moderate policy update method for reinforcement learning, which encourages the agent to explore more boldly in early episodes but updates the policy more cautious. Based on