Soft-Actor-Critic-and-Extensions

Rating

Similar

AutoGLM Rumination

perplexity deepresearch

openai deepresearch

deepseek com

AI AGENT

https://github com/dzhng/deep-research

gemini google

Information

# Soft-Actor-Critic-and-Extensions PyTorch implementation of **Soft-Actor-Critic** with the Extensions **PER** + **ERE** + **Munchausen RL** and the option for Multi-Environments for parallel data collection and faster training. _____________ This repository includes the newest Soft-Actor-Critic version ([Paper 2019](https://arxiv.org/abs/1812.05905)) as well as extensions for SAC: - **P**rioritized **E**xperience **R**eplay ([PER](https://arxiv.org/abs/1511.05952)) - **E**mphasizing **R**ecent **E**xperience without Forgetting the Past([ERE](https://arxiv.org/abs/1906.04009)) - Munchausen Reinforcement Learning [Paper](https://arxiv.org/abs/2007.14430) - D2RL: DEEP DENSE ARCHITECTURES IN REINFORCEMENT LEARNING [Paper](https://arxiv.org/pdf/2010.09163.pdf) - N-step Bootstrapping - Parallel Environments In the paper implementation of ERE the authors used and older version of SAC, whereas this repository contains the newest version of SAC as well as a Proportional Prioritization implementation of PER. #### TODO: - add IQN Critic [X] with IQN critic its 10x slower... need to fix that - adding D2DRL IQN Critic [ ] - create distributed SAC version with ray [ ] - added N-step bootstrapping [X] - Check performance with all add-ons [ ] - added pybulletgym #### Dependencies Trained and tested on:

Python 3.6
PyTorch 1.7.0  
Numpy 1.15.2 
gym 0.10.11 
pybulletgym

## How to use: The new script combines all extensions and the add-ons can be simply added by setting the corresponding flags. \`python run.py -info sac\` **Parameter:** To see the options: \`python run.py -h\`

-env, Environment name, default = Pendulum-v0
-per, Adding Priorizied Experience Replay to the agent if set to 1, default = 0
-munchausen, Adding Munchausen RL to the agent if set to 1, default = 0
-dist, --distributional, Using a distributional IQN Critic network if set to 1, default = 0
-d2rl, Uses Deep Actor and Deep Critic Networks if set to 1, default = 0
-n_step, Using n-step bootstrapping, default = 1
-ere, Adding Emphasizing Recent Experience to the agent if set to 1, default = 0
-info, Information or name of the run
-frames, The amount of training interactions with the environment, default is 100000
-eval_every, Number of interactions after which the evaluation runs are performed, default = 5000
-eval_runs, Number of evaluation runs performed, default = 1
-seed, Seed for the env and torch network weights, default is 0
-lr_a, Actor learning rate of adapting the network weights, default is 3e-4
-lr_c, Critic learning rate of adapting the network weights, default is 3e-4
-a, --alpha, entropy alpha value, if not choosen the value is leaned by the agent
-layer_size, Number of nodes per neural network layer, default is 256
-repm, --replay_memory, Size of the Replay memory, default is 1e6
-bs, --batch_size, Batch size, default is 256
-t, --tau, Softupdate factor tau, default is 0.005
-g, --gamma, discount factor gamma, default is 0.99
--saved_model, Load a saved model to perform a test run!
-w, --worker, Number of parallel worker (attention, batch-size increases proportional to worker number!), default = 1

## old scripts with the old scripts you can still run three different SAC versions *Run regular SAC:* \`python SAC.py -env Pendulum-v0 -ep 200 -info sac\` *Run SAC + PER:* \`python SAC_PER.py -env Pendulum-v0 -ep 200 -info sac_per\` *Run SAC + ERE + PER:* \`python SAC_ERE_PER.py -env Pendulum-v0 -frames 20000 -info sac_per_ere\` For further input arguments and hyperparameter check the code. ### Observe training results \`tensorboard --logdir=runs\` ## Results It can be seen that the extensions not always bring improvements to the algorithm. This is depending on the environment and from environment to environment different - as the authors mention in their paper (ERE). ![Pendulum](imgs/SAC_PENDULUM.jpg) ![LLC](imgs/SAC_LLC.jpg) - All runs without hyperparameter-tuning ## PyBullet Environments ![HalfCheetah](/imgs/HalfCheetahBulletEnv-v0.png) ![HalfCheetah](/imgs/HalfCheetahBulletEnv-v0-D2RL.png) ![Hopper](/imgs/HopperBulletEnv-v0.png) ## Comparison SAC and D2RL-SAC ![D2RL-Pendulum](imgs/Base_D2RL_SAC.png) ## Comparison SAC and M-SAC ![munchausenRL](imgs/SAC_MSAC_Pendulum_.png) ![munchausenRL2](imgs/SAC_MSAC_LL.png) ## Help and issues: Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me. ## Author - Sebastian Dittert **Feel free to use this code for your own projects or research.** \`\`\` @misc\{SAC, author = \{Dittert, Sebastian\}, title = \{PyTorch Implementation of Soft-Actor-Critic-and-Extensions\}, year = \{2020\}, publisher = \{GitHub\}, journal = \{GitHub repository\}, howpublished = \{\url\{https://github.com/BY571/Soft-Actor-Critic-and-Extensions\}\}, \} \`\`\`

Prompts

Reviews

Write Your Review

Detailed Ratings

ALL

Correctness

Helpfulness

Interesting

Upload Pictures and Videos

Name

Size

Type

Download

Last Modified

Community

Add Discussion

Upload Pictures and Videos

Chatbot close

Bot
Hi there
How can I help you today?

Send