mathworks com

Rating

Information

MATLAB Help Center MathWorks MATLAB Answers File Exchange Videos Online Training Blogs Cody MATLAB Drive ThingSpeak Bug Reports Community Toggle navigation Contents Documentation Home Control Systems Applications Quadruped Robot Locomotion Using DDPG Agent Main Content Copy Code Copy Code Get Copy Copy Copy "twister" struct with fields: Get Copy Copy Copy Get Copy Copy Copy "rlQuadrupedRobot" 25Ts/Tf 25Ts/Tf 25Ts/Tf 25Ts/Tf 25Ts/Tf 25Ts/Tf 25 Ts T T s s s / Tf T T f f f rt=vx+25TsTf−50ˆy2−20θ2−0.02∑iuit−12 rt=vx+25TsTf−50ˆy2−20θ2−0.02∑iuit−12 rt=vx+25TsTf−50ˆy2−20θ2−0.02∑iuit−12 rt=vx+25TsTf−50ˆy2−20θ2−0.02∑iuit−12 rt=vx+25TsTf−50ˆy2−20θ2−0.02∑iuit−12 rt r r t t t = vx v v x x x + 25 TsTf Ts Ts T T s s s Tf Tf T T f f f − 50 ˆy2 ˆy ˆy ˆ ˆ y y 2 2 2 − 20 θ2 θ θ 2 2 2 − 0 . 02 ∑i ∑ ∑ ∑ i i uit−12 uit−1 uit−1 u u it−1 i i t−1 t − 1 2 2 2 vx vx vx vx vx vx vx v v x x x Ts Ts Ts Ts Ts Ts Ts T T s s s Tf Tf Tf Tf Tf Tf Tf T T f f f ˆy ˆy ˆy ˆy ˆy ˆy ˆ ˆ y y θ θ θ θ θ θ uit−1 uit−1 uit−1 uit−1 uit−1 uit−1 uit−1 u u it−1 i i t−1 t − 1 i i i i i i Get Copy Copy Copy "observations" Get Copy Copy Copy "torque" Get Copy Copy Copy "/RL Agent" Get Copy Copy Copy Get Copy Copy Copy % create the agent options object % specify the options % optimizer options % exploration options Get Copy Copy Copy "twister" Get Copy Copy Copy Get Copy Copy Copy ... ... ... ... "training-progress" ... "EvaluationStatistic" ... Get Copy Copy Copy true true "async" "async" (Parallel Computing Toolbox) Get Copy Copy Copy "twister" Get Copy Copy Copy false false if % Evaluate the greedy policy performance by taking the cumulative % reward mean over 5 simulations every 25 training episodes. % Train the agent. else % Load pretrained agent parameters for the example. "rlQuadrupedAgentParams.mat" "params" end Get Copy Copy Copy "twister" Get Copy Copy Copy Get Copy Copy Copy Get Copy Copy Copy function % Randomize initial conditions. 'base' 'l1' 'base' 'l2' if % Start from random initial conditions % Randomize height % Randomize x-displacement of foot from hip joint % Calculate joint angles % Adjust for foot height ... % Randomize body velocities else % Start from fixed initial conditions end 'y_init' 'init_ang_FL' 'init_ang_FR' 'init_ang_RL' 'init_ang_RR' 'vx_init' 'vy_init' end ArXiv:1707.02286 [Cs] train train sim sim rlSimulinkEnv rlSimulinkEnv rlDDPGAgent rlDDPGAgent rlDDPGAgentOptions rlDDPGAgentOptions rlQValueFunction rlQValueFunction rlContinuousDeterministicActor rlContinuousDeterministicActor rlTrainingOptions rlTrainingOptions rlSimulationOptions rlSimulationOptions rlOptimizerOptions rlOptimizerOptions RL Agent RL Agent RL Agent Unrated 1 star 2 stars 3 stars 4 stars 5 stars × × United States Chinese Japanese Korean Select a Web Site United States Go to top of page MATLAB Help Center This example uses: This example shows how to train a quadruped robot to walk using a deep deterministic policy gradient (DDPG) agent. The robot in this example is modeled using Simscape™ Multibody™. For more information on DDPG agents, see Deep Deterministic Policy Gradient (DDPG) Agent. The example code may involve computation of random numbers at various stages such as initialization of the agent, creation of the actor and critic, resetting the environment during simulations, generating observations (for stochastic environments), generating exploration actions, and sampling min-batches of experiences for learning. Fixing the random number stream preserves the sequence of the random numbers every time you run the code and improves reproducibility of results. You will fix the random number stream at various locations in the example. Fix the random number stream with the seed 0 and random number algorithm Mersenne Twister. For more information on random number generation see rng. Copy openExample CommandPaste command in MATLAB to download and open example files The output previousRngState is a structure that contains information about the previous state of the stream. You will restore the state at the end of the example. Load the necessary parameters into the base workspace in MATLAB® using the Script initializeRobotParameters provided with the example. Copy openExample CommandPaste command in MATLAB to download and open example files The environment for this example is a quadruped robot, and the training goal is to make the robot walk in a straight line using minimal control effort. Open the model. Copy openExample CommandPaste command in MATLAB to download and open example files The robot is modeled using Simscape™ Multibody™ with its main structural components consisting of four legs and a torso. The legs are connected to the torso through revolute joints that enable rotation of the legs with respect to the torso. The joints are actuated by torque signals provided by the RL Agent. The robot environment provides 44 observations to the agent, each normalized between –1 and 1. These observations are: Y (vertical) and Y (lateral) position of the torso center of mass Quaternion representing the orientation of the torso X (forward), Y (vertical), and Z (lateral) velocities of the torso at the center of mass Roll, pitch, and yaw rates of the torso Angular positions and velocities of the hip and knee joints for each leg Normal and friction force due to ground contact for each leg Action values (torque for each joint) from the previous time step For all four legs, the initial values for the hip and knee joint angles are set to –0.8234 and 1.6468 radians, respectively. The neutral positions of the joints are set at 0 radian. The legs are in neutral position when they are stretched to their maximum and are aligned perpendicularly to the ground. The agent generates eight actions normalized between –1 and 1. After multiplying with a scaling factor, these correspond to the eight joint torque signals for the revolute joints. The overall joint torque bounds are +/– 10 N·m for each joint. The following reward is provided to the agent at each time step during training. This reward function encourages the agent to move forward by providing a positive reward for positive forward velocity. It also encourages the agent to avoid early termination by providing a constant reward (25Ts/Tf) at each time step. The remaining terms in the reward function are penalties that discourage unwanted states, such as large deviations from the desired height and orientation or the use of excessive joint torques. rt=vx+25TsTf−50ˆy2−20θ2−0.02∑iuit−12 where vx is the velocity of the torso's center of mass in the x-direction. Ts and Tf are the sample time and final simulation time of the environment, respectively. ˆy is the scaled height error of the torso's center of mass from the desired height of 0.75 m. θ is the pitch angle of the torso. uit−1is the action value for joint i from the previous time step. During training or simulation, the episode terminates if any of the following situations occur. The height of the torso center of mass from the ground is below 0.5 m (fallen). The head or tail of the torso is below the ground. Any knee joint is below the ground. Roll, pitch, or yaw angles are outside bounds (+/– 0.1745, +/– 0.1745, and +/– 0.3491 radians, respectively). Specify the parameters for the observation set. Copy openExample CommandPaste command in MATLAB to download and open example files Specify the parameters for the action set. Copy openExample CommandPaste command in MATLAB to download and open example files Create the environment using the reinforcement learning model. Copy openExample CommandPaste command in MATLAB to download and open example files During training, the function quadrupedResetFcn provided at the end of the example introduces random deviations into the initial joint angles and angular velocities. Copy openExample CommandPaste command in MATLAB to download and open example files DDPG agents use a parametrized Q-value function approximator as a critic, to estimate the value of the policy. They also use a parametrized deterministic policy over continuous action spaces, which is learned by a continuous deterministic actor. For more information on creating a deep neural network value function, see Create Policies and Value Functions. For an example that creates neural networks for DDPG agents, see Compare DDPG Agent to LQR Controller. First specify the following options for training the agent. For more information see rlDDPGAgentOptions. Specify a capacity of 1e6 for the experience buffer to store a diverse set of experiences. Specify the MaxMiniBatchPerEpoch value to perform a maximum of 200 gradient step updates for each learning iteration. Specify a mini-batch size of 256 for learning. Smaller mini-batches are computationally efficient but may introduce variance in training. Contrarily, larger batch sizes can make the training stable but require higher memory. Specify a learning rate of 1e-3 for the agent's actor and critic optimizer algorithms. A large learning rate causes drastic updates which may lead to divergent behaviors, while a low value may require many updates before reaching the optimal point. Specify a gradient threshold value of 1.0 to clip the computed gradients to enhance the stability of the learning algorithm. Specify a standard deviation of 0.1 and mean attraction coefficient of 1.0 to improve exploration of the agent's action space. Specify a sample time Ts=0.025s. Copy openExample CommandPaste command in MATLAB to download and open example files When you create the agent, the initial parameters of the actor and critic networks are initialized with random values. Fix the random number stream so that the agent is always initialized with the same parameter values. Copy openExample CommandPaste command in MATLAB to download and open example files Create the rlDDPGAgent object for the agent with 256 hidden units per layer. Copy openExample CommandPaste command in MATLAB to download and open example files To train the agent, first specify the following training options: Run each training episode for at most 10,000 episodes, with each episode lasting at most maxSteps time steps. Display the training progress in the Reinforcement Learning Training Monitor dialog box (set the Plots option). Stop training when the greedy policy evaluation exceeds 300. Copy openExample CommandPaste command in MATLAB to download and open example files To train the agent in parallel, specify the following training options. Training in parallel requires Parallel Computing Toolbox™ software. If you do not have Parallel Computing Toolbox™ software installed, set UseParallel to false. Set the UseParallel option to true. Train the agent using asynchronous parallel workers. Copy openExample CommandPaste command in MATLAB to download and open example files For more information see rlTrainingOptions. In parallel training, workers simulate the agent's policy with the environment and store experiences in the replay memory. When workers operate asynchronously the order of stored experiences may not be deterministic and can ultimately make the training results different. To maximize the reproducibility likelihood: Initialize the parallel pool with the same number of parallel workers every time you run the code. For information on specifying the pool size see Discover Clusters and Use Cluster Profiles (Parallel Computing Toolbox). Use synchronous parallel training by setting trainOpts.ParallelizationOptions.Mode to "sync". Assign a random seed to each parallel worker using trainOpts.ParallelizationOptions.WorkerRandomSeeds. The default value of -1 assigns a unique random seed to each parallel worker. Fix the random stream for reproducibility. Copy openExample CommandPaste command in MATLAB to download and open example files Train the agent using the train function. Due to the complexity of the robot model, this process is computationally intensive and takes several hours to complete. To save time while running this example, load a pretrained agent by setting doTraining to false. To train the agent yourself, set doTraining to true. Copy openExample CommandPaste command in MATLAB to download and open example files Fix the random stream for reproducibility. Copy openExample CommandPaste command in MATLAB to download and open example files To validate the performance of the trained agent, simulate it within the robot environment. For more information on agent simulation, see rlSimulationOptions and sim. Copy openExample CommandPaste command in MATLAB to download and open example files Restore the random number stream using the information stored in previousRngState. Copy openExample CommandPaste command in MATLAB to download and open example files Copy openExample CommandPaste command in MATLAB to download and open example files [1] Heess, Nicolas, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, et al. ‘Emergence of Locomotion Behaviours in Rich Environments’. ArXiv:1707.02286 [Cs], 10 July 2017. https://arxiv.org/abs/1707.02286. You clicked a link that corresponds to this MATLAB command: Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands. Select a Web Site Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States. You can also select a web site from the following list How to Get Best Site Performance Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location. Americas Europe Asia Pacific Contact your local office © 1994-2025 The MathWorks, Inc.

Prompts

Reviews

Write Your Review

Detailed Ratings

ALL

Correctness

Helpfulness

Interesting

Upload Pictures and Videos

Name

Size

Type

Download

Last Modified

Upload Files

Community

Add Discussion

Upload Pictures and Videos

Chatbot close

Bot
Hi there
How can I help you today?

Send