Skip to content

Wiki For All

Stable Baselines3

Stable Baselines3

Basic usage

Create environment: gym.make() / make_vec_env()
Define the model, instantiate this model: model = PPO("MlpPolicy)
Train the agent by define the traning timesteps: model.learn(total_timesteps=int(n))
Wrap the environment in a Monitor: eval_env = Monitor(env)
Evaluate policy: evaluate_policy(model, eval_env, m_eval_episodes, deterministics)

Functions

Training

make_vec_env("EnvName", n_envs=n): Create a vectorized environment of n environment, a method for stacking multiple independent environment into a single environment.
model.save("filename"): Specify file name for model and save the model to file

Evaluation

evaluate_policy(model, env, n_eval_episodes, deterministics): Run the policy foe n_eval_episodes and outputs the average return per episode.
- deterministics: Whether to use deterministic or stochastic actions
Monitor: It is used to know the episode reward, length, time and other data

Load model

methods.load(checkpoint, custom_objects=custome_objects, print_system_info=True): models from huggingface hub
- checkpoint: huggingface_sb3.load_from_hub()
- custom_objects: Dictionary of objects to replace upon loading, such as learning rate