Learn Reinforcement Learning by playing Doom

Nov. 21, 2024 · 5 min read

Introduction

At the PyConES 24 held in Vigo, Nagarro presented a talk about ViZDoom, a platform that combines the classic video game Doom with machine learning and artificial intelligence.

In this article, you will see how ViZDoom allows developers and researchers to train intelligent agents in complex and realistic environments, using reinforcement learning techniques and other AI methodologies. The aim is to raise awareness of the application and encourage the reader to experiment with it.

¿What is ViZDoom?

ViZDoom is a simulation platform that uses the Doom video game as an environment for the development and training of artificial intelligence agents. Created as a reinforcement-based learning environment (Reinforcement Learning), ViZDoom offers an interactive and visually complex space where AI algorithms can learn to make strategic decisions, navigate three-dimensional environments, and compete against other agents using visual information, i.e. only from the frames of the buffer of images, without knowing the value of the game variables.

Thanks to its integration with Python and libraries such as TensorFlow and PyTorch, ViZDoom is an accessible tool for researchers and developers alike. It combines the best of video games and AI, making it an ideal platform for experimenting and advancing the field of applied artificial intelligence.

ViZDoom is based on ZDoom, the most popular and modern source-port of Doom. This means that it has support for a huge range of tools and resources that can be used to create custom scenarios, availability of detailed documentation of the engine and tools, and the support of the Doom community.

Main features

Advantages

Easy setup and customisation: Provides a fast and accessible environment for experimentation without the need to develop a graphics engine from scratch. It also allows customisation of scenarios.
Intuitive interface: Includes APIs for popular languages such as Python and C++, making it easy to integrate with deep learning libraries such as TensorFlow or PyTorch.
Active community: It has a user base that shares scenarios, scripts and results, enriching the ecosystem. Competitions have also been organised worldwide.
Open Source and Farama Foundation: In 2022 ViZDoom joined the Farama Foundation, which makes the project meet minimum requirements and maintenance.
Multi-platform: Compatible on Linux, MacOS and Windows.
Fast and light: It weighs a few MB and no powerful hardware is needed to run it.

Disadvantages:

High computational cost for the training phase: If you don't have the right hardware, training complex models can be time-consuming, especially for tasks involving deep neural networks.
More up-to-date competence: Tools such as Unity ML-Agents offer more versatile environments and modern graphics.

Create and play with your agent

The repository includes several examples of algorithms that can be used to train and run agents, they can be found in the examples/ folder.

But, the most interesting thing is to create an agent from scratch. In this case, the Reinforcement Learning algorithm Proximal Policy Optimisation (PPO) and the gymnasium library will be used for the communication between the algorithm and the environment. The main steps to be followed are detailed below:

Step 1: Create the function that will mount the agent environment.

# Environment creation function
def create_env():
    env = gymnasium.make('VizdoomBasic-v0')
    env.unwrapped.game.set_doom_game_path("../scenarios/doom2.wad")
    # Window removed during training to save resources
    env.unwrapped.game.set_window_visible(False)
    env.unwrapped.game.add_game_args("+vid_forcesurface 1")
    return env

Step 2: Using the above function, two environments are created, one for training and one for the evaluation phase:

training_env = make_vec_env(
    create_env,
    n_envs=1,
)
eval_env = make_vec_env(
    create_env,
    n_envs=1,
)

Step 3: The agent is created using the PPO algorithm:

def create_agent(env, **kwargs):
    return PPO(
        policy=policies.ActorCriticCnnPolicy,
        env=env,
        n_steps=4096,
        batch_size=32,
        learning_rate=1e-4,
        tensorboard_log="logs/tensorboard",
        verbose=0,
        **kwargs
    )

agent = create_agent(training_env)

Step 4: Set up the callback evaluation which will periodically review the agent's performance during training and save the best model found so far. The agent will be evaluated for 10 full episodes every 5000 timesteps.

evaluation_callback = callbacks.EvalCallback(
    eval_env,
    n_eval_episodes=10,
    eval_freq=5000,
    log_path="logs/evaluations/basic",
    best_model_save_path="logs/models/basic",
)

An episode is a complete sequence of interactions from an initial state to a terminal state in the environment, and an timestep represents a step in which the agent observes, takes an action and receives a reward.

Step 5: Last step, let's train!

agent.learn(
    total_timesteps=40000,
    tb_log_name="ppo_basic",
    callback=evaluation_callback
)

Note: These steps have been simplified in order not to overload the article. The full code can be found at this link.

Results

Below are some results of the generated agents:

Example 1: The first example works quite well, identifying the enemy and moving to be in front of him to shoot him. It is also the simplest scenario:

Example 2: On an open map the agent stands in the centre and rotates 360° while firing at enemies in the field of view:

In Machine Learning, prolonged training of a model does not guarantee a continuous improvement in its performance, as it may reach a performance ceiling. This case is particularly interesting because this ceiling can be observed visually: The agent starts to lose life when he runs out of bullets, as reloading is not an action considered in his decision set.

Example 3: The latter needs to adjust the parameters of the PPO model or consider another algorithm as it is only able to shoot the enemy on the left without taking into account the enemy on the right:

Conclusion

ViZDoom is an excellent initiative for those who wish to experiment at the intersection of video games and artificial intelligence. Its platform offers several modes for training agents, allowing learning to be adapted to different scenarios and levels of complexity.

One of the most prominent features is the ability to train agents using visual inputs, thus emulating the way we humans perceive our environment. This not only enriches the learning process, but also provides results that are more realistic and applicable to real-world situations.

Furthermore, by contributing to the ViZDoom community, developers and researchers have the opportunity to improve and extend this tool, fostering collaboration and collective advancement in the field of machine learning. Participating in this project is an effective way to drive innovation and support others in their efforts to explore new technological frontiers.