Explainable deep reinforcement learning via online mimicking

Author nameNikolaos Makris
Title
Explainable deep reinforcement learning via online mimicking
Year2024-2025
Supervisor

George Vouros

GeorgeVouros

Summary

This study proposes a method for training interpretable reinforcement learning policies in continuous action spaces, in close interaction with the original deep models, while also examining the effects of training interpretable policies on the original models. The goal is to confirm the feasibility of the proposed method while analyzing the trade-offs between optimal performance and policy interpretability. This work extends previous studies in the field of Explainable Deep Reinforcement Learning (XDRL).

Existing research has primarily focused on Explainable Deep Q-Networks (XDQN) and the interpretability of Actor-Critic methods in discrete action spaces, without considering the trade-offs between optimal performance and interpretability. Specifically, in the proposed framework, during the training process, the original and interpretable policy models—namely Soft Actor-Critic (SAC) and XGBoost—interact, influencing each other’s training. The XGBoost model is trained to accurately approximate the SAC policy, after which the SAC is readjusted to better align with XGBoost in order to minimize prediction discrepancies and, consequently, enhance the fidelity of the interpretable model.

This final step is achieved using the Dual Gradient Descent method, which is applied in constrained optimization problems. All experiments were conducted in the OpenAI Gym environment, using four setups with continuous action spaces of increasing dimensionality to evaluate the framework's effectiveness. It was found that due to the close interaction during the training of the two models, the final SAC policy differs significantly from the optimal SAC policy (i.e., the one obtained solely from SAC). This discrepancy becomes more pronounced as the complexity of the experimental setup increases, as expected. Nevertheless, the interaction between the two models leads to convergence toward policies that, while not necessarily optimal, are interpretable.

In fact, the results indicate that the final SAC policy and XGBoost model predictions align closely, making them interchangeable regardless of the complexity of the experimental setup. This thesis contributes by introducing a novel framework that facilitates the integration of interpretable policy models into Deep Reinforcement Learning methods. This is achieved through the interaction of SAC and XGBoost policy models via the Dual Gradient Descent optimization method, while also providing insights into the trade-off between optimal performance and policy interpretability.