For future applications of artificial intelligence, namely reinforcement learning (RL), we develop a resilience-based explainable RL agent to make decisions about the activation of mitigation systems. The applied reinforcement learning algorithm is Deep Q-learning and the reward function is resilience. We investigate two explainable reinforcement learning methods, which are the decision tree, as a policy-explaining method, and the Shapley value as a state-explaining method.
The policy can be visualized in the agent’s state space using a decision tree for better understanding. We compare the agent’s decision boundary with the runaway boundaries defined by runaway criteria, namely the divergence criterion and modified dynamic condition. Shapley value explains the contribution of the state variables on the behavior of the agent over time. The results show that the decisions of the artificial agent in a resilience-based mitigation system can be explained and can be presented in a transparent way.
Exothermic reactions carried out in batch reactors need a lot of attention to operate because any insufficient condition can lead to thermal runaway causing an explosion in the worst case. Therefore, a well-designed intervention action is necessary to avoid non-desired events. For this problem, we propose to use resilience-based reinforcement learning, where the artificial agent can decide whether to intervene or not based on the current state of the system. One of our goals is to design resilient systems, which means designing systems that can recover after a disruption. Therefore, we developed the resilience calculation method for reactors, where we suggest the use of dynamic predictive time to failure and recover to better resilience evaluation. Moreover, if the process state is out of the design parameters, then we do not suggest calculating with the adaptation and recovery phase. We suggest using Deep Q-learning to learn when to intervene in the system to avoid catastrophic events, where we propose to use the resilience metric as a reward function for the learning process. The results show that the proposed methodology is applicable to develop resilient-based mitigation systems, and the agent can effectively distinguish between normal and hazardous states