In this review article, we explore the application of reinforcement learning (RL) at the different levels of hierarchical chemical process control, where reinforcement learning can improve efficiency and robustness in chemical process operations. RL algorithms are an optimal method for sequential decision making, therefore in chemical process control, where taking decisions is required continuously, RL can be a perfect fit due to its ability to handle dynamic, nonlinear, and uncertain environments. Reinforcement learning has already shown great potential in solving complex tasks, making it a promising approach for the challenges of chemical process control.
We investigate the potential of reinforcement learning compared to traditional control methods. We present advanced multi-agent structures of RL, which can tackle large- scale chemical processes beyond the capabilities of a single agent. We introduce CRISP-RL (CRoss-Industry Standard Process for the development of Reinforcement Learning application), which is a paradigm that aims to deploy and maintain reinforcement learning projects, providing a methodology to handle and solve complex RL tasks and describe the current challenges and future directions for the integration of reinforcement learning into chemical process control.
For future applications of artificial intelligence, namely reinforcement learning (RL), we develop a resilience-based explainable RL agent to make decisions about the activation of mitigation systems. The applied reinforcement learning algorithm is Deep Q-learning and the reward function is resilience. We investigate two explainable reinforcement learning methods, which are the decision tree, as a policy-explaining method, and the Shapley value as a state-explaining method.
The policy can be visualized in the agent’s state space using a decision tree for better understanding. We compare the agent’s decision boundary with the runaway boundaries defined by runaway criteria, namely the divergence criterion and modified dynamic condition. Shapley value explains the contribution of the state variables on the behavior of the agent over time. The results show that the decisions of the artificial agent in a resilience-based mitigation system can be explained and can be presented in a transparent way.
We analyzed a special class of graph traversal problems, where the distances are stochastic, and the agent is restricted to take a limited range in one go. We showed that both constrained shortest Hamiltonian pathfinding problems and disassembly line balancing problems belong to the class of constrained shortest pathfinding problems, which can be represented as mixed-integer optimization problems. Reinforcement learning (RL) methods have proven their efficiency in multiple complex problems. However, researchers concluded that the learning time increases radically by growing the state- and action spaces. In continuous cases, approximation techniques are used, but these methods have several limitations in mixed-integer searching spaces. We present the Q-table compression method as a multistep method with dimension reduction, state fusion, and space compression techniques that project a mixed-integer optimization problem into a discrete one. The RL agent is then trained using an extended Q-value-based method to deliver a human-interpretable model for optimal action selection. Our approach was tested in selected constrained stochastic graph traversal use cases, and comparative results are shown to the simple grid-based discretization method.
Exothermic reactions carried out in batch reactors need a lot of attention to operate because any insufficient condition can lead to thermal runaway causing an explosion in the worst case. Therefore, a well-designed intervention action is necessary to avoid non-desired events. For this problem, we propose to use resilience-based reinforcement learning, where the artificial agent can decide whether to intervene or not based on the current state of the system. One of our goals is to design resilient systems, which means designing systems that can recover after a disruption. Therefore, we developed the resilience calculation method for reactors, where we suggest the use of dynamic predictive time to failure and recover to better resilience evaluation. Moreover, if the process state is out of the design parameters, then we do not suggest calculating with the adaptation and recovery phase. We suggest using Deep Q-learning to learn when to intervene in the system to avoid catastrophic events, where we propose to use the resilience metric as a reward function for the learning process. The results show that the proposed methodology is applicable to develop resilient-based mitigation systems, and the agent can effectively distinguish between normal and hazardous states
The operation of semi-batch reactors requires caution because the feeding reagents can accumulate, leading to hazardous situations due to the loss of control ability. This work aims to develop a method that explores the optimal operational strategy of semi-batch reactors. Since reinforcement learning (RL) is an efficient tool to find optimal strategies, we tested the applicability of this concept. We developed a problem-specific RL-based solution for the optimal control of semi-batch reactors in different operation phases. The RL-controller varies the feeding rate in the feeding phase directly, while in the mixing phase, it works as a master in a cascade control structure. The RL-controllers were trained with different neural network architectures to define the most suitable one. The developed RL-based controllers worked very well and were able to keep the temperature at the desired setpoint in the investigated system. The results confirm the benefit of the proposed problem-specific RL-controller.