In this review article, we explore the application of reinforcement learning (RL) at the different levels of hierarchical chemical process control, where reinforcement learning can improve efficiency and robustness in chemical process operations. RL algorithms are an optimal method for sequential decision making, therefore in chemical process control, where taking decisions is required continuously, RL can be a perfect fit due to its ability to handle dynamic, nonlinear, and uncertain environments. Reinforcement learning has already shown great potential in solving complex tasks, making it a promising approach for the challenges of chemical process control.
We investigate the potential of reinforcement learning compared to traditional control methods. We present advanced multi-agent structures of RL, which can tackle large- scale chemical processes beyond the capabilities of a single agent. We introduce CRISP-RL (CRoss-Industry Standard Process for the development of Reinforcement Learning application), which is a paradigm that aims to deploy and maintain reinforcement learning projects, providing a methodology to handle and solve complex RL tasks and describe the current challenges and future directions for the integration of reinforcement learning into chemical process control.
The operation of semi-batch reactors requires caution because the feeding reagents can accumulate, leading to hazardous situations due to the loss of control ability. This work aims to develop a method that explores the optimal operational strategy of semi-batch reactors. Since reinforcement learning (RL) is an efficient tool to find optimal strategies, we tested the applicability of this concept. We developed a problem-specific RL-based solution for the optimal control of semi-batch reactors in different operation phases. The RL-controller varies the feeding rate in the feeding phase directly, while in the mixing phase, it works as a master in a cascade control structure. The RL-controllers were trained with different neural network architectures to define the most suitable one. The developed RL-based controllers worked very well and were able to keep the temperature at the desired setpoint in the investigated system. The results confirm the benefit of the proposed problem-specific RL-controller.
Exothermic reactions are often performed in SBR because the generated reaction heat can be more easily kept under control in such construction. However, an unsuitable control system can lead to the development of thermal runaway, which may cause lethal damage. NMPC with implemented thermal runaway criteria is a promising tool to operate SBRs. However, engineers should always consider plant-model mismatch because uncertain predictions can cause undesirable scenarios. A novel control framework is proposed to operate SBRs and consists of NMPC with the implemented runaway criterion, extended Kalman filter and parameter identification algorithm. Both Multi-Stage NMPC and NMPC with the worst-case scenario are investigated and tested in terms of ability to handle parameter uncertainty. The former is 38 times slower than the latter with no noticeable increase in reactor performance. NMPC initialized based on the worst-case scenario with updating uncertain kinetic parameters results in a promising control structure for SBRs.
Several exothermic reactions are carried out in semi-batch reactors (SBR). A not suitable control system can lead to dangerous situations if thermal runaway develops. Reactor runaway can be avoided with application of a non-linear model predictive control (NMPC) with implemented runaway criterion to manipulate the feed flow rate of reagent, although the prediction horizon has to be chosen correctly. For this purpose, process safety time (PST) of the system is determined. Two different operation modes are considered. In the first case the loaded reagent is preheated to avoid accumulation, while in the second mode, only the produced heat by the reactions heat up the reactor. A simple PID controller and the proposed NMPC were tested in both cases. Operation results are compared to each other based on batch times and energy consumptions next to a safe operation. Runaway criteria can be successfully implemented in NMPC for the intensification of SBRs.