I. Introduction
Reinforcement learning (RL) has emerged as an effective tool for solving complex decision-making problems in a wide range of fields. It has sparked a growing interest in the process industries in recent years, where it has shown promise in optimizing processes, increasing efficiency, and improving safety [1]. Testing RL in simulated environments, laboratory experiments and pilot-scale setups has yielded significant outcomes and brought RL closer to real-world applications. As a result of these outcomes and rapid developments in computational technologies, numerous technology organizations have created and supported various research institutes to accelerate RL research in robotics and language models. Despite these advancements outside process industries, most of the RL methodologies use a combination of learning and process control techniques extensively studied in the optimization and control of process industries [2]. On the other hand, operational drifts and varying dynamics of processes require modifications in classical techniques and more sophisticated and adaptable solutions. In order to understand and contribute to RL theory in process industries while improving its applicability in real-time, researchers and practitioners should analyze the operational levels in process control holistically [3]. A systematic outline for these levels (the control hierarchy), was initially given in [4] without considering possible faults in the production, supervision and execution levels. Fig. 1 generalizes the existing presentation of the control hierarchy by considering complex applications in the real world, and this study provides representative RL applications for each level.
A schematic of the modern control hierarchy based on [4]. The production level makes high-level decisions based on market demand and political and social events. The supervision level includes real-time optimization of operational targets, control tuning and other hyperparameter optimizations. The execution level observes the process variables and matches the criteria that are defined by the higher levels. These observations are obtained through sensors, and the actuators deliver the control actions to the plant at the instrumentation level.