Identification and Expert Approach to Controlling the Cement Grinding Process Using Artificial Neural Networks and Other Non-Linear Models

The paper involved conducting preliminary research to explore the identification and control of a multi-dimensional, non-linear, and non-stationary cement grinding process using artificial neural networks and various other non-linear models. The primary objective was to establish a precise model that accurately characterizes the functioning of the grinding system. Several model structures were employed, including NARX models based on feed-forward network, Elman, Jordan, and Layer-Recurrent Network (LRN) recurrent networks, as well as MTL (Multi-Task Learning) and traditional NARX non-linear models. It was observed that, in contrast to the linear models, the non-linear models exhibited significantly superior performance in the modeling of the system. Another notable outcome of this research is the proposal of a neurocontroller, functioning as an expert system, which can provide control signals to operators. The development and implementation of such a neurocontroller have the potential to enhance the quality, simplicity, and efficiency of cement grinding process control.


I. INTRODUCTION
The subject of the article revolves around preliminary investigations into the identification and expert control of the cement grinding process using artificial neural networks and other non-linear models.Various identification techniques and models have been employed, along with a variety of approaches, with the aim of extracting the most high-quality and precise ones.The research included in this article contains part of the author's preliminary research contained in the works of [1] and [2] and is their extension and continuation.
These initial studies delve into the realm of identifying and expertly controlling the cement grinding process through the application of artificial neural networks and other nonlinear models.A range of identification techniques and The associate editor coordinating the review of this manuscript and approving it for publication was Vivek Kumar Sehgal .models have been utilized, alongside various approaches, all geared towards extracting the most accurate and top-quality solutions [1], [2].
Ongoing economic growth, both on a national and global scale, continuous technological progress, and the ongoing fourth industrial revolution are driving numerous businesses and manufacturing facilities to pursue innovation and the modernization of their existing solutions.Environmental and economic considerations are also becoming increasingly prominent.The cement industry is no different, given its pivotal role in the production of mineral building materials.
One of the crucial phases in cement production is the precise, well-regulated, and highly complex comminution process.This operation can occur in a horizontal, multichamber cement ball mill, operating within either an open or closed loop system.Notably, such equipment is recognized for its low efficiency, with approximately 30% of the supplied energy actually being utilized in the grinding process.Consequently, companies are actively exploring methods to conserve energy, whether through the modernization of the technological process and its components or the application of advanced, energy-efficient control algorithms for system components.The knowledge necessary for improving these processes typically originates from experts overseeing their operation [3], [4], [5].
Very often, conducting experiments is not that simple and achievable by the researcher.This is due to many different reasons.The first two main limitations may be economic and security issues.The many different systems and facilities in power plants, cement plants, refineries or heating plants cannot simply be tested to all their capabilities.This would involve huge energy costs, possible damage to components or risk to the life or health of researchers and third parties.Moreover, measurement difficulties also generate considerable problems, which often include unmeasurable distortions or the unavailability of measuring some parameters.Additionally, it is worth noting that time constraints are common problems.They are visible in systems with fast or slow dynamics.
When considering the adaptation of a model to project requirements, it is essential to contemplate its level of complexity.In the vast majority of cases, large, complex, and extensive models are not a good idea.Several factors contribute to this decision.Complex models are often difficult to handle, highly impractical to operate, and time-consuming to design.Furthermore, they frequently prove to be sluggish and can be sensitive to certain immeasurable variables.This suggests that a good model should be relatively simple to operate while incorporating the key features of the system being modeled, taking into account its intended purpose.This objective may include, for example, replicating selected variables (emulating trends), responding to disturbances, predicting system behavior, and more.
Analyzing the cement mill system, it's important to emphasize that it is a Multi-Input, Multi-Output (MIMO) dynamic system.Furthermore, it should be noted that this complex system exhibits high non-stationarity and stochasticity.These aspects indicate the difficulty in modeling it.Moreover, in addition to the characteristics of the system itself, additional challenges include numerous often immeasurable disturbances with stochastic patterns and the presence of stochastic features in the input signals.This article will focus on research aimed at improving the energy consumption of the cement grinding process using advanced control algorithms based on both classical solutions and those leveraging artificial intelligence.
In the case of the present research, only a ''black-box'' approach is employed.The studied cement mill system is highly complex and poses a challenge for the model designer.Taking into account the difficulties associated with its modeling, the following factors should be highlighted: • The system is dynamic and non-linear.
• The collected control measurement data is not optimal, depending on the operator's skills and disturbances.
• There is significant multi-dimensionality in the system.
• High parameter fluctuations and very large delays.
• Disturbances exhibit a stochastic and sometimes unmeasurable nature.
• The system is non-stationary, and disturbances are non-stationary as well.
• Inaccuracies exist in the measurement of certain variables.The next section will present a detailed review of the literature on this problem.

II. REVIEW OF LITERATURE ON THE PROBLEM AND RELATED WORKS
In many cases, the analysis of industrial processes makes it practically impossible to propose an appropriate model of the object and reproduce the physical laws influencing its behavior.Observing very complex industrial processes, such as clinker production in a rotary kiln, chemical reactors, cement grinding, combustion processes, or nuclear energy management, it becomes obvious that the system's operation is often influenced by many external, disturbing and even unmeasurable factors.All these facts indicate that it would be unrealistic and impractical to design a mathematical model for one of these complex systems.In such cases, the solution may be process identification, which consists in trying to find a mapping of matrices containing measurement data (from the input and output of the system) to the vector of model parameters [6], [7], [8], [9].
Identification is the process by which the static and dynamic properties of control elements and systems are determined.It involves identifying the relationships between input and output signals in a facility, control system or automation elements.To obtain an appropriate mathematical description, model parameters are selected based on data obtained from experiments.The process or object is subjected to various types of experiments, and model parameters are selected based on these experimental data.The purpose of this procedure is to adjust the model parameters to the data obtained at the experimental stage.Identification is a key process in control because it allows obtaining accurate values of model parameters, which are then used to tune the controller.As a result of identification, an automatic control system is created in an appropriate manner to ensure the correct operation of the actual object or process [6], [7], [8], [9], [10], [11], [12].
The characteristics of industrial processes have been discussed in numerous works, both in the previous century and in the current one.This is a very interesting, valuable topic worthy of development and exploration.Identification of various automation systems provides many valuable insights and allows for a rational approach to their control.
The issue of cement grinding has been addressed in many works, both in the field of automation, computer science, and technological sciences and materials engineering.The grinding process is quite complex and requires a lot of attention, as well as established principles and approaches that involve a wide range of research opportunities in this direction.
The authors of reference [13] addressed the issue of identification and control of the raw material mixing process in the cement industry.They emphasized that this process is multidimensional and coupled since the feeder does not contain chemically homogeneous raw materials, and there are numerous significant delays in the system.Furthermore, the research was complicated by disturbances resulting from changes in the chemical composition of raw materials from long-term compositions.These changes, in turn, led to variations in system parameters.
Continuing the review of the literature, several studies in references [14], [15], [16], [17], [18], [19], [20], and [21] encompass a range of investigations on the given subject, including the development of a model predictive controller with adaptive models.This controller possesses the capability to adapt to parameter variations and ensure the smooth operation of industrial plants.Real-time data were collected from a Vertical Roller Mill (VRM), and a correlation analysis was carried out, demonstrating the utilization of outlet temperature and pressure difference as output variables, coupled with tensile stress and the speed of the assisting fan as input variables.The fundamental VRM model was established through data-driven system identification methods.
Analyzing the findings, it was effectively demonstrated that the Dual Adaptive Model Predictive Controller (DAMPC) yields reduced overshooting and shorter settling times in the presence of parametric variations.Additionally, simulation experiments were conducted to evaluate its effectiveness in tracking reference signals and rejecting slow internal disturbances, all while accounting for parameter variations.
Article [22], however, presents a distinct research approach.In previous works by the same authors, the mill's identification involved breaking it down into Multiple Input Single Output (MISO) systems.Notably, in this study [22], the rough neural network (R-NN) was introduced for mill identification without relying on MISO structures.The R-NN is a neural structure designed on the foundation of rough sets theory, specifically tailored to address uncertainties and ambiguities.Furthermore, an algorithm for stochastic gradient descent was proposed to train the R-NN.The simulation results effectively showcase the method's efficiency and underscore the soundness of the solutions put forth by the authors.
Article [14] focuses on the application of technical computer science research in the identification of a multidimensional cement mill process using nonlinear autoregressive models with exogenous inputs (NARX) and a wavelet network.This research leverages a MATLAB-based system identification toolkit.NARX identification, based on input/output sample sequences collected from an actual cement mill process, is employed for modeling the black-box nonlinear process within the cement mill.
Study [19] aims to identify and model a rotary cement kiln using an artificial neural network MLP algorithm, which establishes a reliable empirical relationship between input parameters (Flow, Temperature, and pressure) and the quality of cement production.Given that the cement production process relies heavily on the effective operation of the rotary kiln, controlling it is challenging due to factors like system nonlinearity and high dimensionality resulting from chemical reactions.Therefore, identifying, modeling, predicting, and simulating the kiln system play a crucial role in optimizing cement production.Using a Multi-Layer Perceptron (MLP) approach, the MLP algorithm was trained using 24 months of historical data and validated against six months of subsequent production data, demonstrating its superiority over conventional modeling methods.
Article [21] highlights the extensive research efforts in the field of cement kiln control strategies.It provides insights into the complexities of cement kiln operation and explores various modeling techniques, including traditional and intelligent methods, with a focus on Model Predictive Control (MPC) as the preferred controller for regulating the Burning Zone Temperature (BZT) in cement kilns.The review also discusses key parameters and tuning strategies that influence MPC control performance.
In a paper [23], an introduction to neural network models and deep learning is provided, explaining modeling structures and the backpropagation algorithm for parameter adjustment.The authors further explore how deep neural network models can aid in understanding brain computations.Publication [24] aims to present the latest advancements in artificial neural network architecture, methodology, and applications.The book comprises two parts: the first part covers the architecture, design, optimization, and analysis of artificial neural networks, while the second part delves into the applications of artificial neural networks in various domains.In another publication [25], a Siamese neural network is introduced, which can find applications in various fields.Continuing, in work [26], a comprehensive review of research on the interpretability of neural networks is conducted.The definition of interpretability is first explained, and the significance of interpretability is discussed.A new taxonomy is proposed organized along three dimensions: type of engagement (passive vs. active interpretive approach), type of explanation, and focus (from local to global interpretability).Finally, existing interpretability assessment methods are summarized, and possible research directions inspired by the new taxonomy are suggested.Article [27] aims to provide a general overview of Multi-Task Learning (MTL), especially in deep neural networks.It presents two of the most common MTL methods in deep learning, offers a literature review, and discusses recent advancements.In particular, it seeks to assist practitioners in applying MTL, shedding light on how MTL works and providing guidance on selecting suitable auxiliary tasks.
On the other hand, publication [28] describes a hybrid modeling and control scheme for fuel cell systems using neural networks.Several feature selection algorithms were tested to reduce dimensionality, with the aim of eliminating irrelevant variables concerning the control objective.
Continuing the literature review on neural network applications, article [29] developed a scheme for controlling the boundaries of an adaptive neural network for a flexible manipulator subject to input constraints, model uncertainties, and external disturbances.The article [30] proposes a softsensing system, GK-ARFNN, for predicting effluent total phosphorus in wastewater treatment processes.It combines Gustafson-Kessel clustering and a hierarchical adaptive second-order optimization algorithm in an adaptive recursive fuzzy neural network (ARFNN).The GK clustering establishes fuzzy rules, and the ARFNN's recursive layer enhances dynamic mapping.The HAS algorithm adjusts parameters online, improving generalization and prediction accuracy.The article analyzes algorithm convergence, highlighting its effectiveness for practical industrial processes.Simulation results demonstrate the GK-ARFNN system's satisfactory accuracy in predicting effluent total phosphorus in WWTPs.The articles [31], [32] propose advanced self-organizing fuzzy neural networks for modeling nonlinear systems in industrial processes.The first introduces SOFNN-ALA, employing an adaptive learning algorithm for concurrent structure identification and parameter estimation, enhancing generalization and convergence speed.The second paper presents SOFNN-HPS, featuring asymmetric Gaussian functions and a hierarchical pruning scheme to balance accuracy and network complexity.This approach achieves a compact structure and robust generalization, validated through benchmark tests and a water quality prediction experiment in wastewater treatment.
The next articles [33], [34], [35], [36], [37], [38], [39] present various approaches to developing and improving neural network models for nonlinear system modeling, with a focus on addressing challenges such as feature extraction, adaptability, and generalization performance.One article introduces an efficient self-organizing fuzzy neural network (SOFNN) with incremental deep pretraining (IDPT), named IDPT-SOFNN.The IDPT is designed to extract effective features, enhancing pretraining efficiency with a more compact structure.Additionally, the SOFNN can dynamically adjust its structure based on current error and error-reduction rate, leading to better modeling performance.Another study proposes a self-organizing fuzzy neural network (SOFNN) modeling methodology based on an adaptive quantum particle swarm optimization algorithm (AQPSO).This approach aims to achieve a suitable number of fuzzy rules and optimal premise parameters, balancing system accuracy and network complexity.The AQPSO-SOFNN demonstrates high prediction accuracy with a parsimonious network topology across various testing cases.In an effort to enhance model generalization for nonlinear system modeling, a self-organizing reciprocal modular neural network (SORMNN) is introduced.This model imitates the modular structure with inter-module connections observed in human brains, leading to higher training accuracy and better generalization ability compared to other modular neural networks.Another article presents a novel online self-organizing modular neural network (OSOMNN) for dynamic nonlinear system modeling.OSOMNN incorporates an online task decomposition algorithm and a self-organizing algorithm for subnetworks.This dynamic approach allows for automatic adjustment of subnetwork modules and optimization of subnetwork structures, resulting in improved generalization performance.Fuzzy neural networks (FNNs) are explored in one study, highlighting their application in modeling nonlinear dynamic systems.To address recurrent and self-organizing design limitations, a self-organizing recurrent fuzzy neural network based on multivariate time series analysis (SORFNN-MTSA) is proposed.The SORFNN-MTSA integrates adaptive recurrent values and a self-organization mechanism to optimize network structure efficiently.Lastly, an online adjusting radial basis function neural network (OA-RBFNN) is introduced to improve prediction accuracy and achieve a compact structure.The OA-RBFNN combines the sliding window strategy and clustering algorithm for online modeling, demonstrating competitive prediction performance and a more compact network structure.In summary, these articles contribute diverse methodologies to enhance the effectiveness of neural network models in nonlinear system modeling, addressing various challenges and showcasing improved performance in terms of accuracy, adaptability, and generalization.
Artificial neural networks find a myriad of important applications in various domains.For instance, the articles [40], [41], [42], [43], [44], [45], [46] showcase the implementation of artificial intelligence methods based on neural networks in tasks such as electroencephalographic signal classification, control systems, musical notation classification using an FPGA (Field Programmable Gate Array) system, and more.These versatile applications highlight the growing significance of neural networks in solving complex real-world problems.
The article consists of two stages.The first stage is to identify the properties of the object in order to design its model.The goal is to learn the properties of the process and reflect its dynamics, as well as to enable further research into the design of controllers and controllers to improve its safe, efficient and optimal control.The second stage is the identification of the inverse process model for control purposes.The goal is to create a quasi-optimal controller that acts as an expert system.In the final stage, the selected studies will be compared according to appropriate quality indicators.
The next chapter will contain a description of the cement grinding process and all detailed aspects of the system will be presented.

III. CEMENT GRINDING PROCESS AND PRESENTATION OF THE MILL SYSTEM
The cement grinding process is highly complex and depends on various factors.This chapter effectively addresses issues related to the operation of the grinding system.To better understand its characteristics, it is necessary to examine the processes occurring in the ball mill during material grinding.The basic system used in the cement industry for grinding cement is a mill with grinding balls along with additional elements without which it cannot function.
A ball mill is a device consisting of a rotating cylinder divided into chambers by partitions.Its interior is filled with special balls.Figure1 shows detailed visualizations of the cement plant, showing its exterior and cross-section of the interior.There are two basic types of ball mills used in the cement industry: open and closed (this article examines the closed-loop mill shown in Figure 1).Each of them has different characteristics and, most importantly, goals.These factors influence the selection of a specific system for the production of specific types of cement.
The ball mill feed contains ingredients such as clinker, gypsum, various additives in appropriate proportions, as well as fly ash, industrial dust, grinding activator -IMOFLOV and slag, depending on the type.The grinding principle is simple and effective because the mill rotates at a constant speed, causing the grinding balls and material to rise.Then the falling balls hit the material, causing it to fragment.This significantly affects their durability and wear, which constantly increases during operation.However, it is essential to remember that during cement grinding, several concurrent phenomena occur.Simultaneously, comminution and mixing of the components take place in the mill, along with the transport of the product and the potential return of oversize material to the mill's input (see Figure 1).It should be noted that the grinding process that occurs in the rotating cylinder adversely affects the observation and measurement of valuable process parameters.A valuable process variable indicating the degree of cement comminution is the measurement of specific surface area according to Blaine.However, this measurement is not always carried out continuously or quasi-continuously.It is often performed cyclically at intervals of several hours in the laboratory of the plant.Therefore, other process variables that indicate the process status should be considered.
A cement mill operating in an open loop configuration does not have any internal feedback loop.In such a configuration, the fresh feed to the mill is comminuted, and the finished product is obtained at the output.In a closed loop configuration (see Figure 1), the situation is much more complex and requires a good understanding of the process by the operator.This particular case is the subject of research in this work.In this configuration, the feed to the mill at the input undergoes grinding, and the product at the output is transferred through a bucket conveyor to a separator.The separator's task is to separate the ground product from the oversize material, which returns to the mill's input as a disturbing variable.

A. CLOSED-LOOP MILLING AS A MIMO SYSTEM
Viewed from the perspective of modeling and control, the cement mill is a complex system with multiple inputs and outputs, exhibiting dynamic and discrete behavior.Furthermore, this system is characterized by high variability and unpredictability, and it is multidimensional in nature.Additionally, the quality of measurement data is suboptimal, often relying on operator judgment, making the effectiveness of grinding control highly dependent on operator expertise.
When examining the challenges, it's essential to consider the presence of various, frequently unmeasurable, stochastic and time-varying disturbances, as well as the inherent stochastic characteristics of input signals and substantial parameter fluctuations.Measurement errors can also arise due to difficulties in obtaining accurate measurements or inaccuracies in the measuring devices.
Regarding the issue of non-stationarity, it's crucial to acknowledge that the properties of the system itself change over time.For instance, the grinding balls within the drum undergo continuous wear and thus evolve in their characteristics over time.Similarly, components such as clinker, slag, gypsum, and others are subject to variations over time due to factors like humidity, particle size distribution, and source of origin.These and numerous other factors significantly impact the mill's operation.
Just like in many industrial processes, the cement mill model involves categorizing process variables into control variables (inputs) and output variables.Of course, in this analysis, several variables have been excluded, including certain auxiliary and extraneous variables.This decision was made due to measurement inaccuracies or the lack of relevance of these specific process variables.

FIGURE 2.
Cement mill model as a MIMO system (based on [2]).
An illustration of the cement mill model in the form of a MIMO (Multi Input Multi Output) system with 6 inputs and 6 outputs is depicted in Figure 2. Considering the parameters important for process control, we can distinguish the input and output values contained in the table 1.
To influence the cement grinding process in a closedloop configuration, it is important to note that the procedure primarily involves manipulating the setpoints of control systems.Naturally, the system's dynamics can be disregarded due to the slow-changing nature of the process.The entire control and modeling process is significantly complicated  by numerous non-stationary and stochastic disturbances.To make it feasible, it is essential to stabilize the selected operating point of the system by the operator based on their knowledge of the static characteristics of the components constituting the system.
It is worth emphasizing that these investigations will focus on a specific cement grade, namely CEM II B-M (V-LL) 32.5 R. Naturally, each different grade would necessitate changes in the static characteristics of the mill's components.Figure 3 illustrates the real appearance of a cement mill operating in a closed-loop configuration within an industrial facility.

B. COLLECTING DATA FROM THE REAL SYSTEM
Properly training artificial intelligence algorithms should ideally be supported by a substantial amount of process data.This is no different in the case of the current research.The author collected measurement data from the real-world system, covering a 30-day period of system operation.These data were archived by the CEMAT PCS7 system and then extracted into a *.csv file format, which was subsequently processed using Matlab R2020b.The data, originating from such a complex and demanding process, were naturally acquired during the regular operation of the cement mill system.They encompass various operational states, including startups and shutdowns, the mill's operation at multiple operating points, and periods of inactivity.
The data collected by the author were sampled by the system at a rate of T = 5 seconds.The file contained approximately 520,000 records.These prepared data were divided into two sets: one for model training and the other for model validation.It was considered reasonable to use around 200,000 records for both the training and validation processes.This represents approximately 11.5 days of the mill's operation for each stage, which is a sufficient dataset that covers various operational states of the realworld system.Naturally, each of the data sets (training and validation) consists of distinct records retrieved from the system.
The next chapter will contain a detailed description of the identification process, related aspects, and an overview of various identification models.

IV. PROCESS IDENTIFICATION AND MODELING
When considering the overall identification problem, it is essential to note that it is intrinsically linked to computer technology, as the collected measurements take the form of time series.This is also related to discrete methods of measuring input and output signals, meaning at discrete time intervals, or in sampling steps of time T .In the identification process, a model was constructed by analyzing the dependencies between different sets of data and the facts derived from them.Therefore, this approach does not rely on the analysis of physical phenomena within the object.
Examining the individual stages of the identification process, the first and most crucial step is to correctly define the task's objective and select the user-relevant variables.In this context, prior knowledge is absolutely key and must be possessed concerning the process being identified.This knowledge results from system observations and the acquired experience of an observer.This knowledge facilitates the proper selection of ''black-box'' structures and, consequently, the success of identification.Other critical stages include experiment planning and selecting the appropriate model structure.In the first stage, the experimenter is tasked with selecting input-output signals, the sampling interval, data filtering, and the removal of undesired signal components.In the second stage, the goal is to choose an appropriate model class and the tools required for modeling.Subsequently, the next stage involves selecting the identification method and an appropriate model quality indicator.
Then, parameter estimation for the model can be undertaken, followed by model verification.It is essential to note that if the designed model fits the actual measurements reasonably well, it does not necessarily mean it will perform well with all other system measurements.Verification should be conducted on a different dataset than the one used in the learning process.This ensures the correctness of the identification process.The ultimate test is the use of the created model in the control process and evaluating its performance.

A. NARX NEURAL MODEL
To model or identify dynamic systems with substantial nonlinearity, the nonlinear NARX model, which stands for Nonlinear AutoRegressive with eXogenous input, may be necessary.This model is defined by the formula 1.Its neural counterpart, known as NNARX, is a neural model commonly applied in time series forecasting.NNARX models incorporate a feedback mechanism that enables the transfer of information from past output values to the input, thereby enhancing the accuracy of predictions.These models find applications in diverse fields, including finance, economics, and automation.
where y(k) is a delayed output signal, u(k) is a delayed input signal, f (.) is a non-linear function, approximated e.g. by a neural network and ŷ(k) is a prediction of the next value at the signal output.The NARX neural model is visually represented in Figure 4, providing an effective illustration of its structure and practical prediction scheme.

B. ELMAN RECURRENT NETWORK
Elman Neural Networks are recurrent neural networks commonly used in tasks involving time series modeling and sequence prediction.The name of this network comes from James Elman, who introduced this type of network in 1990.An Elman network is a two-layer neural network, where the first (usually nonlinear) state layer is subject to one-step feedback.This is similar to Hopfield networks.
When considering the signals in an Elman network, it's worth dividing them into three different components.Equation 2 presents an R-element input signal vector in the k-th step of the network's operation.Then, in Equation 3, you can observe an N -element output signal vector from the first layer of the network in its k-th step (the network state vector).Equation 4, on the other hand, represents the N -element output signal vector of the second (output) layer of the network in its k-th step.
The operation principles of the first (Equation 5) and the second layer (Equation 6) of the discussed network are analogous to the equations describing a nonlinear, stationary, discrete-time state-space dynamic system.Equation 5 v(k) = F (1) W (c) v(k − 1) + W (1) u(k − 1) + b (1)  (5) In the case of such a neural structure, algorithms that have large step sizes (e.g.Levenberg-Marquardt algorithm) are not recommended for training Elman networks.It can be used as a model of a discrete, nonlinear dynamic system of the N -th order, trained using sequences of data obtained from the input and output of the object.Hence, its application in the current research.Moreover, it can be used for time sequence recognition and sequence reproduction.

C. JORDAN RECURRENT NETWORK
Jordan Neural Networks are recurrent neural networks commonly used in various tasks.They have a structure very similar to Elman networks.However, the difference is that in the case of Jordan networks, the feedback connection delays from the output layer and includes both layers of the network.
The operation principles of the first and second layers of the discussed network are presented using Equations 7 and 8.The first of these equations corresponds to a nonlinear state equation, while the second is a nonlinear output equation.Can be found an analogy with Elman networks and Equations 5 and 6. v(k) = F (1) W (c) y(k − 1) + W (1) u(k − 1) + b (1)  (7) Similar to Elman networks, Jordan networks can be used as models of discrete, nonlinear dynamic systems of the N -th order.They should be trained using sequences of data obtained from the input and output of the object.In terms of applications, similar to Elman networks, they are suitable for modeling (identification) of discrete, nonlinear dynamic systems, as well as for the recognition and reproduction of time sequences.

D. LRN-LAYER RECURRENT NETWORK
Layer-Recurrent Networks (LRNs) are structurally very similar to Elman networks.They are dynamic and recurrent neural networks.In the case of the latter, you cannot change the activation functions of the internal layer neurons, but for LRNs, it is permissible.An Elman network has only two layers and uses the ''tansig'' (hyperbolic tangent) function for the hidden layer and the ''purelin'' (linear) function for the output layer.In LRNs, there is a single delayed feedback loop around each layer of the network except for the last layer.This is a key difference that sets them apart from Jordan and Elman networks.
LRNs transmit the output of each layer to the next layer, creating a hierarchy of representations.This allows for an understanding of complex relationships and dependencies between different time steps of input data.Such a hierarchical structure also enables the modeling of long-term dependencies in sequential data.This is often a challenge in other types of neural network architectures and significantly distinguishes LRNs from them.They are sometimes used in conjunction with other types of neural network architectures, such as Convolutional Neural Networks (CNNs).
The next section will be a presentation of selected quality indicators that will allow for an effective and accurate assessment of the quality of the tested systems.

E. LINEAR REGRESSION MODEL WITH LASSO REGULARIZATION FOR MULTIPLE OUTPUT VARIABLES
Multi-Task Learning (MTL) aims to improve the generalization performance of a model by simultaneously learning multiple related tasks.In the context of linear regression, MTL allows us to leverage shared information across multiple output variables.The following describes a linear regression model with Lasso regularization applied to each output variable separately.
Consider a linear regression model with Lasso regularization for a single output variable: Here, Ŷi represents the predicted value for the i-th output, β 0i , β 1i , . . ., β pi are regression coefficients, X 1i , X 2i , . . ., X pi are input features, and ε i is the error term.
To prevent overfitting and encourage sparsity in the model, Lasso regularization is applied.The cost function with Lasso regularization for the i-th output is defined as: Here, J (β i ) is the cost function for the i-th output, m is the number of samples, Ŷij is the predicted value for the i-th output and j-th sample, Y ij is the actual value, and λ is the regularization parameter.
Another key aspect is model training.The objective is to minimize the cost function for each output variable: min This is achieved by adjusting the regression coefficients for each output using an optimization algorithm.The parameter descriptions are as follows: X represents the input data matrix, Y is the output data vector for a specific column, and λ is the regularization parameter, controlling the strength of regularization.The code iterates through each output column, independently training a model for each.This allows the model to capture task-specific patterns while benefiting from shared information.

V. QUALITY INDICATORS USED IN THE RESEARCH
In order to effectively assess the quality of the proposed solutions, it was decided to conduct extensive simulation tests, followed by practical validation.The assessment was made using quality criteria [47], [48]: • ISE -Integral of Squared Error defined by where e(t) is a control error.
• IAE -Integral of Absolute Error defined by where e(t) is a control error.
• MOE -Minimum of energy which is a integral of squared control signal • Correlation -Pearson's linear correlation coefficient.In this context, for the column X a in the matrix X and the column Y b in the matrix Y , with means Xa and Xb , and with n being the length of each column, the correlation coefficient corr(a, b) is defined by: • RMSE -Root Mean Squared Error is a performance metric commonly used in regression analysis to assess the accuracy of a model's predictions.It represents the square root of the average of the squared differences between actual values and predicted values.The formula for RMSE is as follows: • ME -for the Max Error calculation, which represents the maximum absolute error between actual and predicted values, the representation would be: The indicators used include integrals, the correlation coefficient and other methods, making them perfect for validating the methods used in the research discussed.The subsequent stages of the article present a sequence of experiments conducted by researchers along with the results obtained during these tests.

VI. RESEARCH TOWARDS THE DEVELOPMENT OF A MODEL OF A CEMENT MILL
Initial investigations were carried out using a comprehensive and multifaceted approach to develop an optimal cement mill model.The goal was to gain a thorough understanding of the situation and assess the potential of the models used for system parameter estimation.
In this chapter, tests involving four different neural network structures are presented, aimed at constructing a model for the cement mill object.The final subsection of this section provides a summary of the research procedures and effectively evaluates the quality of the models.
The construction of these models involved the application of various approaches and neural network training methods.Each of these methods required distinct approaches, and they possess different characteristics.The quality of their learning is also influenced by the chosen training method.For each structure, training was conducted using 200,000 samples with a sampling time of T = 5 seconds, providing insight into the system's performance over an 11.5-day duration.Subsequently, the next 200,000 records served as test data to validate the model's functionality.Depending on the specific case, the networks were trained using methods such as the Levenberg-Marquardt algorithm (''trainlm''), backpropagation with Bayesian regularization (''trainbr''), as well as gradient descent with momentum and adaptive learning rate adjustment (''traingdx'').
Table 2 provides a comparison of tests related to the neural network model.The information should be interpreted in a way that, for each number of neurons (the third column in Table 2), there are three variations of the number of TDL (Tapped Delay Line) delays.For instance, for the NARX neural network model with the ''traingdx'' learning method, a total of 9 different models were obtained because for each change in the number of neurons, three different approaches related to sample delay times were presented.

A. NARX NEURAL MODEL BASED ON A TWO-LAYER NON-LINEAR NETWORK (SERIES-PARALLEL MODEL)
In the initial phase, a nonlinear NARX neural model was examined in a serial-parallel configuration (Fig. 5).In this setup, appropriately delayed input and output values from the actual process are fed into the model.Predicted output values are obtained from the model's output.For this structure, it was necessary to introduce TDL (Tapped Delay Line) signal delay lines.In the case of this model, a two-layer neural network structure was employed.The first layer, the nonlinear hidden layer, consists of the hyperbolic tangent activation function (tansig).The second layer, serving as the output layer, consists of linear functions (purelin).The investigation involved examining the structure with various combinations of input signal delays, specifically 5, 15, and 35 data samples.However, it's important to note that not all model variants underwent this scrutiny.Multiple training methods were experimented with, including ''trainlm'', ''trainingdx'', and ''trainbr'', as well as different numbers of hyperbolic tangent neurons in the hidden layer (20,30,40).This imposed a significant computational load on the PC unit used for the tests, which was a Lenovo Y700 laptop running Windows 10, equipped with an Intel Core i7-6700HQ CPU operating at 2.60 GHz and 16GB of RAM.Due to technical limitations, not all tests could be executed, and as a result, some of the outcomes are not presented in this study.
The information in Tables 4 and 5 (in Appendix) outlines the segmented output values obtained from the NARX neural model under examination.To illustrate the model's performance, three distinct instances were selected for each 26372 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
of the six parameters, featuring varying numbers of TDL delays.
The authors presented one best result for each TDL value option.The original number of studies for this stage was 27 variants, resulting in 27 different models.Of course, it is not possible to include all the results, so it was decided to select the 3 best results for each TDL value.The reason for this is the fact that it was noticed that the manipulation of this variable has the greatest impact on the quality and fit of the model.Another very important factor is the training method.However, in each case the Levenberg-Marquardt method turned out to be the best.The ''trainingdx'' learning method (gradient method with ''inertia'' and adaptive change of the learning speed coefficient) is not suitable for the NARX neural model based on a two-layer unidirectional network.This is shown by all the results -for each model and each of the output values of the system, which significantly differ by even several orders of magnitude from other learning methods when looking at quality indicators.
Contrary to the assumption that a higher count of TDL delays consistently leads to improved results, the findings suggest a nuanced relationship between the model's efficacy and its structural composition.Notably, the 5-delay TDL model encompasses 60 network inputs, the 15-delay model involves 180 inputs, and the 35-delay model incorporates 420 inputs.These substantial disparities in input dimensions result in significantly heightened computational complexity at each operational stage of the model.
Upon closer inspection of the results, it becomes apparent that a model employing 35 TDL delays fails to yield a substantial improvement in quality indicators across all system output values.Consequently, opting for a model with either 5 or 15 delays emerges as a judicious choice.This conclusion is underpinned by a comprehensive assessment of all considered quality indicators concurrently, emphasizing the paramount importance of such a holistic approach in model selection.

B. NEURAL MODEL BASED ON ELMAN NETWORK
Further research checked the capabilities of the Elman network to identify a nonlinear dynamic object (Fig. VI-B).A detailed analysis of the model is presented in Section IV-B.For this structure, it was not necessary to introduce a delay line of the TDL signals, because there is feedback inside it that takes into account the dynamics of the tested system.
It was decided to examine the structure for various variants of the number of hidden layer neurons.Only one training method was used -''trainingdx'', which is recommended for this type of neural network.
Analyzing the results regarding the model using the Elman network from Tables 4 and 5 (in Appendix) conclusions can be drawn that this type of neural network performs poorly in the task of modeling a dynamic nonlinear system such as a cement mill.The results obtained in quality indicators significantly differ in accuracy from the NARX neural nonlinear model.It was observed that the best results were obtained for a network with 20 neurons in the hidden layer for each output value of the tested model.

C. NEURAL MODEL BASED ON JORDAN NETWORK
As part of the next stage of testing, the possibility of using the recurrent Jordan network to identify a nonlinear dynamic object was checked (Fig. 7).A detailed analysis of the model is presented in Section IV-C.For this structure, it was not necessary to introduce a delay line of the TDL signals, because there is feedback inside it that takes into account the dynamics of the tested system.Again, as in the case of the Elman network, the structure was examined for various variants of the number of hidden layer neurons.Only one training method was used -''trainingdx'', which is recommended for this type of neural network.The approach is therefore analogous to that in previous subsection.
The results for the model using the Jordan network are presented in Tables 4 and 5 (in Appendix).Based on them, it is concluded that the Jordan recurrent neural network, like the Elman network, does not perform very well in the task of modeling the cement mill system.Based on the results obtained, the similarity in terms of the quality of model matching is visible at a similar level, as was the case with the ARX model and the Elman network.In this case, the best results are also obtained for a network with 20 neurons in the hidden layer for each output value of the tested model.This is analogous to the Elman network.

D. NEURAL MODEL BASED ON LRN
The sixth stage of the study concerned tests using a dynamic and recurrent LRN network as a model of the cement mill system (Fig. 6.6.).A detailed discussion of the structure of this network is presented in Section 5.2.6.For this architecture, similarly to the Elman and Jordan networks, there was no need to introduce delay lines for TDL signals, because there is feedback inside it that takes into account the dynamics of the tested system.
However, with this structure it is possible to modify the number of samples in the feedback loop between the output of the hidden layer and the input of the network.Based on many samples, the author decided that the best choice would be to select 10 samples backwards.In the case of training a given network, an approach known from the Elman and Jordan network learning procedure was used.Therefore, the used structure was re-examined for different variants of the number of hidden layer neurons.The procedure was based on the ''trainingdx'' learning method.The literature says that it is recommended for recurrent neural networks and works best in the process of training such systems.The approach is therefore analogous to those of Subsections VI-B and VI-C.
Analyzing the results from Tables 4 and 5 (in Appendix) obtained as part of research on a model based on the LRN recurrent neural network, it can be concluded that there has been no breakthrough in improving the quality of the model.Looking at the results resulting from the quality indicators, it can be seen that the LRN network achieved in individual cases slightly better results than the Jordan network, but quite close to the Elman network.Generally speaking, however, this does not indicate a clear superiority of the model based on the LRN network over the model based on the Jordan network.The resulting differences may be the result of training error in a given trial and should be verified during several attempts to train the network in future studies.
Tables 4 and 5 (in Appendix) also provides an interesting conclusion that, similarly to the other recurrent networks used in the study (Elman and Jordan), the best results here are also achieved for a variant of the model based on a network with 20 neurons in the hidden layer.This is visible for each output value of the tested model.

E. NARX NEURAL MODEL BASED ON A TWO-LAYER NON-LINEAR NETWORK (PARALLEL MODEL)
In the seventh and last stage of research on creating a neural model of the cement grinding process, the nonlinear NARX neural model in parallel architecture was checked (Fig. 9).This type of model is also called the NNOE (Neural Nonlinear Output Error) model due to the fact that the input model draws some of the data from its output, instead of from the output of the actual process data.
In this model configuration, its input receives appropriately delayed input values from the real process.The output values, instead of the process, are taken from the model output with appropriate delays.Therefore, as in Subsection VI-A from the delay line of TDL signals.
It was decided to test only the three best models, i.e. those that performed best in the testing phase in terms of quality indicators (Tables 4 and 5 in Appendix).The research was carried out in the Simulink environment, which significantly facilitated the issue of model simulation.In Tables 4 and 5 (in Appendix) includes tests for parallel NARX neural models, which achieved the best results in the tests from Section VI-A, where serial-parallel models were tested.It was decided to check their effectiveness when the process output data will be data predicted by the model, instead of the original process data.Hence, it was decided to choose the three best models, because research on systems that do not describe the cement grinding process very well could not provide valuable results.It is expected that the results for the parallel model will be slightly worse than for the series-parallel model.The reason is the increasing prediction error.The main selection criterion was the ISE index, but attention was paid to the sufficiently low value of the IAE index (which was among the 10 lowest values of model indexes) and the high level of correlation.
Data presented in Tables 4 and 5 (in Appendix) show that the NARX parallel model does not perform as well in predicting the output values of the cement mill system.This is as expected.Comparing it to recursive models, it should be stated that its quality is at a similar or slightly worse level.
From the results in Tables 4 and 5 (in Appendix) it can be seen that the best results according to quality indicators are achieved by a model with 15 delays in the TDL line, trained with the Levenberg-Marquardt method and having 30 neurons in the hidden layer.In second place is a model with 35 TDL line delays, trained with the same method and also having 30 neurons in the first layer.Last place was taken by a parallel model with 15 TDL line delays and 40 neurons.This allows us to conclude that a more important aspect when creating a good model may be the appropriate selection of the number of neurons in the hidden layer than the modification of the number of TDL delays.As the results showed, it is worth paying attention to both of these issues.

F. CLASSIC ARX NONLINEAR MODEL
At this stage, the nonlinear classical NARX model was tested.Using the configuration dashboard located in the Identification Toolbox of Matlab, you can modify the appropriately delayed input and output values from the real process that are fed into the model.The model output therefore generates predicted output values.The main principle of operation of such an identification structure is presented in Section IV-A on the example of a neural model.In this case, in the configuration panel, the user can also add linear blocks working in parallel with non-linear blocks.This will be examined in detail in this course of comparative research.
The structure was examined for various variants of input signal delays, i.e. including 5 and 15 data samples.Unlike the research on the NARX neural model from section VI-A, in this case it was not possible to carry out tests with 35 delays at all due to hardware limitations.Therefore, this part of the research was abandoned and tests were carried out only with two different signal delays.The remaining advanced settings offered by the toolbox of a given program add-on have been left in the default state, as they are preliminary research on the construction of a cement mill model.However, for the sake of clarity of the article, only the top 3 results will be presented again.
Observing the results presented in Tables 4 and 5 (in Appendix) it can be concluded that the nonlinear ARX model, as part of preliminary research, presents a similar level of quality to the recursive models of Elman, Jordan, and LRN.However, certain restrictions should be noted in this matter.Both the wavelet network and the tree structure as functions of nonlinearity in the model do not significantly affect the quality of prediction.The same applies to the number of sample delays.However, the presence of a linear block is crucial when designing a model in the Matlab toolbox.

G. LINEAR REGRESSION MODEL WITH LASSO REGULARIZATION
In this part of the research, a multi-task learning model was used to predict various output variables related to the cement mill system.
The primary aspect of this testing phase involves training the MTL model for each output variable.In the loop, each output variable is chosen as the target value, and a linear regression model (fitrlinear) is trained using the input data.To mitigate overfitting, Lasso regularization is applied, with the regularization parameter (lambda) set to a configurable value, for example, 0.1.The trained model is then saved, and predictions are generated for the input data.
The study aims to construct a predictive model for various components of the mill system, treating each output variable as a distinct task.Although the current implementation utilizes linear regression and Lasso regularization, further enhancements can be explored by experimenting with different regularization parameters, investigating additional features, or considering alternative algorithms.The ultimate objective is to optimize the MTL model for improved prediction of the operational aspects of the mill system based on the provided input features.The results for the sample study are included in the Tables 4 and 5 (in Appendix).
For the MTL baseline model, it is advisable to employ cross-validation or Bayesian Information Criterion (BIC) for selecting the optimal lambda parameter instead of relying on subjective settings.This will then allow you to optimize the course of the test and thus may result in better results.In this case, however, the differences were not significant.The reason is the very high complexity and complexity of the signals.
The results regarding this stage of research are unsatisfactory.It was expected that the approach used was linear.The results are therefore somewhat similar to what was achieved in the tests with ARX and ARMAX.

H. SUMMARY OF COMPARATIVE STUDIES OF MODELS WITH REAL DATA
Neural NARX models performed best in this study.The remaining structures did not fare well enough.Moreover, Tables 4 and 5 (in Appendix) clearly show that models with the number of TDL delays of 15 and 30 neurons in the hidden layer, trained with the Levenberg-Marquardt method, performed best.Then it is a network with 35 TDL delays, followed by models with 5 delays and 30 neurons of the first nonlinear layer.It was decided to present an example result.In addition to graphical visualization of the designed neural structure, the number of neurons, inputs, outputs and activation functions, the authors also provide the opportunity to obtain information about the selected learning algorithms.In addition, users could track training progress, number of epochs, gradient progress, validation tests and training time in real time, which in the case of this large-scale study amounted to up to 50 hours.The neural network training desktop allows you to draw graphs as shown in Figure 10.They report the lowest mean square error for a given epoch, training state, and regression results.The graphical representation in Figure 11 illustrates that the NARX neural model, when configured in a seriesparallel architecture, effectively addresses the challenge of modeling a dynamic and non-linear system, such as a cement mill.The outcomes produced by the network exhibit substantial agreement with the original data derived from an actual industrial process.Upon revisiting Tables 4 and 5 (in Appendix), it is noteworthy that the number of neurons in the hidden layer plays a pivotal role in crafting the optimal model.For nearly every set of TDL delays, networks featuring 30 neurons in the first layer consistently yielded the most favorable results.An exception arises with the NARX structure incorporating 35 TDL delays and 40 neurons, where, due to technical constraints, the network training could not be executed reliably, leading to the omission of results in the article.The analysis reveals that a larger number of delayed samples does not necessarily enhance the modeling process.Consequently, there is no assurance that incorporating more historical input significantly improves the description of the cement mill dynamics, as the optimal results were consistently achieved with 15 samples back.
The decision was made to exclusively present the simulation results of the best-performing structure, as graphical observation indicates remarkable similarity among the graphs of all the optimal models.Thus, showcasing results for each model separately is deemed unnecessary, ensuring transparency in result presentation.It is visually evident that the predicted data aligns closely with the actual signals from the industrial facility, confirming a high level of accuracy in the model fit.Notably, the conducted tests involved 200,000 records for both the training and verification processes, equivalent to approximately 11.5 days of mill operation for each stage.It is essential to highlight that distinct datasets were employed for each stage of the testing process, maintaining the integrity of the evaluation.
The next section describes the extension of the existing research by an attempt to find a rational proposal for a neurocontroller working as an expert system predicting the cement mill control settings.

VII. CONCEPT AND RESEARCH OF A NEUROCONTROLLER APPROXIMATING EXPERT'S DECISIONS
This Section delves into the exploration of a neurocontroller aimed at approximating expert decisions.To achieve this goal, the chapter employs the concept of the inverse model for a given process.Specifically, the identification of the inverse model of the tested system, referred to as Inverse Modeling, is employed.Figure 12 illustrates the general structure outlined for this purpose.A notable distinction can be observed compared to the tests discussed in Section VI.In this context, predictions of input feeds to the cement mill are made based on historical input and output data of the mill.In the task of overseeing the mill's operation, it is imperative to maintain a stable degree of cement grinding, which is gauged through the measurement of specific surface area.Simultaneously, efforts should be directed towards minimizing electricity consumption while ensuring optimal product quality.Given the inherent delays in controlling milling systems and the unavailability of continuous measurements for product quality, it becomes essential to rely on various indirect indicators for effective control.These indicators encompass factors such as the mill's filling degree, the load on the elevator conveyor, and other pertinent variables of lesser significance.Given the intricate nature of the process, the decision was made to leverage the expertise of professionals in the development of the proposed regulatory system.
This Section undertakes tests employing two distinct approaches to address the identified problem, with the objective of thoroughly assessing the efficacy of the resulting neurocontroller concept.The research conducted in Section VI conclusively demonstrated that the NARX nonlinear neural model is the most suitable for describing the intricate, multidimensional, and non-stationary nature of the cement grinding process.Consequently, this model was employed in the subsequent research outlined in this section.
The research consisted of two phases, detailed in Subsection VII-A.In the initial phase, a nonlinear NARX neural model, utilizing a one-way network with a tangent activation function in the hidden layer and trained through the Levenberg-Marquardt method, was developed using real process data.This training involved 200,000 samples with a sampling time of T = 5 seconds, representing the system's operation over an 11.5-day period, as depicted in Figure 12.Subsequently, another 200,000 records were employed to assess the model's performance.This methodology aligns with previous research phases.Notably, in this instance, the output values of the process served as input data for the NARX model, with the expectation that the input values of the tested grinding system would be predicted at the output.This approach aims to create a neurocontroller capable of predicting new input signals for the mill system based on the input-output data of the process.
In the second testing stage (Section VII-A), the neural controller based on the NARX model underwent further evaluation.This time, an energy quality indicator was employed to scrutinize each feed to the mill predicted by the neurocontroller in terms of resource savings.This evaluation is pivotal in determining the suitability of artificial intelligence for controlling such complex processes.

A. NEUROCONTROLLER BASED ON THE NARX STRUCTURE FIRST TEST
In the initial phase of the ongoing research, scrutiny was applied to the nonlinear NARX neural model configured in a series-parallel arrangement, as illustrated in Figure 12.In this configuration, the model's input is supplied with appropriately delayed input and output values derived from the real process.Subsequently, the model's output furnishes predicted values for the feeds to the system.Once again, the introduction of a delay line for the TDL signals proved necessary.This model adheres to the structure of a two-layer neural network, akin to the approach outlined in Section VI-A.The first nonlinear hidden layer incorporates a tangent activation function (''tansig''), while the second layer, serving as the output layer, is composed of linear functions (''purelin'').Table 3 provides a summary of the tests conducted as a result of the research on the NARX model, which is visually depicted in Figure 12.
In the endeavor to regulate the mill's operation, it is imperative to maintain a consistent degree of cement grinding based on the measurement of the specific surface area.Simultaneously, the goal is to minimize electricity consumption while ensuring optimal product quality.Given the inherent delays in controlling milling systems and the absence of continuous measurements for product quality, it becomes imperative to rely on various indirect indicators for effective control.These indicators encompass factors such as the mill's filling degree, the load on the elevator conveyor, and other less significant variables.Owing to the intricate nature of the process, the decision was made to leverage expert knowledge in the development of the proposed regulatory system.
A non-linear NARX neural model, configured with a feed-forward network featuring a tangent activation function in the hidden layer, underwent testing.The model was trained using the Levenberg-Marquardt method and real process data, as illustrated in Figure 12.The training process involved 200,000 samples with a sampling time of T = 5 seconds, providing insights into the system's operation over an 11.5-day period.Subsequently, an additional 200,000 records were utilized to assess the model's performance, aligning with methodologies employed in previous research phases.
In this instance, the model's input received appropriately delayed input and output values from the real process, while the output provided predicted values for the feeds to the system (refer to Fig. 12).Introducing a delay line for the TDL signals proved necessary.This approach aims to create a neurocontroller capable of predicting new input signals for the mill system based on the input and output data of the process.12.In this scenario, two specific case studies were undertaken, focusing on variations in the number of neurons within the hidden layer.

B. SUMMARY OF COMPARATIVE STUDIES OF CONTROLLERS WITH REAL DATA
Upon scrutinizing the data presented in Table 3, it becomes evident that the NARX neural model, configured in a serialparallel architecture, effectively serves its purpose as a neurocontroller.The neural network adeptly replicates all input data from the cement mill facility, aligning closely with the data provided by the expert/operator.This assertion is substantiated by quality metrics, as evident in Figure 13.It is noteworthy, however, that determining the superior accuracy between the designed models remains challenging, as both yield precise predictions.
Upon analyzing the MOE results for both the actual process and the model in a series-parallel configuration, a noteworthy observation emerges.Despite utilizing process data, the model's proposed control signals exhibit more energy-efficient waveforms and values.While imperceptible to the naked eye (Fig. 13), the energy results unequivocally indicate that, despite a relatively small difference, further development of this concept is warranted, particularly considering the preliminary nature of these studies.
The purpose of designing the neurocontroller was to check whether it could function as an expert system providing control signals to operators managing a given industrial facility.An additional, optional issue was to test it according to the selected energy index.This is dictated by the search for broadly understood energy-saving control systems.The tests showed the possibility of improving energy indicators by a neurocontroller based on the NARX neural model (using the feed-forward network) in a series-parallel architecture.The proposed expert system based on a neural controller is characterized by high accuracy, consistent with the actual decisions of experts.
However, the energy results clearly show that although the difference is relatively small, considering that these are preliminary studies, it is worth developing this concept.Observing the result of the MOE quality indicator for the series-parallel model also shows a slightly lower value than for real data from the industrial process.This shows that the neural network used in the NARX model has more than just approximation capabilities.Due to the characteristics of its work and the ability to generalize and not just memorize, it can ignore some sudden jumps in control signals.
It is worth considering using a neurocontroller as an expert system providing the operator with signals controlling the cement mill.This undoubtedly requires further, extensive research, considering the fact that the process is non-linear, non-stationary, multi-dimensional and burdened with delays and fluctuations of parameters.
Therefore, we can proceed to a detailed discussion of the research results and then conclusions and open problems will be presented.

VIII. DISCUSSION OF THE RESEARCH RESULTS
Analyzing the results related to the identification and modeling of the Section VI cement grinding process, as well as the results related to research on the neurocontroller concept from Section VII, it can be concluded that they provide many important conclusions.
It is worth focusing first on the problem of identifying an industrial process in the form of a cement mill operating in a closed system.As described in Section III, this object is highly complex and complicated.It is characterized by multidimensionality, non-stationarity, high non-linearity, numerous disturbances and delays, as well as measurement uncertainties and parameter fluctuations.
The research examined various modeling frameworks.These were, for example, classic nonlinear models (NARX).In addition, the NARX nonlinear neural model was also used in a serial-parallel and parallel architecture (feed-forward network), as well as Elman, Jordan and LRN recurrent networks.
An important aspect is the superiority of the nonlinear neural NARX model based on the feed-forward network over other models.It should be clearly noted here that the prediction accuracy of the NARX model does not depend solely on the form of its architecture.The element of non-linearity plays a major role here.The results from Section VI showed this very clearly.The seemingly identical NARX structure achieved different quality metrics results each time.As it turned out, the cement grinding process is best reflected by the NARX model based on a two-layer feed-forward network with a tangent activation function in the hidden layer.However, this does not mean that it is the optimal solution.Other studies from Subsection VI-F using other nonlinear elements in the NARX model have shown that they do not produce the same quality as studies from Subsection VI-A.As these are preliminary studies, their continuation can be expected.It will be worth exploring other options as well, modifying the activation function and the number of network layers.
As research has shown, the structure of the network and the element of non-linearity play the greatest role.These are the issues that most impact the accuracy of the model.Next, the method of training the neural network is quite important, because each structure has its own preferences in this respect.Other aspects, such as the number of TDL delays or the number of neurons, do not dramatically change the quality indicators.It is therefore worth modifying them at the final stage of research, after selecting the appropriate structure of the neural model.
Observing the tabular and graphical results from Section VI, it can be concluded that classical nonlinear models and recurrent networks are characterized by similar quality of operation and similar accuracy in reflecting the behavior of the cement mill system.However, it cannot be denied that their quality is inferior to the NARX neural model in a serial-parallel architecture, based on a feed-forward network.
The next part of the research concerned the search for a rational method to support the control of the cement grinding process.Research into the search for a neural controller was characterized by a better understanding of the effectiveness of architectures than the preceding search for an appropriate structure of the grinding system model.
The author decided to focus only on one, most accurate model that could offer the best results.This approach in preliminary research is understandable.The purpose of designing the neurocontroller was to check whether it could function as an expert system providing control signals to operators managing a given industrial facility.An additional, optional issue was to test it according to the selected energy index.This is dictated by the search for broadly understood energy-saving control systems.The tests showed the possibility of improving energy indicators by a neurocontroller based on the NARX neural model (based on a unidirectional network) in a series-parallel architecture.The proposed expert system based on a neural controller is characterized by high accuracy, consistent with the actual decisions of experts.

IX. INSIGHTS INTO THE SUPERIORITY OF NARX NEURAL MODELS
Throughout our preliminary research on the identification and control of the intricate cement grinding process, the implementation of various non-linear models, particularly the NARX neural model, has yielded promising results.An important observation from our study is the better performance of the NARX neural structure compared to other methods.
The heightened predictive capabilities of the NARX neural model can be attributed to its innate ability to capture complex non-linear dependencies, within the multi-dimensional and non-stationary nature of the cement grinding process.In contrast to conventional linear models, the NARX neural model excels in accommodating the dynamic interactions and intricate relationships inherent in the grinding system.Including external input data into the structure of the NARX neural model further increases its ability to adapt to diverse process dynamics.
The success of the NARX neural model in this specific application may be traced back to its intrinsic capacity to learn and adapt to the non-linearities present in the data, providing a more accurate representation of the grinding system.Moreover, the feedback loops inherent in NARX neural models, stemming from the recurrent connections, contribute significantly to capturing temporal dependencies crucial for modeling the time-varying characteristics of the cement grinding process.
As industries increasingly seek precise and efficient control strategies for their processes, our findings suggest that the choice of modeling technique is pivotal.The preference for the NARX neural model over alternative approaches is not only supported by empirical evidence but also aligns with the intrinsic characteristics of the cement grinding process.The ability of the NARX neural model to effectively handle non-linearities positions it as a promising tool for accurate identification and control in similar industrial processes.
In light of these findings, future research avenues could explore the specific aspects of the cement grinding process that contribute to the efficacy of the NARX neural model.Additionally, further investigations into the optimal configuration and parameter tuning of the NARX neural model could provide insights into maximizing its performance in real-world applications.
In conclusion, the superiority of the NARX neural model in our study underscores its potential as a valuable tool for advancing the precision and efficiency of cement grinding process control.The insights gained from this research pave the way for the development of more robust and effective control strategies in the broader domain of industrial processes.

X. CONCLUSION AND OPEN PROBLEMS
The aim of this work was to design a model of a cement mill facility operating in a closed system.It was also expected to propose a sensible and practical control of the cement grinding process.
The authors made a detailed review of the literature and thoroughly identified the problem of identification in industry, neural networks and the cement grinding process.Existing solutions in this area were checked, as well as possible research development opportunities.This made it possible to effectively undertake research work towards designing a model of a cement mill facility operating in a closed system.Moreover, rational control of the cement grinding process was proposed.
The article presents the possibilities of using classical nonlinear models and nonlinear neural models for the task of identifying and modeling the cement mill system.Moreover, a controller based on neural algorithms was proposed for the task of implementing expert decisions in controlling a grinding facility.The use of artificial intelligence and machine learning in industrial processes is worth continuous development and research in this area.Neural networks have the ability to approximate any non-linearities, and their structure can be tuned using real data recorded during measurement experiments and everyday operation of such an object.
It is believed that this preliminary research lays the foundations for further development of the problem and the search for an optimal solution in the field of cement mill control and its modeling.It is worth noting that in the era of the ongoing fourth industrial revolution and the search for more and more innovative control methods and, above all, developing artificial intelligence, such algorithms are valued and worth further research.
As part of the research, the authors first examined the cement mill system in closed architecture for several days during its actual, daily operation.Then, it was necessary to archive valuable data that could be used to design the model and controller.After analyzing the literature, classical and neural models worth examining were selected.It was also decided to assess the quality of nonlinear models.This resulted in results in the form of several dozen classical and neural models tested by computer simulation, which were thoroughly assessed.By selecting the best structure, it was decided to propose a controller that would enable prompting of control decisions and would be able to improve energy indicators.
The designed models were verified based on data sets from real operating conditions of the system.These were sets different from those used in the model creation process.Moreover, the models take into account the non-stationarity of the object and parameter fluctuations and are based on many records collected from the real process.Data from several days of mill operation (approx.11.5 days) were also tested.This resulted in the most comprehensive model possible.Everything was tested using selected quality indicators.
A method of using a neural model to implement a neurocontroller, which is expected to act as an expert system when controlling the cement grinding process in a closed loop, was also proposed and examined.The task is to support the operator's decisions.For this purpose, the most accurate model from the research phase on the identification of the cement mill system was used (Section VI).This enabled the identification of an inverse model for control 26380 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
purposes.The controller was tested using quality indicators and its ability to improve the reduction of control energy indicators was checked.It is also resistant to disturbances and non-stationarity elements occurring in cement grinding.This is due to, among others, from tests performed on an extensive data set.
The method based on artificial neural networks and nonlinear models has many advantages.First of all, it enables precise modeling and the search for a rational method of controlling the cement grinding process, taking into account its non-linear nature.Additionally, the ability to fine-tune the network structure based on actual measurement data allows for better results in practical applications.
The work shows that important properties of neural networks can be successfully used to model even such complex, non-stationary, multi-dimensional and non-linear objects as cement mills.The identification of this object as part of preliminary research was successful and gives optimism for the next phase of searching for an appropriate structure.The most important issue, i.e. the search for an appropriate method of controlling a given process, also provides enormous scope for further research.It was confirmed that the controller based on the neural model is able to approximate the control signals given by experts on a real object.It also has the ability to slightly improve the consumption of raw materials that are used at the entrance to the mill.In the future, it should also be checked how it affects the energy indicators of the facility itself, including: electricity consumption of the mill and the tower bucket conveyor that operates in this system.An important role in improving the energy indicators of the controller was played by collecting control signals set by the best facility operators who have high experience.
The whole issue thus leaves several open problems.They mainly concern the modeling of a cement mill at selected operating points, for various types of cement and taking into account various model structures.This translates into the search for various types of control methods, both for selected operating points and as expert systems that provide operators with control signals.Further research will certainly address this issue and may result in practical implementation in a real facility in an industrial plant.
It is planned that the algorithms proposed in the article will have practical application in the cement industry in the cement mill system.Their use is planned as an expert system supporting operators' decisions in controlling and prudent management of the cement grinding process.
To sum up, preliminary research into the identification and control of the cement grinding process using artificial neural networks and other nonlinear models has shown promising results.This method can contribute to improving the efficiency and quality of the cement grinding process, which is important in the industry in the era of the ongoing industrial revolution and the development of artificial intelligence in many applications.

FIGURE 3 .
FIGURE 3. The cement mill examined in this article in its actual form (based on [1]).
corresponds to the nonlinear state equation showing how the current values of state variables v(k) depend on the previous values of state variables v(k − 1) and the previous values of input signals u(k − 1).Equation 6 is the nonlinear output equation, illustrating how the current values of output signals y(k) depend on the current values of state variables v(k).

FIGURE 11 .
FIGURE 11.Comparison of actual measurements with those predicted by the NARX neural model (based on [1]).

FIGURE 13 .
FIGURE 13.Comparison of actual measurements with those predicted by the NARX neural controller (based on [1]).

TABLE 1 .
Comparison of tests for research on the neural model.

TABLE 2 .
Comparison of tests for research on the neural model.

TABLE 3 .
Test results of the neurocontroller based on the NARX model.

Table 3
presents a compilation of tests conducted as an outcome of the investigation into the NARX model, which is visually represented in Figure