Task-Relevant Encoding of Domain Knowledge in Dynamics Modeling: Application to Furnace Forecasting From Video

Waste incineration plants are complex dynamical systems that rely on expert human operators to maintain steady combustion, by observing real-time in-chamber video feeds. Real-time plant forecasting provides vital operational support in decision making, and applying machine learning to automatically learn dynamics forecast models from video feeds is an attractive means to realise this. However, learning complex dynamics in systems that requires cost-efficiency remains an open research problem. Specifically, modelling plant dynamics in real-time is challenging due to uncertainties caused by inhomogeneous waste inputs, requiring complex learning that impedes real-time modelling. To address this, this paper presents a real-time data-driven framework for generating video forecasts, by incorporating task-relevant domain-knowledge, during learning. Specifically, this method combines dynamics modelling and forecasting using dynamic mode decomposition, with Fourier transformations informed by expert operator heuristic knowledge for encoding task-relevant frequency information inside the learning process. Experiments in this paper demonstrate that the proposed framework captures intuitive physical aspects of the underlying physiochemical process, with a greatly reduced computational runtime in comparison to standard approaches, allowing for application in real-time domains. Forecasted video predictions are accurate over short time horizons, and capture important system characteristics over longer time periods.


I. INTRODUCTION
Municipal industrial waste-incineration plants are an increasing popular alternative to land-fills [1] for waste disposal and energy recuperation [2]. While prevention and minimization is key to waste-management, international policy objectives highlight the need for improving installation environmental performance [3], often via innovative technologies and automation [4]. Incineration automation is particularly desired to ensure consistent environmental and energy outcomes [5].
A key operational objective in waste-incineration automation, is real-time forecasting of long-term behavioural trends The associate editor coordinating the review of this manuscript and approving it for publication was Yi Zhang . within a combustion chamber (e.g., combustion growth/decay over a period), to assess stability and ensure consistency of energy output. Currently, manual forecasting is performed by operators, using in-chamber video feeds to make speculative estimates of changes in the dynamical process, supported by simple automated solutions such as image classification [6]. Automated methods for modelling combustion, and generating real-time videos forecasts of future chamber behaviour, can help support operator decision making, and provide quantitative combustion assessments.
Traditional plant modelling and forecasting (e.g., in manufacturing or energy production domains) often uses first-principle analytical models with detailed understanding of the known physiochemical process and prior identification of control parameters and processes laws [7]. However, VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ waste-incineration plants presents challenges not traditional addressed, as combustion involves complex to model dynamical behaviour, with high uncertainty about the composition of the waste input. Specifically, inhomogeneous waste inputs [8] (e.g., uncertainty in qualitative characteristics such as calorific and moisture content of waste [9], [10]) cannot be determined a priori, and results in challenging to model combustion events, e.g., flame stagnation.
In the absence of detailed analytical models addressing uncertainties, an intuitive forecasting approach is to use datadriven machine learning, to automatically learn models of combustion behaviour, from the video feed. Specifically, in the context of forecasting long-term behavioural trends, spatio-temporal decomposition methods are well-suited to explicitly decompose high-dimensional time-series measurements into coherent spatial structures and corresponding time dynamics [11], and as such can be used to extract task relevant dynamics from video feeds, i.e., extract components of a video associated with known dynamical frequencies.
However, naive application of spatio-temporal methods (e.g., dynamic mode decomposition (DMD) [12]), is challenging to apply to real-time video systems, due to the high computational cost and complexity. Real-time implementations of DMD with improved computational benefits (e.g., Streaming DMD [13]), unfortunately, necessitate projecting measurements to a significantly low-rank subspace, which is challenging in situations involving non-autonomous (input-driven) uncertain dynamics, such as waste inhomogeneity. In addition, complexity and uncertainty in dynamics can result in unstable learnt dynamics, requiring complex modelling often via deep learning [14], [15].
As such, practical means for applying dynamical learning techniques in domains that require both cost-efficiency and intuitive modelling remains an open research question.
To address this, this paper proposes the Task-Relevant Encoding of Domain Knowledge (TREK) framework for the real-time modelling and forecasting of dynamical systems from high-dimensional, long-term measurements with uncertainties, by incorporating task-relevant domain knowledge into the learning process. Specifically, cost-efficiency is achieved by encoding expert heuristic knowledge about the dynamical system (e.g., combustion decay rates) as part of the decomposition process (FIGURE 1), thereby learning only dynamic modes relevant to the task. This approach exploits the relationship between Fourier transformations and dynamic mode decomposition, to preprocess measurements with uncertain dynamics into a decomposition amenable form. Specifically, a Fourier spectrum manipulation is used to extract task relevant frequency components from a sequence, which are then mapped back to the input space as a low-rank, task relevant approximation. As such, the Fourier transformation converts high-rank measurements into a low-rank coordinate system amenable to decomposition with DMD, while remaining agnostic about the complexity or uncertainty of the underlying system. By exploiting this relationship, dynamics modes can be quickly learnt that capture intuitive physical interpretations of task relevant combustion behaviour. Results in this paper demonstrate this framework learns interpretable models that capture important physiochemical characteristics of combustion, and can generate video forecasts in realtime, which remain accurate within a reasonable forecast horizon.
The key contributions of this paper are as follows: (i) Identification of limitations: This paper presents first evidence that current spatial-temporal methods are unable to model complex dynamics in the presence of uncertainties in real-time, severely limiting applications in real-world scenarios. (ii) Novel real-time framework: A computationally efficient framework, is presented to address these limitations, by exploiting spectral manipulation to reduce learning complexity. This framework is agnostic to the underlying dynamics complexity, and can provide robust and real-time dynamical learning for real-world systems. (iii) Evaluation and Results: This framework is evaluated in comparison to commonly used data-driven dynamics learning methods, in (a) a high-dimensional signal simulation example, and (b) in a dynamics learning task using real in-furnace video data. The framework extracts intuitive dynamical characteristics of the furnace in realtime, with potential for application to other fields that require time-efficient dynamical learning. The remainder of the paper is organised as follows: §II presents a background of operational control for industrial waste incineration plants and data-driven dynamical modelling approaches, §III outlines prior work in extracting intuitive dynamic models via spatio-temporal decomposition, §IV presents the novel approach of exploiting spectral manipulation for performing decompositions in real-time. §V presents an experimental evaluation of the approach, both in an intuitive simulation example, and applied to an long-term industrial furnace video feed. Finally, a discussion and future applications is presented in §VI and §VII.

II. RELATED WORK
In the following section, the role of human-expert knowledge in industrial furnace control is identified, as well as prior state-of-the-art machine learning methods for modelling dynamic behaviour.

A. INDUSTRIAL FURNACE CONTROL AND FORECASTING
To motivate the proposed approach, consider a general overview [16] of the standard waste-incineration process (FIGURE 2), whereby: (a) waste is inserted to a feeder via a separate crane [8], and released into the chamber, (b) actuated staircase grates slowly agitate and transport waste [17], (c) ignition and combustion is executed in a central region, (d) monitoring and control is typically performed by PID controllers using in-furnace sensors [18], with refined manual control performed by human operators monitoring in-furnace video cameras.
While this process seems relatively straightforward and autonomous, operators are vital in maintaining combustion stability by monitoring in-chamber video and periodically overriding process controls (e.g., actuation speed or airflow) to ensure consistent uniform combustion on the staircase. A key factor that affects control decisions, is the large lag time inherent in the system, often of the order of tens of minutes between observations and system state changes. Specifically, this delay is caused by numerous inherently slow processes, e.g., : (i) slow necessary agitation of the solid waste along the staircase, to ensure consistent and complete combustion, or (ii) variance in ignition and combustion dynamics caused by waste moisture and composition.

B. OPERATOR DOMAIN KNOWLEDGE
To maintain stable combustion given these delays, it is imperative that operators accurately anticipate changes in combustion behaviour ahead of time, and provide control inputs accordingly. As such, operators develop forecasting heuristics to foresee long-term furnace state changes, thereby anticipating changes. This approach enables a generalised comprehensive outlook on the slow, delay induced chamber dynamics, while being unconcerned with insignificant short-term events such a local waste movement. Specifically, operators are attentive to factors that induce long-term behaviour seen in the video feed over a period of 5 − 20 minutes, e.g., : (i) whether waste fails to incinerate completely within the time period, indicating either an inadequate or oversaturated feed rate, (ii) changes in the fire strength (e.g., change in colour, peak height or distribution of flames).
Specifically, these heuristics can be described as taskrelevant domain knowledge, and are directly related to specific aspects of furnace state behaviour, and as such will vary depending on the combustion scenario. This knowledge of long-term dynamical trends is quantified and collated as the set of task-relevant domain knowledge, τ = {τ 1 , τ 2 , . . . τ D } ∈ R D , that the operator employs when forecasting (e.g., the approximate time for waste to traverse the system is τ d = 10 minutes). Operators compare this knowledge against current observations to inform which control operations should be applied ahead of time to ensure stability. While this knowledge is only approximate (given variations in waste content), it provides key temporal information for forecasting, and is crucial for informing operational decisions in advance of furnace state changes.

C. DATA-DRIVEN MODELLING
To help support operator decisions, automating this speculative heuristic forecasting can provide numerous operational advantages, including: (i) using forecasting predictions of expected combustion changes ahead of time to ensure furnace efficiency, (ii) a quantitative means to assessing different control strategies, (iii) the ability to incorporating models into simulations for offline operator training.
In the absence of detailed understanding of the physiochemical system and lack of explicit control laws due to waste-inhomogeneities, an appealing approach to automating forecasting is to use machine learning methods to model the dynamics from the available video data, and use learnt models to generate videos of forecasted future behaviour [19]. While traditional stochastic models are often used for forecasting (e.g., vector autoregressive moving average (VARIMA) [20]) these do not generalise well to high-dimensional complex inputs [21]. As such, prior work forecasting industrial furnace dynamics [22], [23] is generally limited to multivariate, but low-dimensional forecasting, and is unsuitable for generating high-dimensional video forecasts. To address this, state-of-the-art methods for generating video forecasts are often deep-learning based [24], [25], and specifically for learning combustion dynamics this includes recurrent and convolutional neural network approaches [19], [26]. These methods are subject to the standard deep-learning challenges, VOLUME 10, 2022 e.g., lengthy model training time and large sample requirement, however, an additional key deficiency is the lack of interpretability in models (known as the black box problem). Learning interpretable models is of paramount importance [27], not only for furnace safety and integration with existing tools [28], but also to necessitate the desired aim of extracting frequency components corresponding to the longterm dynamics, while ignoring short-term anomalies. Standard dynamical modelling approaches that learn black box models of the dynamics from video [19], require just as complex post-hoc analysis to derive physical intuitive meanings of the model structure. As such, a key challenge is designing a modelling framework that provides both accurate generated video forecasts, as well as intuitive coherent outputs that explain the underlying physiochemical processes in terms of the dynamical behaviour.

III. PRELIMINARIES
To address the problems of dynamics modelling from video, a pertinent approach is spatio-temporal decomposition, which explicitly extracts intuitive spatial structures and corresponding time dynamics from high-dimensional measurement time-series. These structures provide clear physical interpretations of the underlying process, well-suited to the application goals of data-driven plant modelling. As opposed to traditional machine learning methods which require numerous samples for generalised modelling, decomposition requires only a single sample sequence in which to infer dynamics, making this vastly more computationally suitable to data-limited environments. Decomposition has previously been applied to many tasks involving complex dynamical systems including jet [29] and nuclear reactor analysis [30], soft robot identification [31], epidemiology [32], and even financial trading [33].
Specifically, given K time-sequential measurements X ∈ C P×K from a dynamic system of the form dx/dt = f(x, t), spatio-temporal decomposition describes this system as a superposition of empirically estimated basis vectors, known as the dynamic modes [34]. While outside the scope of this paper, the interested reader is referred to [35] for a formal rigorous definition, and [11] for illustrative concrete examples.
In the following, a brief outline of Dynamic mode decomposition (DMD) [12] is presented for extracting these modes. For detailed derivations and additional implementations, the reader is referred to the following well-established texts [11], [34], [35].

A. DYNAMIC MODE DECOMPOSITION
Dynamic mode decomposition is a common approach to estimating dynamic modes, by assuming that dynamics are driven by a linear operator A ∈ C P×P , where: This is commonly described as finding the operator A that minimises the error between two snapshot matrices [11]: which can be solved in a least-squares sense by: where . , x k ], † denotes the Moore-Penrose pseudoinverse, and F is the Frobenius norm. In the context of video processing, a measurement x k can be a tall vector consisting of RGB values for each pixel in a frame. Given this linear dynamics form, it is assumed that measurements at time t k are represented as a Fourier-like expansion of M spatio-temporal structures φ m ∈ C P (also known as dynamic modes), characterised by growth rates δ and frequencies ω [36]: where b m is the amplitude of the corresponding mode. This is more commonly expressed in discrete form as [11]: where = λI M ∈ C M×M is a matrix of eigenvalues that define the dynamical time evolution of the dynamic modes respectively as growth/decay and oscillation frequencies [34]. As such, the goal of DMD is to find these M dynamic modes via an eigendecomposition of A.
Practically, finding this eigendecomposition is computationally challenging, and commonly a low-rank approxima-tionÃ ∈ C N ×N is sort after instead. Specifically, proper orthogonal decomposition (POD), also known as principal component analysis (PCA), is first performed to find a low-rank subspace spanned by the first N highest energy spatial modes. POD spatial modes are not the same as the spatial structures of DMD dynamic modes, as POD modes are extracted solely based on sequence energy ignoring dynamics. Given these highest energy spatial modesŨ ∈ C P×N , a low-rank approximation of the dataX is given by the singular value decomposition [37]: where is a diagonal matrix of the singular values of X, and U,V are matrices of the left and right singular vectors respectively.
Selecting the number N of columns of U to use as the basis U is often performed by evaluating the singular values in against some threshold, and the approach taken in this paper follows the one in [36]: where σ 2 are the singular values of X sorted in decreasing order, and ε 1 is a tunable threshold.
Given low-rank POD modesŨ, this defines the subspace on which the low-rank operatorÃ evolves the dynamics: and the goal becomes finding an eigendecomposition of A. Note, commonly it is assumed there is a one-to-one correspondence between the number of spatial and spectral modes, i.e., that the number of POD modes (the spatial complexity N ) is equal to the number of dynamic modes (the spectral complexity M). For further details on dealing with situations not of this form, see [36].

B. STREAMING DMD
In the context of real-time systems, DMD can be prohibitively computationally expensive. Specifically, for a measurement sequence X ∈ C P×K , computing the low-rank basis (6) involves a singular value decomposition (see TABLE 1), which is computationally infeasible for real-time systems involving high-resolution, long-term video sequences where both terms are very large.
Streaming DMD (SDMD) [13] is well suited for large values of P or K , by bypassing the need for computationally expensive SVD operations to find the low-rank subspace. Specifically, an iterative Gram-Schmidt approach to low-rank approximation of (6) is used, which is shown to be equivalent to standard DMD and the interested reader is referred to derivations in [13].

C. EXTRACTING TASK-RELEVANT DYNAMIC MODES
Given a learnt decomposition approximating the global dynamics ( §III-A), further analysis can be performed to extract only modes associated with known frequencies. Prior domain knowledge τ of the system §II-B can be used to to select only task-relevant dynamics, known as task-relevant dynamic modes, from the global model, in a post-hoc manner i.e., { τ , τ } ∈ { , }. Reconstructing a dynamical system using only these task-relevant modes, as outlined in FIGURE 3, is formally given as the linear reconstruction of these task-relevant terms: In the context of long-term furnace forecasting, these task-relevant modes are dynamical modes associated with task-relevant changes in the furnace dynamics, e.g., long-term (low frequency) changes in combustion. In summary, this approach first decomposes the entirety of the global dynamics with DMD eigendecomposition, and then performs posthoc extraction of task-relevant frequencies by using domain knowledge τ to identify locally relevant components.

D. LIMITATIONS
However, while post-hoc extraction ( §III-C) of dynamic modes via either standard or streaming-based DMD is appealing, these methods suffer from a number of key problems that limit application in complex real-time systems such as furnace modelling, specifically: (i) Inefficiency of decomposition: As outlined in §III-C and FIGURE 3, the standard DMD approach first decomposes the global dynamics, and then selects the task-relevant dynamic modes based on consulting τ . This all-inclusive approach introduces inefficiencies, and does not allow for selective decomposition of only relevant dynamic modes based on τ . Alternative formulations such as Bayesian priors [39], [40] can be used to bias learning in favour of known task-knowledge, however these often involve computationally expensive Monte Carlo sampling making them ill-suited to this domain. (ii) Uncertain dynamics: In the context of furnace video decomposition, the underlying physiochemical combustion process are non-autonomous, with inhomogeneous input waste driving the dynamics. While the DMD formulation assumes a linear dynamical system (4), the relationship to Koopman spectral analysis [34], [41], [42] allows for learning non-linear dynamical systems, either via approximation to a linear system with the standard DMD approach or the use of embedding functions to explicitly linearise the dynamics in a higher-dimensional space [14], [43]. However, linear approximation can be unstable and provide poor forecasting, and finding appropriate embedding functions for real-world systems is challenging, often requiring complex deep learning [14], [15] for relatively simple non-linear systems. (iii) Computation speed of Streaming DMD: While iterative approaches to DMD greatly reduces the computational complexity (as seen in TABLE 1), this speed benefit is conditional on N min(K , P), i.e., there is a low-rank subspace that can be easily found via iteration. In the context of real-world systems, it is often not easy to find this low-rank subspace [44], with factors such as noise [36] or rank-deficiency [34] requiring either a large N , or spurious results requiring alternative (computationally expensive) DMD formulations such as delay-embeddings [36]. As such, real-time modelling of complex systems such as furnace dynamics is challenging even with iterative streaming approaches. The uncertainty of the system introduces an inherent contradiction, requiring both speed, and computationally expensive methods to overcome.

IV. METHOD
To address the problem of real-time learning of dynamical systems, this paper presents a method for approximating uncertain dynamical systems within an efficient low-rank framework, by exploiting the relationship between spectrum manipulation with Fourier transformations and DMD, as outlined in FIGURE 3.  Step 1) Preprocesses via a low-cost FFT to find D bases, with one-to-one correspondence to terms in τ , and Step 2) DMD characterises only D terms related to the task-relevant dynamics, with no need for additional post-processing. Output: Given task-relevant dynamic modes, forecasts are computed for any t -timesteps into the future.

A. OVERVIEW
As a high-level overview, this framework (i) uses a discrete Fourier transformation as an alternative to POD preprocessing, to instead decompose the sequence into a set of spectral Fourier components, (ii) uses spectrum manipulation to extract only task-relevant frequencies defined by τ , thereby approximating a low-rank spectral subspace of rank D < N , (iii) performs streaming DMD on data in this low-rank subspace to learn real-time decomposition models.
In the context of DMD, this approach could be seen analogous to the standard preprocessing approach of finding N maximum energy spatial (POD) modes (6), except in this case the aim is to find D frequency spectral (FFT) modes defined by the task-knowledge τ . As such, both the computational advantages of FFT over POD, and the ability to selectively extract relevant Fourier terms, enables this framework to be both computationally efficient and embeddable with taskrelevant knowledge.

B. METHOD
To illustrate this approach, without loss of generality consider the standard application of a discrete Fourier transform (DFT) to a one-dimensional time-series x ∈ C K sampled at rate f s . A discrete Fourier transform decomposes this sequence into J = K frequency components, with sample frequencies s = [0, 1, 2, . . . , (K − 1)]/f s , giving Fourier terms y ∈ C J . Each term y j is expressed as a weighted summation of all elements in x, with a corresponding trigonometric dynamics [45]: x j e −i2πkj/J with inverse transformation: As such, (10) shows any sequence can be approximated into J discrete terms y j , with each term driven by a complex cosine/sine pair. In the context of DMD, this has a clear relationship to the linear dynamical system in (4), and as such this sequence is perfectly expressible with a linear operator with J complex eigenvalues. This relationship has previously been analysed in [46], and has been shown that under certain conditions applying DMD to mean-subtracted linearly independent datasets is equivalent to DFT. In addition, this is a trivial solution to Koopman's theory in J -dimensional space [47], as such guaranteeing the sequence is decomposable.
However, while (10) states that the sequence is driven by a linear dynamical system in an J -dimensional-space (equivalent to a K -dimensional-space as all terms y j are used), this is not practically useful, as DMD requires that there is a lowrank (linear) subspace N < K in which to drive theÃ linear operator (8) along.
To address this, note that these Fourier basis are orthogonal, and therefore the inverse transformation (11) using only a subset of D < J terms will also remain linear. Performing the inverse transformation using D terms results in a new measurement sequence perfectly expressible in a D-dimensional space. As such, this approach is amenable to using the task-relevant domain knowledge τ ∈ R D to select only a subset of those coefficients relevant to the task at hand to form a spectrally-rank-reduced sequencex τ ∈ C K . Specifically, given task-relevant frequency information τ = {τ 1 , τ 2 , . . . τ D } the inverse FFT computation can be performed as: ζ y j e i2πkj/J (12) where ζ is an indicator function determining if a specific term y j should be included in the reconstruction. For example, a low-pass filter could be given as: indicating that if the corresponding sample frequency of this y j is less than a desired task-relevant term τ , it should be included in the reconstruction. As such, the number of dynamics modes N used for the DMD computation is given by the number of terms y j where ζ = 1 By incorporating task-relevant frequency information τ the sequence X is transformed to a new sequenceX that is guaranteed to be linear in terms of D < K frequency components, and as such is a linear solution in D. In the context of furnace modelling from video, this allows for the extraction of key long-term characteristics which are indicative of incineration state changes, and discarding of unnecessary frequencies such as flame flickering.

V. EXPERIMENTS
To evaluate the proposed TREK framework, this section outlines experiments examining the suitability for both modelling and forecasting. Without loss of generality, experiments use the Total-Least Squares implementation of standard DMD [48] and Streaming DMD [49], which minimises measurement error bias and is more suitable for real world systems such as video.

A. SIMULATION
To demonstrate this approach, consider the following illustrative example of decomposing a high-dimensional sequence with a large number of measurements, into its constituent dynamic modes. As outlined in §III, decomposition and forecasting requires only a single measurement sequence, as opposed to traditional machine learning.
In this, a sequence is generated where N = 101 random spatial modes, each of size P = 12000 (equivalent to the dimensionality of an RGB image) are generated from a random P dimensional orthonormal basis. These modes are evolved according to (4) using N corresponding linearly spaced frequencies δ = [− t −1 , . . . , t −1 ], where t = 1e − 2, for t = 150s. The resultant of this is a data sequence X ∈ C 12000×15000 comprised of exactly 101 dynamic modes. The sequence energy is shown in FIGURE 4(a) (light grey), where it is seen that this is a relatively complex mixture of trigonometric functions.
The goal is to characterise only long-term trends in the sequence, corresponding to the task-relevant information τ = [−1Hz, 1Hz], i.e., D = 2, shown in FIGURE 4(b) (light grey). Experiments are performed using an Intel i9-9900K 3.60GHz, with 128GB RAM, conducted in Python 3 with experimental notebook located at. 1 In this paper, the standard DMD approach uses the well-known PyDMD implementation [50] and the streaming DMD approach is based on the dmdtools codebase [51] supplementary to [13]. Without loss of generality, a forked version of dmdtools is used in this paper [52], which includes additional Python implementations and minor bug-fixes.
Initially, standard DMD ( §III-A) is used to decompose this sequence into N = 101 dynamic modes. Given this decomposition, a reconstruction of the sequence is performed, as shown in FIGURE 4(a). In this, it is seen that the DMD reconstruction utilising all learnt dynamics (both short and long term) perfectly matches the sequence energy, and therefore characterises the dynamics of the sequence. However, as the goal is to extract long-term dynamics from this sequence, a second reconstruction is generated utilising only the long-term dynamics learnt from the decomposition, following the post-hoc extraction methodology outlined in §III-C. From this, FIGURE 4(b) demonstrates that DMD has also learnt the long-term components, as the reconstruction matches the long-term dynamical component of the sequence. Additionally, as shown in TABLE 2, standard DMD correctly extracts all 101 dynamic components from the sequence (including the desired task relevant ones in τ ), resulting in a low mean squared error (MSE) between the original sequence and the DMD reconstruction. However, while accurate, this approach is computationally expensive, due to computing the low-rank basis (6) of X as seen in TABLE 1. In fact, the time taken to compute a DMD model vastly exceeds the sequence runtime (as seen in TABLE 2), and is therefore unsuitable for real-time systems.
To address this slow learning speed, streaming DMD (SDMD) can be used as an iterative solution, and as such this method is evaluated on the same sequence. As shown in TABLE 2, SDMD also correctly identifies the dynamic components, and obtains a similarly small MSE. However, the runtime remains high, due to the high rank requirement of the data (N = 101) dramatically scaling the computation time (as seen previously in TABLE 1).
Given the poor runtime of both standard DMD and the iterative SDMD, a naive approach to address this problem is to choose fewer modes (N = 2) to decrease runtime. However, this results in poor dynamics learning, seen in the failure to characterise any relevant frequencies in FIGURE 4(b), and high MSE in TABLE 2. This is due to a key failure, that being the dynamics must evolve along a low-rank operators (8). Simply choosing N = 2 from a system explicitly containing 101 dynamic modes, provides no guarantees that this is the correct dimensionality for the low-rank operator, as such no correct dynamic modes are extracted.
As an alternative to all these above approaches, the TREK framework is applied with the aim of extracting the task-relevant dynamic components of the sequence, within a reasonable running time. Following the methodology in §IV, task-relevant information τ is used to inform extracting N = 2 task-relevant spectral components from the data with (10) and (12). Streaming DMD is applied on this spectral low-rank data to decompose only the task-relevant (longterm) dynamics. The results for the corresponding reconstruction are shown in FIGURE 4(b), where it is seen that like standard DMD, the TREK reconstruction has correctly characterised the long-term dynamics. Results in TABLE 2 show that similarly the MSE as compared to the long-term dynamics remains small, and it is seen that the computation time is vastly more suitable for real-time, resulting in a 98.8% decrease in runtime. Note, the non-task relevant dynamics are not learnt with TREK, and as such there is a high MSE when comparing against all dynamic characteristics in TABLE 2. However that is both unsurprising and unessential for the goals of this approach.
To evaluate the robustness of the approach, this experiment is repeated while varying the number of constituent spatial modes in the range N = [10, 90], thereby exploring application of each method to dynamical systems of varying complexity. These results are shown in FIGURE 4(c), where initially it is seen that the runtime for DMD remains high regardless of model complexity, due to the expensive SVD based preprocessing stage being independent of the number of modes (seen in TABLE 1). In comparison, the  runtime of SDMD increases with model complexity, due to the larger iterative cost which includes the number of modes (also seen in TABLE 1). As such, both standard methods fail to complete model learning within a reasonable timeframe. In comparison to these, the runtime for TREK remains low regardless of the underlying model complexity, due to its functionality of extracting only task-relevant dynamic modes as part of the pre-processing stage, thereby ensuring the amount of useful modes remains small during computation.

B. FURNACE EXPERIMENT
The proposed TREK framework is applied to a real-plant industrial furnace video feed, with the aim of modelling and forecasting dynamic furnace state changes. Input data consists of a three-color RGB sequence of video frames, sampled at 30Hz with original size (1920, 1080, 3), resized to (40,75,3) prior to learning (i.e., P = 9000).

1) MODELLING -BURNOUT POINT DETECTION
Initially, the proposed approach is evaluated for modelling a dynamical event, the shifting of a burnout-point, occurring over 20 minutes (K = 36000 frames). A burnout-point is the primary combustion position in the chamber, and the general control aim is to maintain a steady consistent burn at this fixed position. An example of burnout-point shifting is shown in FIGURE 5, where it is seen that initially the fire burns uniformly on the platform. Over the next 10 minutes, inhomogeneity in waste combustion causes waste at the centre to fail to ignite resulting in waste splitting into two separate upper and lower combustion regions. Following this, at 20 minutes the lower of the two regions extinguishes due to lack of combustible material. This is an unstable state, and the operator will need to actuate the platform to drive reignition. The ability to model this event would be a useful operator tool to help in the control decision making process. Specifically, the operator is interested in characterising long-term dynamic activity greater than five-minutes, while being disinterested in short-term events. As such, the prior task-relevant information involved in this task is the frequency τ = [300 −1 Hz].
To investigate modelling this event, an experiment is performed to extract the long-term dynamics from the sequence. Specifically, the aim is to learn a dynamic model that has: (i) dynamic modes that are interpretable in the context of the plant, (ii) a fast computation time, greater than 30Hz.
Model performance is evaluated by examining three key criteria [11]: (i) the stability of the learnt eigenvalues, indicating the growth and oscillation of the learnt dynamics model, (ii) the coherency of the extracted spatial modes, i.e., the ability to qualitatively explain mechanics of the underlying furnace system, (iii) the mean squared error (MSE) between the reconstructed sequence and the original video, quantifying the prediction accuracy of the modelling.
Initially, the standard DMD approach is applied, by selecting a low-rank basis of spatial modes by (6) using the singular value thresholding approach by (7), resulting in N = 9000, and then computing an eigendecomposition ofÃ. The results for this are seen in FIGURE 6 (a-b), where it is shown that modelling is very unstable and of poor quality. Specifically, learnt eigenvalues in FIGURE 6 (a) lie far from the unit circle (along which stable eigenvalues should reside), and undergo extreme growth, decay, and oscillation, which is obviously not present in the original video sequence. This instability results in uninformative unstable dynamic modes (FIGURE 6 (b)) and the corresponding spatial models do not show any intuitive structures with which to interpret the sequence, instead highlighting hyper-localised regions of combustion. This is expected, as due to the complexity and uncertainty in the video, as discussed in §III-D, fitting linear approximating models to (potentially) non-linear data can result in uninformative models. Given this instability, no useful dynamics are extracted, resulting in an unbounded reconstruction error (seen in TABLE 4).
Additionally, given the data is full-rank (N = P), SDMD cannot be used to provide real-time updates, due to the reliance on a low-rank basis ( §III-B). Simply selecting fewer modes to decrease Streaming DMD runtime is meaningless, resulting again in unstable eigenvalues (FIGURE 6 (a)).
As such, standard DMD approaches fails to learn this complex video data, and reconstruction or modelling of this sequence will be inaccurate and unstable. Additionally, due to the size of P and K , model training takes approximately 74 minutes to compute, and as such is infeasible for practical application on this 20 minute sequence. It is clear that  standard DMD is not applicable to this scenario, and this approach suffers from limitations as discussed in §III-D.  To address these problems, the TREK approach as outlined in §IV is used to first extract spectral components based on task-relevant information τ , and then learn a low-rank Streaming DMD model. Initially, the task-relevant Fourier transform (10) is applied to this data sequence to extract all spectral components. Subsequently the inverse FFT (12) is performed, using the low-pass indicator function (13). As such, extracting only non-zero task-relevant spectral components results in N = 8 Fourier modes with frequencies less than τ .
Eigenvalues from the learnt dynamic model are shown in FIGURE 6 (a), where it is seen that these are stable (as they sit on the unit circle). In addition, the learnt dynamic modes are shown in FIGURE 6 (c) where it is seen that three key spatial regions that characterise this burnout event are identified: an upper, middle, and lower region of the fire. This corresponds to the spatial components that describe the burnout event as seen in FIGURE 5, i.e., a separation of a single region into three separate components. The corresponding learnt dynamics show that these three spatial regions are modelled together to characterise the split and decaying behaviour. As such, through examination of the outputs, we can characterise the event that occurred and analyse it in terms of spatial-temporal dynamics.

2) LONG-TERM FORECASTING -PATTERN DETECTION
Experiments in §V-B1 demonstrate the proposed method is suitable for real-time analysis of furnace events. However, a desirable property for dynamical modelling is the ability to forecast. To evaluate the suitability of the proposed approach, an experiment is outlined to detect patterns in longterm forecasts. Specifically, in the context of an industrial furnace, the aim is to detect a 10 minute periodic event, the automated grate control (AGC), which actuates platform rollers to sift waste, which as a side effect spews smoke and debris.
In this experiment, a video sequence of 80 minutes (K = 143, 999 frames) is split into two independent sequences, a training sequence covering the first 40 minutes, and a testing sequence covering the later 40 minutes. As in §V-B1, TREK is applied to identify frequency components less than five minutes, resulting in N = 20 dynamic modes, and decomposition is applied on the low-rank sequence.
The results are shown in FIGURE 7. Initially, it is seen in FIGURE 7 (a) that the energy of the video sequence is stable and oscillates with a fixed period. The energy of the TREK reconstruction also follows this trend, for both the training and forecasting period. As such, the framework has learnt a model that captures the general trend of the measurement sequence and forecasts oscillatory behaviour.
To understand what drives this oscillation behaviour, and determine if the learnt model has learnt a characteristic decomposition, the frequency-power graph of the learnt dynamics is plotted in FIGURE 7(b). In this, it is seen that eigenvalues characterises the sequences in terms of long-term oscillatory components (between 6 and 40 minutes oscillation periods). FIGURE 7(d) shows a visual output of one of these components, the learnt component with period ten minutes (denoted M 6). It is seen that spatially, this mode is characterised by a hazy smoke field region surrounding the main flame body (light red), and combined with the corresponding dynamics showing a 10 minute period, it is clear that this components corresponds to the underlying AGC process. As such, the TREK framework has extracted key spatio-temporal components of a long-term video sequence, and is able to forecast these dynamics for a long horizon. Importantly, the components remain intuitive in the context of the furnace system, and can be used to infer underlying physiochemical characteristics of the plant.
To examine the robustness of the approach, the above experiment is repeated on 18 independent train/test video instances, from a 13 hour video feed. The mean average error of the reconstruction for multiple forecast horizons is shown in FIGURE 7(c). In this, it is seen that TREK is accurate for both reconstructing the training data (while providing intuitive decompositions of the data), and short-term horizon tasks (up to one minute). However the long term forecasting accuracy remains challenging, due to the problems of inhomogeneity of inputs and non-stationary driven dynamics. This is a common limitation of spatio-temporal modelling, and as such, while the proposed approach is suitable for the real-time learning and forecasting, incorporating control and inputs into forecasting remains an open problem.

VI. DISCUSSION
In this paper, a framework is derived and evaluated for the learning of combustion dynamics of an industrial furnace from video-feed data, via a combination of dynamic mode decomposition for extracting intuitive dynamic modes, and Fourier analysis for incorporating task-relevant information. The findings outlined here show that by including this information, complexities and uncertainties in the data can be mitigated during dynamics modelling, resulting in computationally inexpensive, stable predictions. Even when observed combustion is comprised of multiple complex interacting dynamical processes, the use of task-relevant information eliminates undesired dynamical information, allowing for learning relevant long-term dynamical patterns. In forecasting tasks, the use of task-relevant information can result in longer, more accurate long-term predictions, which can be made quickly and at a reduced computational cost compared to traditional modelling methods. This long-term forecasting coupled with low-cost predictions, demonstrates that this approach for learning dynamics is suitable for real-world scenarios involving uncertain, high-dimensional dynamics.

VII. CONCLUSION AND FUTURE WORK
Generating video forecasts of in-furnace dynamics behaviour is key to supporting plant operator decision making. The approach presented in this paper allows for learning real-time complex dynamics, in a framework that captures intuitive spatio-temporal physical aspects. In the context of real-world dynamics modelling, this method in its current form would find utility in forecasting long-term, steady dynamical behaviour from video. Given the low-computational cost, there is potential for integrating these forecasts with existing real-time control tools, such as state identification or classification. Additionally, the ability of this method to remain agnostic to the complexity of the underlying dynamics has wider implications outside of the field of combustion monitoring from video. This framework may also find utility in other scenarios requiring real-time forecasting with dynamics uncertainty, e.g., real-time adaptation for robot control, specifically, highly non-linear system such as soft robotics or environmental stimuli and interaction. Not only would the ability to encode task-relevant information into dynamics modelling enhance applications, but this approach enables the analysis of non-linear dynamics systems previously thought to be too complex for standard spatio-temporal approaches.