Online Unsupervised Occupancy Anticipation System Applied to Residential Heat Load Management

Human preferences and lifestyles significantly impact buildings’ energy consumption. Consequently, a better understanding of occupants’ behavior is crucial to decrease energy consumption and maintain occupants’ comfort. Occupant-centric control (OCC) strategies are effective approaches to fulfil such a purpose. As such, occupancy detection and prediction are of prime importance, particularly to manage Electric Space Heating (ESH) systems, due to the relatively slow dynamics of the temperature in dwellings. This paper proposes an Explicit Duration Hidden Markov Model (EDHMM) for unsupervised online presence detection and a hazard-based approach for occupancy prediction. Moreover, a control strategy using a cost function, weighted by occupancy predictions, and a load-shifting strategy based on time-varying electricity price are put forward. This work initially validates the consistency of the proposed approach by using synthetic data generated by a Monte Carlo simulation. Subsequently, the performance of our framework is compared with previous methods presented in the literature through experimental validation. Results demonstrate that the proposed EDHMM approach is efficient in detecting occupancy states. Besides, the results of the field implementation show the potential of the proposed control strategy to preserve occupants’ thermal comfort while decreasing the heating energy consumption.

Due to the increasing energy consumption of the residential sector, energy efficiency, and grid reliability have become significant concerns for the Distribution System Operators (DSOs), especially in Nordic countries like Canada. In Quebec, where this study is conducted, ESH systems account for about 62% of the yearly household energy consumption [1]. These systems can significantly increase power demand during peak load hours as a great number of customers simultaneously heat their homes [2]. Over the last decades, many electricity suppliers have promoted Demand-Side Management (DSM) strategies based on price [3] and incentive-penalty [4] to exploit energy demand flexibility [5]. These conditions alongside the growing availability of smart appliances have led to the rapid development of Home Energy Management Systems (HEMSs) [6]. Energy consumption in residential buildings is influenced by building characteristics, climate, indoor environment quality, and occupants' behavior [7]. Among these factors, human behavior has been revealed as a major issue towards ensuring the effectiveness of the deployed residential energy management strategies [8]. Occupants' behavior refers to the interaction between occupants and buildings in order to preserve a healthy indoor environment and acquire the desired comfort and security [9], [10]. Occupancy is an essential aspect of the occupants' behavior concept, which deals with tracking the presence/absence of people in a building space [11]. Several occupancy detection methods have been integrated with control strategies to mainly manage lighting and HVAC systems. These OCC strategies focused on the presence/absence of occupants (occupancy-based controls) have demonstrated up to 30% of energy savings [12]. However, instantaneous occupancy detection is not adequate for ESH systems as they require some time to affect the temperature of a room. Moreover, occupancy is a stochastic process complex to be anticipated and difficult to be directly measured. Therefore, the combination of indoor temperature slow dynamic and the stochastic nature of the occupants' presence is revealed as a challenging problem to be solved.

A. LITERATURE STUDY
Several techniques have been proposed for occupancy detection systems. Rueda et al. [13] categorized them into analytical, data-driven, and knowledge-based methods. Among them, data-driven methods, such as Artificial Neural Networks (ANN) [14], Support Vector Machines (SVM) [15], K-nearest neighbors (KNN) [16], and Hidden Markov Models (HMM) [17], are considered as the most utilized ones for occupancy detection [13]. However, supervised learning algorithms, like ANN, SVM, and KNN, need a training data set comprising ground truth occupancy evidence which limits their application in practice. As an alternative approach, some studies [17]- [23] have explored the use of unsupervised learning methods for occupancy detection in which HMM has been one of the most popular ones, demonstrating an accuracy between 80% and 90% [13]. Most of the used HMMs are nonetheless first order which means that they are memoryless stochastic processes and their inherent state duration follows a geometrical distribution that is inadequate to model realworld occupants' behavior [12]. Moreover, the use of timevariant transition probabilities on HMM has been mostly neglected, restricting the temporal dependency of occupancy dynamics representation [24].
Concerning the occupancy prediction methods, Kleiminger et al. [25] grouped them into schedule-based and context-aware approaches. Schedule-based algorithms predict occupancy using historical data on the household occupancy state. Kleiminger et al. [25] and Li and Dong [26] present a comparative performance analysis of the stateof-the art of these approaches, stressing that predictions of up to 90% accuracy can be achieved. Moreover, algorithms, such as ANN [27], KNN [28], k-means, expectationmaximization [23], and Markov models [26], [29] have been explored as alternative to predict occupancy at horizons from 15 minutes up to 24 hours ahead. Context-aware approaches estimate future occupancy states of a home according to its occupants' current context (e.g., location or activity). For instance, Krumm and Brush [30] and Gupta et al. [31] use information from GPS to estimate when a person would be at home or away and dynamically control the thermostat of the rooms. Although context-aware approaches are promoted as one of the best-performing techniques for occupancy prediction, they depend on the permanent operation of specific devices by individuals (e.g., smartphone and GPS). As a result, these methods are more intrusive regarding individuals' lifestyles, and are more sensitive to human errors, such as forgetting the phone at home. Therefore, in this work, schedule-based algorithms are preferred.
Several occupancy detection and prediction methods have been integrated with control strategies, such as model predictive control (MPC), to manage HVAC systems [32]. For instance, in [29], Dobbs et al. present an occupancypredicting control algorithm for HVAC systems of a building and compare their approach with a scheduled controller and an occupancy-triggered controller. It is shown through simulation that the proposed approach can achieve up to 19% of energy savings compared to the scheduled controller. Furthermore, other approaches have reported energy savings between 6% to 17% [25] by combining occupancy information and MPC strategies. However, most of the occupancybased controls (OBCs) field implementations have been done in academic and office buildings. In fact, only 10% of OCC studies have been performed in residential buildings [33]. Another worth noting aspect is that the main objective of OBC strategies is to reduce energy consumption while few studies have integrated load-shifting demand management strategies to improve user's flexibility [32]. Therefore, more efforts to integrate OBC strategies and time-of-use (ToU) pricing programs are still required.

B. CONTRIBUTION AND ORGANIZATION
In light of the discussed matters, this paper proposes an online unsupervised technique for automatic occupancy detection and prediction in residential buildings. This study falls within the overall objective to optimize residential energy consumption by intelligently managing ESH system (baseboard heaters) concerning the presence of the occupants, preserving their thermal comfort. Considering this, the main contributions of this work are summarized as follows: • Firstly, an Explicit Duration Hidden Markov Model (EDHMM) is developed for unsupervised online occupancy detection.
• Secondly, a hazard-based model is proposed to predict the occupancy probability of each zone of the residence with a 24-hour horizon.
• Finally, a finite-horizon adaptive optimization approach based on a cost function, weighted by occupancy predictions, and a load-shifting strategy for TOU price is formulated.
Unlike the other existing papers in the literature, this work considers the limitations of Markov models for occupancy detection to consistently capture presence/absence duration and realistically predict individuals' presence. Moreover, this study utilizes an adaptive learning strategy based on a sliding window of fixed size (last N days) to daily retrain the system. It should be noted that only a few studies have integrated OBC techniques with load-shifting strategies to manage ESH systems in dwellings. The proposed approach minimizes human intervention during the system learning process and provides an online occupancy detection and prediction by thermal zone. Therefore, the HEMS is empowered to detect and anticipate the presence/absence of occupants, allowing it to automatically adjust the thermal preferences of each room, maintaining a balance between comfort and energy cost. An experimental proof of concept of the proposed framework has also been carried out. In this regard, the proposed method has been thoroughly tested for three months during Winter in an apartment located in the province of Quebec in Canada.
The rest of this paper is organized as follows. Section II presents the combination of an EDHMM formulation and a Hazard-based model to detect/predict occupancy. Section III illustrates the use of occupancy predictions, an Interpretable Machine Learning model, and a Finite-Horizon controller to improve the performance of the HEMS in residential environments. Section IV describes the design of the case study for the validation purposes. Section V presents the obtained results through simulation and experiments. Finally, the conclusions of the work are drawn in Section VI.

II. OCCUPANCY DETECTION AND PREDICTION STRATEGY
Environmental variables inside a closed space and power consumption are highly correlated with humans' presence [10]. In this sense, due to the difficulty in directly measuring occupancy, Hidden Markov Models (HMMs), are stressed as a suitable method for modeling household/room occupancy through unsupervised learning [20]. HMMs can effectively model the temporal structure of data. Moreover, they allow the use of a priori knowledge and supports the explicit integration of duration as well as the time dependency of states. The latter is indispensable for a realistic representation of the occupancy profiles [24], [34].
Let v = v 1 , v 2 , . . . , v K be a sequence of measurements from the sensors and let z = z 1 , z 2 , . . . , z K be a sequence of latent variable taking values from two possible occupancy states s = {0, 1}, where K is the length of the sequence. In a HMM, two assumptions are made. First, the probability of a particular state only depends on the previous state, P(z k+1 |z 1 , . . . , z k ) = P(z k+1 |z k ), where k is the discrete-time index. Second, the probability of an observation, only depends on the state that produced the observation, Nonetheless, literature has revealed that traditional firstorder Markov models are not able to consistently predict state duration and thus, their generated occupancy profiles often fail to reflect a realistic behavior [35], [36]. In this regard, this work puts forward an Explicit Duration Hidden Markov Model to effectively handle this issue. In an EDHMM, a hypothesis of medium-term independence is introduced. Such hypothesis implies that the probability of changing a state depends not only on the current state, but also on its duration, as shown in (1).
where d k is the duration of the current state. It is considered as a deterministic incremental counter restarted when a transition is made, expressed by, Furthermore, as presented in [12], the transition probability between the states can be formulated as, where h s (d ) is the discrete hazard function of state s, given by equation (4), which can be estimated either using a parametric or a non-parametric approach.
where f s (d ) and S s (d ) are the state duration probability and the survival function, associated with state s, respectively. Besides, due to the time-dependent dynamics of occupancy [24], the use of a time-variant transition probability A is proposed. Therefore, the hazard function is computed in hourly intervals t ∈ [0, 1, · · · , 23]. This results in a hazard function with the form h t s (d ). The transition probability matrix A for the two-state EDHMM can be represented by, where h t 0 (d ) and h t 1 (d ) are the discrete hazard functions for the absence (s = 0) and presence (s = 1) states, respectively.
Following the above-presented mathematical formulation, this work proposes an occupancy detection and prediction approach through a three-step process. First, Section II-A explains the start-up process that solves a cold-start problem to collect the necessary information to initialize the EDHMM learning. Second, Section II-B describes an online detection procedure to detect occupancy at five-minute intervals. Third, Section II-C presents the utilized approach to predict the occupancy probability with a 24-hour horizon.

A. START-UP PROCESS
Since the hidden states are unknown, measured data are used to estimate the transition matrix and emission distributions. For this purpose, the start-up process depicted in FIGURE 1 is proposed.

1) DATA PRE-PROCESSING
Measured signals can contain noise from different sources (e.g., the sensor itself, EMI, and acquisition/communication errors). Therefore, a denoising process based on a kernel smoother (KS) and a decomposition process using the Wavelet transform (WT) are proposed in this work. Kernel smoother is a statistical technique widely used to reduce signals noise [37]. In this work, a Gaussian kernel is used in order to give less weight to the more distant neighbours. Wavelet transform has been widely employed for image denoising, edge, and transient detection [34], [38]. Accordingly, WT is used to decompose the denoised signal into low and highfrequency components, represented by the approximation and the details of the measured variables. Principal Component Analysis (PCA) is also performed to ensure a new set of decorrelated variables [39].

2) CLUSTERING AND PARAMETER LEARNING
Due to the lack of prior knowledge and hidden states information, a Gaussian Mixture Model (GMM) is utilized to estimate a preliminary occupancy profile from the set of observations. GMM is an unsupervised probabilistic model that assumes that all the data points are generated from a mixture of a finite number of Gaussian distributions N (v k | µ i , σ i ), expressed by, denotes the set of parameters, µ i and σ i are the mean and variance of the i th Gaussian distribution, G is the number of Gaussian distributions, and φ is the mixture component weight, which must satisfy the condition G i=1 φ i = 1. The probability that an observation belongs to a given distribution is estimated from (7) through, where p i is the probability vector in which each element indicates the probability that an individual belongs to a cluster and G is the number of clusters. Before applying the GMM, the number of clusters G should be fixed. In this respect, approach based on minimizing the Root-Mean-Square Error (RMSE) between the model estimation and a prior average occupancy profile is employed. Due to the close correlation between the motion information captured by the PIR sensors and occupancy, these data are used as a reference to estimate an initial average profile for each zone. In accordance with the above-explained case, this work assumes that the conditional probability of PIR events given the time of day converges to the likelihood of occupancy at that instant over a long analysis period. Therefore, the average daily motion profile detected by the PIR sensors converges to the actual daily occupancy profile of the dwelling/room. Furthermore, this work presumes a stationary hypothesis which leads to discarding the effects of seasons, vacations, and other long-term behavioral changes. Therefore, as depicted in FIGURE 2, a threshold on the average daily motion for each zone is used to automatically decide on the reference average profile that will be used as the prior information to determine the value of G. Once the prior information for each zone is determined, the performance of the GMM is evaluated for a maximum number of 12 clusters, and the value of G that minimizes the RMSE is selected to estimate the preliminary occupancy profile,z.
After estimatingz, this profile is used to fit the emission distributions and compute the survival functions. To do so, maximum likelihood estimation algorithm and the Kaplan-Meier estimator are utilized. In the case of emission distributions, three distributions (Normal, Weibull and LogLogistic) are used to fit the observations, v, corresponding to each occupancy state, Subsequently, the distribution with the best fit Mm is calculated using the MLE algorithm, as follows, where L is the log-likelihood operator, and M m represents each of the distributions with their parameters. FIGURE 3 shows an example of the estimated emission distribution for the first principal component of living room observation. For the case of absent state, the three tested distributions are presented to illustrate the suitability of using MLE algorithm to achieve a good distribution fit to the data. On the other hand, the non-parametric Kaplan-Meier estimator (KME) is used to compute the survival functions. Unlike the parametric models, KME, expressed by (9), does not restrict the survival function shape to a particular distribution. Therefore, KME assists with a better representation of duration distributions with multimodal shape.
where e i is the number of events (arrive/depart) that happened, d is the duration, and n i is the number of presence/absence periods that are still active within d .

3) STATE SEQUENCE ESTIMATION
Once the learning procedure of the parameters is performed, the most probable occupancy sequence, z, must be estimated considering the observation sequence, v,. For this purpose, a dynamic programming approach is used that follows the classical procedure of the Viterbi algorithm while explicitly including the state duration to compute the transition probabilities. The proposed decoding algorithm is divided into two stages: Step: To estimate the most probable sequence of hidden states at each time-step, the next hypothesis is followed: if the state s = {0, 1} is part of the optimal sequence, what would be its best previous state? In this regard, if ψ q,k−1 represents the highest probability along a single path for the first k − 1 observations, which ends at state q ∈ s, and the transition probabilities from state q to s are given by the matrix A q,s (equation (5)), then, the best previous state q for the current state s is estimated by solving the following optimization problem, 1] ln A q,s + ψ q,k−1 (10) where the initial values ψ s,1 for all the states s are estimated by using the initial probability distribution π s and the likelihood of the first observation v 1 of each state, as follows, Therefore, knowing the best previous stateq = Q s,k , the probability of the sequence for each state is updated by including the probability of the observation at timestep k, which can be expressed as, Additionally, in order to estimate transition probabilities, at each time-step k, duration needs to be computed. For this purpose, an incremental counter d is used. This counter is restarted (d s,k = 1) when a transition between the best previous stateq and the current state s is made, based on, where δ q, s is the Kronecker delta, defined by, • Backtrack the Most Probable Path: Once the recursive step is completed, the last hidden state is calculated by maximum likelihood, based on, Then, the most likely sequence is selected by taking the most probable previous state, expressed as, Algorithm 1 summarizes the decoding procedure to estimate the most probable sequence of states for a finite state machine (FSM) with two-states.

4) PARAMETERS UPDATE
Once the new occupancy sequence z is estimated, it is used to update the emission and duration distributions by applying the same procedure as presented in Equation 8.

B. ON-LINE PROCESS
After the start-up procedure, as shown in FIGURE 4, two processes are executed to estimate the real-time occupancy state and to perform the model parameters adaptation over time.

Algorithm 1: Occupancy Detection
Input: -Observations sequence: v 1 , v 2 , · · · , v K -Emission distribution parameters Θ s -Duration distributions Output: Hidden states sequence: Define initial probability distribution (π s ) 4 Define matrices ψ, Q and d of dimension |s|×K Maximum a posteriori (MAP) estimation is computed at 5-minutes intervals to determine at instant k the most probable hidden state z k of each of the house zones, expressed as, whereq = z k−1 is the estimated occupancy state at the previous instant (5 minutes ago), and v k is the observations at instant k.

2) PARAMETERS UPDATE
HEMS and occupancy detection systems operate in timevarying environments. Therefore, adaptive capabilities are essential for them. As shown in FIGURE 4, every day at midnight, the system performs an update of the model parameters. To do so, a sliding window with a fixed size of N days is utilized, in which the most recent information is included in the model and the oldest is discarded. It should be noted that studying the appropriate size of the learning window and when new data could be relevant for retraining the model is outside of the scope of this work [40].

C. OCCUPANCY PREDICTION
This step aims to predict the presence probability of each zone of the dwelling at a 24-hour horizon, N . This predictive model is an essential step for the control strategy to determine optimal control actions considering future observation forecasts. Accordingly, a hazard-based model and a Monte-Carlo simulation are utilized. Literature study shows that hazard-based models are a suitable generative approach for modeling occupancy profiles and occupants' behavior [12]. In fact, this method has proved its potential for applications, such as in-home and out-of-home activities generation [41], and windows opening and closing behavior modeling [42]. The prediction technique takes advantage of the duration distributions estimated during the learning process of the EDHMM. Likewise, the matrix A (equation (5)) is used to determine the transition probability of the states during the prediction horizon. Algorithm 2 describes the process used for the prediction of the zonal occupancy probability β. In this procedure, the transition signal, τ k ∈ {0, 1} is sampled in every time-step using a Bernoulli distribution with α = h t î z i−1 (d) (see line 7 of the Algorithm 2). According to this transition signal, the algorithm updates the probable occupancy state,ẑ i by using an XOR function, and restarts the duration counter,d i , which determines the elapsed time in each state. The prediction technique takes as initial state the current occupancy status, estimated by the EDHMM, and computes the current duration according to the information of the last detected transition (arrival/departure). The process is repeated M times, and the average of the generated profiles  is used as the presence probability during the optimization process.

III. FINITE-HORIZON ADAPTIVE CONTROLLER
This work proposes a control strategy to adjust the thermostat set-points of each thermal zone in the residence. By adjusting the thermostat set-points, the control strategy pre-heats the thermal mass of the residence before peak hours, which implicitly modifies the energy consumption patterns. Moreover, it utilizes the thermal preferences of the users and the occupancy predictions to improve the overall performance. The control strategy has two steps of learning and optimizing the ESH management. The first step uses an Interpretable Machine Learning (IML) model to learn the thermal dynamics of the residence using historical data. The second step employs the results of the IML model and the occupancy predictions to formulate an optimization problem that minimizes the costs and guarantees the occupants' comfort within a finite horizon. Section III-A explains the learning procedure of the thermal dynamics, and Section III-B presents the optimization formulation of the proposed controller. Another worth noting feature of the proposed control strategy is the deployment of an event-based adaptation approach which allows the re-estimation of control signals when changes in occupancy states are detected. Section III-C describes the event-based adaptation method.

A. LEARNING OF THE RESIDENCE THERMAL DYNAMICS
In this work, a linear model based on an equivalent Resistance-Capacitance network (2R1C) is used to estimate the thermal dynamics of the house in each thermal zone [43]. Equation (18) presents the energy balance between the heat flux imposed into the environment by the space heaters and the thermal losses of the rooms [2]. (18) where T in is the internal temperature, T ext is the outside temperature, and T avg is the average temperature of the surrounding zones, which allows to account for internal heat exchange between the house rooms. in is the rated electrical power consumption of the heating system (applied thermal flux), and db are the thermal gains and losses of the room (e.g., solar gains and air infiltration). The latter have not been considered in this study. Additionally, C in , R ext and R wl represent the internal thermal mass, the thermal resistance that isolates the building, and the thermal resistance of internal walls (thermal zones divisions), respectively. Therefore, the mathematical expression of the state-space model for a residence with two thermal zones (η 1 and η 2 ) can be expressed as, where A ∈ R 2×1 , B ∈ R 2×1 and C ∈ R 1×1 are the parameter matrices that the IML model finds by minimizing the sum-of-squares loss function between Equation (19) and the actual measured output Y over a set of historical data.

B. OPTIMIZATION
To maximize the individual welfare of the customers, a bi-objective cost function is introduced in this study to perform the optimization process. The first term of the cost function guarantees the occupants' thermal comfort, and the second one minimizes the customer' payments for any type of dynamic tariff. Occupants' thermal comfort is expressed by the difference between the predicted temperaturex η a i (obtained by Equation (19)), and the reference temperature x η a i (preset by the user). Moreover, the occupancy predictions, β η a i are used as weights for the comfort term. This allows the control strategy to guarantee the thermal comfort of the customers when they are actually present in a specific room of the residence. Consequently, a seamlessly integration of the occupancy predictions into the control strategy, as shown in Equation (20), is achieved using this formulation.
In Equation (20), i represents the dynamic tariff and u η a i is the predicted power consumption of the ESH. Thereby, considering the cost function and system constraints, Equation (21) shows the complete formulation of the optimization problem.
| u i | 0.25u max x 0 = Initial state u 0 = Initial state (21) where x min and x max are the possible allowed temperature interval in the rooms of the house (user-defined), u max is the maximum output power of the ESHs at each thermal zone, and | u i | denotes the changes in heating power which is restricted to 25% of u max with the aim of avoiding discomfort to users. Equation (21) follows the rules of Disciplined Convex Programming approach that ensures the convexity of the formulation. It guarantees that if a solution is found, it will be unique and globally optimal [44].

C. EVENT-BASED ADAPTATION
An event-based adaptation strategy is developed in this paper to reduce the risk of discomfort to users when unexpected events occur. This strategy utilizes the changes of the occupancy state detected by the EDHMM as a trigger to reestimate control actions. FIGURE 5 shows an example of how the adaptation process works. The system plans a series of control actions based on the energy price and the predicted occupancy probability for the control horizon. Accordingly, the system seeks to provide a temperature close to the predefined setpoint when the occupancy probability indicates a high likelihood of presence in the house. However, there may be cases in which occupants arrive to/leave the residence earlier than expected. These can cause discomfort since the temperature will be below the predicted value. Hence, to reduce the users' discomfort and energy waste, this work takes advantage of the information provided by the online occupancy detection system. Accordingly, once an arrival/departure is detected, the control system automatically updates its plan.

IV. DESCRIPTION OF THE CASE STUDY
The EDHMM, the IML model, and the finite-horizon adaptive controller are data-driven models. For this reason, the proposed framework uses a network of sensors to monitor the environmental variables and the energy consumption of appliances. From the collected data, the proposed occupancy detection and prediction approach are trained and validated. Moreover, the gathered information is used to tune the thermal model of the house employed on the optimization problem. Hereinafter, Section IV-A presents the design of the experimental setup, and Section IV-B describes the performance metrics of the study.

A. EXPERIMENTAL SETUP
Occupancy detection systems are mainly based on the deployment of environmental sensors (e.g., carbon dioxide (CO 2 ), temperature, humidity, and light sensors), specialized devices (e.g., PIR sensors, smart meters, and cameras), and other technologies such as WiFi and Bluetooth [13], [45]. Literature reveals that each of these sensors has unique advantages and limitations for occupancy detection applications [16], [46], [47]. Consequently, sensor fusion has been promoted as an efficient way to leverage the strengths of different sensors to improve system performance. In this work, information on CO 2 concentration, movement, relative humidity, temperature, heating, and lighting consumption are collected. For this purpose, the sensors network described in FIGURE 6 has been deployed in an apartment located in the province of Quebec, Canada. TABLE 1 presents the list of the sensors used and a brief description of their specifications. Furthermore, FIGURE 7 shows a floor plan of the apartment with the location of the sensors. The deployed sensors network leverages the WiFi communication infrastructure already available in place. Message Queuing Telemetry Transport (MQTT) and Hypertext  Transfer Protocol (HTTP) are the main communication protocols used for data exchange between the devices and weather data acquisition. Furthermore, Telegraf, Influxdb, and Chronograf 1 are used for collecting the information transmitted through the MQTT broker, time-series data storage, and as a graphical interface to monitoring the system, respectively. A Python module utilizing the BACnet IP 2 protocol is also used to get information from the environmental variable sensors. Occupancy app and HEMS app modules, presented in FIGURE 6, correspond to the occupancy detection/prediction model and the controller discussed in Sections II and III, respectively. To validate the performance of the proposed methods, the occupants of the apartment were asked to fill out a form with the actual occupancy. FIGURE 8 illustrates a portion of the ground truth occupancy data for the two zones of the apartment.
It should be noted that the proposed experimental setup is based entirely on open source development tools. Besides, a container-based infrastructure has been adopted to deploy the framework on the embedded system. A Raspberry Pi 3 B+ along with the open platform Docker [50] are used in this regard.

B. PERFORMANCE METRICS
In order to evaluate and compare the performance of the proposed system, the following metrics have been used.
• Accuracy: This metric is used to quantify the proportion of correct estimations performed by the model. The accuracy can be expressed as, where,ẑ k and z k are the estimated and actual occupancy states sequences, respectively. Moreover, [ẑ k = z k ] is a Iverson bracket, defined by, • Precision and Recall: Precision, also called positive predictive value (PPV), is the fraction of retrieved positive instances among all retrieved instances. Recall, also known as true positive rate (TPR), is the fraction of positive instances retrieved from all the positive cases. These metrics can be expressed as, where TP, FP, and FN are the true positives, the false positives, and the false negatives, respectively.
• F1-Score: This metric is the harmonic mean of precision and recall, and is approximately the average of the two when they are close. The F1-score can be expressed as, • MCC: The Matthews correlation coefficient (MCC) is a balanced measure that can be used even if the classes have very different sizes. The MCC returns a value between −1 and +1, where +1 represents a perfect prediction, 0 no better than a random prediction, and −1 indicates total disagreement between prediction and observation. This indicator is defined by,

V. SIMULATION AND EXPERIMENTAL VALIDATION
In this section, numerical examples are provided to illustrate the performance of the proposed framework. In this regard, the collected data over three months of cold weather during 2020-2021 are utilized. Initially, the consistency of the EDHMM under variations in the input data is evaluated through simulation in Section V-A. Afterwards, in Section V-B, the effectiveness of the proposed occupancy detection approach and the control strategy is investigated using an experimental case study.

A. SIMULATION ANALYSIS
This work uses the Monte Carlo simulation method to generate 200 synthetic lighting consumption and environmental variable profiles in order to evaluate the consistency of the EDHMM. Thus, the in situ data collected by the sensor network is modeled, assuming they follow a normal distribution. The simulation analysis is carried out for a period of two months of systematic data. Once the data is generated, the EDHMM is used to estimate the occupants' presence. Then, for each of the simulated scenarios, the model performance is evaluated using the metrics described in Section IV-B. In order to validate the performance of the proposed EDHMM, it is compared with two other generative methods in the literature. The first one is an homogeneous first-order HMM, presented by Candanedo et al. [17], and the second one is an in-homogeneous first-order HMM with multinomial logistic regression (iHMM-MLR), introduced by Chen et al. [24]. To perform the validation analysis, this work utilizes the data collected over a period in which windows and internal doors were usually closed. Furthermore, in this case study, the system uses a time window of N = 21 days for the learning process, and the occupancy of each zone is detected at 5-minute intervals. TABLE 3 shows the results of the three approaches under five evaluation criteria. These results illustrate that among the baseline approaches, the iHMM-MLR has a better estimation of occupancy states. It can be attributed to the in-homogeneous character of its transition matrix which captures the occupancy time-variant dependency. Moreover, it is seen that the proposed EDHHM outperforms the state-ofthe-art methods under the five criteria, achieving an accuracy up to 95.7%. In fact, this improvement highlights the potential of explicitly integrating the state duration and temporal variability in HMM for applications, such as occupancy state detection. It should be noted that the number of the states of the proposed model can exceed two. Accordingly, the EDHMM can be exploited in other applications, such as activity detection or the estimation of the number of people in residential and commercial buildings. FIGURE 9 shows the EDHMM daily average accuracy for about two months of data. These findings reveal that for 80% of the testing days, the model accuracy is higher than 88.5%. Furthermore, over the analysis period, some outlier days have been identified in which a decrease in model performance can be observed. On these atypical days, the EDHMM accuracy has been between 75% and 80%. Such findings are expected since the model has little or no information about these anomalous patterns in the learning data, making it difficult to anticipate these sporadic changes in behavior.

2) CONTROL STRATEGY
As presented in Section III, an occupancy-based control strategy is proposed to control the ESH systems. In fact, for the considered case study, two electric baseboard heaters with nominal power of 1.5kW and 1kW are controlled for zone 1 and 2, respectively. Moreover, to deploy the proposed strategy, this work uses the Python-embedded modeling language for convex optimization (CVXPY) [51] combined with the Splitting Conic Solver (SCS). The latter is selected due to its suitability to solve very large convex cone programs [52]. Furthermore, this work defines a fixed optimization horizon of 24 hours with a discrete time-step of 15 minutes.
Occupancy prediction is an nontrivial part of the control strategy. FIGURE 10 shows an example of the results obtained by the proposed hazard-based approach and the Monte-Carlo procedure while using 500 iterations to predict occupancy. In that figure, the actual occupancy value for zone 2, the predicted occupancy probability (β) for a 24-hour horizon, and the uncertainty of the prediction (µ ± σ ) are illustrated. These results show that occupancy probability is a time series that can take values between 0 and 1. However, due to the binary nature of occupancy, a threshold to transform occupancy probability values into a binary signal is often used in the literature [23]. Nevertheless, selecting a suitable threshold value to minimize the prediction error is necessary. An analysis to evaluate the impact of the threshold value on the prediction error is performed. As shown in TABLE 4 and FIGURE 11, the prediction accuracy is considerably affected by the selected threshold value. Furthermore, it is observed that the RMSE of the predicted presence probability is similar or even lower than the one obtained by the optimal threshold value. Consequently, this work uses β to put more or less pressure on the comfort objective of the cost function.
Furthermore, to validate the performance of the proposed control strategy, it is compared with an always-on strategy, which seeks to maintain the reference setpoint at all times, regardless of whether the room is occupied or not. Besides,   it should be noted that a deterministic control is performed for this work. However, the uncertainty of the occupancy prediction and weather variables are not integrated into this study.
To compare the two control strategies, each of them is applied for several days in the testing apartment. In order to make a fair performance comparison, a similar day strategy is used. Since the focus of this study is thermal comfort and the heating system energy consumption, five days on which the outside temperature maintains a similar daily average (in this case −1 • C) are selected for each of the control strategies. Likewise, for each of the analysis periods, weekdays and weekends are included. TABLE 5 presents the obtained results by the two control strategies. As expected, the minimum discomfort is obtained in the always-on strategy. Although the discomfort MAE and deviation are higher in the proposed OBC strategy, as presented in FIGURE 12, the internal temperature difference is less than ±0.85 • C at 95.5% of the time. This result confirms that the proposed approach can guarantee people's comfort.  From TABLE 5, the proposed OBC strategy can reduce the consumed energy for heating by 31.8%. FIGURE 13(a) to FIGURE 13(d) depict that these savings are achieved because the system reduces the room temperature during periods of absence. Moreover, as shown in FIGURE 13(e) and FIGURE 13(f), some of the energy consumption is shifted away from the periods when energy has the highest price. Therefore, the preheating of rooms is performed when it is estimated that a period of presence may begin shortly or if a period of peak cost starts soon. Likewise, the setpoint is reduced when it is predicted that the occupancy will end in a short time. In this way, energy reduction is achieved while the occupant's comfort is ensured.

VI. CONCLUSION
This paper presents an unsupervised online system for automatic occupancy detection and prediction applied to a residential space heating control scheme. To do so, an EDHMM is developed for unsupervised online presence detection, and a hazard-based approach is devised for occupancy prediction. Furthermore, a control strategy based on a weighted cost function and a load-shifting strategy based on dynamic pricing are suggested to enhance the energy and cost economies. Compared to other similar methods, the strengths of the proposed EDHMM lie in its unsupervised learning, the integration of occupancy state duration, and its real-time capability, which are of prime importance for accurate occupancy modeling and its application in real-life scenarios. The deployment of the proposed occupancy-based control strategy has resulted in 35% of energy savings and 31.8% of cost reduction in the analyzed case study. Furthermore, the experimental tests demonstrate that the system has ensured the thermal comfort of the occupants over 95.5% of the time. It is worth noting that the experimental setup has been developed using low-cost embedded systems, open source development tools, and common communication standards for IoT implementations to be conveniently replicable and scalable. The obtained results through simulation and experiments indicate that the features of the proposed occupancy-based control strategy regarding the anticipation of the occupants' presence and use of a dynamic energy price are highly effective in proactively managing ESH systems and minimizing the costs as well as users' discomfort. Moreover, up to three months of simulation and experimental results confirm the model consistency.
In future, it is expected to expand the analysis performed in this article to a scenario with multiple residences, for example, a neighborhood. Therefore, it will be essential to study the design of tariff schemes and identify the number of customers needed to change the load profile in the distribution network without creating new peaks.  SIMON SANSREGRET received the B.A.Sc. and M.A.Sc. degrees in mechanical engineering from the University of Sherbrooke, with a focus on energy. He has been a Researcher with the Laboratoire des technologies de l'énergie (LTE), Hydro-Québec Research Institute, since 2001. He is currently a member of the Ordre des ingénieurs du Québec. His expertise is related to energy efficiency and demand respond in building sector. In recent years, he has been devoted to the development of simulation tools in order to improve the energy efficiency of commercial and institutional buildings. He was responsible for the development of simulation software called SIMEB, an interface to EnergyPlus Simulation Engine. He has published several scientific articles in connection with the building energy simulation, model calibration, and visualization of building performance data. He contributed to various projects related to energy consumption and demand response in the residential sector. He was on the Board of Directors of the Canadian Chapter of International Building Performance Association (IBPSA-Canada), from 2010 to 2016.
MICHAËL FOURNIER received the B.S. degree in physical engineering from the Ecole Polytechnique de Montréal, Canada, in 2000, and the master's degree in electrical engineering from the Université du Québec a Trois-Rivières, Trois-Rivières, Canada, in 2003. He has been with Greenlight Power Technologies (now Greenlight Innovation) and the Hydrogen Research Institute, where he worked as a Professional Researcher in fuel cell modeling and integration. Since 2006, he has been a Researcher at the Laboratoire des technologies de l'énergie, Institut de recherche d'Hydro-Québec, Shawinigan, Canada. His current research interests include demand response, power load profile analysis, sustainable energy vectors, energy efficiency, and management of residential loads. He is a member of the l'Ordre des ingénieurs du Québec.