Enabling Preventive Conservation of Historic Buildings Through Cloud-Based Digital Twins: A Case Study in the City Theatre, Norrköping

Historic buildings require good maintenance to sustain their function and preserve embodied heritage values. Previous studies have demonstrated the benefits of digitalization techniques in improving maintenance and managing threats to historic buildings. However, there still lacks a solution that can consistently organize data collected from historic buildings to reveal operating conditions of historic buildings in real-time and to facilitate various data analytics and simulations. This study aims to provide such a solution to help achieve preventive conservation. The proposed solution integrates Internet of Things and ontology to create digital twins of historic buildings. Internet of Things enables revealing the latest status of historic buildings, while ontology provides a consistent data schema for representing historic buildings. This study also gives a reference implementation by using public cloud services and open-source software libraries, which make it easier to be reused in other historic buildings. To verify the feasibility of the solution, we conducted a case study in the City Theatre, Norrköping, Sweden. The obtained results demonstrate the advantages of digital twins in providing maintenance knowledge and identifying potential risks caused by fluctuations of relative humidity.

Several digitalization techniques have been used to monitor and model historic buildings. Considerable research was devoted to 3D geometry modeling of historic buildings [11]. Many techniques, such as 3D computer graphics, photogrammetry, and laser scanning, are used for modeling, managing, and preserving architectural heritage [7]. However, information stored in such systems is usually static and cannot reflect the real-time operating status of historic buildings. Another bulk of studies deployed wired or wireless sensor networks (WSNs) to monitor the status of historic buildings, such as indoor environment [12], [13] and structural health [14]. Nevertheless, such monitoring studies primarily focused on data collection, transmission, and storage, with little emphasis on data analytics and converting analysis results into applications that benefit the conservation of historic buildings. Also, preserving historic buildings involve many specialists with different backgrounds, and the solutions proposed in these studies do not effectively support knowledge sharing and collaboration. To discover insights and facilitate preservation, some studies [15], [16] have attempted to create digital twins (DTs) of historic buildings. However, the solutions provided in these studies are difficult to transfer to other historic buildings due to the lack of a consistent, reusable, and extendable data format to represent buildings. Lacking a consistent data representation also restricts interoperability between subsystems of a building and limits the implementation of applications [17].
In this study, we aim to promote the preventive conservation of historic buildings through digitalization techniques, with a focus on mitigating the challenges in creating DTs of historic buildings. The main contributions of this work are as follows: • Proposed a solution that can consistently represent the data in historic buildings, reflect the latest operating status of historic buildings, and facilitate further data analytics. The solution combines Internet of Things (IoT) and ontology to create DTs of historic buildings. IoT enables collecting the latest conditions of historic buildings through various sensors and other devices. Ontology provides a consistent schema for representing physical entities in historic buildings, such as spaces and assets. Subsequent analytics can be performed using historical and real-time data streams acquired from historic buildings.
• Presented a reference implementation using hardware, open-source software libraries, and the public cloud, i.e., Microsoft Azure, to reduce reinvention and make it more easily reproduced and reused in other historic buildings. The use of public cloud services also eases the challenge of dealing with a large volume of collected data since preserving a historic building is a long-term process. Furthermore, the reference implementation can be packaged as software as a service (SaaS), which eases the delivery of applications and be transferred to other historic buildings.
• Validated the implementation by conducting a practical case study in the City Theatre, Norrköping, Sweden. The City Theatre is a representative of a class of historic buildings with large open interior spaces, and activities held in such buildings bring together many people in the same place over a period of time. The case study demonstrated the functionalities and benefits of the created DT.
The remainder of this paper is structured as follows. After a discussion of related work in Section II, the detailed design of the solution is described in Section III. Then, a reference implementation, including used hardware, software, and public cloud services, is given in Section IV. After that, a description of the case study building and the data analysis method is given in Section V. Section VI presents and discusses the obtained results. The last section concludes the paper.

II. RELATED WORK
The reviewed studies in this section focus on the contents of preventive conservation of historic buildings, the background of digital transformation, the concept of DT, the application of DT in different fields, including the built environment, as well as solutions provided by cloud and industrial vendors for creating DTs.
Preventive conservation concentrates on distinct aspects for different types of historic buildings. For historic buildings having interior spaces, e.g., exhibition rooms, keeping an appropriate indoor environment is important [2]. Improper environmental conditions, such as temperature and RH, might harm buildings and collections. For example, high RH levels that are greater than 75% under 20 • C promote the growth of fungi on surfaces and chemical deterioration in most organic materials [2]. Large fluctuations in temperature or RH could cause physical damage to materials [18]. Beyond thermal conditions, good air quality is also necessary for occupants like staff and visitors. Most occupants will be satisfied with a steady-state carbon dioxide (CO 2 ) concentration under 1,200 parts per million (ppm) [19]. For historic buildings where optimizing environmental factors is difficult, e.g., monuments, Van Balen [3] claimed that it is more concerned with responding quickly to urgent activities like disasters. In addition to managing threats, Lucchi [6] identified another major area of preventive conservation, i.e., energy modeling of historic buildings. Energy modeling aims to improve the energy efficiency of historic buildings to reduce costs. All these conservation studies could benefit from a deeper understanding of the operating conditions of historic buildings.
Digital transformation aims to integrate information and communication technologies (ICTs) to improve an entity by causing considerable changes to its attributes [20]. IoT and cloud computing are two representative technologies that significantly promote digital transformation [21]. The integration of IoT and cloud computing could benefit from both the data gathering capability of IoT and the data storage and VOLUME 10, 2022 processing capability of cloud computing [22]. A lot of data have been accumulated during the digital transformation and conservation of historic buildings. However, the organization of data is more complex than collecting data [23]. The complexity is due to a large amount of data and the heterogeneity of the sources. The data originates from analytical studies, surveys, diagnostics, and monitoring and must be continuously updated during performance assessment, planning, and execution [11]. The validity of data can also be an issue. Anomaly in data should be detected [24] and determined to be data source errors, such as sensor failure, or indicative of abnormal status in a building. Therefore, it is challenging to effectively organize the collected data during the digital transformation of historic buildings.
Although the precise concept of DTs varies among scholars, most people believe that a DT represents a physical entity [25], [26]. Furthermore, a DT must evolve to reflect the changes of its physical counterpart [27]. Thus, creating a DT of a physical object is useful when the physical object changes over time [28]. DTs have been applied in various sectors, such as manufacturing [29], aerospace [26], energy [30], and healthcare [31]. The built environment research community has various views on the capabilities that a DT must offer. Boje et al. [32] believed that a DT's ability to actuate realworld activities, whether proactive or reactive, in response to environmental changes is crucial. Jiang et al. [33] considered that whereas a DT necessitates data transfer from the physical entity to the virtual entity, feedback is not required. Also, the virtual entity can control its physical counterpart, but this is not required. We believe that the ultimate form of a DT in the built environment must include direct interactions from virtual to physical space. However, in practical application, a DT can gradually evolve, e.g., starting from effective monitoring to real-time data analysis, and finally including automatic control.
Within studies on modeling historic buildings, a concept usually confused with DT is historic building information modeling (HBIM). Although both DT and HBIM necessitate using a virtual model to represent a physical entity [33], there are some differences between them. First, while 3D geometric models, often obtained from 3D point clouds by laser scanning and photogrammetry [7], are typically used in HBIM, they are not necessarily required for DTs [34]. Second, the dataset in HBIM is often static. The information update for such a dataset is usually manually carried out by professional operators, resulting in a loss of temporal correlation [35]. However, DT is constantly updated [34] which accurately reflects the current state of its physical counterpart [33]. Last, the application of DT is more focused on extracting useful information from data. By analyzing historical and real-time data acquired from the physical entity, a DT enables preventive maintenance, performing what-if analysis, and simulations [26]. Therefore, while HBIM is appropriate for modeling, managing, and restoring historic buildings by utilizing reality-based recording data [7], DT is suitable for preventive conservation of historic buildings.
Several studies have applied DTs to the built environment. Khajavi et al. [36] recommended combining three necessary components to construct the DT of a building: data components from existing building information modeling (BIM), WSN, and data integration as well as analytics. Lu et al. [37] proposed a hierarchical architecture for creating DTs at the building and city levels and demonstrated the benefits of anomaly detection service in managing the health of building assets. Angjeliu et al. [15] created a DT model of a historic masonry building by integrating data such as geometry, visual observation, construction process, and material properties. The DT model is calibrated with the dynamic measurements and utilized to investigate the structural reaction of the system or for activities connected to planned preventive maintenance. Zhang et al. [16] presented an approach based on DT for optimizing relative humidity in underground heritage sites. The own developed web-based DT platform uses IoT technology to control the ventilation system automatically. However, the solutions proposed in these studies to create DTs are difficult to generalize to other historic buildings due to the lack of a consistent, reusable, and extendable data format to represent buildings.
Representing a building aims to conceptualize the whole building or selected parts of a building. The conceptualization contains the objects, concepts, and other entities that are presumptively present in the interesting area, together with the relationships among them [38]. An ontology provides explicit data format of a conceptualization [39]. Some studies have been reported to address the demand for consistent data format. Balaji et al. [17] proposed a standardized metadata schema called Brick for representing buildings. The schema defines a concrete ontology for sensors, subsystems, and their relationships, enabling the development of applications. Hammar et al. [10] developed an ontology, namely RealEstateCore, to help property owners to describe the data of interaction within the buildings as well as the management, storage, and exchange of these data. Digital Twins Definition Language (DTDL) [40] is a language for specifying DT models. DTDL is built on a JavaScript Object Notation (JSON) variation known as JSON-LD that is intended to be usable as JSON and in Resource Description Framework (RDF) systems. While DTDL allows users to generate models of any entity from scratch, it is preferable to use DTDL in conjunction with existing domain knowledge.
To reduce reinvention and accelerate the application of DTs, several public cloud and industrial vendors have provided services or software for creating DTs. Some services, e.g., Microsoft Azure Digital Twins [41] and Amazon Web Services (AWS) IoT TwinMaker [42], are for general purposes. Others, such as Google Cloud Digital Supply Chain [43], Siemens Intosite [44], and General Electric (GE) Digital Twin [45], are industry-specific. There are also a few open-source DT frameworks like Eclipse Ditto [46]. Although these tools do not provide out-of-the-box solutions for creating DTs of historic buildings, some of them could be leveraged to a certain extent to reduce duplication efforts.
Based on those backgrounds, this paper describes our work on applying DT to the preventive conservation of historic buildings. While earlier modeling work documented historic buildings through 3D geometry models, we created parametric DT through ontology. Using ontology also paves the challenge of lacking a consistent data format to represent historic buildings in previous studies. Further, we provided a reference implementation by reusing and extending the RealEstateCore ontology as well as utilizing open-source software libraries and Microsoft Azure services. Although these digital technologies exist, they are not dedicated to preserving historic buildings. The focus of this study is to provide a methodology for using these technologies to facilitate preserving historic buildings. For example, although Microsoft Azure Digital Twins allows users to generate models of any entity from scratch, it does not provide a domain-specific data format for representing historic buildings. We need to combine it with existing domain knowledge to mitigate reinvention. The RealEstateCore ontology provides some domain knowledge to represent building components and subsystems, but we still need to extend it to make it suitable for the application scenario of preserving historic buildings. Furthermore, we do not bind our methodology to Microsoft Azure. Services provided by other cloud vendors may be used to implement this solution. The cloud-based nature also allows our implementation to be packaged as SaaS, making it easy to deliver various applications and scale to accommodate more needs when promoted to more historic buildings. These efforts make our solution more reproducible for other researchers to reuse in their work adaptly.

III. SYSTEM DESIGN
This section presents the overall design of the solution. First, the main modules and their functions are outlined. Then, the detailed architecture is given.

A. OVERVIEW
As shown in Fig. 1, the solution involves four main modules, namely physical entities, virtual models, data warehouse, and functional services, as well as connections, i.e., interaction and synchronization, between the modules.

1) PHYSICAL ENTITIES
Physical entities are real-world objects. An object could be a space or an asset in space. Space ranges from a whole historic building or specific areas inside a building, such as a subbuilding, a floor, or a specific room. Assets are those objects located in a historic building but are not a part of the structure. Furniture, housed collections, and infrastructures such as electric power, water, lighting, and heating, ventilation, and air-conditioning (HVAC) systems are examples of assets.
The purpose of the application determines the physical entities that need to be modeled. When it comes to preserving collections in an exhibition room, the physical entities could be the indoor environment of the room. When energy optimization is desired for historic buildings, the physical entities could be energy-consuming equipment and the areas they serve.

2) VIRTUAL MODELS
Virtual models are digital representations of physical entities. Such representations aim to extract, define, and characterize essential attributes and behaviors of physical entities. Any model that sufficiently accurately depicts the physical entities can be used to create digital twins. This study proposes to build virtual models on a standardized metadata schema that establishes a concrete ontology for spaces, assets, and their relationships.
Besides providing a complete physical and functional description of interested physical entities, virtual models should be able to receive input and evolve with the status changes of modeled physical entities. Virtual models should also provide interfaces for other modules to notify model changes and retrieve model data. In this way, virtual models contain the necessary information for performing specific subsequent tasks to enable the development of portable applications.

3) DATA WAREHOUSE
The data warehouse entails defining data structures, operating procedures, and data storage. Operating status, model definitions, and alert rules are examples of data related to physical entities. Operating status refers to past and current situations of physical entities, such as indoor environment and energy consumption. Model definitions include virtual models that depict physical entities and data analysis models that try to extract insights from data. Alert rules describe which preventative conservation actions are taken when certain conditions are met. VOLUME 10, 2022 Because preserving historic buildings is a long-term process, evolving data related to the physical entities will accumulate. The data warehouse should be capable of handling, processing, and analyzing large monitoring datasets to extract useful information for practical use.

4) FUNCTIONAL SERVICES
Functional services aim to assist users in achieving smart maintenance of historic buildings. This goal is accomplished by integrating and analyzing data from various sources. There are three categories of functional services: • Monitoring entails visualizing historical and real-time operating status data to help users keep track of events and behaviors.
• Analytics uses methods like exploratory data analysis and machine learning to extract valuable insights from data obtained from historic buildings.
• Decision-making focuses on using the retrieved beneficial insights to guide actions for smart maintenance, such as assessing current situations and diagnosing previous issues. Decision-making also includes notifying the facility managers when maintenance is necessary, or a subsystem fails through pre-defined alert rules.

5) INTERACTION AND SYNCHRONIZATION
Interactions and synchronizations are connections between modules that allow data and commands to be shared. Synchronizations focus on data, while interactions focus on commands. Examples of synchronizations are: • Physical entities-Data warehouse: the latest status of physical entities is stored in the data warehouse on time.
• Virtual models-Data warehouse: virtual models obtain the stored latest status from the data warehouse to evolve and reflect the changes of their physical counterparts. Also, examples of interactions are: • Physical entities-Virtual models: by operating virtual models, the corresponding commands are issued to control the associated physical counterparts to accomplish in-time adjustment or optimization.
• Virtual models-Functional services: functional services search the virtual models to obtain information about the digital twins based on their properties, models, and relationships.
• Data warehouse-Functional services: functional services create queries that locate relevant resources in a building. Then, a functional service adjusts its behavior or applies an anomaly detection algorithm to the collection of retrieved resources.
B. ARCHITECTURE Fig. 2 depicts the architecture. The architecture has two parts: the local part and the cloud part.

1) THE LOCAL PART
The local part consists of the edge platform and devices that retrieve information from historic buildings. The devices are the data sources of the entire system. According to their functions, they are divided into three groups: • Sensors and collectors: sensors measure interested parameters in historic buildings, such as temperature and RH. Collectors obtain measurements from the sensors.
• Actuators and controllers: actuators refer to the equipment in the historic building that can be controlled, such as fans and heaters. Controllers send control signals to these equipment and read the operating status from them.
• Other systems and brokers: other systems refer to existing systems, e.g., the building management system (BMS), in a historic building. Brokers help retrieve information from such systems and forward commands to them. An edge platform has certain storage, computing, and networking resources that serve two purposes. One is to act as a gateway so that the data from local devices can be reported to the cloud platform and receive control commands issued from the cloud platform. The other is to store and process data locally, such as taking over part of the artificial intelligence (AI) and analytics workload offloaded from the cloud platform, which is helpful for sensitive data or critical applications that require high real-time performance. Because the data does not need to be uploaded to the cloud platform, it can be processed locally in time.
The edge platform can communicate with devices, i.e., collector, controller, and broker, through wired or wireless networks.

2) THE CLOUD PART a: CLOUD GATEWAY
In the cloud part, the cloud gateway is the entry. The cloud gateway manages local edge platforms and enables bi-directional communication between the cloud services and the edge. For example, when receiving a message from an edge, the cloud gateway notifies the downstream components to consume the message, such as updating the status of DT models and storing data.

b: DIGITAL TWIN MODELS
The digital twin models are built on ontology. Ontology provides a consistent data representation to store metadata of physical entities and capture all significant relationships between physical entities. Ontology-based digital twin models provide machine-readable data formats and querying tools, allowing other modules to reason about the semantics of the data.

c: STORAGE
Storage includes databases for storing structured and unstructured data. Other modules can write or read databases through application programming interfaces (APIs) provided by storage. Structured data include telemetries collected by sensors, the status of equipment, or other organized data. Unstructured data include texts and images. Texts may be a maintenance guide for equipment or collections. Images may be photographs of collections taken periodically, such as paintings and sculptures, to see how their surfaces have deteriorated.

d: INSIGHTS
Based on historical and real-time data, we can do a series of data analytics to discover insights from data. Data analytics usually start with exploratory data analysis (EDA). EDA is a method that uses various techniques (mostly graphical) to maximize insight into a data set [47]. Based on the understandings obtained from EDA, appropriate machine learning (ML) algorithms can be selected to build predictive models to perform functions such as anomaly detection and energy prediction. After these predictive models are trained and validated, they can be deployed for performing streaming data analysis.

e: APPLICATIONS
The ultimate goal of the system is to provide applications for preventive conservation. Applications can take many forms, including Web Apps, REST APIs, and Streaming APIs. The Web App can be a visualization application of data and models or an interactive simulation application such as energy prediction and occupancy prediction. An API is a contract between our system and information users. For example, the API designed for an indoor environment service could specify that the user supplies a building identity (ID) and a room ID. Our system replies with two indoor environmental conditions, the first being the temperature and the second being the relative humidity.

IV. SYSTEM IMPLEMENTATION
Our developed implementation is an entire IoT system for data collection with multiple sensors, data communication with an edge platform, data storage with the Microsoft Azure cloud, and the creation of DTs of historic buildings with ontology. This implementation emphasizes modeling the indoor environment to provide a deeper understanding of the operating conditions of historic buildings. The implementation does not include detecting other factors, such as pest infestation, natural disasters, theft, and vandalism, that might cause damage to historic buildings and housed collections. We have also reserved interfaces for further expansion, e.g., connecting equipment in historic buildings to exchange operation status and control commands.

A. THE LOCAL PART
The used hardware, including sensors, collector, and edge platform, is depicted in Fig. 3.

1) SENSORS
Five sensing devices are used for measuring six indoor environmental parameters.
All sensing devices have been calibrated before being deployed in the case study building. Among the five sensing devices, DHT22 and the Grove-Piezo sensor were calibrated by ourselves. The other three sensing devices, i.e., MH-Z16, PPD42NS, and MIKOE-1630, were calibrated by the manufacturers. In addition, all sensing devices have undergone continuous operational testing that lasts for two weeks in a test room before deployment.

2) COLLECTOR
An Arduino Uno Rev3 SMD (Arduino, Somerville, MA, USA) together with a Grove Base Shield V2 (Seeed Technology, Shenzhen, CHN) is used as collector. The collector has abundant interfaces, including an SPI connection, a UART connection, six analog input/output (IO) pins, and 14 digital IO pins. These interfaces enable the collector to obtain readings from sensors and communicate with the edge platform. A 16-bit cyclic redundancy check (CRC) is used to detect errors during data transmission between the collector and the edge platform.

3) EDGE PLATFORM
The edge platform includes a computing module, i.e., Raspberry Pi CM3+ Dev Kit, and a network module, i.e., ZTE MF833V. The used Raspberry Pi CM3+ Dev Kit (Raspberry Pi Foundation, Cambridge, GBR) has 1 GB RAM and 32 GB flash storage, as well as a processor with a clock speed of 1.2 GHz. The ZTE MF833V (ZTE, Shenzhen, CHN) is a 4G USB modem that allows users to connect to mobile broadband. The download speed for the specific 4G network is up to 150 Mbps, and the upload speed is up to 50 Mbps.

B. THE CLOUD PART
The cloud part processes the collected data in real-time, including data visualization and system health monitoring. We have also reserved interfaces to integrate machine learn- ing and streaming analytics to detect anomalies from the six environmental data in real-time.
As shown in Fig. 4, several cloud services provided by Microsoft Azure are used to implement the cloud part. The scale tiers, key specifications, and costs of used services can be seen in Table 1 of Appendix. For each used Azure service, the scale tier was determined by present needs. These services also support scale-up to fulfill future research needs.

a: IoT HUB
Azure IoT Hub [48] is a cloud-based managing service that acts as a central messaging hub for enabling communication between an IoT application and its linked IoT devices. IoT Hub serves as the cloud gateway.

b: FUNCTIONS
Azure Functions [49] is a cloud service that provides users with event-driven serverless computing. This study uses Azure Functions to transfer messages between different cloud modules, such as updating status of digital twin models and writing data to databases.

c: DIGITAL TWINS
Azure Digital Twins [41] is adopted to deploy DTs. The DT models of historic buildings are defined by using DTDL [40] and the RealEstateCore ontology [50]. As shown in Fig. 5, the RealEstateCore ontology has three major classes: Space, Asset, and Capability. We reused the definitions of subclasses under the Space category. We extended several subclasses under the Asset and Capability categories to represent used hardware as well as their capabilities.

d: SQL DATABASE AND BLOB STORAGE
Azure SQL Database [51] and Blob Storage [52] are utilized to provide storage resources. Azure SQL Database stores structured data, such as all collected sensor data. Blob Storage stores unstructured data, for example, the definitions of DT models.

e: WEB APPS
Azure Web App Service [53] is used to host a web App called Minerva (see https://historicbuildings.azurewebsites.net). We developed the Minerva by using a low-code open-source framework, i.e., Dash Plotly [54] and a Bootstrap component package, namely Dash Bootstrap Components [55]. The Minerva user interface supports visualizing real-time and historically collected data as well as sharing these data.

V. CASE STUDY
This section presents a practical application of the implemented system in a historic building. First, a brief description of the historic building is provided. Then, the deployment of local devices is introduced. After that, the data analysis method is given. The results, including created DT and its main functions, the impact of the occupants on the indoor environment, and fluctuation analysis of RH, are given in Section VI.

A. DESCRIPTION OF THE CITY THEATRE
The City Theatre (see Fig. 6) is located in the city center of Norrköping, Sweden. The building was erected in 1908 in the Art Nouveau style. Since 1990, the City Theatre has been listed as a protected building under the national historic legislation. Now, the building is used as a platform for performing shows that reflect current society. The building has a salon room (see Fig. 7) with a seating capacity of 600 persons. Before the COVID-19 pandemic, in each year of 2018 and 2019, there were more than 50 shows performed, attracting over 13,000 audiences.
A case study in the City Theatre could be a good inspiration for the digital transformation of similar historic buildings, e.g., theatres and churches, that have large open interior  spaces and draw many people to the same place for some time during activities.

B. DEPLOYMENT OF THE LOCAL DEVICES
To collect environmental conditions that are close to the average values of the seating area in the salon room, a sensor box (see Fig. 8) is deployed under the fence of the second floor of the grandstand, which is near the spatial center of the salon (see Fig. 7). The sensor box is a plastic box that packages an edge platform, a collector and five sensors as described in Section IV-A. The data collection started from March 16, 2021, and will last at least by the end of 2023. All the six environmental parameters, i.e., temperature, RH, CO 2 concentration, dust concentration, harmful gas concentration, and vibration, are collected every 15 s. Undoubtedly, deploying more sensor boxes can provide a better understanding of the spatial distribution of environmental parameters. However, the location where the sensor box is deployed in this paper is representative, and the collected environmental data VOLUME 10, 2022 are supposed to approximate the average values of the seating area. In addition, the methodology to create DT models and the data analysis method (see Section V-C) are also applicable when more sensor boxes are deployed.

C. DATA ANALYSIS METHOD
The data analysis in this study serves two purposes. One is to investigate the impact of occupants on the indoor environment when shows are performed in the City Theatre and the relationship between changes in different environmental parameters. Another is to inspect the fluctuations in indoor relative humidity throughout the whole calendar year to see if any risks may harm conservation.
After preliminary observations of collected data, we found that the data were sampled much faster than the expected need, i.e., fast enough to catch any changes in real-time. Therefore, we downsampled the data before further data analysis. After removing invalid data, the original data, collected every 15 s, were downsampled by taking an average value of measurements every five minutes. Due to occasional network or power issues, some readings (< 2%) were lost during transmission. Therefore, after downsampling, missing values are filled in using the nearest previous value. All subsequent times are presented in the 24-hour system and in local time (Greenwich Mean Time (GMT) +1 for summer time and GMT +2 for winter time). All shows performed in the City Theatre (from September 2021 and February 2022) lasted over two hours.

1) THE IMPACT OF OCCUPANTS ON INDOOR ENVIRONMENT
The analysis of the impact of occupants on the indoor environment is carried out by combining exploratory data analysis (EDA) and classical statistical analysis. EDA is adopted to observe and compare the changes in the indoor environment before and during shows, as well as analyze what factors cause these changes. The classical statistical analysis focuses on quantitatively studying the impact of different occupancy levels on the indoor environment.
To quantitatively study the impact of different occupancy levels on temperature, as shown in Fig. 9, we use the show start time as the boundary to determine two equal periods, namely before the show and during the show, each has a two-hour duration. Then we use the average value of measurements during these two periods as their representative value, respectively. The difference between these two average values is the change brought by occupants. We denoted average temperature during the period of before the show as T before and denoted average temperature during the period of during the show as T during , then the change of temperature is calculated as T during − T before , and is denoted as T .
All days were chosen between September 2021 and February 2022 (during the season with heating) to mitigate the impact of outdoor weather and obtain a relatively consistent indoor temperature. The number of days with shows is three times the number of days without shows. This ratio ensures the equality of group size after splitting days with shows to three groups with different occupancy levels. When selecting days without shows, the distribution is considered by days in each month and the ratio of weekdays versus weekends.
Then, among days with shows, occupancy levels are divided into three groups, namely High, Medium, and Low. A control group called Zero is determined by picking days without shows at the same time period. For the Zero group, the changes represent natural changes that could act as a baseline. In contrast, for the other three groups with occupants, the changes are brought by the combined effect of occupants and the operation of HVAC systems. Hence, we have totally four groups corresponding to four occupancy levels.
In this study, we have one independent categorical variable, i.e., the occupancy level, and one dependent continuous variable, i.e., T . To determine whether there are any statistically significant differences between the means of T in the four groups of occupancy levels, we conduct a one-way analysis of variance (ANOVA). This could reveal whether different occupancy levels could result in different temperature changes. For one-way ANOVA, the null hypothesis H 0 is that there is no difference among group means. i.e., µ 1 = µ 2 = µ 3 = µ 4 , where µ k denotes the mean value of T in group k. The alternative hypothesis H 1 is that the means are not all equal. The significance level is set as 0.05.

2) FLUCTUATION ANALYSIS OF RELATIVE HUMIDITY
The approach proposed in the European standard EN 15757:2010 [18] is used to inspect the fluctuation of RH. Three statistics, i.e., yearly average level, seasonal cycle, and fluctuation, characterize indoor RH. The yearly average level is calculated as the arithmetic mean of the RH readings during a whole calendar year. The seasonal cycle is determined by computing the centered moving average (CMA) for each reading, which is the arithmetic mean of all RH readings collected in 30 days consisting of 15 days before and 15 days after the average is computed. A fluctuation is calculated as a current RH reading minus the 30-day CMA determined for that reading.
The 7th and 93rd percentiles of the fluctuations recorded during the monitoring period are used to determine the lower and upper bounds of the safe band of RH variations. If the 7th and 93rd percentiles are less than 10%, a 10% RH deviation from the seasonal RH level can be accepted [18].

VI. RESULTS AND DISCUSSION
This section summarizes obtained results and findings. First, the created DT and its main functions are presented. Then, the impact of occupants on the indoor environment is shown. Finally, the fluctuation of RH in the City Theatre is analyzed.

A. THE CREATED DIGITAL TWIN
A limited parametric DT of the City Theatre is created with the locally deployed sensor box. Fig. 10 shows the twin graph of the created DT. In the twin graph, the circular nodes represent virtual models of physical objects. The arrows indicate relationships between virtual models. The virtual models and relationships correspond one-to-one with physical objects and relationships in the real world. For example, the top node The City Theatre corresponds to the entire building. It is followed by the node The salon room. The relationship between the two nodes is called isPartOf, i.e., the salon room is part of the City Theatre. The DT is open to extension and modification. When new physical objects need to be modeled, corresponding virtual models and relationships can be added to the twin graph. Existed virtual models can also be modified, e.g., adding more properties to model their physical counterparts more exactly.
Such a DT provides three capabilities. The first is that virtual models reflect the latest status of physical objects by fed sensor data collected in nearly real-time. Therefore, through exploring the properties of virtual models, users can obtain the current conditions of physical objects. Fig. 11 illustrates such a use case: by clicking on the node representing the capability of measuring temperature, users can know the latest indoor temperature of the salon room.
The second is that the DT provides knowledge of a historic building. The twin graph can be searched to learn more about the virtual models and their relationships. These queries are written in a custom SQL-like query language. For instance, suppose several other rooms in the City Theatre have also  been deployed the same sensor box as in the salon room. We may ask the DT to find out all rooms currently with CO 2 concentrations below 500 ppm through a query instruction as follows.  The third capability lies in collaborating with data analytics to uncover valuable insights from historical and real-time data to help optimize operations and achieve preventive conservation of historic buildings. The dashboard page of Minerva shown in Fig. 12 presents the collected readings of six environmental parameters in a typical week. Some correlations can be found between changes in readings of different environmental parameters when there were human activities in the salon room. As depicted in Fig. 12a and Fig. 12c-e, five peaks can be found in each plot of temperature, CO 2 , dust, and harmful gas from October 6 to 10. Such drastic environmental changes are caused by shows performed on that five days. Also, as shown in Fig. 12e, an extra peak of harmful gas concentration can be found on October 4. We speculate that the existence of other harmful gases except CO 2 causes this peak because the air quality sensor MIKOE-1630 (see Section IV-A1) used in this study can detect the presence of various harmful gases, including CO 2 . However, there was no significant increase in CO 2 concentration (measured by the CO 2 sensor MH-Z16) on October 4. Such differences in concentrations of CO 2 and overall harmful gas might be used for developing an anomaly detection service in the DT in further work. It is worth mentioning that the plot of vibration readings (see Fig. 12f) shows a constant value because we have not yet determined the threshold of collected interruptions for distinguishing if there was a vibration. The threshold will be decided in future work. In the following subsection, the impact of occupants on the indoor environment is analyzed in detail.

B. THE IMPACT OF OCCUPANTS ON INDOOR ENVIRONMENT
When shows are performed in the salon room, the presence of audiences can affect the indoor environment. Fig. 13 provides a typical example of such an impact. The show of the day started at 19:00 and ended around 21:20. As depicted in Fig. 13a, at around 19:00, as the audience entered the room, the temperature rose and reached the maximum value of 25.6 • C at around 19:30. After that, the temperature fluctuated around a relatively high value. After 20:00, the temperature went through a process of first falling and then rising. This change is attributed to a short break in the show. Some audiences left the salon for rest and then reentered the room. After the show ended, audiences left, and the temperature dropped to below 24 • C. A similar phenomenon can be found in changes of CO 2 concentration (see Fig. 13c). The change in relative humidity (see Fig. 13b) is a combination effect of moisture and temperature. Relative humidity fluctuated more during a show than when there is no audience.
As seen in Fig. 13d, there were two time periods when the dust concentration fluctuated at a high level (∼20,000 pcs/L). The first time period was shorter and took place between 17:00 and 18:00, i.e., before the show started, during which we believe that the actors and professionals were doing some pre-show preparation work in the salon room. The second time period was longer and lasted from the start of the show (around 19:00) to one hour after the show ended (around 22:20). High dust concentrations in both of the two periods are mainly due to the operation of the ventilation system, which enhances the mobility of air in the salon room, thereby stirring up small particles. The movement of air makes the environmental conditions relatively uniform, which indicates that data collected by one sensor box (see Fig. 8) are good enough to reflect the indoor environment of the salon room.
We can see that CO 2 concentration is more sensitive to indicate the presence of occupants compared with temperature and relative humidity. This is reflected in two aspects. One is the quick response. When comparing Figs. 13a and 13c, we can find that some audiences had already entered the salon room before the show began, resulting in a quick rise in CO 2 concentration. However, there was some hysteresis in the temperature rise.
Another is that the difference in CO 2 concentration can reflect occupancy levels to a certain extent. More audiences lead to higher CO 2 concentrations. Fig. 14 shows the daily maximum CO 2 concentration during a whole week, from October 4 (Monday) to October 10 (Sunday), 2021. We can find that CO 2 concentrations were below 500 ppm on Monday and Tuesday because no show was performed on that two days. From Wednesday to Sunday, each day had a show performed. The maximum CO 2 concentrations during these five shows all exceeded 900 ppm. On Saturday, the maximum CO 2 concentration even exceeded 1,200 ppm, which might cause discomfort to some audiences. Such difference in CO 2 concentrations between shows reflects differences in the number of audiences attending different shows. It is reasoned that the Saturday show drew the most audiences during that week. In addition, the authors suggest that when the number of audiences is large, the airflow of the ventilation system can be appropriately increased to ensure that the CO 2 concentration is lower than 1,200 ppm and hence guarantee human comfort of audiences. Fig. 15 shows the daily maximum CO 2 concentration in a full calendar year, which indicates the activities in the City Theatre throughout the year. When there was no show, CO 2 concentrations were often lower than 500 ppm. Due to the pandemic of COVID-19, there was no show in the City Theatre until September 2021. Moreover, in October, the City Theatre had more audiences. The occupancy information can be further explored in future work, such as using changes in indoor environmental parameters to estimate occupancy levels in historic buildings in real-time.
The quantitative study of the temperature changes could support that CO 2 concentration reflects occupancy levels. We used the average CO 2 concentration during the period of VOLUME 10, 2022  during the show to determine the group of different occupancy levels. Then we investigate if there is a statistically significant difference in T among different occupancy level groups. The days in each group are shown in Fig. 16. A lowercase n represents the number of T data points in an occupancy group. Data of T is presented as mean ± standard deviation. T increased from the Zero (n = 19, 0.02 ± 0.26 • C), to Low (n = 19, 0.59 ± 0.25 • C), to Medium (n = 19, 1.00 ± 0.29 • C) and to High (n = 19, 1.72 ± 0.46 • C) occupancy groups.
To Determine if the data have outliers, the boxplot (see Fig. 17) for T for each occupancy level was used. Any data points that are more than 1.5 box lengths from the edge of their box are classified as outliers, illustrated as diamond dots. There was no outlier in the data of T as assessed by inspection of the boxplot.
The Shapiro-Wilk test was conducted to test the normality of the data of T . Four tests were run, one for the dependent variable T in each group. The data of T were normally distributed for each occupancy level, as assessed by Shapiro-Wilk's test (p >.05). The assumption of homogeneity of variances was violated, as assessed by Levene's test for equality of variances (p <.05). This means that the standard one-way ANOVA is not applicable in this case. A modified version of the ANOVA, i.e., Welch's ANOVA, is used. T was sta- Therefore, the statistically significant difference in T confirms that CO 2 could reflect occupancy levels. Consequently, for historic buildings like the City Theatre, the indoor CO 2 concentration might be used to estimate occupancy levels and thus adaptively make conservation decisions. For example, since a higher occupancy level leads to higher internal heat gain, it is recommended to make a more efficient heating decision like adaptive heating. Through adaptive heating, the heating system of the case study building or similar historic buildings is controlled based on occupancy levels. Hence, one goal of preventive conservation, i.e., saving energy, could be achieved.

C. FLUCTUATION ANALYSIS OF RELATIVE HUMIDITY
The yearly mean value of RH was 32.1%. The 7th diff is −7.0% while 93rd diff is 6.9%. Since the absolute values of both 7th and 93rd are lower than 10%, a 10% deviation from the 30-day CMA level is adopted as upper and lower bounds of the safe band. The measured values (see Fig. 18) show that the RH is higher in summer months and lower in winter months. Also, in summer months, the RH fluctuated more, and more risky data points were measured than in winter months. The authors suggest that this is mainly due to  indoor heating. In winter, the temperature and RH are better maintained due to the working of the heating system.
The authors propose that some proactive interventions could be done during the summer months to reduce fluctuations in RH and thus achieve better heritage conservation. It is recommended to cycle on the ventilation system during the summer months, even when the salon room is not occupied, to lower RH and mitigate its fluctuations. It is also suitable to add a dehumidifier for better controlling the RH.

VII. CONCLUSION
This study aimed to facilitate the preventive conservation of historic buildings by integrating digitalization techniques to create digital twins. A solution was proposed by combing Internet of Things and ontology to consistently represent the data in historic buildings, reflect the latest operating status of historic buildings, and enable further data analysis. An implementation was provided to verify the proposed solution. The implementation is an entire Internet of Things system developed by ourselves using hardware, open-source software libraries, RealEstateCore ontology, and Microsoft Azure. A practical case study was also conducted in a historic building located in Norrköping, Sweden, to demonstrate the functions of the implemented Internet of Things system. The results indicated that a digital twin that reflects the latest status of a historic building can be created and fed with real-time sensor data. The insights discovered from the digital twin provided facts for improving the indoor environment of the case study building to achieve both heritage conservation and human comfort. For future work, monitored indoor environmental conditions could be used to estimate occupancy levels. More operating status like energy consumption could also be integrated into the digital twin to model energy behaviors of the historic building and to reduce operating costs. Table 1 summarizes Azure services used in this manuscript. All used services are hosted in the Azure region of North Europe. Costs are in Swedish krona (kr). It is worth noting that the monthly paid services, such as IoT Hub, SQL Database, and Web App, currently support the operations of three sensor boxes. Two other sensor boxes were deployed in other case study buildings in addition to the one mentioned in this research. Moreover, it is estimated that to support the operations of 100 sensor boxes like the one in this study, the monthly cost of Azure services is around 3 000 kr.