Transmission Probability of SARS-CoV-2 in Office Environment Using Artificial Neural Network

In this paper, curve-fitting and an artificial neural network (ANN) model were developed to predict R-Event. Expected number of new infections that arise in any event occurring over a total time in any space is termed as R-Event. Real-time data for the office environment was gathered in the spring of 2022 in a naturally ventilated office room in Roorkee, India, under composite climatic conditions. To ascertain the merit of the proposed ANN and curve-fitting models, the performances of the ANN approach were compared against the curve fitting model regarding conventional statistical indicators, i.e., correlation coefficient, root mean square error, mean absolute error, Nash-Sutcliffe efficiency index, mean absolute percentage error, and a20-index. Eleven input parameters namely indoor temperature (<inline-formula> <tex-math notation="LaTeX">$T_{In}$ </tex-math></inline-formula>), indoor relative humidity (<inline-formula> <tex-math notation="LaTeX">$RH_{In}$ </tex-math></inline-formula>), area of opening (<inline-formula> <tex-math notation="LaTeX">$A_{O}$ </tex-math></inline-formula>), number of occupants (<inline-formula> <tex-math notation="LaTeX">$O$ </tex-math></inline-formula>), area per person (<inline-formula> <tex-math notation="LaTeX">$A_{P}$ </tex-math></inline-formula>), volume per person (<inline-formula> <tex-math notation="LaTeX">$V_{P}$ </tex-math></inline-formula>), <inline-formula> <tex-math notation="LaTeX">$CO_{2}$ </tex-math></inline-formula> concentration (<inline-formula> <tex-math notation="LaTeX">$CO_{2}$ </tex-math></inline-formula>), air quality index (<inline-formula> <tex-math notation="LaTeX">$AQI$ </tex-math></inline-formula>), outer wind speed (<inline-formula> <tex-math notation="LaTeX">$W_{S}$ </tex-math></inline-formula>), outdoor temperature (<inline-formula> <tex-math notation="LaTeX">$T_{Out}$ </tex-math></inline-formula>), outdoor humidity (<inline-formula> <tex-math notation="LaTeX">$RH_{Out}$ </tex-math></inline-formula>) were used in this study to predict the R-Event value as an output. The primary goal of this research is to establish the link between <inline-formula> <tex-math notation="LaTeX">$CO_{2}$ </tex-math></inline-formula> concentration and R-Event value; eventually providing a model for prediction purposes. In this case study, the correlation coefficient of the ANN model and curve-fitting model were 0.9992 and 0.9557, respectively. It shows the ANN model’s higher accuracy than the curve-fitting model in R-Event prediction. Results indicate the proposed ANN prediction performance (R = 0.9992, RMSE = 0.0018708, MAE = 0.0006675, MAPE = 0.8643816, NS = 0.9984365, and a20-index = 0.9984300) is reliable and highly accurate to predict the R-event for offices.


I. INTRODUCTION
Governments of several nations are predicting the occurrence of fourth wave of Coronavirus Disease (COVID- 19) in their countries, while many countries are already facing the critical situation. The Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) instigated the COVID- 19 and it has been spreading rapidly across the world since the beginning of 2020. On 30 th January, 2020, the World Health The associate editor coordinating the review of this manuscript and approving it for publication was Binit Lukose .
Organization (WHO) declared the outbreak a ''Public Health Emergency of International Concern'', and it was designated a ''pandemic'' on 11 th March, 2020. Many countries immediately implemented preventive methods like social (physical) distancing and lockdowns due to the COVID-19 pandemic. This results in negligible or limited outdoor activities during the COVID-19 pandemic, consequently, people spend more time indoors. SARS-CoV-2 has been abbreviated as 'SC-2' during this study.
This outbreak of SC-2 causes immense omnidirectional losses (human lives, economic loss, physical and psychological health loss, educational and cultural loss, job loss) to every nation [1], [2], [3], [4], [5], [6]. Doubts regarding the pathways of transmission SC-2 have been raised because of the prompt rise of COVID-19 symptoms as well as mortalities throughout the world. WHO COVID-19 Dashboard [7] shows that the death toll will soon cross the 6.5 million mark along with more than 5 billion total cases of COVID-19 as reported to the WHO throughout the globe.
Contact transmission and droplet transmission are widely recognized by the public [8]. Albeit, in addition to the more commonly established transmission routes of direct contact with sick individuals or contaminated surfaces and larger respiratory droplets, inhaling tiny airborne droplets (aerosols) is a likely third route of infection. According to several recent studies, different sizes of droplets are generated when infected person breathe, speak, sneeze, and cough, the larger droplets swiftly settle in the 1-2 metres range [9], [10]. As illustrated in Figure 1, droplets with smaller diameters can travel a longer distance, depositing greater than 2 metres away from the virus-emitting source [11]. Smaller particles carry more viral concentration and due to negligible influence of gravitational force and the presence of air flow they can travel up to 6 metre distance. Figure 2(b) illustrates the relationship between the droplet and airborne transmission with the distance between infected and susceptible persons. The susceptible person '1' is at high risk due to less distance from Infected person 'I', whereas susceptible person '2' is at moderate risk and susceptible person '3' is at low risk due to distance of more than 6 meters from the person 'I'. Additionally, Figure 2(a) presents that smaller particles penetrate deeply into the human respiratory system and affect more.
Several cases of suspected aerial transmission of various infectious illnesses (including COVID-19) in indoor spaces have been recorded. The first instance comprised thirty-seven confirmed cases out of sixty-eight people who shared a bus in China for a hundred minutes' journey [27]. The Diamond Princess cruise ship is the second notable occurrence, with 712 confirmed cases recorded out of 3711 people traveling on the ship [28]. The third instance had 9 confirmed cases; an index patient (who had earlier visited Wuhan, the epicentre of the pandemic) who lived in apartment infected 4 other TABLE 1. Respiratory activities in office environment and COVID-19 [22], [23], [24], [25]. neighbouring residents in a housing complex in China [29]. In an experimental study conducted by Van Doremalen et al. (2020) [30], SC-2 remained alive in aerosol form and suspended in the air for the whole 3-hour experiment. Shrestha et al. [31] studied SC-2 aerosol dispersion in a virtual office building and concluded that unventilated spaces are more prone to transmissions. They found that when a building is under-ventilated, significant aerosol build-up can develop even with low occupancy.
Poor ventilation in the building's interior areas has been linked to an increase in the spread of other respiratory diseases apart from SC-2. However, according to the current priority researchers pointed out that improper ventilation is an important factor in the spread of SC-2 inside different types of buildings. After recognising the significance of ventilation systems in preventing the transmission of SC-2, various researchers, policymakers, building-related research institutes, and Heating, Ventilation and Air-Conditioning (HVAC) associations and societies of different countries have published revised guidelines on the management as well as operation of ventilation systems during pandemic. Several nations across the globe, including the United States, India, China, Japan, Canada, and the European Union, have released recommendations to combat COVID-19 [32], [33]. Approximately all the guidelines suggested that it is quite beneficial to increase external air ventilation throughout the day in a building. Table 2 presents a comparable overview of several important guidelines and recommended temperature and relative humidity limits. These guidelines also considered natural ventilation, specific spaces, CO 2 as a proxy indicator, door/window opening, etc. for effective prevention of the SC-2 spread in buildings.
In buildings, to decrease the use of energy, Natural Ventilation (NV) is one of the most basic technique. NV permits substantial air changes per hour for minimal operating costs with effective heat removal. Airflow through the building with NV is caused by both buoyancy forces and wind effect, a wind pressure component is generally not considered as design principles. NV is dependent on buoyant forces created by temperature variations within the structure and wind from the outside. Buildings can be built to take advantage of one or both of these driving forces [46], [48], [47].
According to a research study [48], in a poorly ventilated room, the aerosol suspension duration increased tenfold. In a steady-state condition, the interior ventilation and the indoor CO 2 concentration have a specific relationship [49]. As a result, if a suitable device to detect interior ventilation is not available, the indoor CO 2 concentration might be utilised as a proxy indication. Indoor CO 2 levels reflect a building's ventilation performance in relation to the air changes, air flow pattern (depending upon indoor and outdoor environment, anthropogenic activities indoors), occupant density, opening area, and spatial volume of occupied space. Conventional CO 2 measurement strategies fail to capture spatial or temporal variations in the CO 2 concentration level. Recent technological developments in CO 2 monitoring and prediction provide opportunities to monitor CO 2 and accurately predict the concentration levels inside buildings. This develops deeper insights into individuals' CO 2 exposure. Additionally, exact air changes per hour (ACH) is hard to find in a real-time office working environment because conditions are highly dynamic in naturally ventilated building spaces. For regular indoor environments, the CO 2 content should be kept at 1000 ppm or below. However, according to the SAGE-EMG guidelines [50] for COVID-19 like situations, one should keep CO 2 levels below 800 parts per million (ppm) in indoor environments where a lot of aerosols are produced. If the concentration reaches 1500 parts per million or more, it should be prioritised for improvement. CO 2 could be used as a proxy to detect the chances of infection spread due to SC-2 as both COVID-19 spreading and CO 2 concentration depend upon occupant density inside any confined space.
Rudnick and Milton [51] evaluate the risk of indoor airborne viral transmission based on carbon dioxide concentration. They develop a model employing CO 2 concentration as a marker for exhaled-breath exposure and computing the proportion of inhaled air that has already been discharged by someone in the building. Without assessing the outside air supply rate or assuming that it stays constant over time, authors created a CO 2 -based risk model without presuming that the concentration of an infectious agent has achieved steady-state. Similarly, in 2021; Peng and Jimenez [52] nudges the idea to reconsider the CO 2 concentration as a COVID-19 infection risk proxy for different indoor environments and activities. Authors create analytical formulations for CO 2 -based risk proxies and applied them to a variety of common indoor settings. Authors limits the scope of study by stating a limitation that there are large uncertainties associated with infection risk estimates mainly from virus exhalation rates but they recommended to install low-cost CO 2 based indoor infection risk monitoring systems to promote public health and safety. Bazant and Bush [53] recommended limited time spent in a shared place with an infected individual for minimising COVID-19 indoor airborne transmission. Later ensuing the same, Bazant et al. [54] reformulated this safety recommendation in terms of occupancy duration and mean exhaled carbon dioxide content in an indoor environment, allowing the use of CO 2 sensors in the risk assessment of airborne transmission of pathogenic viruses. Authors developed a mathematical model to forecast the probability of airborne transmission based on real-time CO 2 observations. Authors strongly concluded that as CO 2 levels are considered as direct indicator for ventilation and mixing of air it can also be used to assess the risk of airborne transmission of viruses like SC-2. Wang et al. [55] presented a review summary on the airborne transmission of respiratory viruses. Carbon dioxide sensors can be used to monitor and improve ventilation by acting as indications of the accumulation of exhaled air. Although the ventilation type should also be considered, it has been recommended to keep carbon dioxide levels below 700 to 800 ppm. Martinez et al. [56] developed a novel agent-based simulator named ''ArchABM''. ArchABM is intended to aid in the design of new buildings or the adaptation of existing ones by calculating sufficient room sizes, ventilation characteristics, and assessing the effect of policies while accounting for IAQ as a result of complex human-building interaction patterns. in this work, a previously published aerosol model was modified to compute time-dependent carbon dioxide (CO 2 ) and viral quanta concentrations in each room, as well as inhaled CO 2 and virus quanta concentrations for each occupant over the course of a day as a measure of physiological response. Recently, Tang et al. [57] consider spatio-temporal airborne viral infection risk assessment in real time using CO 2 concentration field monitoring and occupancy in the built environment. Authors suggested a technique that can assess real-time airborne disease infection risks via aerosol transmission. The method was applied in a university office and the cumulative infection risk was evaluated for three residents with three-day work. The current work nudges these research by introducing AI to predict the airborne transmission by using real-time collected data of CO 2 and other environmental parameters.
Humidity and temperature are important parameters of an indoor environment and affect the virus survival rate. Both humidity and temperature play an important role in indoor human comfort and directly affect the occupant's perception. Mecenas et al. [58] reviewed the literature on the effects of humidity and temperature on COVID-19 transmission and explored that it is important to include these parameters in future studies as these can affect the spreading of the disease. Several guidelines of different countries also considered relative humidity and temperature as important parameters. Many of these guidelines (as mentioned in Table 2) recommended boundary conditions for indoor relative humidity and temperature; considering both human comfort indoors as well as prevailing SC-2 spreading. Additionally, outdoor VOLUME 10, 2022 environmental conditions, occupant behaviour, and indoor occupant density are among other important parameters to reducing SC-2 spread in an indoor environment [59], [60]. Several organizations and governments of different countries around the world had issued guidelines on COVID-19 appropriate behaviour in office environments [35], [39], [43], [45], [50], [61], [62]. Reduced occupancy, personal hygiene, good ventilation, proper personal safety, timely training, authentic information circulation, and worker's health monitoring were some of the important recommendations to employers for a safe working environment.
A suitable door and window opening strategy according to spatiotemporal variation and occupant activity behavioural pattern helps in efficiently reducing indoor CO 2 concentration and thus, affecting the probability of SC-2 transmission in an office environment. Table 2 shows that approximately all the guidelines recommended opening doors/windows for enhancing natural ventilation and diluting exhaled stale air indoors. However, in some cases, opening doors/windows are not suitable. Especially in case of like extreme exterior noise, visual privacy concerns, high level of exterior pollution (depending on AQI ) infiltrating indoors, unwanted excessive light, uncomfortable air movements/drafts, invasion of flies and other insects, etc. which can reduce occupant concentration and increasing errors resulting in productivity loss. AQI is also an important parameter as it represents pollutants level, PM 2.5 and PM 10 are the most important component of AQI .
Tupper et al. [63] introduced a concept named ''event R'' in relation with SC-2. According to Tupper et al. [63], ''event-R is the expected number of new infections due to the presence of a single infectious individual at an event''. Using reports of small outbreaks authors obtained a fundamental correlation between ''event R'' and four other parameters namely, proximity of individuals, exposure duration, degree of mixing, and transmission intensity. The event-R is also termed as R-Event. REHVA also used R-Event alongside infection probability in their updated version (Version 2.1) of the airborne COVID-19 prediction calculator [64].
Currently, throughout the world, scientific investigation of the foregoing transmission of different variants of SC-2 virus in indoor settings is trying to find out the accurate infection probability. Worldwide scientific community look at two types of infection probability in bounded spaces: (i) based on air change rate, and (ii) based on exhaled CO 2 by the occupants. In this study, the above-mentioned second type; based on exhaled CO 2 by the occupants is linked with the real-time dynamic spatiotemporal indoor and outdoor environmental parameters to predict the R-Event value. R-Event value is an enhanced and more informative substitute for infection probability. The Version 2.1 calculator developed by the REHVA (mentioned above) is used to find out the R-Event value in the existing conditions and recorded simultaneously with other real-time parameters to develop a link between exhaled CO 2 and R-Event in an office environment. Unknown single infected occupant (among four static occupants) in the office environment is considered as an infection source to find the possibility of airborne transmission (through aerosols and droplets only). This study aims to help readers to accurately predict the R-Event value using indoor CO 2 levels in an office room. Along with CO 2 , indoor as well as outdoor environmental parameters, occupancy, and occupant information is also required for attaining prediction results.
In this work, the curve fitting and supervised machine learning-based ANN model have been used to forecast the R-Event value as a proxy to the probability of SC-2 spreading. The objectives and novelty of the proposed study is explained in Section 2. The details of the input parameters used to construct these models is in Section 3. Section 4 presents the description of curve fitting and ANN models. The findings of the curve fitting model and the results of the ANN models are described in Section-5. The last section of this study contains the conclusion, limitations, and future scope of the work.

II. NOVELTY AND OBJECTIVES
If an infected occupant is present inside a building with susceptible occupants and all are exhaling and inhaling at the same time, lower ventilation rates and increased occupant density lead to a build-up of carbon dioxide (CO 2 ) with freefloating SC-2 virus responsible for the airborne spread of COVID-19. Various simulation studies and numerical models have been developed to predict infection probability using subjective and objective data. However, just a few researchers have attempted to establish a link between occupant CO 2 emissions and the transmission of SC-2. No real-time study has yet published a model based on advanced computing techniques such as artificial intelligence (AI).
There is very limited literature on indoor CO 2 concentration relation with viral transmissions [48], [51], [52], [54], [56], [57]. However, the indoor CO 2 concentration can be used to forecast the number of infected people as scientifically CO 2 can be used as a proxy for SC-2 concentration indoors. CO 2 concentration is not yet linked with the amount of infectious particles present in the air in real-time dynamic conditions like an uninterrupted office environment or classroom environment. The objectives of this research study are threefold; (i) Collecting real-time spatial and temporal data of both environmental and personal parameters to establish a relationship among parameters using machine learning (ML) (ii) Developing a novel link between CO 2 with infection probability-based R-Event value (iii) Comparing novel ANN and curve-fitting model for R-Event prediction.
The other advantages of this study were to develop the CO 2 concentration database based on high-resolution realtime office-environment data and to investigate individuals' exposure to the human exhaled CO 2 by mapping occupant's activities inside the office and linking it with R-Event.

III. MATERIALS AND METHODS
CO 2 was measured continuously during office time using EXTECH ''Indoor Air Quality Meter/Datalogger'' Model EA80. The EA80 checks for CO 2 concentrations using a maintenance-free CO 2 sensor with a range of 0 to 6,000 ppm. The device was kept distant from the static occupants as directly exhaled CO 2 over the device will degrade the accuracy of the measurement. Temperature and humidity were measured by EXTECH ''Heat Stress WBGT Meter'' Model HT200. The HT200 enables accurate measurements for Air Temperature (TA), and Relative Humidity (RH). Heat Stress Index measures how hot it feels when humidity is combined with radiant heat, air movement, as well as temperature. Both the instruments were set at a height of 0.8m above the floor level. The technical specification of both the devices is presented in Table 3 and Table 4. Figure 3. (a) and (b) respectively shows the CO 2 measurement device EXTECH Model EA80 and heat stress meter EXTECH Model HT200 for temperature and humidity measurement. The instruments were produced by FLIR Commercial Systems Inc. under the EXTECH Instruments brand. As per the stringent guidelines of FLIR Commercial System the instruments hold ISO-9001 certification for Quality Management System; this is the prime reason for selecting these models for this study. Both the instruments were calibrated and inspected before use. Some better models are also available for the same purpose with higher accuracy but due to economic constraints, these models were used during this study. The calculation process was carried out in a MATLAB R2021a environment on a laptop computer (Dell Inspiron 15R, Intel(R) Core (TM) i5-4200U CPU @160GHz 2.30GHz, 6GB RAM).
A total of 638 data sets out of 950 data sets were used to build the prediction model. In literature, it is found that the amount of data required to develop any ML based model is the number of input parameters multiplied by ten. This study includes eleven input variables such as indoor temperature (T In ), indoor relative humidity (RH In ), area of opening (A O ), number of occupants (O), area per person (A P ), volume per person (V P ), CO 2 concentration (CO 2 ), air quality index (AQI ), outer wind speed (W S ), outdoor temperature (T Out ), outdoor humidity (RH Out ), and one output variable that is Event-R (R-Event). The selection of the dataset was based on the chances of having at least two subjects in the office room. Single occupancy data sets were not considered for this study so 312 data sets were excluded from the whole datasets. The CO 2 concentration inside the office is mainly affected by the occupancy inside the office room as workers themselves are the emitting source. The output R-Event is the expected number of new infections that arise in any event occurring over a total time 'T' in any space. The mathematical representation of R-Event is presented in equation 1 [63]. However, for this study, the R-Event is numerically worked out by using REHVA Version 2.1 calculator for SC-2 infection probability prediction. The mathematical representation of the REHVA calculator-based R-Event is presented in equation 2 [64]. The conditions inside the NV office building also depend upon the external environment around the building. Office block near dense areas, marketplaces, and busy roads are generally seen affected by external conditions. Hence, outdoor environmental parameters are also considered in this study.
where, 'T' is the total event duration, 't' is the contact time between a single susceptible person and an infected person, 'β' is constant per unit time probability of transmission, 1 − e −βt is the probability for any susceptible person being infected, and 'k' is the total number of susceptible persons contacting the infectious person. Simply, it can be written   (2) where, IP is infection probability. Figure 4 depicts the methodology adopted for this study. An artificial neural network (ANN) based novel approach for predicting SC-2 transmission linked with exhaled CO 2 as a proxy indicator in an office environment is commenced by database preparation. The database preparation includes three main steps (i) collection of data, (ii) normalization of data, and (iii) splitting of data for training, testing, and validation.

A. COLLECTION OF DATA
The data used in this work was generated using a CO 2 analyzer and heat stress meter in the office room of Central Building Research Institute, Roorkee; a national laboratory of CSIR, India. The measurements were taken during the month of February and March in the year 2022. The coordinates of the tested location are NL 29 • 51'54'' and EL 77 • 54'10''. The floor area of the office room was 24 m 2 and the volume of the room was 84 m 3 . Approximately, 40% of the floor area was covered by the furniture (storage cabinet, table, chair, etc.) and approximately 20% of the volume is taken by the static furniture primarily covering more volume near the floor at the height of one meter. The office is on the ground floor and contains one big glass and aluminum frame window at the height of 1.2m on the south wall having a height and width of 1.5m and 2.5m, respectively. The window is divided into three parts; the middle portion is fixed while parts at both the ends of the window were operable and considered as a variable components of the room. The office room contains two wooden doors front door and a side door. The first door (front door) was on the north wall having the dimensions of 1.2m × 2.4m and it opened in the corridor. The front door was having attached ventilation above it having a dimension of 1.2m × 0.5m. During the study period, the ventilation is closed all the time and considered as a fixed component of the room while the front door was a variable component in this study. The second door was the side door, situated on the east side wall of the office room. The dimension of the side door was 0.9m × 2.0m and it is considered as a variable component of the room. The type of front door was a double door and the side door was a single door. The details of room components are presented in Table 5.
Considering the Hawthorne effect [65], both static and dynamic occupants (subjects) were not told about the purpose of the study as it would impact their normal behavior (talking/working/activity level/free movement, etc.). Occupants would change their behaviour after knowing that they were being monitored by someone; their breathing patterns, movements, and other activities also got changed if they know the purpose of the study. However, only one occupant (the first author) knows the purpose of this study as he is observing the details precisely for this study. The details of the subjects are tabulated in Table 6. All the static subjects were sitting in the longitudinal direction ('L' shape) at a distance of approximately one meter as shown in Figure 5.
The 3-dimensional representation of the 12 various cases during the monitoring in the office room with variable scenarios of two doors and two operable windows along with furniture arrangement is shown in Figure 6. The green numbered operational cases (1, 2, 4, 5, and 10) were observed during this study. The data were collected forty-four times a day at every 10-minute interval from Monday to Friday. The data collection schedule is shown in Figure 7. Figure 8 represents the plot matrix of the dataset used in this study. The 'R' and 'P' are the two important factors of the scatter plot diagram. The value of correlation coefficient 'R' is maximum in between input parameters area per person (A P ) and volume per person (V P ) having the value of 1 (dark red circle at the center of Figure 8). Whereas, the value of the correlation coefficient is minimum at two points; between area per person (A P ) and the number of occupants (O), as well   as volume per person (V P ), and the number of occupants (O). The minimum value of R is −1 at the above-mentioned points and is shown in the dark blue circles at the center of Figure 8. The maximum value of P is 0.93 and the minimum value of P is 0.
The collected eleven input parameters are indoor temperature (T In ), indoor humidity (RH In ), area of opening (A O ), number of occupants (O), area per person (A P ), volume per person (V P ), CO 2 concentration (CO 2 ), air quality index (AQI ), outer wind speed (W S ), outdoor temperature (T Out ), outdoor humidity (RH Out ). As per the input data, the range of CO 2 concentration levels inside the office room is 327-964 ppm and the number of occupants during this study is 2-5 persons. The output parameter R-Event ranges between 0.04-0.24. The various statistical parameters of the dataset taken are tabulated in Table 7.

B. STANDARDIZATION OF SELECTED DATA
The data was normalized once the database had been finalized for this study after the data collection. Normalization is the most important step to define the data into certain ranges such as 0 to +0.8, 0 to +0.9, 0 to 1, or maybe −1 to +1 [66]. The values of the selected input parameters may fluctuate. Accordingly, data standardization was employed to make the computations easier. In this research work, the data was normalized between the range of 0 to +0.8 by using equation 3.
where, N normalized is the value to be normalized, N is the random value in the selected database, N min is the minimum value of the selected dataset and N max is the maximum value of the selected dataset.

C. FILTRATION OF DATA
Studies done by Kumar and Saini [67] considered 70% of data for training, 15% for validation, and 15% data for testing to implement the multi-layer feed-forward backpropagation learning algorithm. A total of 4808 randomly selected datasets were chosen by the computer itself for training, the rest of the 2060 datasets were further split into two equal parts of 1030 datasets in each group; used for testing and validation. Similarly, in another study, Kumar et al. [68] divided the data into three parts. 70% of the data is used for training, 15% for testing, and the remaining 15% is used for validation in ANN and ABC-ANN. After having a quick review of the existing literature on ANN, the data was split into three parts having 70%, 15%, and 15% of the total dataset for training, testing, and validation purpose respectively. After splitting the 638 datasets, 446 datasets were used for training purposes, 96 datasets were used for testing and 96 datasets were used for validation in this study.

IV. PREDICTION OF R-EVENT
In this research article two methods were employed to develop the correlation and predict the values of R-Event, a proxy for infection probability. The first method is based on the curve fitting method by simply applying the first-degree order equation to the input and output parameters. The second method is based on artificial intelligence known as ANN and is part of supervised machine learning. VOLUME 10, 2022

A. CURVE FITTING METHOD
In the curve fitting procedure, a constant value that depends on the remaining input parameters was generated after first defining the relationship between the first input parameter and the output parameter. The functional connections between the parameters were then established, and constants were obtained for each input parameter throughout the whole process, using the same integrated approach for the second input parameter and so on until the last input parameter. In detail, a first-order linear equation was used to determine the functional correlation between the coefficient of R-Event and the indoor temperature (T In ). A scatter diagram between R-Event as a function of T In is presented in Figure 9(a), and equation 4 shows the connection between R-Event and T In .
Let, N 1 = 0.2875, as the constant value and furthermore the functions of the rest of inputs i.e. RH In , A O , O, A P , V P , CO 2 , AQI , W S , T Out , and RH Out . Therefore, the updated equation 4 can be expressed as: For obtaining the correlation between the coefficient of T In and indoor relative humidity (RH In ), the relationship between N 1 and RH In is drawn, as shown in Figure 9(b). The obtained equation between N 1 and RH In is expressed as in equation 6.
Putting the value of N 1 in equation 5, the updated equation expressed as: (7) Let, N 2 = 0.2193, as the constant value and furthermore the functions of the rest of inputs i.e. A O , O, A P , V P , CO 2 , AQI , W S , T Out , and RH Out . Therefore, equation 7 can be expressed as: R-Event = −0.0586 T In + 0.1641RH In + N 2 (8) For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, and area of opening (A O ) on R-Event the relationship between N 2 and A O is drawn, as shown in Figure 9(c). The obtained equation between N 2 and A O is expressed as in equation 9.
Putting value of N 2 in equation 8, the equation 8 become: (10) Let, N 3 = 0.3590, as the constant value and furthermore the functions of the rest of inputs i.e. O, A P , V P , CO 2 , AQI , W S , T Out , and RH Out . Therefore, the equation 10 can be expressed as: For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, and number of occupants (O) on R-Event, the relationship between N 3 and O is drawn, as shown in Figure 9 (14) For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, number of occupants, and area per person (A P ) on R-Event, the relationship between N 4 and A P is drawn, as shown in Figure 9(e). The obtained equation between N 4 and A P is expressed as in equation 15.  (16) Let, N 5 = 0.0414, as the constant value and furthermore the functions of the rest of inputs i.e. V P , CO 2 , AQI , W S , T Out , and RH Out . Therefore, the equation 16 can be expressed as: (17) For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, number of occupants, area per person, and volume per person (V P ) on R-Event, the relationship between N 5 and V P is drawn, as shown in Figure 9(f). The obtained equation between N 5 and V P is expressed as in equation 18.
Putting value of N 5 in equation 17, the equation 17 become: Let, N 6 = 0.0414, as the constant value and furthermore the functions of the rest of inputs i.e. CO 2 , AQI , W S , T Out , and RH Out . Therefore, the equation 19 can be expressed as: For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, number of occupants, area per person, volume per person, and human exhaled carbon dioxide (CO 2 ) on R-Event, the relationship between N 6 and CO 2 is drawn, as shown in Figure 9(g). The obtained equation between N 6 and CO 2 is expressed as in equation 21. For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, number of occupants, area per person, volume per person, human exhaled carbon dioxide, and air quality index (AQI ) on R-Event, the relationship between N 7 and AQI is drawn, as shown in Figure 9(h). The obtained equation between N 7 For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, number of occupants, area per person, volume per person, human exhaled carbon dioxide, air quality index, and wind speed (W S ) on R-Event, the relationship between N 8 and R-Event is drawn, as shown in Figure 9 For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, number of occupants, area per person, volume per person, human exhaled carbon dioxide, air quality index, wind speed, and outdoor temperature (T Out ) on R-Event, the relationship between N 9 and R-Event is drawn, as shown in Figure 9(j). The obtained equation between N 9 and T Out is expressed as in equation 30.
By putting value of N 9 in equation 29, the equation 29 become: Let, N 10 = 0.0793, as the constant value and furthermore the function input parameter i.e. RH Out . Therefore, the equation 31 can be expressed as: For obtaining the correlation between the coefficient of indoor temperature, indoor relative humidity, area of opening, number of occupants, area per person, volume per person, human exhaled carbon dioxide, air quality index, wind speed, outdoor temperature, and outdoor relative humidity (RH Out ) on R-Event, the relationship between N 10 and RH Out is drawn, as shown in Figure 9 (35) Now, for all the parameters the coefficient of correlation (Cc) can be expressed in computational form as: Hence, the final equation can be written as: The first ANN, known as a perceptron, was created in 1958 by psychologist Frank Rosenblatt with the goal of modelling how the human brain processes visual input and learns to recognize things. Over the last four decades, artificial neural networks have made significant contributions to implementing many types of intelligent information processing, understanding the core roles of actual neurons and brain activity, and also in many engineering applications [69]. The ANN modelling technique is a computer-based methodology for simulating several key aspects of the human nervous system. The capacity to solve issues by applying knowledge gained from previous experiences to new challenges or case scenarios. Learning models like ANN build a network of core linkages between chosen outputs and available inputs that are linked together by connections known as neurons that are hidden behind layers. Every ANN neuron has a weight parameter and linkages to neurons from the preceding layer (hidden or input), with each connection having its own assigned weight. The number of neurons in the ANN's hidden layer has a significant impact on model learning performance. The network learning function will be unable to converge to an ideal value if the number of neurons in the hidden layer is too low, resulting in oscillatory behavior of the error function, and hence will be unable to learn the connection between the input-output designs. Overtraining is the primary cause of overfitting [70]. A trained ANN assembly can identify causal relationships between inputs and outputs by comparing measured outputs with predicted outputs to get the best possible result [71]. Figure 10 shows the usual structure of a neural network, as well as a single node overview. Advantage of using neural networks is that it may create models from complicated natural systems with massive inputs easier to use and more accurate [72]. As a result, an artificial structure may be built in the manner of natural networks, as well as the relationship between its components can be determined by altering the values of each connection. Applying a certain input after training the neural network results in a definite output. The most crucial aspect of training is to reduce the errors maximize the value correlation coefficient (R = 1). This is accomplished by adjusting the weights throughout the learning phase until the error function is attained. The 'R' and 'MSE' metrics are used to assess the network's performance in ANN. The equations of 'R' and 'MSE' are expressed in equations 38 and 39 respectively.
R represents the Pearson correlation coefficient, x i is the measured values (x-variable) in a sample,x is the mean value of the measured values, y i is the predicted values (y-variable) in a sample andȳ is the mean value of the predicted values.
Modifying the value of w, approximating y i , and figuring the related MSE is an iterative method. At the initial stage, the errors are too high as the weights are randomly selected. Goal of network learning is to discover weights that provide the least amount of error across all data sets. It would take a lot of time and effort to estimate weights through trial and error methods. Within the network training operation, the 'gradient-descent' strategy is an excellent way to quickly identify the least sets of errors. As the name indicating gradient descent employs the error gradient to slope down the error [73]. Backpropagation determines the error, which is proportional to the network's output and is defined by the weighted output of the hidden neurons as well as the weights. Werbos [74] proposed backpropagation as a mechanism for learning and afterward it was suggested by Rumelhart [75]. Backpropagation is a kind of gradient descent technique in which the network's weights move in the reverse direction of the performance function slope. This study proposed an ANN model to predict the R-Event for an office environment in the observed boundary conditions of the eleven input variables.

1) MODELING OF PROPOSED ANN MODEL
Procedure of representing a real-world object or phenomenon as a sequence of computational statements is 'Modeling'. It is essential to find the network's optimal design, which simultaneously delivers both high accuracy and a well-set. Figure 11 represents the ANN architecture for this study with eleven input parameters and R-Event as an output parameter. The number of ideal neurons and hidden layers was determined by trial and error because there is no formula for determining the exact number of neurons and hidden  layers on each layer. The optimum structure was found after testing various architectures with various numbers of neurons in each layer. The MSE and R of the network were calculated for every neuron in the hidden layer, whereas the network's other parameters remained unchanged. Figure 12 shows the network performance based on MSE-value. In this work, the ANN model was trained from 3 to 17 neurons. At the 3 rd neuron, the correlation coefficient of training was 0.9905 and at the 17 th neuron, the correlation coefficient of training reached 0.9999. At least 20 iterations were performed on each of the ANN models. Figure 13 illustrates the optimal design of the proposed ANN model. The hit and trial method was executed for finding the MSE and R of the data.
The rank for different neurons was decided on the basis of their respective MSE and R-value performance. The neurons with higher R values ranked higher but on the contrary, the neurons with lower MSE were ranked higher as presented in Table 8. For finding the overall best neurons the ranks of all neurons for R and MSE were totaled for training, testing, and validation. For the 8 th neuron trial, the total ranking is the least among all the trials. The network with a single hidden layer and 8 neurons in the hidden layer predicts the best performance among other neurons. The database characteristics were normalized linearly in between 0 to 0.8 to increase the speed of the learning operation and lead to faster convergence. The computations were done with the ANN toolbox in MATLAB (R2021a) environment. The available training algorithms in MATLAB are Bayesian Regularization, Scaled Conjugate Gradient, and Levenberg-Marquardt. Out of these algorithms, the Levenberg-Marquardt (LM) training algorithm was employed in this work to train the network because of its appropriate convergence, high accuracy, and low time consumption [76], [77], [78]. This approach usually necessitates more memory, but it takes less time. When generalisation stops improving, as shown by a rise in the MSE of the validation samples, training automatically terminates. This method divided the data into three portions at random: 15% for testing, 15% for validation, and 70% for training; the same procedure is used in this study. TANSIG (equation (40)) and PURELIN (equation (41)) were chosen as the activation functions in the hidden and output layer, respectively.
As mentioned above and presented in Table 8, the neurons were trained from 3 to 17. Figure 12 represents the MSE values at different neurons for training, testing, and validation. The training, testing, and validation MSE were recorded with 'R' value for every neuron trained. The minimum MSE for training, testing, and validation are at the 16 th , 10 th , and 8 th neuron respectively with having the values of 0.00000103, 0.00002730, and 0.00002546 sequentially. The maximum MSE for training, testing, and validation are at the 3 rd neuron with having the values of 0.00066024, 0.00105268, and 0.00062585 respectively.
In Figure 13(a) the errors lie in the range between -0.01918 to 0.03197 (normalized values). At the 92 epoch, the minimum value of MSE was achieved having the value of 0.00027214. In Figure 13(b) the green line shows the validation MSE, the blue coloured line shows the training MSE and the red coloured line shows the testing MSE. Figure 13(c) shows the learning process of the ANN.

2) PERFORMANCE INDICES
To evaluates the performance of curve fitting and ANN models, basic performance indices such as R, MAE, RMSE, MAPE, NS, a20-index are considered to find the accuracy of the model [79], [80]. The equations for the above-mentioned performance indices are presented in equation 42 to equation 47 below. R is the Pearson correlation coefficient, MAE is the mean absolute error, RMSE is the root mean square error, MAPE is the mean absolute percentage error, and NS is the Nash-Sutcliffe efficiency index. The accuracy of a model is defined by higher values of R and NS approaches to 1. The MAE, RMSE, and MAPE, have the lowest values, closer to zero defining the best accurate model. These performance functions were used to evaluate the model's ability to predict R-Event accurately. The equations of R and MSE has been already mentioned in subsection 4.2, with equation no. 38 and 39 respectively.
a20 − index = m20 N (47) where, N is the total number of values in the experimental dataset, x i is the measured value at i th level,x is the mean of the experimetal values y i is the predicted value at i th level, y is the mean value of the predicted results.

V. RESULTS AND DISCUSSION
Curve fitting and ANN models were used in this study to develop the correlation among input parameters and R-Event. MAE and RMSE quantify the inaccuracy in parameter estimation, whereas R and NS quantify the similarity and proximity between measured and predicted values of the same parameters. The R of the ANN model for training, testing, validation, and overall for the 8 th neuron is presented in Figure 14. The training, testing, and validation R-value is 0.99978, 0.99632, and 0.99965 respectively. The overall R at the 8 th neuron is 0.99922 as shown in Figure 14.
In this study, based on the various performance indices it is observed that ANN outperformed the curve fitting model. As shown in Figure 15, for the ANN model scatter plot between measured and predicted values all the dataset approximately lies in the range of 10% to -20% reference lines. However, for the curve-fitting model, the dataset exceeds the range of 30% to -30% reference lines. The comparison of different performance indices of both models is presented in Table 9. The ANN model's performance is better than the curvefitting model and all the error indices are lower VOLUME 10, 2022  in the ANN model compared to the curve-fitting model. The correlation coefficient of the ANN model is 4.55% higher than the curve-fitting model. The MAE, MAPE, MSE, and RMSE of the ANN model is respectively 93.35%, 91.82%, 98.16%, and 86.55% lower than the curve fitting model. The Nash-Sutcliffe efficiency index (NS) and a-20 index value of the ANN model is 9.36% and 13.75% higher than the curvefitting model. Figure 16 shows the results obtained by the curve fitting model. The maximum error in the dataset lies in the range between -0.05 to 0.08. The frequency-error histogram (Figure 16(b)) shows that maximum error lies in between −0.04 to 0.04 and the error frequency is distributed evenly with the frequency peak at zero.
In this study, the ANN model surpassed the curve fitting model with a correlation coefficient (R) of 0.9992. Figure. 17 shows the results obtained by the ANN model. The maximum error in the dataset lies in the range between −0.04 to 0.01. All the data sets are showing fewer errors in the ANN model compared to the curve-fitting model. The frequency-error histogram (Figure 17(b)) shows that maximum error lies in between −0.005 to 0.005 and the error frequency peak at zero. The comparison between Figure 16 and Figure 17 shows that ANN is more effective in predicting R-Event for officebuilt environments. The error trend between the ANN model and curve-fitting is visibly different as ANN error graphs show less deviation than the curve-fitting graph.

A. ANN FORMULATION
As stated earlier, the proposed ANN model, demonstrated high accuracy and interpretability. The explicit formula for R-Event may be directly obtained from the ANN model, which is a function of weights, biases, and associated activation functions. The weight and biases for input and output found in this study are presented in Table 10. The final equation to predict the R-Event is expressed as equation 48: (48) The generalized formulation for input to hidden layer Y i is represented in equation 49 and is calculated using equation 50, as shown at the top of page 22. (49) where, N i,normalized are the normalized inputs, W IH is the weight of the matrix in between the input layer and hidden layer, B IH is the biases between the input layer and hidden layer, W HO and B HO are the weights and bias of the hidden layer to the output layer. The developed model can predict the R-Event for normalized projected data of inputs for the defined ranges. The proposed model includes eleven parameters to predict the indoor R-Event values for office buildings. After putting the values from Table 10

VI. LIMITATIONS OF THE STUDY
This study has certain limitations. The scope of this study is restricted to office building indoor environments.  Single source of infection is assumed with only airborne transmission route consideration in an office environment operating under natural ventilation mode in composite climatic condition. This research is limited by a number of technological, economic, social, and temporal restrictions. The procedure, however, stays the same and may be used in a variety of structures in a variety of climatic areas with varying environmental circumstances. Mechanical devices such as fans, air conditioning systems, exhaust systems, and so on were not used in this study and so are not taken into account. They can, however, have a significant impact on the spread of airborne infections, which restricts the scope of the study. Only diurnal fluctuations in the environment are taken into account since office time limits the observation of nocturnal variations. Healthy people between the ages of 21 and 30 were selected, with a normal breathing rate of 12-20 breaths per minute and no respiratory disorders. The data gathered is case-based and is dependent on a variety of other environmental parameters such as pressure, wind direction, and so on, which are not taken into account in this study. The behaviour of several old and new SARS-CoV-2 variants is extremely unpredictable. As a result, the human-virus interaction is outside the scope of this investigation. Human behaviour varies greatly depending on a variety of conditions, which is a restriction itself because each individual has a unique behavioural pattern based on their brain-prints and experiences. Most of these limitations will be addressed in the future research by the authors.

VII. CONCLUSION AND FUTURE WORK
In this study, predictive models such as ANN and curve fitting were developed and tested to predict the R-Event inside an office room. Real-time data for the office environment was gathered in the spring of 2022 in a naturally ventilated (NV) office room situated in composite climate. The main conclusions are as follows: • Eleven parameters such as indoor temperature (T In ), indoor relative humidity (RH In ), area of opening (A O ), number of occupants (O), area per person (A P ), volume per person (V P ), CO 2 concentration (CO 2 ), air quality index (AQI ), outer wind speed (W S ), outdoor temperature (T Out ), outdoor humidity (RH Out ) were used as input for the models. R-Event is the output that is used as the proxy to SARS-CoV-2 infection spread probability.
• The data were analyzed first and then curve fitting was applied, afterwards, the analyzed data were trained, tested, and validated with ANN. • The ANN model can be used to predict R-Event accurately when the data of necessary input parameters are available. This will help in saving time and effort.
• This study presents a novel computational model based on real-time collected data to predict the R-Event value as a proxy indicator for SARS-COV-2 spreading in indoor office spaces.
• There are certain limitations of this work as the model can only predict the results according to the dataset limits. Additionally, sneezing, coughing like other respiratory activities were not taken into account in this study.
• Outer movement (near office doors/windows) of people is also an important parameter, however, due to highly dynamic conditions, this factor is not considered in this study.
• This model is created for naturally ventilated indoor settings (office rooms) where no mechanical equipment is used for ventilation; only door and window operations were used for ventilation purposes.
• The models developed in this study are useful to predict the R-Event as a proxy to SARS-CoV-2 infection probability using CO 2 level in similar other office environments and built spaces.
Some other general observations were also made during the study; these are as follows: • Human activities affect the carbon dioxide concentration directly and a higher activity level increases the chances of infection spread.
• The area and volume of the office are very important for a safe working environment. However, occupancy is the prime factor that is strongly correlated with R-Event. Hence, for safe working environment occupancy must be reduced to the safe level.
• CO 2 levels are the second most correlated parameter to R-Event after occupancy, showing that it is very important to VOLUME 10, 2022 ventilate indoor spaces for maintaining low CO 2 levels for preventing SARS-CoV-2 spread.
• People moving outside also affect the CO 2 levels but are negligible until they stand out in front of the door for long talks. This is a dangerous condition as this suddenly rises the CO 2 levels as incoming air is having high CO 2 levels itself.
• Maintaining all doors and windows is recommended for better operational control. Proper maintenance improves the efficiency of full-scale door and window functioning. Additionally, nets are also recommended to use for preventing the entry of insects inside the office.
• Frequent talking inside the office environment is not promoted and that to without a mask, it enhances the chances of infection spreading.
• It is recommended to install CO 2 sensors in offices to monitor the CO 2 levels.
• Cross-ventilation strategies are comparatively better than other single side ventilation scenarios to mitigate the viral transmission inside office spaces and other indoor spaces.
• Training and enforcement of guidelines significantly reduces the risk of viral transmission.

FUTURE WORK
The authors will extend this work to establish the relationship between CO 2 and viral spreading probability in different seasons and in different types of ventilation scenarios. The authors are currently gathering data for mixed-mode ventilated and air-conditioned office rooms.

CONFLICTS OF INTEREST
The authors declare that there are no conflicts of interest regarding the publication of this paper. NISHANT  He has written research papers, review articles, patents, conference articles, and book chapters which have been published in peer-reviewed scientific international journals and books. His research interests include real time built environment problems, COVID-19, indoor human comfort, indoor environmental quality, comfort perceptions, building energy efficiency, artificial intelligence, and environmental engineering. He has carried out a great deal of research in the abovementioned areas. He was also a part of the team that provided ventilation related guidelines for preventing the spread of SARS-CoV-2 in Indian office and residential buildings. On the occasion of the Foundation Day of the CSIR-CBRI (a national laboratory committed to promoting research and scientific knowledge in the building industry and related disciplines), he also earned the prestigious Diamond Jubilee Best Technology Award for the technology ''HVAC Ducting System for Integration of COVID-19 Disinfection Solutions,'' in 2021.
ASHOK KUMAR is currently an Outstanding Scientist (Scientist 'H'), a Senior Professor at AcSIR, Inventor, and an Architect-Planner. He has experience of more than 33 years. He has executed international and national projects related to IEQ and he is a Senior Member of the CSIR Task Group ''WAYU'' created to find indigenous solutions for ventilation problems and IAQ in COVID-19 like situations. He also invented several market-ready designs and technologies and their execution/applications to deal with IEQ problems of buildings. He has authored/coauthored over 110 research publications in international and national journals, conferences, reports, and book chapters. He is currently guiding five Ph.D. scholars and handled more than 85 research and development projects. The total costing of last five main projects are more than 30 crores. He has visited Germany, Japan, and Australia in recent years. He is a Principal/Alternative Member as an expert in more than a dozen committees of BIS, National Building Code 2016, Energy Conservation Building Code (ECBC 2017 and 2018), and ISO/WD 19467:2018. He was a Senior Member of the team that provided guidelines for reducing the spread of SARS-CoV-2 in Indian office and residential buildings. He has several awards to his credit from the various prestigious societies and organizations. He also received many ''Best Paper Awards'' for his brilliant publications. Fellow and an Assistant Professor at CSIR Faculty. He is currently working as a Ramanujam Fellow, an Assistant Professor at AcSIR Inventor, and an Electrical and Instrumentation Engineer, working on IAQ for more than 15 years and invented devices as well as applications to tackle indoor air problems along with monitoring IAQ. He has authored/coauthored over 80 research publications in different international journals, conferences, and book chapters. His research interests include sensing applications, wireless sensor-actuator networks, and the Internet of Things.
ANIL KUMAR (Senior Member, IEEE) received the M.Tech. degree from the Delhi College of Engineering, New Delhi, and the Ph.D. degree from Manipal University. He is currently working as a Professor of computer science and engineering, an Accreditation Coordinator, and the Head at the Data Science Research Group, DIT University, India. He has more than 25 years of teaching and industrial experience. He has served various reputed origination, such as Manipal University, Bharti Vidyapeeth, Mody University Science and Technology, and DRDO. He has guided over ten research scholars. He has published more than 200 research articles and patents. His research interests include image processing algorithm, cryptography, artificial intelligence, signals and systems, neural networks, genetic algorithm, and machine learning. He has done various Indian government projects as a Principal Investigator. He is a Senior Member of ACM and CSI; also worked as an Executive Committee Member of the IEEE Computer Society India Council, in 2015 and 2016, and the IEEE Rajasthan Sub-Section, in 2018. He has also been a Consultant to various industries. He is a Reviewer of many international journals of IEEE, Elsevier, Springer, and ACM.
KRISHNA KUMAR (Graduate Student Member, IEEE) received the B.E. degree in electronics and communication engineering from the Govind Ballabh Pant Engineering College, Pauri Garhwal, and the M.Tech. degree in digital systems from the Motilal Nehru NIT Allahabad. He is currently pursuing the Ph.D. degree with the Indian Institute of Technology Roorkee. He worked as an Assistant Professor at BTKIT, Dwarahat. He is currently working as a Research and Development Engineer at UJVN Ltd. He has more than 11 years of experience and has published numerous research papers in international journals, such as IEEE, Elsevier, Taylor and Francis, Springer, and Wiley. His research interests include renewable energy and artificial intelligence. VOLUME 10, 2022