A Phone-based Distributed Ambient Temperature Measurement System with An Efficient Label-free Automated Training Strategy

Enhancing the energy efficiency of buildings significantly relies on monitoring indoor ambient temperature. The potential limitations of conventional temperature measurement techniques, together with the omnipresence of smartphones, have redirected researchers'attention towards the exploration of phone-based ambient temperature estimation methods. However, existing phone-based methods face challenges such as insufficient privacy protection, difficulty in adapting models to various phones, and hurdles in obtaining enough labeled training data. In this study, we propose a distributed phone-based ambient temperature estimation system which enables collaboration among multiple phones to accurately measure the ambient temperature in different areas of an indoor space. This system also provides an efficient, cost-effective approach with a few-shot meta-learning module and an automated label generation module. It shows that with just 5 new training data points, the temperature estimation model can adapt to a new phone and reach a good performance. Moreover, the system uses crowdsourcing to generate accurate labels for all newly collected training data, significantly reducing costs. Additionally, we highlight the potential of incorporating federated learning into our system to enhance privacy protection. We believe this study can advance the practical application of phone-based ambient temperature measurement, facilitating energy-saving efforts in buildings.

Abstract-Enhancing the energy efficiency of buildings significantly relies on monitoring indoor ambient temperature.The potential limitations of conventional temperature measurement techniques, together with the omnipresence of smartphones, have redirected researchers' attention towards the exploration of phone-based ambient temperature estimation methods.However, existing phone-based methods face challenges such as insufficient privacy protection, difficulty in adapting models to various phones, and hurdles in obtaining enough labeled training data.In this study, we propose a distributed phone-based ambient temperature estimation system which enables collaboration among multiple phones to accurately measure the ambient temperature in different areas of an indoor space.This system also provides an efficient, cost-effective approach with a few-shot meta-learning module and an automated label generation module.It shows that with just 5 new training data points, the temperature estimation model can adapt to a new phone and reach a good performance.Moreover, the system uses crowdsourcing to generate accurate labels for all newly collected training data, significantly reducing costs.Additionally, we highlight the potential of incorporating federated learning into our system to enhance privacy protection.We believe this study can advance the practical application of phone-based ambient temperature measurement, facilitating energy-saving efforts in buildings.

I. INTRODUCTION
M ONITORING indoor temperature is significant for en- ergy saving in building systems [1].The indoor thermal environment significantly influences occupants' comfort, wellbeing, and productivity, while also playing a crucial role in the energy consumption of residential, commercial, and industrial sectors [2].By closely monitoring indoor temperature, we can identify potential inefficiencies, and implement targeted measures to optimize energy usage.Numerous studies [3]- [5] have highlighted the importance of monitoring and controlling indoor temperature in achieving energy efficiency.
Actually, in a large indoor space such as shopping malls, public offices and stadiums, the temperature is not always the same in different small areas.We think that accurately monitoring the temperature of different areas serves as the foundation for developing advanced technologies aimed at enhancing human comfort and energy conservation.For example, the traditional centralized cooling system can be transformed into a fine-grained distributed cooling system.By considering the number of occupants and the temperature estimation results, the cooling intensity in specific areas can be adjusted to optimize energy usage and conserve energy.Besides, personalized micro thermal comfort zones can be created leveraging fine-grained temperature monitoring.Instead of maintaining thermal comfort throughout the entire room, we can construct some small thermal comfort zones for each personal by applying mini-watt mobile cooling devices or implementing the distributed cooling on furniture such as desks or chairs.This approach ensures individualized comfort while also promoting energy efficiency.
The conventional and most commonly employed method for measuring indoor temperature typically is using thermometers.However, it suffers from several drawbacks including fixed location, limited measuring range, relatively high cost and configuration error [6].With the consideration of these, in this study, we propose a distributed phone-based ambient temperature estimation system to help effectively monitor the indoor ambient temperature.

A. Estimating Ambient Temperature with Smartphones
Using smartphones to estimate ambient temperature has gained attraction in recent years, owing to their widespread availability [7]- [11].Though temperature sensors were previously integrated into smartphones, studies have indicated that these estimations are influenced by various factors which bring significant measurement bias [7].These years, researchers have shifted their focus towards estimating ambient temperature based on phone battery temperature and other phone state data.Given the aforementioned contexts, we present a phone-based ambient temperature estimation model with a machine learning structure.This model utilizes phone battery data and various phone state features as input and is capable of estimating the current ambient temperature, while also generating the associated uncertainty of this estimation.

B. Crowdsourcing and Truth Inference
In many prior studies, the integration of crowdsourcing techniques has been employed to enhance the accuracy of temperature measurement [8]- [11].Crowdsourcing is a data annotation method that involves assigning the same task to multiple participants, allowing them to provide answers.Because of the differences in cognitive level and the existence of malicious sabotage behavior, the truth inference technology is introduced to aggregate all responses and derive a final accurate result.The typical truth inference algorithms includes Mean [12], Participant-mine Voting (PM) [13], Confidenceaware Truth Discovery (CATD) [14], ZenCrowd (ZC) [15], among others.These algorithms take into account different factors such as task difficulty, task domain, and participant reliability to infer the true value from multiple answers.
In recent years, there has been a notable integration of uncertainty in reported answers within truth inference algorithms, as evidenced by the works [16]- [20].However, despite the advancements made, these approaches still possess certain limitations.Firstly, these methods rely on self-reported uncertainty values of each answer as input.Nevertheless, in most crowdsourcing scenarios, organizers do not require participants to provide the uncertainty of their answers.This is mainly due to the abstract nature of uncertainty and the difficulty in expressing it.Furthermore, self-reported uncertainty measures are highly subjective and their reliability cannot be guaranteed.Additionally, the performance of these algorithms is particularly sensitive to the initial parameter values chosen.
This study introduces a combined approach consisting of an automated uncertainty generation module and a truth inference module to aggregate multiple temperature estimation answers from diverse phones, while considering the uncertainty associated with each answer.This truth inference method, named Confidence-based Tree-structure (CBTS) model, effectively addresses the aforementioned limitations.It is initially introduced in our previous work [6].However, in this study, we present a more comprehensive analysis and explore additional applications of this model.In the CBTS model, the uncertainty value used is automatically and passively generated by the temperature estimation model based on relevant data from the phone's state.Besides, It does not depend on initial parameters setting.We also find the CBTS model exhibits higher robustness towards fluctuations in the number of devices involved.Even with a limited number of participants, it consistently demonstrates high performance levels.
In addition to using the CBTS model to aggregate multiple estimation answers, considering the high cost associated with acquiring labeled data for each new phone to train an estimation model, this study also introduces a data annotation approach that leverages crowdsourcing technology and the CBTS model for automatic label construction.The proposed approach significantly reduces costs while facilitating the extension of the temperature estimation model to a broader range of new mobile phones.

C. Meta-learning and Few-shot Learning
A notable challenge of phone-based ambient temperature measurement is the variability of equipment specifications among different smartphones.A temperature estimation model that performs well on one type of phone cannot be directly applied to a new type of phone.To address this issue, a viable strategy is to train a general source temperature estimation model that can be fine-tuned with some new training data for adapting on a new phone.This strategy is referred to as metalearning.On the other hand, the short duration that a new phone remains within a specific area renders it impractical to obtain a substantial amount of training data.Therefore, alternative training strategies need to be utilized to develop a model that requires only a small quantity of new data and can be deployed rapidly.In this study, the few-shot learning is integrated into our system.In summary, we propose an new approach to address these challenges through the utilization of a meta-learning-based few-shot learning strategy.
Meta-learning, also known as "learning to learn," involves training a model on a variety of tasks to enable it to adapt efficiently to similar tasks [21].Unlike traditional machine learning methods that train with data samples, meta-learning learns from multiple predefined tasks.It has demonstrated substantial effectiveness in diverse domains [22]- [24].On the other hand, few-shot learning (FSL) is a specific machine learning paradigm tailored to situations where only a limited amount of labeled data is available for training [25].Many recent advancements in FSL have embraced a meta-learning approach, such as the RNN memory-based FSL [26]- [28] and the metric-based FSL [29].In our system, we adopt the Model-Agnostic Meta-Learning (MAML) framework [23], which is a widely used meta-learning-based few-shot learning framework, to train the temperature estimation model for each smartphone.The core concept of MAML is to obtain optimal initialization parameters, denoted as θ.It can facilitate rapid task optimization from θ through one or more gradient descent steps using a small volume of available data [25].

D. Federated Learning
Privacy security is also a challenge.The collaborative training process inevitably requires sharing data, which results in privacy concerns and general reluctance among individuals to participate in ambient temperature measurement tasks.To address this challenge, we propose the incorporation of Federated Learning (FL) into our system as a means to safeguard the privacy of smartphone users.
FL, initially introduced by Google in 2016 [30], has emerged as a promising approach for developing machine learning models that leverage data from multiple parties while ensuring data privacy [31].This study specifically focuses on horizontal federated learning, which is a subtype of FL suitable for scenarios where diverse devices share a common set of features but possess distinct data samples [31].The primary emphasis of horizontal federated learning lies in security, ensuring the protection of data exchange among clients to mitigate the risk of privacy leakage [32].To this end, various strategies, including homomorphic encryption [33], differential privacy [34], and secure aggregation [35], are employed to safeguard data information.
The primary aim of this study is to design a distributed system for estimating ambient temperature.Consequently, the focus on efficiency and accuracy of federated learning is relatively limited.In this study, we just demonstrate a straightforward scenario of combining the federated learning into our system to avoid privacy disclosure.The cryptographic technique we use is homomorphic encryption, which allows computation to be performed directly on encrypted data, yielding an encrypted result that, when decrypted, matches the result of the computation performed on the plaintext data [36].

E. Contributions
Despite extensive research on phone-based ambient temperature estimation with crowdsourcing, several drawbacks still exist in the existing works [8]- [11].Specifically, the works conducted by [9] and [10] focus on city-scale estimation with daily resolution.They do not include any phone pattern information except the battery temperature.Moreover, their estimation is based on the physical model, which has a relatively poor performance.The work presented by [11] addresses the issue of granularity in both time and space.However, their use of the physical model also results in a high estimation error.Furthermore, their experiments were limited to only three phones with sufficient data, without discussing the generalization to new phones or thoroughly researching the crowdsourcing aspect.Although a relatively comprehensive system was proposed in [8], it also exhibits certain shortcomings.To begin with, the ambient temperature estimation in their system relies on features such as CPU utilization and network information, which are no longer available on the latest phones.Besides, their crowdsourcing method considers the uncertainty of each phone instead of each estimation result.However, since a single phone may provide both accurate and inaccurate estimations, assigning uncertainty to individual estimations would be a more reasonable approach.Additionally, the practical value of the system could be further enhanced by addressing aspects such as data collection for new phones and privacy protection, which were not adequately explored.
In this study, we present a distributed and secure phonebased system specifically designed for indoor ambient temperature measurement.Our system comprises four essential modules: the ambient temperature measurement module, which enables each phone to estimate air temperature using a machine learning model; the crowdsourcing module, which collects multiple estimation answers for the same area from diverse phones to generate a more accurate result; the label generation model, which automatically generates labels for newly collected data through crowdsourcing; and the fewshot learning model, which facilitates rapid acquisition of a functional estimation model by each newly joined phone.Additionally, we also highlight the potential of incorporating federated learning to ensure robust user privacy protection.As this study builds upon our previous work [6], we have condensed the descriptions of the first and second modules in this paper to reduce the overall length of exposition.
Through the implementation of our system, we effectively address several challenges commonly associated with existing phone-based ambient temperature measurement technologies.In summary, the key contributions of our work can be summarized as follows: 1) We present a distributed cooperative system for estimating ambient temperature using mobile phones, which is highly practical and efficient.2) We introduce a crowdsourcing-based approach for automatic data annotation in order to assist newly joined phones in building their data sets, which yields significant cost reductions.3) We propose a few-shot learning strategy which facilitates rapid training of temperature estimation models and requires only a few pieces of data.It expedites participation of new phones in temperature estimation tasks.4) We demonstrate the viability of incorporating federated learning into the model training process, ensuring the protection of mobile phone users' privacy.

A. System Overview
We introduce our distributed phone-based cooperative ambient temperature estimation system through Fig. 1.In our system, we classify smartphone users into two categories: • Contributors: These users possess a precise temperature estimation model and can participate in training a source model in the server.
• Participants: These users are those who just joined the system.They lack both trained estimation models and labeled data.They do not contribute to the training of the source model.However, they can download the source model from the server and collect data to fine-tune a temperature estimation model for their own device.In our system, we assume the presence of contributors in every crowdsourcing group.All the contributors are from two sources: Some of the contributors are those who have manually collected a substantial amount of labeled data, enabling them to train an adequate model.Others are those phone users who are originally defined as the participant but have been in the system for a considerable duration and have refined their models to achieve high accuracy.This definition ensures a robust pool of contributors, as every participant has the potential to transition into a contributor after a certain period of time.From Fig. 1, we can see that each contributor can provide an estimation of the ambient temperature.The blue numerical value represents the estimated temperature value, while the green value represents the uncertainty associated with this estimation.By utilizing the CBTS truth inference model to aggregate all the estimation answers within a crowdsourcing group, a more accurate answer can be obtained.The methodology of this part will be described in sections II-B and II-C.Additionally, all the contributors participate in the training of a source model using the MAML framework.The trained source model is stored on a central server.We will explain this part in section II-E.Every time a new participant joins a crowdsourcing group, it downloads the source model from the server and commence data collection together with other group members.The system incorporates an automated label generation module which can help assign labels to all the new collected data.The participant then uses these labeled data to establish a training data set and subsequently perform fine-tuning on the source model.Further elaboration on these procedures will be presented in section II-D.Finally, in section II-F, the application of federated learning in this system will be demonstrated as a means to safeguard user privacy.

B. Ambient Temperature Estimation Model
Our temperature estimation model incorporates a range of phone state features, including screen state, battery temperature, and others.All the features utilized in our model are listed in Table I.The process of feature construction has been previously described in a separate article and is reiterated in the appendix of this paper.In comparison to related studies, these selected features possess a lower security level and can be acquired with relative ease.The key innovation of this temperature estimation model lies in its ability to estimate a distribution rather than a Before active, the continuous screen off time 7 Before off, the continuous screen active time 8 Final Battery temperature when screen was activated 9 Final Battery temperature when screen was turn off specific numerical value for the ambient temperature.Fig. 2 provides a detailed structure of our model.To begin with, we encode the 9 phone state features into a 32-dimensional vector, which is then decoded back to 16 dimensions.This encoding and decoding process, referred to as Embed for simplicity, comprises of a fully connected layer followed by a ReLU activation layer.At the final stage, the latent vector is decoded by two distinct Embed operation modules in parallel.Both of these modules decode the latent vector into a numerical value.The first value represents the expected value of the predicted distribution and serves as a specific temperature estimation result.The second value represents the variance of the predicted distribution, thereby quantifying the uncertainty associated with the estimation result.We define the following loss function: Here, T is the observed ambient temperature (ground truth), µ is the expectation value and σ is variance.The supervision process entails maximizing the probability of the target value within the predicted Gaussian distribution, utilizing the output

C. CBTS Truth Inference Model
Truth inference models have been developed to leverage estimation results from multiple sources and generate a singular, accurate answer during crowdsourcing.In our system, we employ the CBTS model to do the truth inference, which is firstly introduced in our previous work [6].The structure of the CBTS model is illustrated in Fig. 3.For ease of illustration, we define an answer as a pair consisting of an estimated result (µ) and its associated uncertainty value (σ), both derived from a single phone.The core operation in this model is termed Aggregate (AGG), which combines two distinct answers to produce a new, singular answer.As depicted in Fig. 3, AGG takes as input answer i (comprising µ i and σ i ) and answer j (comprising µ j and σ j ).Following the application of two Embed operations, AGG outputs a new answer K (including µ k and σ k ).Within a group of multiple given answers, the CBTS model continuously aggregates the current answer with the newly generated one until all answers in the group are combined, culminating in the final inferred answer.
In our system, we employ the CBTS model to fulfill two main objectives.Firstly, it aggregates multiple given answers within a crowdsourcing group to derive a more reliable estimation for one small area.Secondly, it generates inferred labels for new participants to aid them in constructing a training data set.Subsequently, participants can utilize the training data set to fine-tune the source model and obtain their personalized estimation models.

D. Automatically Label Generation by Crowdsourcing
To address the challenges of limited training data and costly data annotations, we propose an automatic data annotation method that utilizes crowdsourcing technology to generate labeled data for each new participant.As depicted in Figure 3, when a new participant joins a crowdsourcing group, it engages in data collection alongside other phones (contributors) within the group, operating at the same data acquisition frequency.Given that these contributors possess a trained temperature estimation model and can provide estimation answers based on the collected data, we employ the CBTS model to aggregate the estimation answers from all contributors and derive a final precise answer.This final answer is then considered as the inferred label for the corresponding data collected by the new participant.Through this approach, the need for manual data labeling is obviated, resulting in significant cost reduction.Furthermore, subsequent experimental results demonstrate that substituting the true label with the inferred label does not compromise the effectiveness of model training.

E. Meta-learning-based Few-shot Learning
Given the diversity of hardware configurations and the transient nature of smartphones in fixed locations, we have incorporated the MAML framework into our system.This integration helps to effectively train a model utilizing a limited number of data samples and within constrained timelines.The forthcoming sections will provide a comprehensive explanation of the entire framework.1) Meta-Task: We utilize Fig. 5 to depict the dataset partitioning.As outlined in Section II-A, we manually separate all the phones in our data set into two distinct groups: "Contributors" and "Participants".The data set corresponding to the "Contributors" and "Participants" groups are denoted by D c and D p , respectively.Besides, the data of each contributor (denoted by C n ) in D c is divided into two subsets: the training data set, D ctn , and the validation data set, D cvn , with a ratio of 7:3.The combination of all contributor training data sets is denoted by D ct and the combined validation data set of all contributors is denoted by D cv .The same data partitioning strategy is also employed for the participants data set, D p .In this context, all the aforementioned notations simply involve substituting the letter 'c' with 'p'.
Different from traditional training methods, the MAML framework necessitates training the model using predefined tasks rather than individual data samples.Within our system, we conceptualize a task as the process of "Training a model for temperature estimation utilizing a support set, and subsequently evaluating its performance with a query set".This conceptualization is in alignment with the foundational principles of MAML [23].Here, the support set plays a pivotal role in task-level model training, whereas the query set is instrumental in evaluation and updating the original model parameters.We denote the sizes of the support and query sets within a single task as k spt and k qry , respectively.Importantly, all samples within a single task originate from the same phone.The details of the training methodology will be elaborated upon in the subsequent section II-E2.
The remaining portion of Fig. 5 illustrates the construction of task sets.The figure demonstrates that we generate the training task set T t using data from D ct , while the validation task set T v is derived from data within D cv .Here, the training task set T t is used to conduct the meta-training and the validation task set T v is used to conduct the meta-evaluation.For each individual task, a random selection of (k spt + k qry ) data samples is drawn from a single phone within the corresponding data set.Subsequently, k spt data samples are designated to compose the support set, while the remaining data samples are assembled to form the query set.Since we focus on few-shot learning, the value of k spt is very small.
2) Meta-Training: The training methodology is concisely described in Algorithm 1.Initially, we set the task-level learning rate α, meta-learning rate β, task count n, task-level update steps s 1 , and the training task set T t .Each training epoch consists of the following sequence of operations: To begin with, a training batch is constructed by randomly selecting n tasks from T t .Then, we make a copy of the original model parameters θ as θ ′ i and pick one task t i from the training batch.Subsequently, we take the loss value on support set of the task t i to update the copied parameters θ ′ i with the learning rate α for s 1 iterations.After that, we use the updated parameters θ ′ i to calculate the query loss value L qry (f θ ′ i ) on the query set.This process is iteratively conducted for each task in the batch, accumulating the query set loss values.Upon completion of the batch processing, the aggregate of these query loss values is used to perform an overarching update on the original model parameters θ with the meta-learning rate β.
During the whole process, we continually take n tasks from the training task set T t to construct the training batch and repeat the above steps until T t is empty.
for every task t i in T v do copy the parameters θ as θ i the query set to evaluate model performance end for 3) Meta-Validation: Algorithm 2 describes the metavalidation methodology, utilizing the task-level learning rate α, number of fine-tuning steps s 2 , and the validation task set T v as inputs.In contrast to the training process, the validation process is considerably simpler.Initially, we define s 2 as the number of fine-tuning steps.For each task within the validation task set T v , we employ its support set to calculate the loss L spt (f θ ′ i ) and subsequently update the model parameters θ ′ i accordingly.Following s 2 iterations of refinement, the task's query set is used to assess model performance.

F. Federated Learning
During the meta-training process, we observe that the gradient of each task is computed independently.Each task copies the original model parameters and performs calculations using its own data set and does not interfere with each other.Additionally, the model parameters are updated by aggregating the gradients from all tasks.These characteristics render the approach highly suitable for federated learning applications.
In this study, we present a straightforward illustration of the application of federated learning in our system.The homomorphic encryption is employed to safeguard the privacy of phone users.To begin with, the server initializes a source model.Subsequently, each contributor downloads the source model and conducts local training using their respective data, thereby avoiding the need to upload their data to the server.This decentralized approach ensures the non-disclosure of sensitive data.Following the training phase, all phones utilize the public key to encrypt their respective gradients, which are then securely transmitted to the server through reliable communication channels.Since the homomorphic encryption technique is used, the server can aggregate all gradients in encrypted state without decryption.Upon the aggregation of all individual gradients, the server employs the private key to decrypt the aggregated result, thereby retrieving the metalearning gradient.Finally, the server employs this gradient to update the source model parameters.Algorithm 3 depicts the process of combining meta-learning and federated learning.We can find that the introduction of federated learning minimally alters the original MAML training process.
In this scenario, all personal data is stored locally, eliminating the need for phone users to share their data with other users or the server, thereby preserving their privacy.This approach alleviates privacy concerns and enhances user willingness to participate.Considering that gradients may inadvertently disclose user information, we propose a rule wherein the server performs decryption only after aggregating all encrypted gradients.Alternatively, the task of encrypted gradient aggregation can be delegated to a trusted third party.To maintain the article's structure, we defer the discussion of other attack scenarios to our future research.The specific homomorphic encryption method employed in this study is CKKS [37].Due to the intricate mathematical nature of CKKS principles, we refrain from providing an in-depth explanation in this study.

A. Data Set Description
Our data set [38] decrypt and get the result: initialize the homomorphic encryption method E build a task set T c with size of n c download the model parameters θ from server for task t i , i = 1, 2, ..., n c do copy the parameters θ as θ encrypt the gradient g k return the encryption result E(g k ) to server set is presented in Table I and a concise introduction to these features is in the appendices.
Table II presents details of the data set.The data collection involved a total of eight distinct phones, of which two phones were of the same type.In the table, we refer to these two phones as OPPO R9m and OPPO R9m 2 to differentiate between them.Additionally, we collected data using the OPPO R9m phone on two separate occasions, with a time span of 6 months between them.Consequently, the training data set contains two records with the same phone model (C 1 and C 3 ).In the subsequent experiments, we treat these data instances as if they were from two separate phones.The table also includes information about the operating system and the size of data set for each phone.
We designated six phones as contributors, while the remaining three phones are regarded as participants.As illustrated in Fig. 5, the data from each phone was divided into two parts: 70% of the data was allocated to the training data set, and the remaining 30% was assigned to the validation data set.The combined training data set of contributors, denoted as D ct , comprised a total of 10,282 sample data points, while the combined validation data set D cv consisted of 4,405 samples.Similarly, for the data set of participants, the sizes of D pt and

B. Crowdsourcing Group Construction
Given that the majority of modules in our system operate within crowdsourcing groups, we will now elucidate the process of constructing the training group set and the validation group set utilizing the data from all 6 contributors: • Step 1: Randomly select k phones from the pool of all 6 contributors, where k is a randomly chosen number ranging from 2 to 6.
• Step 2: Generate the common label set of these k phones, randomly pick one common label T com from the common label set.
• Step 3: For each selected phone, randomly choose one data sample that shares the same label as T com .Subsequently, the selected k data samples will be considered to belong to the same crowdsourcing group.Following this criterion, we use the training data set D ct to establish a training group set which contains 6000 crowdsourcing groups.Similarly, data from the validation data set D cv is used to create a validation group set which contains 1500 groups.Fig. 6 illustrates the evaluation results from all six contributors, showcasing the relatively accurate estimations achieved by each phone.Among them, phone C 6 demonstrates the best performance with a mean absolute error (MAE) of 0.190 • C. The highest MAE of 0.497 • C is observed on phone C 2 .On average, the MAE across all contributors is 0.276 • C. We leave the more comprehensive analysis of the estimation model in the appendices.
2) CBTS Truth Inference Model Evaluation: We use the training group set described in section III-B to train the CBTS model until the loss on the validation group set ceased to decrease for a consecutive 20 epochs.During training, the gradients are back-propagated only once all k data instances have been aggregated into a final result.Subsequently, we utilized the data set D cv to reconstruct a testing group set consisting of 6000 distinct groups to evaluate the performance of the CBTS model.
Table III presents the evaluation results of the CBTS model and seven other baseline methods.The first three methods, namely D&S [39], PM [13], and ZC [15], are typical truth inference algorithms that are applicable to numeric tasks.The next two methods, MV-2 and MV-3, are modified versions of the Majority Voting (MV) [12] approach.In these two methods, we clustered all the answers and selected the average of the largest cluster as the final answer.The number associated with each method indicates the set cluster number.The final two methods are the aggregation approaches utilized in other similar phone-based ambient temperature measurement studies.The Mean method, employed in works [9]- [11], calculates the average of all the given answers to derive the final result.On the other hand, the Weighted Average (WA) method, utilized in the work [11], assigns a confidence value to each phone based on its historical performance and employs a weighted average approach using these confidence values.For a more comprehensive understanding of these baseline methods and the rationale behind the baseline selection, we provide a detailed description in the appendices.
The MAE of the crowdsourcing results derived from the CBTS model is 0.136 • C. Compared to the ambient temperature measurement by a single phone, which has an average error of 0.276 • C, the application of crowdsourcing techniques reduces the error by approximately 50%.From the results in Table III, we see the CBTS model surpasses the performance of all other methods listed.Additionally, we assessed each method under varying group scales.The findings indicate that, in comparison to other algorithms, the CBTS model and the revised MV method exhibit less sensitivity to group scales.Remarkable results can be achieved with the CBTS model even when the group size is as small as 3.Although the performance of MV could be very close to CBTS, CBTS is much more flexible than the MV method.Because for MV method, we are required to set clustering parameters in advance based on the number of participants.Moreover, the results of MV can be obtained only when collecting enough answers.However, the CBTS model can provide a result no matter how many answers collected.Following the same group construction rules as before, for each data instance x i in D pt , we randomly sample k data instances from k different contributors in the data set D c .All of these data instances are required to have the same label as x i , so that we can assume they are collected from the same crowdsourcing group at the same time.By taking this rule, every crowdsourcing group contains 1 data instance from 1 participant and k data instances from k different For each crowdsourcing group, we do the following operations: • Step 1: Compute the estimation results of all k contributor data instances using their estimation models.
• Step 2: Take the CBTS model to crowdsource all of these results, obtaining a final inferred answer.
• Step 3: Assign the inferred answer to the remaining data instance x i , designating it as the inferred label.We generate the inferred label for each data instance x i in D pt and compare them with the true labels.Fig. 7 illustrates the bias between the true labels and the inferred labels.Moreover, we provide the specific errors in each sub-figure .From the results, we can see the average error between inferred labels and true labels on all 3 phones is only 0.161 • C, which is very small.
2) Few-shot Learning with MAML Framework: In this section, we explore various few-shot learning strategies to train a new temperature estimation model for a newly added participant who possesses only 5 data instances.To further evaluate the efficacy of inferred labels, we train each model using both true labels and inferred labels.Apart from the MAML training strategy, we consider several baseline training methods, which are as follows: • Pre-training (PT): In this method, we employ the entire  training data set of all contributors, denoted as D ct , to pre-train a model until convergence.Subsequently, we fine-tune the model using the 5 data instances collected from the selected participant.
• Direct-training (DT): This method involves training a model from scratch using the 5 data instances collected from the chosen participant.To better evaluate each strategy, we repeat the process of training and evaluation for 100 times.We use each strategy to train 100 models and evaluate their performance on the complete validation data set D cvn of each participant P n .For each training iteration of a specific participant, we randomly select 5 data instances from their corresponding training data set D ctn .We also fine-tune each model for both 1 step and 20 steps, evaluating their performance accordingly.The results are illustrated in Fig. 8 and summarized in Table IV.In all experiments conducted, adhering to the notation settings in Algorithm 1 and Algorithm 2, we set the task-level learning rate (α) to 0.001 and the meta-learning rate (β) to 0.01.We chose a task batch size (n) of 400, with the task-level update steps (s 1 ) being 5. Depending on the specific experiment, the number of meta-level update steps could be either 1 or 20.For optimization, we employed the Adam optimizer.
In Table IV and Fig 8, "TL" indicates the true label and "IL" represents the inferred label.The final row of Table IV presents the model performance across the evaluation data sets from all three participants.Upon comparing the plots in each figure, we can observe that utilizing inferred labels to train a model does not introduce significant deviations in the results.When all other conditions remain constant, the prediction curves obtained by training with true labels closely align with those obtained by training with inferred labels.Interestingly, for participant P 1 , some methods achieve superior performance with inferred labels compared to true labels.This could be attributed to the relatively small data quantity for P 1 , which may introduce a degree of randomness.However, when examining the results across all participants, it is evident that the overall performance when utilizing inferred labels is marginally lower than that achieved with true labels.
Among the various training strategies, MAML exhibits a substantial advantage over others.Taking into account the average performance across three participants, MAML surpasses Direct-learning by 59% and exceeds Pre-training by 36%.The results presented in the tables unequivocally demonstrate that training with MAML consistently achieves the lowest MAE across all three participants.These results indicate that even with only 5 new data instances, by training with MAML, we can effectively reduce the estimation error to less than 1 • C. Furthermore, we assessed the computational efficiency by fine-tuning over 20 epochs on our CPU (12th Gen Intel(R) Core(TM) i7-12650H 2.30 GHz), where the average duration across 100 trials was merely 0.017 seconds.Given that our data collection app samples data at a rate of 5 seconds, this allows for the rapid deployment of a new estimation model onto a new phone.This is of great significance for the extension of the temperature measurement model in practice.
3) Federated Learning: As illustrated in the methodology section, the incorporation of federated learning into the system does not alter the original process of few-shot learning.We use the TenSEAL library [40] to evaluate the performance of applying CKKS on the gradients.During the experiment, we set 100 virtual clients participate in cryptographic gradient computation for 100 times and every time we randomly generate a gradient for each client.The average time of encryption, summation and decryption on 100 clients is only 7.306 ms.The average error with respect to the result calculated in plain text is only 1.335e-10.The result is almost exactly the same as those obtained by direct calculation.

IV. FUTURE WORK
This study presents a comprehensive technical overview of the system implementation.In future work, we will focus on enhancing the user experience and usability of the system.Specifically, we may try to establish a reward mechanism based on the duration of participation.Then we will also address and discourage the related malicious competition behavior.Moreover, We will try to integrate personal phonebased temperature estimation with various appliances and electronic devices to incentivize continuous participation.

V. CONCLUSION
In this study, we present a phone-based distributed ambient temperature measurement system with an efficient label-free automated training strategy.The primary objective of our system is to enable collaboration between multiple smartphones in large indoor spaces for real-time measurement of ambient temperature in different small areas.Through crowdsourcing, the system can achieve an MAE of only 0.136 • C in temperature measurement.Additionally, our system incorporates an automatic data annotation method that leverages crowdsourcing technology to provide labels for newly collected data.The MAE between the generated labels and the true labels is only 0.161 • C. To expedite the training of an accurate temperature estimation model for newly added phones, we employ a fewshot training framework, which achieves an MAE of 0.814 • C with just 5 data points.Furthermore, we demonstrate the integration of federated learning into our system, ensuring the protection of privacy.With this work, we aim to advance the practical application of phone-based ambient temperature estimation and pave the way for further energy-saving technologies.Our system has the potential to contribute to energyefficient practices and encourage the adoption of innovative solutions in various domains.Final Battery temperature when screen was activated F 9 Final Battery temperature when screen was turn off the screen off.In Fig. 1, the battery temperature is denoted as T , and t represents the duration of time.

PHONE CONTEXT SETTING
Taking into account the impact of indoor environmental characteristics and various phone contexts (such as being in the pocket, on the desk, or in the hand), we design that our system activates the ambient temperature estimation model only when the phone is detected to be exposed to the air.This strategy is also used in the work [?].For a more comprehensive illustration, we refer readers to our previous work [?].

DESIGN OF LOSS FUNCTION
In Fig. 2, we illustrate our design by presenting three distinct predicted distributions.As inherent to the Gaussian distribution, increased variance results in a wider range of potential values and reduces the likelihood of observing values close to the expectation.That means the true temperature value possesses a greater probability of deviating from the expectation.Conversely, a smaller variance leads to a more concentrated distribution, thereby indicating that the ground truth is less likely to deviate significantly from the expected value.This is precisely why variance is used to denote the uncertainty of the prediction.

EFFECTIVENESS OF UNCERTAINTY VALUE
To provide a more intuitive representation of the effectiveness of uncertainty values, we plot the relationship between uncertainty and estimation bias using data samples from the validation data set of each contributor.By examining the results depicted in Fig. 3, it becomes evident that as the uncertainty value increases, the distribution of prediction bias becomes more divergent.This indicates that uncertainty is helpful for evaluating answers in the crowdsourcing process

CROWDSOURCING TRUTH INFERENCE BASELINES DESCRIPTION
We select the crowdsourcing baseline methods from two research areas to compare with our CBTS model: ) that are suitable for numeric tasks to compare with our CBTS truth inference model.Additionally, we include the Majority Voting (MV) method [?] as another baseline and apply a minor adjustment to tailor it for our purposes.Although this method is not originally designed for numeric tasks, it has demonstrated strong performance and thus warrants consideration in our evaluation.Specifically, we cluster all the answers and select the average of the largest cluster as the result.We conduct evaluation experiments with two different clustering numbers: 2 and 3.In cases where the number of given answers is smaller than the clustering number, we treat them as a single cluster and directly compute the average as the result., the researchers employed a weighted average approach, considering the confidence level of each phone.In order to reproduce this method, we assigned a confidence value to each phone based on its performance on the training data and adopted the weighted average method to crowdsource the answers.This method is referred to as Weighted Average (WA).In this study, we denote the MAE on the training data set of phone C n as E n .We calculate the weight W n for each phone using the following formula: The final weights obtained for the six phones are as follows: W 1 = 0.144, W 2 = 0.08, W 3 = 0.262, W 4 = 0.141, W 5 = 0.136, W 6 = 0.237.

Fig. 1 .
Fig. 1.The schematic diagram of our distributed phone-based cooperative ambient temperature estimation system.

Fig. 2 .Fig. 3 .
Fig. 2. The structure of the ambient temperature estimation model.This model takes 9 phone state features as input and outputs the temperature estimation result with the corresponding uncertainty value.

Fig. 4 .
Fig. 4. The flow diagram of how to build a training data set and fine-tune the source model for the new participant.

Fig. 5 .
Fig. 5.The diagram which illustrates the partition of data sets and task sets.

C
. Distributed Ambient Temperature Measurement Evaluation 1) Ambient Temperature Estimation Model Evaluation: In this section, we evaluate the performance of the ambient temperature estimation model on each single phone.The training data set of each phone is utilized to train a specific ambient temperature estimation model.During the training process, 20% of the training data is reserved for determining the appropriate stopping point for training.Upon completion of the training process, the model of each contributor is evaluated on its respective validation data set.

Fig. 6 .
Fig. 6.The estimation results of all six contributors.

P 3 :Fig. 7 .
Fig.7.The comparison plots of the inferred label and the true label on the whole training data set of all participants (P 1 to P 3 ).In every sub-figure, the horizontal axis represents the true label, while the vertical axis portrays the inferred label.

Fig. 8 .
Fig. 8.The figure depicts the results of few-shot learning using the Direct-training (DT), Pre-training (PT), and Model-Agnostic Meta-Learning (MAML) methods.In order to maintain visual appeal, the y-axis range for all the images has been set between 12 and 35.Consequently, some of the outliers in figure (b) are not visible due to the restricted y-axis range.

Fig. 1 . 9 Fig. 2 .
Fig. 1.The illustration of what the features F 4 to F 9 represent for.

C 6 Fig. 3 .
Fig. 3.The relationship diagrams depicting the uncertainty and estimation bias which is constructed using the validation data set obtained from all contributors.In each diagram, the horizontal axis represents the level of uncertainty, while the vertical axis portrays the estimation bias.The colors utilized in the diagrams indicate the true temperature values associated with each piece of data.

TABLE I THE
COLLECTED FEATURES FROM SMARTPHONES.
Algorithm 1 Meta-TrainingInput: α, β, n, s 1 , T t repeat shuffle T t repeat build the training batch with n task samples for task t i , i = 1, 2, ..., n do copy the parameters θ as θ ′ i for step = 1, 2, ..., s 1 do calculate L spt (f θ ′ i ) with the support set update θ ′ consists of a total of 21,014 sample data points collected from eight distinct phones.Each data point contains nine features values and one label value.The feature

TABLE III THE
MAES OF THE CBTS MODEL AND OTHER BASELINE TRUTH INFERENCE ALGORITHMS WHICH ARE EVALUATED ON THE TESTING GROUP SETACROSS DIFFERENT GROUP SCALES.To evaluate the effectiveness of annotating new data using the CBTS truth inference model, we conduct the inferred label generating experiments on the training data set of participants, denoted as D pt .

TABLE IV THE
PERFORMANCE OF MODELS THAT ARE TRAINED WITH DIFFERENT METHOD BY USING 5 PIECES OF DATA ON EACH PARTICIPANT FOR A 20-STEP UPDATE.WE MEASURE THEM WITH MAE( • C).