Unraveling the Temporal Importance of Community-scale Human Activity Features for Rapid Assessment of Flood Impacts

The objective of this research is to explore the temporal importance of community-scale human activity features for rapid assessment of flood impacts. Ultimate flood impact data, such as flood inundation maps and insurance claims, becomes available only weeks and months after the floods have receded. Crisis response managers, however, need near-real-time data to prioritize emergency response. This time lag creates a need for rapid flood impact assessment. Some recent studies have shown promising results for using human activity fluctuations as indicators of flood impacts. Existing studies, however, used mainly a single community-scale activity feature for the estimation of flood impacts and have not investigated their temporal importance for indicating flood impacts. Hence, in this study, we examined the importance of heterogeneous human activity features in different flood event stages. Using four community-scale big data categories we derived ten features related to the variations in human activity and evaluated their temporal importance for rapid assessment of flood impacts. Using multiple random forest models, we examined the temporal importance of each feature in indicating the extent of flood impacts in the context of the 2017 Hurricane Harvey in Harris County, Texas. Our findings reveal that 1) fluctuations in human activity index and percentage of congested roads are the most important indicators for rapid flood impact assessment during response and recovery stages; 2) variations in credit card transactions assumed a middle ranking; and 3) patterns of geolocated social media posts (Twitter) were of low importance across flood stages. The results of this research could rapidly forge a multi-tool enabling crisis managers to identify hotspots with severe flood impacts at various stages then to plan and prioritize effective response strategies.


I. INTRODUCTION
As climate change induces more frequent extreme weather events, populated areas are at greater risk to destruction and disruption due to floods at many scales and at every facet of The associate editor coordinating the review of this manuscript and approving it for publication was Senthil Kumar . society. Floods cause significant social and physical impacts on communities, such as property loss, loss of life, damage to infrastructure, and disrupted access to critical facilities [1]- [4]. In the aftermath of a flood, rapid assessment of either inundated areas or damages could facilitate rapid identification of hotspots with more severe flood impacts [5] during flood event stages. Using hotspot information enables crisis response managers to prioritize recovery efforts and resource allocation to these areas [6], [7]. The ultimate flood impact data, such as flood inundation maps, insurance claims, and compiled household surveys, become available only several weeks and months after the flood events have ended (for example, Federal Emergency Management Agency flood inundation map for Hurricane Harvey in August 2017 became available in April 2018). This time lag makes the rapid assessment of flood impacts particularly critical.
Several studies have investigated the use of satellite imagery [8]- [10] and aerial images from drones [11]- [13] for rapid flood impact assessment. Qiang et al. [8] used nighttime satellite images to model flood impacts on and recovery of the economy. Skakun et al. [9] analyzed the time series of satellite images to evaluate flood hazard and risk in Namibia. The Copernicus Emergency Management Service' [10] rapid mapping can provide geospatial information of disasters within hours or days of a request. Popescu et al. [13] used features such as color, texture and fractal types from dronecollected aerial images, to predict flood size, achieving accuracy of 98.87%. While these data and assessments provide valuable insights regarding the extent and spatial variation of inundations, they have some limitations, such as their relatively coarse spatial and temporal resolution [14] and higher computation costs for processing a large amount of imagery data [15]. To enhance rapid flood impact assessment and complement the insights obtained from satellite and aerial images, researchers have investigated the usefulness of community-scale big data [16], [17]. The growth of sensing technologies and ''data for good'' programs of some technology companies have increased the availability of communityscale big data [18], [19], such as activity data from cell phone signal densities, credit card transaction records, social media data and metadata, and traffic data. These communityscale big datasets have become more commonly available on the same day and at fine spatial resolution affording capture of human activities, such as daily activity indexes, transaction activities, online communications, and mobility. Large-scale flood-caused perturbations cause disruptions at the smaller scale of human activities [20]. Hence, variations in human activities in a flood-affected community signal impacts on the community [21], which can be further used for the rapid assessment of flood impacts (e.g., [22], [23]). In particular, some recent studies have investigated single human activity features for rapid assessment of flood impacts. Fluctuations in the density of population activities (obtained through aggregate cell phone signals) or Waze road flooding reports could indicate local flood inundation status in an area outside a floodplain [21], [24]. Podesta et al. [20] compared the fluctuations in visits to points of interest (POIs) during normal periods to the Hurricane Harvey period and found that such fluctuations can be used to assess flood impacts. Yuan et al. [23] assessed the flood impacts of Hurricane Harvey through the analysis of variations in credit card transactions. Kryvasheyeu et al. [5] and Fan et al. [22] examined the Twitter posts of disaster-related tweets to estimate damage in Hurricane Sandy. Yuan and Liu [25] evaluated both the posts related to a disaster and their sentiment expressions to examine damage by Hurricane Matthew. In addition, Fan et al. [26] have found that most roads with null values of average speed (from traffic data) in Harris County during Hurricane Harvey were actually inundated by the floods. This study provides promising evidence that changes in human mobility obtained from traffic data can also provide signals about flood impact. While these studies point to the promise of garnering insights from community-scale big data about heterogeneous human activity-based features, little is known about the temporal importance of these features across flood stages. For instance, changes in human mobility before an event might be due to preparedness and evacuation response of people and might not be a strong indication of flood impacts [20]. Changes in human mobility during and in the immediate aftermath could be a stronger signal about of flood disruption [16]. Hence, it is essential to discern the temporal importance of human activity-based features for rapid assessment of flood impacts.
In pursuit of this goal, we examined ten features related to community's activities, including daily activity index, transaction activities, online communications, and mobility in Harris County (Texas, USA) during the 2017 Hurricane Harvey. We investigated the fluctuations of these features from the normal period compared with the flood period and created a set of random forest models to explore the temporal changes in the importance of each feature across the flood stages. We present predicted results from one of the random forest models. It should be noted, however, that the purpose of this study was not to build a state-of-the-art prediction model of flood impacts based on these features, but rather to evaluate the temporal changes in the importance of features. Figure 1 illustrates the research framework that guided the study. We first started by identifying aspects of community realities that are susceptible to external disruptions. Then we specified the community-scale data and associated features, which can capture the temporal variations in community realities. With the specified community-scale big data and associated features, we created multiple random forest models and utilized the function for feature importance analysis to analyze temporal changes in the importance of various features in terms of the extent to which they could indicate flood impacts. The ultimate flood impacts were based on the flood insurance claim data and flood inundation map across the 142 ZIP codes in Harris County.

A. COMMUNITY REALITY
Community reality, the state of the day-to-day life of community [27], can be captured based on various dimensions of human activities, such as mobility activities [20], [28], mobility and traffic patterns [29], [30], credit card transactions [23], and online communications [31]. Human activities get perturbed due to direct effect on households, infrastructure damage, and people's response behaviors. Hence, VOLUME 10, 2022 fluctuations in human activities could provide important insights regarding the state of community reality [32]. For example, human mobility activities were impacted by the COVID-19 pandemic [28]. Podesta et al. [20] showed that human visits to points of interests decreased during floods and thus the community had less mobility activities. In another study, Fan et al. [26] showed that road inundation can be inferred from the absence of average speed data on road segments. Yuan et al. [23] examined the extent to which flood impacts were associated with the variations in credit card transactions. Online communications play an important role in collective sense-making of communities as disasters unfold. Evaluation of social media posts could be used for aggregating the personal toll [33]. When analyzing sentiment expressions on Twitter during Hurricane Harvey, Yuan et al. [34] found that the impacted areas were more likely to indicate negative sentiment during disasters. Hence, the content of social media communications could shed insight on the dimension of human activities [35].
As a result, community reality and its variations (during crises compared with the normal period) can reveal signals of flood impacts. For instance, if residents are economically impacted by the floods, or if they could not access businesses such as restaurants and groceries due to inundated/closed roads, or if businesses such as pharmacies are closed due to damage, we can capture the flood impacts by analyzing community's transaction behaviors (e.g., credit card transactions). In this study, we harnessed the community-scale big data related to different dimensions of community reality (i.e., human activity index, traffic and mobility, credit card transactions, and online social media communications) to explore the extent to which different features can indicate the flood impacts in the context of the 2017 Hurricane Harvey. The Mapbox data contains indices of telemetry-based human activity that vary across space and time. The spatial unit of data aggregation is a tile. The partition of tiles is based on Mercantile, a Python library, which enables creating spatial-resolution grids. Human activity is collected, aggregated, and normalized by Mapbox based on the geography information updates of users' cell phone locations. The more users located in a tile at time t, the greater the human activity index. The dataset includes human activity data for the entire United States. Mapbox provided the temporal resolution of 4 hours as the raw data. This dataset contains an activity metric for different tiles at different time points across the United States (activity title,t ), and the larger value of the activity metric reflects more human activities within that tile at time t. Then, we aggregated the tiles (180 meter × 180 meter) at the ZIP code level to derive the daily activity metric for our further analysis. The INRIX dataset includes location-based traffic condition data from both sensors and vehicles at the road segment level. The INRIX traffic data contains the average traffic speed of each road segment at 5-minute intervals and their corresponding historical average traffic speed. The traffic data includes all available road segments-from interstates to intersections, and from country roads to neighborhoods. This dataset also provides road names, geographic locations defined its head and end coordinates, and length. We aggregated the road segments according to their locations by ZIP to extract features related to changes in traffic conditions. Credit card transaction records data were obtained from SafeGraph. Each transaction record data contains the transaction date, cardholder's residential ZIP code, the number of unique cards from the ZIP code involved in transactions on that day, the number of unique transactions on cards from the ZIP code observed on that day, and the total amount spent on cards from the ZIP code on that day. By matching the ZIP codes to transactions, we aggregated the daily credit card transaction activities.
The Twitter data was obtained from the Twitter API which enables streaming the geo-located tweets generated by its users from given periods and regions [14]. Each tweet contains username, tweet content, geolocations, and post time [36]. Tweet geolocations by latitude and longitude were used to aggregate humans' daily online communication activities by ZIP code.

2) FEATURE ENGINEERING
Using these four community-scale big data categories, we specified ten features (Table 1) to capture the community state during floods. This section describes how each feature was defined and calculated with given datasets. To examine signals of flood impacts, all features in Table 1 were evaluated based on their variations from the normal period (baseline period) to flood period. We defined the normal period from August 1, 2017, to August 24, 2017, for the features derived from Mapbox, IRINX, and SafeGraph datasets. For the Twitter dataset, we denoted the normal period from August 22, 2017, to August 24, 2017, due to the lack of Twitter data during August 1-27, 2017.

a: VARIATIONS IN THE AVERAGE DAILY ACTIVITY INDEX
We computed the average daily activity index by averaging the daily activity indices for all tiles within each ZIP code. Accordingly, variations in the daily activity index values can be a potential indicator for the flood impacts [21]. Using the mean of daily activity index in the normal period as the baseline, we calculated the variation of the average daily activity index during flood period (i.e., FE 1) using Equation (1), where t represents the date. (1)

b: VARIATIONS IN THE PERCENTAGE OF CONGESTED ROADS
Using the IRINX traffic data, we specified two features: variations in the daily maximum percentage of congested roads (i.e., FE 2) and changes in the daily average percentage of congested roads (FE 3). According to the geographic location of each road segment, we divided road segments by ZIP codes. For each 5-minute period t within a day, we computed the ratio between the current average speed and the speed limit of that road segment (ratio vt,v lim it ). In this study, we used 50% speed loss to denote road congestion.
If ratio vt,v lim it < 50%, a road was denoted as congested.
If an unflooded road is near a flooded road, this unflooded road is more likely to become congested [26]. Therefore, the variations in road congestion during flood periods can become a potential indicator for flood impacts. According to the status of road congestion, we computed the percentage of congested roads at period t within each ZIP code. Then, we calculated the maximum road congestion value and average road congestion value of the 288 percentages (1440 minutes per day/5 minutes per period) for each ZIP code. Using as baselines the normal maximum daily percentage of congested roads and the normal average daily percentage, we computed the fluctuations of daily maximum percentage of congested roads (i.e., FE 2) and the daily average percentage of congested roads (i.e., FE 3) during flood period with Equations (2) and (3), as shown at the bottom of the next page, respectively, where t represents the date.

c: VARIATIONS IN THE CREDIT CARD TRANSACTIONS
Using credit card transaction data, we calculated the number of cards, the number of transactions, and the total spent per day for each ZIP code. Yuan et al. [23] found that variations in credit card transactions from the normal period compared with the flood period can be an indicator of flood impacts. A significant negative fluctuation signals more severe flood impacts. The study by Yuan et al. [23] used the average daily spent in the normal period (i.e., three weeks before Hurricane Harvey made landfall) as the baseline from which to compute fluctuations of total spent during the flood period. Accordingly, we introduced two additional variables-the number of cards and the number of transactions-to calculate their variations as potential indicators for flood impacts. Specifically, using the averages of the daily number of cards, daily number of transactions, and daily total spent in the normal period, we computed the variations of these three variables for the daily values of features FE 4, FE 5, and FE 6 with Equations (4)- (6), as shown at the bottom of the next page.

d: VARIATIONS IN THE ONLINE COMMUNICATIONS ON TWITTER
Existing studies have demonstrated the use of tweet sentiments for assessing disaster impacts in hurricanes and floods [25], [37]. Yuan and Liu [25] employed variations in sentiment scores of Twitter posts from the normal period compared with the hurricane period to assess the disaster impacts and found an association between the sentiment score variations and flood impacts. In this study, we used the rule-based model called VADER [38] for the sentiment analysis with geolocated Twitter data. The VADER model calculates the normalized sentiment scores for Twitter data and provides a mechanism for denoting the sentiment polarities for a given text. If the normalized sentiment score ≥ 0.05, the given text data has positive sentiment; if the −0.05 < normalized sentiment score < 0.05, the given text data has neutral sentiment; and if normalized sentiment score ≤ −0.05, the given text data has negative sentiment. Using the VADER model, we computed the normalized sentiment score for each geolocated tweet. According to their geolocations, we aggregated the daily Twitter data for a ZIP code and calculated the daily average sentiment scores for all daily Twitter data within a ZIP code using Equation (7). Also, we determined the daily numbers of Twitter data with positive, neutral, and negative sentiment. With the average sentiment scores and numbers of tweets with positive, neutral, and negative sentiment in the normal period, we computed the daily values for FE 7, FE 8, FE 9, and FE 10 using equations (8)- (11), as shown at the bottom of the page.
where, n z,d represents the number of tweets within ZIP code z on day t; sentiment i denotes the normalized sentiment score of Twitter i.

C. RANDOM FOREST MODELS
Based on the concept of ensemble learning, random forest was developed as an extension of bagging technique with the trees-based algorithms. Compared with traditional machine learning approaches which learn one hypothesis from the training data, the ensemble method aggregates multiple hypotheses [39] thereby reducing errors and variances within a single hypothesis. Given the concept of the ensemble method, the technique of bagging within the tree models can reduce the variances of a single decision tree [40]. The bagging technique divides the initial training dataset into several subsets then randomly substitutes subsets with replacements to train their corresponding decision trees. Thus, the bagging technique produces an ensemble of different tree models. Averaging all the prediction results from various tree models smooths out data and enables more reliable classifications.
Using the bagging technique, random forest modeling involves not only random selection of subset of training dataset but also the random selection of features within the training dataset [41]. Therefore, random forest can improve the variable selection based on the enhancement of bagging technique [42]. In this study, we created daily random forest models to test the importance of our daily features (Table 1) for indicating flood impacts. In addition, we utilized the feature importance evaluation method within the random forest modeling to assess the temporal changes in the importance of each feature. The random forest package from the scikit-learn library uses the optimized version of the CART (classification and  regression trees) algorithm to yield the largest information gain at each node measured by the Gini index [43]. Accordingly, we employed the aggregated decrease in Gini impurity to evaluate the feature importance for our daily random forest models. The aggregated decrease in Gini impurity can be calculated using Equations 12 through 16 [44]. The descriptions of all the variables in these equations are summarized in Table 2.
Im p(feature i , t) RF Im p(feature i , t) = t∈trees within a RF model Norm Im p(feature i , t) T Using Equations 12 through 16, we determined the temporal variation in the features' importance across daily random forest models. The daily changes in the features' importance are examined for evaluating the time periods within which each feature to seek out signals regarding flood impacts.

D. FLOOD IMPACTS
Flood impacts as dependent variables of the random forest models are represented by two measures: (1) the normalized number of claims; (2) and the flood inundation percentages within a ZIP code. Specifically, with insurance claim data for Hurricane Harvey collected by National Flood Insurance Program [45], we computed the number of claims for each ZIP code. Considering the effect of population size on the number of claims [14], we used the normalized number of claims as one of our flood impact measures. The normalized number of claims was calculated based on the ratio between the number of claims and the population of that ZIP code from the US census data. Figure 3a shows results of the normalized number of claims. For the flood inundation percentages, we used the flood inundation map of Hurricane Harvey (Figure 3b) produced by Federal Emergency Management Administration. Overlapping this map with Harris County map at ZIP code level, we computed the flood inundation areas within a ZIP code and further calculated the flood inundation percentage for each ZIP code. Figure 3c shows the spatial distribution of the flood inundation percentages.
With these two flood impact measures, we classified our ZIP codes with two to four classes of flood impacts, which are used as class labels for the ZIP codes. For the evaluation of temporal variations of features' importance as indicators for flood impacts, we used these class labels as the input dependent variables for random forest feature importance function.

A. FEATURE IMPORTANCE FOR FLOOD INSURANCE CLAIMS
This section summarizes the rank of feature importance for ind icating the flood impacts (using flood insurance claims as a measure). With the normalized number of claims of the 142 ZIP codes, we classified the flood impacts into two, three or four classes. Taking three-class classification as an example, we performed the rank of the importance for 10 features ( Figure 4) This section takes the three-class classification of flood insurance claims as an example to discuss the rank of feature importance in the response and recovery stages. Table 3 shows the analysis results for the rank of feature importance for indicating flood impacts using insurance claims as the measure. For each feature as in Figure 4, we summarized the persistence period (the number of days the feature importance persisted and fluctuated slightly, second column, Table 3). The third column of Table 3 shows the rank persistence ranges for all the features in their corresponding persistence periods. The last column reveals the final rank of each feature calculated by the average of its ranks across the response stage. The feature importance analysis for the two-class and four-class classifications in the response stage are illustrated in Tables S1 and S2 in the supplementary information.

1) FEATURE IMPORTANCE FOR INDICATING FLOOD INSURANCE CLAIMS IN RESPONSE STAGE
According to Table 3, we can see humans' daily activities (FE 1) and travel activities (FE 2 and FE 3) are more reliable indicators for flood impacts measured by flood insurance claims. During hurricane and flood periods, some of the affected residents may choose to stay at home, as they felt confident staying at home as they had ridden out previous extreme events [46], while some could choose to follow the  evacuation order [47]. Either evacuating or sheltering in place can be captured by daily activity index and travel activities. For instance, staying at home could reduce travel activities, and evacuation could increase road congestion, which can result in the variations of average daily activity index and congested roads. Therefore, variations in the average daily activity index, daily maximum percentage of congested roads, and daily average percentage of congested roads, could provide indicators for rapid assessment of the extent of flood impacts in terms of insurance claims.
In addition, features related to credit card transactions (i.e., FE 4, FE 5, and FE 6) are among the middle rank from four to six. Yuan et al. [23] had found that variations in daily total expenditures from the normal period could capture the flood impact. The study showed that residents' credit card transactions (total spent) decreased in business sectors such as drugstore, health care and groceries [23], which indicates that flood impacts can be captured by the changes in the credit card transactions.
For the features derived from Twitter data (FE 7, FE 8, FE 9, and FE 10), changes in the number of neutral tweets (FE 9) demonstrated the greatest importance among the four features. Features' relation to changes in the number of positive and negative sentiments show little importance as indicators for flood insurance claims. The quantity of Twitter data posted within a ZIP code region has a strong and positive relationship with the population of that region [48]. During Hurricane Harvey, less populated ZIP code regions posted limited Twitter posts, the division of which in three sentiment polarities could further reduce the number of Twitter data, which would result in null values for features FE 7, FE 8, FE 9, and FE 10. As a result, changes in the number of positive, negative, and neutral sentiments, as well as variations in the average sentiment score, may be less important indicators for flood impact assessment compared with other features derived from humans' daily activities, travel activities and credit card transactions. Table 4 shows results of analysis of the rank of feature importance (during the recovery stage) for indicating flood impacts in terms of insurance claims. The feature importance analysis for the two-class and four-class classifications based on flood insurances are illustrated in Tables S3-S4 in the supplementary information.

2) FEATURE IMPORTANCE FOR INDICATING FLOOD INSURANCE CLAIMS IN RECOVERY STAGE
Compared with the rank of feature importance during the response stage for indicating flood insurance claims, there is no significant variation in the rank of features in the recovery stage. Features derived from humans' daily activities (FE 1) and travel activities (FE 2 and FE 3) are still in the rank of top three places; features related to credit card transactions (i.e., FE 4,FE 5,and FE 6) are still at the middle ranks; and features related to Twitter activities are at the bottom of the rankings in terms of feature importance. This results indicate that the importance of features for rapid assessment of flood  impacts (based on flood claims) does not vary significantly during response and recovery stages; the most important features during the response stage retain their importance ranking in the recovery stage. Hence, since these features are calculated each day, a daily value of each of the important features could be a reliable indicator of flood impacts.

B. FEATURE IMPORTANCE FOR FLOOD INUNDATIONS
This section shows the rank of feature importance for indicating the flood impacts measured by the flood inundation. Based on the calculated flood inundation percentages of the 142 ZIP codes, we classified the flood impacts into two, three, and four classes. As in the section of FEATURE IMPORTANCE FOR FLOOD INSURANCE CLAIMS, we used the three-class classification as an example and conducted analysis of the rank of feature importance ( Figure 5). 5 (See the rank of the importance for two-class and four-class classifications of flood inundations in Figures S3 and S4 in the supplementary information.) Table 5 shows the analysis results for the rank of feature importance for predicting flood inundations in the response stage. For each feature presented in Figure 5, we summarized its persistence period, range of ranks in the persistence period, and its final rank. The feature importance analysis for the two-class and four-class classifications of flood inundations in the recovery stage are illustrated in Tables S5 and S6 in the supplementary information.

1) FEATURE IMPORTANCE FOR INDICATING FLOOD INUNDATIONS IN RESPONSE STAGE
Compared with the rank of feature importance for indicating flood insurance claims in the response stage (Table 3), Table 5 shows some variations in the ranks when flood impacts are measured based on the flood inundations (such as variations of credit card transactions). In most cases, the general rank for feature importance rank remains stable when the measurement of flood impacts changes from flood insurance claims to flood inundations. Table 6 shows the analysis results for the rank of feature importance (during recovery stage) for indicating flood inundation extent. The feature importance analysis for the two-class and four-class classifications are illustrated in Tables S7-S8 in the supplementary information.

2) FEATURE IMPORTANCE FOR INDICATING FLOOD INUNDATIONS IN RECOVERY STAGE
Compared with the rank of feature importance indicating flood inundations in response stage (Table 5), Table 6 reveals minor variations in their rank during the recovery stage. The significant variations are captured in the rank changes in the daily average percentage of congested roads (FE 3) and changes of the number of transactions (FE 5). In the recovery stage, variation of the daily average percentage of congested roads becomes more important (rank changes from 5th place to 2nd place), while changes of the number of transactions become less important (rank changes from 4th place to 7th place). For the remaining eight features, variations of humans' daily activities (FE 1) and travel activities (FE 2 and FE 3) are still more reliable indicators for flood inundations; changes of credit card transactions remain in the middle of the ranking (excluding FE 6 at 3rd place); and variations of online communications are still at bottom place (excluding FE 7 at 5th place).
Compared with importance rank of features indicating flood insurance claims (Table 4), we observe the main variations in the rank of the changes in credit card transactions feature when flood impacts are measured based on flood inundations ( Table 6). The most significant variation falls in the importance rank of the changes of the total spent (FE 6), which moves from 7th to 3rd place. The importance rank of variations of the number of cards dropped from 4th place to 6th place and changes of the number of transactions dropped from 5th place to 7th place, when flood impacts reflected by insurance claims are changed to be measured by flood inundations. For the other seven features, there is no significant variation in their importance ranks. Hence, overall, the feature importance rankings for flood impact assessments related to two different flood impact measures-insurance claims and flood inundation extent-show very similar results. This result indicates that the features identified as being of top importance could provide reliable indicators of flood impacts when a rapid assessment is needed.

IV. DISCUSSIONS AND CONCLUDING REMARKS
Early and rapid estimation of flood inundations and losses across within a community can empower crisis response managers to identify areas with severe flood impacts to inform resource allocation during response and recovery. Flood inundation maps, insurance claims, and survey data reflecting flood impacts, however, become available only in weeks and months after the events. Emerging community-scale big data categories reveal fluctuations from the normal period to flood period, of community-scale activities, such as human activity index, travel activities, credit card transactions, and online communications, that could provide weak signals of flood impacts on the community. For instance, disruptions in infrastructure or population response behaviors could change the traffic and movements in the affected areas; thus, examining fluctuations in human activities and traffic could provide rapid and early indications of flood impacts.
While recent studies (e.g., [20], [21], [23], [33], [49]) demonstrated the potential of using human activity-based data for rapid impact assessment, the relative importance of features related to different aspects of human activities and their temporal significance was not known. Using four community-scale big data types, we derived ten features related to the changes in the daily human activity index, daily travel activities (reflected on road congestion conditions), daily credit card transactions, and daily online communications in the context of the 2017 Hurricane Harvey in Harris County. Through the use of the feature importance function within the random forest model, we explored the importance rank of these 10 features to indicate the ultimate flood impacts measured by both flood insurance claims and flood inundation across flood stages. With cases of three-class classifications of flood insurance claims and flood inundations, we found a stable rank of features derived from four categories of community-scale human activities. Features derived from the variations in the average daily activity index (FE 1), daily maximum percentage of congested roads (FE 2), and daily average percentage of congested roads (FE 3), are generally at the top three places in their importance rank in terms of indicating both flood insurance claims and flood inundations. Changes of credit card transactions in terms of the number of cards (FE 4), the number of transactions (FE 5), and the total spent (FE 6) generally in the middle of the importance scale in both response and recovery stages. Features derived from the variations of online communications on Twitter (FE 7, FE8, FE9, and FE10) mainly are the least relevant in terms of correlation with flood insurance claims and flood inundation in both stages among these four categories of community-scale human activities. It is worth to mention that ranks of these features may vary for different disaster cases and different regions. The method can be generally implemented to various disaster cases and regions with the community-scale human activity data.
The study and findings contribute to the emerging field of smart resilience focusing on harnessing community-scale big data and analytics techniques to enhance disaster resilience capabilities, such as rapid impact assessment. Also, the findings could help public officials and emergency managers assess the impacts of floods before detained flood maps and claims become available. For instance, in both the response and recovery stages, crisis response managers could first use changes in the average daily activity index (FE 1) and the daily average percentage of congested roads (FE 3) as indicators of flood impacts, as these remain in the top-three ranked features when flood impacts are measured by flood insurance claims and flood inundations. If community-scale big data needed for determining features FE 1 and FE 3 are not available in the response stage, changes in the total spent (FE 6) could be another reliable indicator for flood impacts. With the identification of hotspots with severe flood impacts, crisis response managers can allocate relief resources to these areas. The analysis of feature importance rank could also suggest what features should be monitored by crisis response managers across different flood stages.
This research demonstrates that data-driven machine learning models with community-scale big data could provide important insights of flood impacts in discrete regions within a community and across flood stages through the fluctuations of community-scale human activities. One limitation in this study is the lack of consideration of relationships among these 10 features derived from four community-scale big data. We considered mainly the first-order feature importance and will investigate the dynamics of feature interactions (relationships) across flood stages in future research. The potential findings of their temporal relationships will inform which feature could impact another feature, which can help the selection of features to be monitored across flood stages.

APPENDIX
Supplementary information are available as appendix.