Visual Analytics Platform for Centralized COVID-19 Digital Contact Tracing

The COVID-19 pandemic and its dramatic worldwide impact has required global multidisciplinary actions to mitigate its effects. Mobile phone activity-based digital contact tracing (DCT) via Bluetooth low energy technology has been considered a powerful pandemic monitoring tool, yet it sparked a controversial debate about privacy risks for people. In order to explore the potential benefits of a DCT system in the context of occupational risk prevention, this article presents the potential of visual analytics methods to summarize and extract relevant information from complex DCT data collected during a long-term experiment at our research center. Visual tools were combined with quantitative metrics to provide insights into contact patterns among volunteers. Results showed that crucial actors, such as participants acting as bridges between groups could be easily identified—ultimately allowing for making more informed management decisions aimed at containing the potential spread of a disease.

The COVID-19 pandemic and its dramatic worldwide impact has required global multidisciplinary actions to mitigate its effects. Mobile phone activity-based digital contact tracing (DCT) via Bluetooth low energy technology has been considered a powerful pandemic monitoring tool, yet it sparked a controversial debate about privacy risks for people. In order to explore the potential benefits of a DCT system in the context of occupational risk prevention, this article presents the potential of visual analytics methods to summarize and extract relevant information from complex DCT data collected during a long-term experiment at our research center. Visual tools were combined with quantitative metrics to provide insights into contact patterns among volunteers. Results showed that crucial actors, such as participants acting as bridges between groups could be easily identified-ultimately allowing for making more informed management decisions aimed at containing the potential spread of a disease.
T he COVID-19 disease produced by the SARS-CoV-2 virus has become the biggest pandemic ever known, affecting all dimensions of human activity in 2020 and 2021. 1 Actions to mitigate its effects on health, economy, education, leisure, or any other aspect of human life have been explored from many perspectives, and science and technology have played a key role in shaping public policies for monitoring and reducing the impact of the pandemic. 2 Among the wide variety of technological initiatives, Bluetooth low energy (BLE)-based digital contact tracing (DCT) 3 has shown to be one of the most controversial technological approaches where the high potential to monitor and track the spread of the disease via tracing a person's contacts with other persons wearing a mobile phone device directly collides with ethical concerns 4 related to privacy rights and individual's intention to adopt DCT apps. 5 Distributed proposals, such as DPT3, a offer high privacy protection. 6 However, some studies show that their effectiveness is lower than initially expected due to the lack of details provided by such systems. 7 As an alternative, centralized technologies have been developed to offer much more detailed and traceable information. The development of these decisionmaking support systems calls for creating tools able to extract valuable information from massive interconnected data.
Simulation studies show that approximately 70% of the contacts of positive cases need to be traced with a maximum delay of one day in order to substantially contain the spread of the disease. 8  Available: htt_ ps://github.com/DP-3T/ positive cases in order to apply isolation policies. Our study is based on an experiment carried out at our research center during the COVID-19 pandemic using a centralized, nonanonymous DCT system that was tested by volunteer employees during their working time at the center's facilities. The possibility of identifying contact individuals provides much better insight into general and particular behaviors and individual actions (e.g., in case of infection) can be directly controlled and monitorized by occupational risk prevention services. The work presented here focuses on the power of visual analytics techniques as a decision-making support tool for the large amount of DCT data generated by each mobile phone. The platform created to this end securely integrates all data and communications and provides a detailed overview of human interactions at work. This information-when fed to visualization tools and simulators, such as ArchABM 9 -has been used to fine-tune internal policies and monitor the potential impact of positive cases at our center.
The rest of this article is organized as follows. First, the "Related Work" section in DCT is presented, including methods for modeling and visualizing contact tracing data. This is followed by the "Description of Work" section where the general architecture and main elements of the DCT system are presented. The "Validation" section shows the results of the proposed DCT system deployed in a real organization. Finally, general conclusions, observations in system scalability, and some future work ideas are presented in the "Conclusion and Future Work" section.

RELATED WORK
A plethora of ICT b solutions have been developed during the pandemic period, 10 including DCT apps, social networking and informative apps, simulation models, mask-wearing detectors, and disease incidence maps. This section will summarize those activities primarily related to DCT data generation and analysis.

Digital Contact Tracing
Many countries have developed and supported various types of DCT systems. Azad et al. 11 presented an in-depth analysis of a wide variety of DCT applications where 26 apps from 18 countries are analyzed, and Barrat et al. 12 analyzed contact tracing applications and their impact by establishing a theoretical framework and running simulations based on these created models. Authors conclude that the massive adoption of DCT apps could lead to a strong suppression of the spreading of the disease.

Theoretical Models and Simulation
Theoretical COVID-19 spreading models and simulators offer the possibility of estimating the consequences of specific policies and behaviors. There is abundant literature related to modeling and simulating the infection risk of the disease to evaluate the mitigation impacts of policies and actuation plans.
Singh et al. 13 applied a simulator to evaluate scenarios in the Madrid metropolitan area. Lelieveld et al. 14 12 developing and running simulation models to evaluate the effectiveness of manual and DCT. Martinez et al. 9 implemented an agent-based simulator based on an aerosol model where one of the input parameters was the number of agents using DCT.

Visualization
Dense and massive contact-tracing data require intuitive representations to allow experts to acquire an insight into what is relevant. Due to the nature of DCT data (contacts between a multitude of mobile devices located in a certain proximity range), graphs offer good visual representation properties, such as intuitive topological information, ability to show indirect relations between nodes, and metrics that characterize individual nodes. However, most of the research efforts observed in the literature focus on exploiting individual traces with the aim to apply mitigation actions, such as isolation protocols. Among those that have proposed global analytic and visualization methods, Luo et al. 15 presented a method to visualize transmission networks by plotting graphs colored by geolocation data. Extending data representation to a broader set of visualizations, Dixon et al. 16 proposed a platform applied in the state of Indiana (U.S.) where different sources of public health data are integrated and presented in dashboards with maps, bar charts, etc., that provide a general overview, help to identify trends and outbreak. Baumgartl et al. 17  synchronized visualization to detect and analyze outbreaks in German hospitals. Antweiler et al. 18 described a dashboard based solution that combines graph, temporal, and geographical views while Sondag et al. 19 proposed a dynamic tree-based visualization where domainspecific knowledge is used for clustering purposes. Moreover, these authors analyze existing work in graph visualization concluding that the different proposed methods to show global information (also based on other visual paradigms, such as self organizing maps and choropleth maps) tend to be static, while their method (as well as ours) offers an interactive dynamic behavior.
To the best of our knowledge, a system based on nonanonymized and centralized DCT information that combines the global vision of contact networks (by using graphs) with individual graph-related metrics and interactive methods for filtering and visualization has not been fully addressed in the current state of the art. Therefore, the main contribution of this article is to present a nonanonymized DCT system that shows information on a visual-analytics platform that combines different visual paradigms, with special focus on the combination of interactive graphs and numerical graph-related metrics.

DESCRIPTION OF WORK
The proposed system consists of an end-to-end solution that securely captures BLE-based contact tracing data, stores preprocessed information, and provides visualanalytics interfaces for data understanding and decisionmaking. The developed interactive graphs provide individual detail to the global vision and allow occupational risk prevention officers to filter data by using node metrics to combine users' qualitative and metrics-related quantitative criteria.
Data privacy has been one of the main concerns during the research activity. As the DCT technology developed is centralized and nonanonymous, several privacy-related measures have been adopted.

› The DCT app has been used by volunteers that
have been informed about the research project and all privacy-related aspects via internal communications. An information office is available to clarify any concerns that volunteers might have. Privacy-related suggestions made to this office have been included in the final system. These policies state that personal information will be accessed only by the data controller, which is the center's dedicated occupational risk prevention service. Only anonymized information will be used for scientific publications, and no other use of these data will be allowed.
› Users can set the DCT app tracking hours according to their work schedule and the app can be enabled/disabled at any moment.
› Contact tracing information is shared between mobile devices by anonymous 16-bit daily generated pseudorandomized keys. Associations between keys and corresponding user IDs can only be established by the data controller.
› The source code of the app has been made available for all participants. The running analytics platform with real data are only available for occupational risk prevention services. › All communications have been secured the following state-of-the-art security strategies and via hypertext transfer protocol secure (HTTPS) and secure sockets layer (SSL)-based connections.

General Architecture
A high-level architecture representation of the whole system with four main elements (from data capturing to exploitation) is depicted in Figure 1. User identity, authentication, data management/storing, and processing modules are deployed on cloud infrastructures.
All communications have been secured (SSL/ HTTPS) and user authentication and identity management have been deployed with Keycloak. f

Underlying Technologies
The system has been developed based on a set of open-source base technologies. The DCT app is based on react native g and local data have been stored in SQLite3.
The backend is based on the following technologies. The frontend of the system monitoring and data analytics platform is based on Vue.js h with eCharts i as the main visualization library.

Data Collection
The data capturing layer consists of an in-house developed Android app based on react native, published in Google Play, j ready for the volunteers to be installed on their mobile devices. Some static BLE nodes were set as beacons in fixed places and were used to provide indoor location data of moving nodes (volunteers). The DCT functionality of the app can be configured to only work during specific hours. Contacts are only traced between recognized devices that have the app pertaining to the organization installed. Users can report through the app if they feel symptoms or get infected by COVID-19. In order to give complete control to each volunteer, tracking can be disabled at any moment by pushing the principal button of the application. Figure 2 shows the main screens of the DCT app.
The app runs as a background process and provides the following data picked up by the BLE: Timestamp, Origin ID, Target ID, and received signal strength indicator (RSSI). By processing consecutive events, contact duration can be calculated.

Data Management
Once the DCT app-enabled devices are successfully logged in, contact tracing information is automatically sent to the central system database pertaining to the organization, in this case, our research center. This information is preprocessed and organized before it is finally stored. Preprocessing tasks include the following. › Filtering: This module allows filtering the events that are relevant for analysis. For instance, very weak signal connections or very short ones can be filtered according to the preconfigured parameters. Due to our use case being set in a research context, we have stored all events; however, a real use case may benefit from prior filtering.
Several parameters of the capturing and communications system can be configured. Volunteers can select the activation period to match it with their own working hours. System administrators can set the BLE advertising/scanning periods (that affect the sensitivity of contact tracing and battery consumption), and data uploading frequency.
Collected raw data are consolidated by the preprocessing module that filters detected IDs by a timespan threshold of consecutive contacts. Filtered connections are recorded as contact events that included data, such as detected user ID, RSSI, timestamp, and contact duration.

Data Anonymization
As it has been mentioned earlier, privacy is one of the biggest concerns when it comes to centralized DCT systems. Therefore, no personal data are stored in mobile devices, and all communications are based on dynamic

Visual Analytics
The primary rationale behind the designed visual analytics approach is that contact tracing information contains both numeric and relational information, which are best represented by graphs. However, static graphs can still be hard to understand and might require processing (interactive pruning or browsing) and metrics that help to identify the most relevant facts.
The visual analytics module includes several interactive data visualization services tailored to the type of data (contact networks) and the knowledge that experts aim to extract from them. k General activity information is represented in Figure 3. Information related to the connected devices can be visualized after selecting the desired time period. These visualizations help to monitor the usage and adoption of the DCT system as well as global statistics (number of unique users, average contact duration per user, average number of encounters per user, average RSSI per encounter, total sum of duration of all contacts, total number of encounters).
The core information is presented in the interactive graph view (see Figure 4), where the main activity groups can be seen. Nodes can be colored by different criteria, such as department or role. Once the time period is selected, the visual analytics module shows the corresponding contact graph and highlights its direct contacts (see Figure 5). The visualization is supported by three main metrics that provide quantitative insights regarding node connections. The summary information of these metrics is presented with bar charts on the upper side. These quantitative parameters enable comparative studies between periods, groups, conditions, etc., and complement intuitive visual information provided by the graph representation. The calculated metrics are as follows.
a) Number of direct contacts of a given node. b) Degree centrality is measured by the number of direct contacts X a;gna6 ¼g of a given node divided by the total number of nodes n. It measures, therefore, the number of direct links to other actors in the network, meaning that nodes with a high degree of centrality often serve as "hubs" or major channels C D ðaÞ ¼ X n a¼1;na6 ¼g X a;g ðn À 1Þðn À 2Þ : c) Betweenness centrality is a metric that characterizes the relevance of a node for establishing short pathways with other nodes 20 (i.e., nodes serving as "bridges") C B ðaÞ ¼ X g;h2V ng;h6 ¼a p g;h ðaÞ p g;h : ( The abovementioned equation represents the betweeeness centrality, where p g;h is the number of geodesic pairs that connect nodes g and h while p g;h ðaÞ is the number of geodesic pairs that connect nodes g and h by passing through node a. Raw data tend to create dense networks that do not provide visual information (upper graph in Figure 5) where filtering data by contact duration or RSSI helps to discover contact patterns. In a fully connected network the number of edges n e has a quadratic relation with the number of nodes n (n emax ¼ nðnÀ1Þ 2 ). In order to allow analysts to browse and filter dense networks, the visual analytics platform provides interactive services to change the aforementioned parameters (see Figure 4) and prune the initial graph. Each time that a new graph is defined, quantitative metrics are recalculated. By changing the time of the analysis period, the effects of specific daily actions, such as lunchtime can be observed. With all this information, the occupational risk prevention services can provide guidelines and policies to mitigate the risk and monitor the effect of these proposed actions.
Individual cases can be tracked under the user info option (see Figure 6). These options provide precise information related to selected user's contacts during the specified time period. These contacts can include BLE beacons installed along the building rooms, which provide a glimpse of the locations visited by the individual and the time spent in those places. All these data provides valuable information allowing the occupational risk prevention services to adopt specific measures for each particular positive COVID-19 case.

VALIDATION
The presented system has been developed and deployed in our research center's facilities in San Sebastian (Spain). 98 unique volunteers have participated by installing the DCT app on their phones with an average concurrency of approximately 30 users (some more detailed info can be found in Figure 3). The backend has been deployed in commercial cloud infrastructures, and the user interfaces have been available for the occupational risk prevention services from October 2020 to March 2022. The system has been reviewed and refined during this period according to the participants' recommendations.
During the validation period, two main activities were carried out. First, when positive cases appeared, their contact information during previous days was analyzed in order to define the group of persons that could potentially get infected and, therefore, should remain at home until negative PCR test was obtained. This process helped to identify people whose location was not close to positive cases and that unexpectedly had direct contact during lunchtime, meetings, etc.
The second activity was related to the use of the developed platform, which allowed occupational risk prevention services to monitor global activity and analyze the effect of general policies, anomalous behaviors, etc. As it can be seen in Figure 5, a direct graph representation of raw data does not provide clear visual information. However, filtering contacts by duration and combining it with quantitative metrics (e.g., users with the highest betweenness centrality) provides good insight into the general behavior. Figure 7 shows the evolution from a densely connected network represented in Figure 5 to more structured graphs. Some clusters can be identified by filtering out connections that last less than 15 minutes. In this case, the users with the highest betweenness centrality are: [59, 65, 64, 33, 37], where User 59' is the main bridge between the red group and the rest. The analysis of these users showed that all these five users had transversal responsibilities within the organization (management and IT).
By setting the filter to 30 minutes, the structure shows the following users with the highest betweenness centrality: [59, 65, 68, 37, 47], confirming that employees with transversal roles l are those with the highest impact in connecting nodes. By increasing the time to 45 minutes, the connection between User-57 and User-27 can be clearly identified as the bridge between two groups. This effect can be confirmed for connections that last 60 minutes while for longer periods different groups remain isolated.
From this information, it can be concluded that employees with transversal roles were the ones connecting different groups (mainly due to face-to-face the meetings they had with representatives of each group). Therefore, they were encouraged to organize teleconferences with people belonging to other groups, even if they were sharing the same facilities.

Occupational Risk Prevention Services
The visual analytics platform was at the disposal of occupational risk prevention officers. Interviews with them confirmed that while individual tracking of positive cases was feasible with nonanonymized tabular information, graph representations offered quick and valuable insight into indirect contacts.
The relevance of visual analytics and graph statistics was identified as even more relevant for general monitoring purposes. Officers concluded that tabular data were not manageable to get a global view and that, instead, graphs were able to provide high-value insights in an intuitive form. Moreover, centrality metrics were used to determine the most relevant nodes in messy graphs. The combination of centrality metrics with interactive graph filters was identified as the most valuable method to understand the general behaviors, individual high-risk cases, and established policies' effects.

CONCLUSION AND FUTURE WORK
Visual analytics techniques are essential for complex and massive data understanding, such as big networks generated by DCT data that evolve over time. Moreover, interactive visualizations allow occupational risk prevention services to browse these data where they can identify the best match between what is implicit in the observed data and their own experience and knowledge. In the case of COVID-19 contact tracing data, we proposed the use of graphs as visual paradigm that provide a quick and intuitive insight together with network quantitative measures providing measurable and objective metrics. In this sense, we present a complete DCT filtering (upper graph). When one node is selected, its connections are highlighted. l Transversal roles are considered those that regularly interact with other departments (e.g., Department Heads, IT Service, Administrative Department, etc.) system based on a mobile service that creates individual traces and transmits the information to a central system where a data monitoring and visualization platform provides insight into general patterns, individual contacts (when employees get infected), effects of adopted policies, etc. The experience gathered in our organization by applying the proposed system and methodology to address the COVID-19 crisis has demonstrated that continuous monitoring of contacts and real-time data availability combined with tailored visualization methods are able to shift complex management decisions from being merely intuitive to data-driven global insights with both visual and quantitative backup.
Privacy has been one of the main concerns related to centralized DCT systems. The fundamental aspects for the massive adoption of the DCT app (the vast majority of employees working in presential mode has participated as a volunteer in the initiative) have been a clear data usage policies with a special focus on cybersecure system design and the continuous availability of DCT system managers to clarify any concern that employees might have.

Scalability
The implemented architecture allows the system to scale it up to bigger organizations. Data capturing and storage systems are technologically far below their limits, and both would allow the creation of new instances (with load balancing, etc.) for much higher amounts of connections. Data processing and visualization tools are more limited in terms of scalability. Even if there are no strict real-time requirements, in order to provide a good usability, results should be delivered in a few seconds, and user interfaces must behave smoothly. As data are processed in backend infrastructures, concurrent processing methods and hardware improvements would leverage the needs for performance improvement. Experimental tests have shown that the implemented web graph visualization offers good responsiveness for a few hundred nodes. A higher number of nodes would require GPU-based rendering or static graph representations and would become difficult to interpret. Thus, preprocessing methods for clustering and filtering become as relevant as computational performance oriented improvements.

Future Work
Apart from the aforementioned technical challenges in scalability aspects, the application of DCT in larger organizations or massive events would create big and complex graphs, which would be much harder to interpret. Future research will focus on finding effective visual paradigms to represent the evolution of contact tracing data over time and to provide new interactive filtering tools for big graphs. The integration of DCT data with business-related information and key performance indicator will also be the part of the foreseen future work.

ACKNOWLEDGMENTS
This research work has been carried out within the context of the RAPID m initiative, fostered by the FIGURE 7. Interactive filtering methods help to create meaningful representations and to discover patterns as a key element for data-driven decision-making. Each graph corresponds to a contact duration range and node colors represent organization's departments. Global scores rank most relevant nodes. Node sizes represent node attributes such as number of contacts (in this case) or degree centrality. m [Online]. Available: htt_ ps://rapid.vicomtech.org/ Basque Government as part of the fast reaction program (PRAP Euskadi, led by SPRI-the entity of the Economic Development, Sustainability, and Environment Department of the Basque Government for promoting the Basque industry) with the aim to boost the Basque industrial sector by maintaining the productive activity in the context of the threat of the COVID-19 pandemic. Three research centers of BRTA n (Basque Research and Technology Alliance) have collaborated in this R&D initiative: Tecnalia, Ikerlan, and Vicomtech. Among the different research lines carried out in the RAPID initiative, Vicomtech has been responsible for the centralized BLE-based DCT system and visual analytics of the obtained data which has been selected as one of the representative cases by the OECD o of pandemic reaction report.