A Proactive and Practical COVID-19 Testing Strategy

To reopen the economy safely during the COVID-19 pandemic, governments need the capability to proactively identify new and often asymptomatic infections, as well as contact tracing. Policymakers and public health professionals need a sampling-testing method that can achieve broad population coverage without overwhelming medical workers. We observe that COVID-19 high-risk groups are located in the hubs and cliques of our geo-social network, formed by the close encounters of people during daily life. These individuals are the de facto “canary in a coal mine”. We propose that nations offer free and anonymous testing service to them. With open-source computer algorithms and datasets, only a small fraction of the population selected for COVID-19 testing can cover the majority of high-exposure-risk individuals. A 0.3% sampled testing for a megacity covers 3/4 of its entire population. A 3% sampled testing for a rural town covers 3/4 of its entire population. With government oversight and public consent, this approach can serve each province/state or city/township for decentralized daily testing planning. However, to protect privacy, we recommend constructing the geo-social network of anonymized cellphones, not named individuals. This infrastructure should be dismantled once the pandemic is largely over. This can be achieved by policymakers, health workers, and engineers together in solidarity.


PROBLEM FORMULATION
THE COVID-19 pandemic puts global governments in a dilemma.Before social distancing and stay-athome orders, rapid chain infection happened.Strict stay-at-home orders save lives but risk economic recession.Public opinions are growing increasingly polarized and led to armed protesting [1].If the economy collapses in any nation, the ensuing mass unemployment and social unrest can expose the most fragile families to the pandemic.Reopening the economy safely is, thus, a necessary public health policy.
However, recklessly loosening stay-at-home policies and reopening the economy in hard-hit nations can be risky.Asymptomatic COVID-19 patients can infect others in offices or onboard public transportation.Droplets and aerosols from people talking can carry the virus [2].If the chain of community infection goes undetected, it can grow like wildfire.Hospitals will again be overwhelmed and the pandemic can become endemic.Thus, a prerequisite to reopening the economy is the ability to rapidly identify new cases among the asymptomatic population [3].That enables contact tracing of community infection, and subsequent containing local outbreaks.There are other prerequisites such as a declining number of patients, universal availability of PPEs, which are as important but will not be discussed in this study.
To that end, Mr. B. Gates prescribed a drastically increase of nucleic testing capability for COVID-19 [4].
Meanwhile, a Harvard panel report proposed daily proactive testing of 5-20 million people in the United States alone or 2-6% of the total population [5].The study did not specify how they come to that estimation nor how good that estimation is.The challenge to that testing capability lies not only in the production and distribution of test kits but more crucially in the logistics of the actual tests.There might not be enough medical workers and lab technicians in the US to conduct 20 million tests a day.Peto et al. [3] advocate universal weekly random testing of 13% of the U.K. population to reach 90% coverage.That translates to $2% daily testing of the entire population.This is a huge logistic challenge for the U.K. as well.Similar random sampling schemes are being developed for India [6].However, none of these testing schemes materialized since their conception.This probably is because governments deem them impractical.This logistical challenge can be readily solved, if only a selected 0.1-0.3%sample of the total population is needed to tested daily or weekly.As of now, megacities in the United States, Europe, and China already have that testing capacity.
We propose the daily testing of only a small subset of the asymptomatic population, specifically targeting the hubs and cliques in a geo-social network of anonymous cellphones.If any result comes back positive, then the people around them need further testing as contact tracing.The geosocial network of anonymous cellphones in a given area during a given time period consists of vertices and links.The vertices are the cellphones, carried by their owners active in the economy.The links among them indicate significant close encounter, such as working in the same office, living in the same house, and sharing the same ride.
The following graph illustrates a simple geo-social network of three young working professionals.Mary, Giuseppe, and Lee work in a small consulting firm.Mary shares her house with a partner and jogs with a group of X (5 to 20) people to the office daily.Giuseppe shares a house with his parents and 2 siblings and drives alone to the office daily.Lee lives alone in his condo and takes a 40-min metro ride to the office daily with Y (10 to 50) people in a train car.In the geo-social network graph, we use F to denote family members, and C to denote commuters they meet daily.For simplicity we assume that other family members stay strictly at home, the commuters interact with no one else, and all the people in this graph are asymptomatic.There exists no full-scale study on COVID-19 exposure on individuals.Patients of old age or with preexisting medical conditions have the highest death rate once infected, but not necessarily the highest exposure chances before getting ill.We observed that two types of people might be the most exposed to COVID-19 due to their distinctive geosocial network niches.We could focus our limited testing capabilities on them.
Around the world, senior government officials have been disproportionately hit by COVID-19.The list includes prime ministers of Britain and Russia, the first Ladies of Spain and Canada, the first family of Brazil, and countless ministers around the world.Likely this situation resulted from their busy daily schedule to meet with a large number of people, often internationally.In other words, the "hubs" in our geosocial network are most exposed to infection risks.In a sense, they are the canary in a coal mine.Timely testing for them could buy time for their local communities.

People spending long hours in close quarters have seen horrendous local outbreaks of COVID-19.
Well-known cases include the Diamond Princess, USS Theodore Roosevelt, USS Kidd, and many hospital wards, retirement homes, factories [7], and prisons [8] around the world.In a geo-social network, these communities are known as clique's because each member is within close vicinity of all other members and, therefore, geo-socially interconnected.Such cliques are often exposed to airborne droplets carrying the virus, which leads to unusually high percentages of local infection.
Thus, our goal is to identify, in each geo-social network of a workforce embracing economic reopening, the hub's and clique's people for daily COVID-19 testing even though they are asymptomatic.If any hub's or clique's individual turns up positive for COVID-19, the geo-social network of his/her immediate daily interaction circle needs to be tested, and the patients quarantined.We argue that this is an efficient sampling strategy for COVID-19 testing in a reopened economy.
Each city has its own logistical constraints on testing.Some cities can afford to test daily 1% of its workforce, others may afford to test 0.1%, which testing percentage is sufficient?How to measure sufficiency?How can each city perform its own rapid assessment on a daily basis?

METHODOLOGY AND EXPERIMENT
To address the abovementioned questions, we conducted a pilot study using existing social network tools on two real-world social network datasets.The simplest approach is to single out individuals with the most links in the geo-social network for testing.But the problem with that approach is that those individuals are often in the same local community and, thus, have highly overlapped geo-social networks [9].For example, doctors and nurses working in the same ER room, or the congress members of the same nation.If we concentrate our testing resources on them, we will miss out on the big picture in the population and have a social inequality issue.Thus, we aim to find the individuals with the most links in the geo-social network, while the individuals directly linked to them cover the maximum percentage of the population.This can be achieved by dividing the geosocial network of mega-cities like Wuhan or NYC into small communities or cliques.We can then identify the hub's in each community.
Unfortunately, in mathematics and computer science, this problem is NP-hard.To find the exact optimal solution takes exponentially computation time as the size of the population grows.There exist heuristic solutions that can produce imperfect yet useable solutions with a limited time budget.These solutions were developed over the past two decades not just to analyze social networks and internet traffic [9].These algorithms are also the workhorses behind Internet search engines such as Google [10] and Microsoft Bing [11].
The heuristic algorithms examined here are developed in academia and open-source.We also share crude yet simple Python snippets [12], [13] to make use of these models with real-world datasets.We hope that the public health sector can integrate these methods without hiccups.In our pilot study, both algorithms can analyze geo-social networks with millions of vertices (people) in several minutes on a Linux workstation.This indicates the feasibility of decentralized day-to-day operations in each municipality without additional charges.
The Louvain algorithm was created by Blondel et al. [14] from the University of Louvain, Belgium.It is a bottom-up clustering algorithm to find communities large or small, often very different in size.The METIS algorithm [15], [16] was created by G. Karypis and V. Kumar from the University of Minnesota, USA.It enables parallelprocessing to partition social networks into communities of similar sizes.The first dataset we tested on is a Googleþ social network dataset [17], including 107 614 people, and 13 673 453 links among them.On average each person is connected to 127 others.This number is comparable to the number of people a working professional meets daily in a busy metropolis using public transportation.It is a densely connected network.
The second dataset we tested on is an Internet server topology dataset [18] originally assembled to study the transmission of computer viruses.It has 1 696 415 vertices (machines) and 11 095 298 links among them.On average each machine is connected to 6.5 others.This number is comparable to the number of people a working professional meets daily in a small town without using public transportation.It is a sparsely connected network.
To be clear, we do not assume that COVID-19 transmits along with cybersocial networks.We consider the two datasets previously because they have network structures similar to geo-social networks of the workforce, which has close-range physical interactions daily in a reopened economy.
Our study is designed in the following four steps.First, we partition the network datasets into U clusters using the METIS algorithm and the Louvain algorithm.Then in each cluster, we single out K individuals who have the most connections within the cluster.In total, we have U Ã K individuals chosen for COVID-19 testing.As a simpler baseline choice, we single out the top U Ã K individuals with the most connection links in the complete geo-social network.We adopted the value of parameter U as the total number of individuals S divided by 100 or 1000.In this way, the total amount of individuals chosen [U Ã K] will be a percentage of the total population.The evaluation metric is the coverage of the tested individuals, defined as the number of individuals immediately linked to the tested individuals divided by the total number of individuals.The four steps are illustrated in Figure 2 using the sample described in Figure 1.

FINDINGS
The following two tables list the coverage rates from three different algorithms on two real-world datasets.They can tell us to an extent how well the geo-social network sampling and testing cover the population in a reopened economy.The "Coverage" percentages are calculated as the percentage of people who had close contacts with the COVID-19 test subjects, out of the general population.
On both datasets and all sampling percentages, the METIS algorithm steadily outperforms other algorithms in terms of coverage rate.This does not indicate that the Louvain algorithm is inferior.It was designed to identify natural-looking subcommunities large and small.Its most suitable use would be to visualize and trace local community transmission.
On the densely connected Googleþ dataset, we are indeed running a simulation of busy urban life such as that in NYC, or Wuhan.Results listed in Table 1 indicates that, the METIS algorithm used to sample 0.3% of the population can effectively represent With location data sitting idle with the telecommunication service providers and tech giants, the general public, and national governments may want to discuss and decide whether or not to make use of it during the pandemic [20], [21].People have valid reasons to worry about privacy, [22] but these are not normal times [23].Safe and moral usages of this data flow require mandatory erasure of any and all personal details from the dataset and render it anonymous except to oneself.For example, only the citizen him/herself can know that he/she is a hub of the geo-social network.This pilot study is a baby step to introduce to the field of public health the importance of social network analyses.We have already seen the use of traditional S-I-R modeling for infectious diseases since the onset of the pandemic.The S-I-R models assume equal infection risk for all individuals and, thus, is insufficient alone.Social network analyses provide insights into exposure risks of each individual and, thus, can be integrated into S-I-R models for S-E-I-R modeling.We assume that everyone has equal immunity in our model because of limited data.If possible to collect more detailed information about individuals, we hope to improve our model considering the covariates affecting personal immunity.To battle the pandemic and potentially endemic COVID-19 as a planetary challenge, interdisciplinary teamwork among epidemiologists, computer scientists and data scientists, and lawmakers is needed.We hope to see our model revised and applied in policies and day-to-day operations [28].Modeling can only tell us so much.Politics does the rest [29].The bottom line against dystopian use of location data is to construct a geo-social network of anonymous cellphones, not of people without privacy.Make this a service instead of surveillance.And this service should only be temporary during the pandemic.Our planet after the pandemic does not need Geoslavery [22].

Figure 2 .
Figure 2. Selecting the hubs from a sample geo-social network.

Figure 1 .
Figure 1.Sample geo-social network of a small consulting company.

Table 2 .
Coverage Percentage out of Geo-Social Network Sampling Test on Skitter Dataset.

Table 1 .
Coverage Percentage out of Geo-Social Network Sampling Test on Googleþ Dataset.
combat COVID-19.It is important that peoples are aware of this option, can debate about it, and make a decision for their own nation.We do not yet know how long this pandemic lasts and how bad it can go.Therefore, all options should stay on the table.For epicenters of the pandemic, government might want to integrate all possible measures together to turn the tide against the pandemic.