SEARCH WITHIN CONTENT
Citation Information : Connections. Volume 41, Issue 1, Pages 25-46, DOI: https://doi.org/10.21307/connections-2021.022
License : (CC-BY-4.0)
Published Online: 17-June-2021
At its core, contact tracing is a form of egocentric network analysis (ENA). One of the biggest obstacles for ENA is informant accuracy (i.e., amount of true contacts identified), which is even more prominent for interaction-based network ties because they often represent episodic relational events, rather than enduring relational states. This research examines the effect of informant accuracy on the spread of COVID-19 through an egocentric, agent-based model. Overall when the average person transmits COVID-19 to 1.62 other people (i.e., the
Issues regarding the reliability and validity of relational data have been a long concern for social network researchers (Perry et al., 2018), especially when such data are supposed to represent observable behavior (i.e., A interacted with B this week). And although there has been considerable research demonstrating various biases and cognitive limitations participants have when reporting their network ties (see, Smith et al., 2020, for a review), less work has demonstrated some of the more applied consequences of such issues. In other words, in what applied contexts should researchers take a deeper look at the outcomes of informant accuracy?
One such applicable area where the reliability and validity of relational data are important is the practice of contact tracing. Contact tracing is a strategy used to help contain various infectious diseases that spread through interpersonal contact. Put simply, contact tracing is the process of retrospectively identifying persons who may have made interpersonal contact with a confirmed infectious individual (Eames and Keeling, 2003). The logic of contact tracing is straightforward: identify contacts who have had interactions with an infected patient and remove them from the social system to (hopefully) prohibit the further spread of the disease. Such a method has been credited as effective because it allows for the testing of at-risk contacts and this method also allows for the identifying of hotspots and clusters of diseases (Klinkenberg et al., 2006).
At its core, contact tracing is simply a form of egocentric network analysis (ENA). ENA is the study of individuals (i.e., egos) and people in the ego’s immediate social environment (i.e., alters; Perry et al., 2018). Conducting a reliable and valid ENA study is challenging, and there is a rich academic/applied literature dedicated to improving the rigor behind these efforts (Crossley et al., 2015; McCarty et al., 2019; Perry et al., 2018). However, there has been scant research determining how varying levels of reliability and validity might influence how effective contact tracing can be at containing infectious diseases.
This research is organized as follows. First, we review the literature behind contact tracing and our key factor of interest: informant accuracy. Second, we set up the details behind the egocentric agent-based model, examining the effect of informant accuracy and multi-level tracing on the spread of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the biologic strain of coronavirus that causes the illness known as COVID-19. Third, we conduct a sensitivity analysis to examine the impact of the timing of the tracing and the percent of asymptomatic cases in the population. Finally, we discuss the results in terms of the wider literature on contact tracing and ENA.
Contact tracing is a well-known type of social network intervention for understanding how infectious diseases spread (Valente, 2010). Indeed, rudimentary versions were even used during waves of the bubonic plague in the sixteenth century (Cohn and O’Brien, 2020). Contact tracing has been employed in infectious diseases caused by various pathogens (e.g., STIs, Macke and Maher, 1999; SARS: Donnelly et al., 2003; TB: Mandalakas et al., 2017). Consequently, it is considered efficient at reducing the prevalence of infections, especially when dealing with an isolated number of cases (e.g., STIs) or toward novel forms of viruses (Eames et al., 2010).
Although contact tracing is generally considered a useful means of gathering potential transmission data, it does not always suffice, by itself, as a control measure to contain a given epidemic (Eames et al., 2010). Moreover, the effectiveness of contact tracing is dependent on the transmission dynamics of the outbreak (Klinkenberg et al., 2006) and on the timing of the tracing itself (Kretzschmar et al., 2020). For example, Cheng et al. (2020) conducted a contact tracing assessment toward COVID-19. They found that, due to the high transmission rate before and near an individual’s symptom onset, contact tracing would be inadequate on its own. Instead, contact tracing should be at least implemented with other interventional strategies, such as social distancing and mask-wearing. Likewise, a recent stochastic transmission model by Hellewell et al. (2020) sought to investigate the potential efficacy of contact tracing and isolation of cases toward COVID-19. They did so through a variety of simulated outbreaks and ranging their R 0’s and transmission before symptom onset percentage. They found that contact tracing probability (i.e., odds that the tracing happens) must be high (i.e., 80%+) to control COVID-19 transmission.
Additionally, agent-based models (ABMs) have been used to simulate contact tracing data. Kucharski et al. (2020a, b) employed ABM to simulate a variety of scenarios involving responses to COVID-19 (e.g. no control measures, self-isolation away from the household, quarantining, self-isolation in the household). The authors found that a combined approach of isolating symptomatic cases and contact tracing the contacts of positive cases reduced the spread of COVID-19 when compared to individually implementing measures. Furthermore, the authors note that in an instance where asymptomatic cases were high, many contacts would need to be traced and tested to consider transmission at a higher network level (i.e., transmission from secondhand contact).
At the end of the day, contact tracing is theoretically useful for understanding contagion dynamics and practically useful for mitigating the spread of infectious diseases, Additionally, as Eames et al. (2010) point out, collecting contact tracing data is not an easy feat. It requires significant material resources to compensate tracers and it requires informational resources to train tracers to overcome some of the many obstacles to collect reliable and valid contact network data. Overcoming these obstacles is important because the effectiveness of contact tracing is only as good as the data collected.
Because contact tracing is a form of ENA, it is prone to a host of issues regarding reliability and validity (for a review, see Perry et al., 2018). For contact tracing, perhaps the most sensitive issue for data collection is informant accuracy, the difference between an ego’s perceptions of their network and their actual network (Bernard et al., 1984). When researchers compare and contrast between perceived and actual networks, these analyses are commonly known as accuracy studies, perhaps most notably popularized by Bernard, Kilworth, and Sailer, otherwise known as the BKS studies (for a brief review, see Bernard et al., 1981). The general theme of the BKS studies is that individuals’ self-reports of their network behaviors do not align very well with their actual behaviors. Moreover, the network structures of self-reported and actual behavioral networks differ from one another as well (Quintane, 2012). Although self-reported data on perceived network contacts may be useful as cognitive social structures in its pure form (Krackhardt, 1987), for contact tracing, it is clear that the minimization of the error between the two is a central concern. Indeed, since accumulation of a handful of accuracy studies, there has been debate, research, and theoretical moves to try and figure out why there is such a discrepancy between perceived and actual contacts (e.g., Corman and Scott, 1994; Pilny et al., 2017), how to mitigate some of these biases in data collection (Kogovšek et al., 2002), and how to account for informant (in) accuracy in inferential network analyses (Butts, 2003).
Nevertheless, it is important to understand why individuals have trouble recalling network ties in the first place. For starters, humans do not store information like network contacts in a vacuum, they are primarily organized through cognitive schemas that cluster information based on how ties are related to one another (McCarty et al., 2019). For instance, Brashears and Quintane (2015) found that alters tend to be better remembered in terms of common group membership (e.g., role relations like family, work, neighbors) and ‘chunked’ structures like triads. Moreover, contacts are better remembered if they are repeated and represent long-term, consistent interactions (Freeman et al., 1987). Likewise, contacts that are not very popular themselves (e.g., less central) are often difficult to remember than popular ones (Marin, 2004). Perhaps more importantly for contact tracing, individuals whom the ego feels less close and interacts with less frequently are more prone to recall issues (Brewer, 2000). The key implication here is that there are certain network ties that are more likely to be forgotten.
For extracting contact tracing data, all of the above issues may be present, but may be even further complicated by the fact that such data represents the extraction of relational events, not relational states. A relational event can be defined as a ‘discrete event generated by a social actor and directed toward one or more targets’ (Butts, 2008, p. 159), while relational states can be viewed as ‘continuously persistent relationships between nodes’ (Borgatti et al., 2013, p. 3). The transmission of COVID-19 does not require an established relational state between dyads, a simple relational event will suffice.
The key difference is the nature of the tie: relational events are episodic, while relational states are more enduring. For instance, some psychological theories of recall generally find that some information (e.g., interpersonal contacts) are better remembered if they are encoded in a meaningful way. For instance, if a relational event occurs with somebody where there is no meaningful relational state (e.g., friendship, work relationship) or other cue of information, such an event may be more difficult to recall because the episodic event may be harder to encode in an elaborative (i.e., meaningful) fashion (e.g., Craik and Lockhart, 1972). Relational events divorced from relational states or other helpful memory schemas like elevator conversations, interactions with waiters/bartenders, fellow parent at the park, etc. might lend themselves more prone to recall problems.
Moreover, individual-level differences can also exacerbate similar informant accuracy problems. In other words, it could be characteristics about the ego, rather than the alter, that can influence informant accuracy. A substantial amount of research has demonstrated that factors like gender (Breashears et al., 2016), age (Hsieh, 2014), mood (Hlebec and Ferligoj, 2001), and occupation (Marineau et al., 2018) can influence network recall as well.
As such, previous ENA research has demonstrated that informant accuracy problems exist and influenced by characteristics of both the ego and the alter. Poised with this phenomenon, we consider how informant accuracy impacts the spread of COVID-19 when contact tracing is implemented. Important questions like ‘how accurate does the contact tracing need to be to have any effect?’ and ‘at what point do we start to see diminishing returns?’ can be answered by analyzing the relationship between informant accuracy and viral spread. Moreover, contact tracing requires participants to engage in the unpleasant task of going into quarantine. Here, there is also a challenge to be economical and try to require as few as people as possible to go under quarantine. As such, we ask:
R1: What is the relationship between informant accuracy during contact tracing and (a) the spread of COVID-19 and (b) number of quarantines?
In addition to the effect of informant accuracy, it is important to ask if there are any probable and useful alternative strategies for collecting contact tracing data. Because contact tracing is a respondent-driven technique, we consider how a snowball sample may be a useful strategy for finding potential infectious contacts (Johnson, 1990). In the context of social network research, Borgatti et al. (2013) defined snowball sampling as the process of gathering network data ‘on any qualifying actor with a tie to any actor already selected, up to K waves or until quotas or cost limits are reached’ (p. 34). Such a technique has been used in other studies attempting to gain access to at-risk populations. For instance, Kendall et al. (2008) were able to get up to five waves of data for an HIV at-risk population in Brazil.
In the context of COVID-19, a qualified actor is someone who has tested positive for the virus. When the actor is interviewed for contact tracing data, the practitioner will elicit from the actor a set of alters and will inform those alters to get tested and quarantine. This traditional method can be described as first-level contact tracing as it revolves around making use of the first wave of data from the initial infected individual. Second-level contact tracing would simply repeat the process for the set of initial alters, which is to say it would contact trace the contacts. A practitioner would then elicit alters from these contacts and require them to get tested and quarantine for the time being, a strategy used in the early stages of the COVID-19 outbreak in South Korea (Schneider et al., 2020). Third-level contact tracing would repeat the same process for the new alters and so on.
One of the key advantages of snowball sampling is accessing hidden and ‘hard to reach’ populations (Browne, 2005), such as those individuals who are spreading COVID-19 but may not yet know it. In this sense, it allows researchers to work backward to catch up on the viral diffusion process. As such, we ask:
R2: What is the relationship between multi-level tracing and (a) the spread of COVID-19 and (b) the number of quarantines?
Finally, informant accuracy and multi-level tracing may be interdependent on their effectiveness at mitigating the spread of COVID-19. That is, while they may have main effects by themselves, they can be dependent in complex ways. For instance, different levels tracing may have different thresholds of informant accuracy to contain the spread. To our knowledge, there is no empirical work analyzing the relationship between informant accuracy and different levels of tracing. As such, we ask our third research question:
R3: How do informant accuracy and multi-level contact tracing interact to influence the (a) the spread of COVID-19 and (b) the number of quarantines?
Agent-based models (ABMs) are constructed to ‘simulate simultaneously multiple agents, or actors, who behave in ways to that impact one another’ (Larson, 2012, p. 84). They are particularly useful for understanding how simple rules guiding agents’ behavior or other manipulations of input factors can influence the emergence of complex social structures (Corman, 1996). ABMs are very common in public health and epidemiology, where the goal is usually to create models to understand infectious disease dynamics that can help inform policy and responses to epidemics (e.g., Epstein, 2009).
An ‘egocentric’ ABM does not radically depart from the basic mechanics of social simulation because ABMs already typically focus on interactions between agents. However, the current egocentric approach puts special emphasis on the ENA portion as factors that can be manipulated (i.e., informant accuracy and level of tracing). In the current model, egocentric interaction networks are collected during each iteration (i.e., a day), allowing to user to inspect any network-interaction history of any agent.
What follows is a description of the key building blocks of the current ABM, which we call ConTrace (see Fig. 1 for visualization). Following Hammond’s (2015) suggestions for best practices of reporting ABMs, we specify the following: Properties, Actions, Rules, Time, and Environment (PARTE). Documenting the details of any ABM can sometimes be overwhelming. As such, what follows is an abbreviated summary. More fine-grained details can be found in an online appendix1, including additional robustness checks, model verifications, and the ConTrace model in NetLogo.
The developed ABM of infectious disease tracing and quarantine is based on a slightly modified susceptible-exposed-infectious-removed (SEIR) model with an expansion by adding the procedures of contact tracing and quarantine. The SEIR model, an expansion of SIR model, divided the population into four groups: (i) susceptible, (ii) exposed, (iii) infectious, and (iv) recovered. Individuals in a population could go through all the four phases during an epidemic outbreak.
Figure 2 illustrates the structure and flow of the model. Ovals represent human agents and rhombuses represent model decisions. A ‘susceptible’ contact moves into ‘exposed’ when they have made contact with somebody that has COVID-19. Exposed contacts now have the ability to be traced when the previous infectious agent becomes symptomatic, goes to the doctor, and gets contact traced. The odds that the exposed contact must go under quarantine will depend on the level of informant accuracy. If the exposed contact does not get traced, they remain in the system. All the while and depending on the transmission and asymptomatic rate, they may continue to spread COVID-19 if infected, and undergo contact tracing when symptoms develop (i.e., incubation period is over).
Properties represent the mutable and immutable attributes of agents in the system, which can also be observable and unobservable to other agents. In the present ABM, only one type of agent, people, is used. However, the agents are classified into six groups – (i) susceptible, (ii) non-infected contacts, (iii) presymptomatic contacts, (iv) asymptomatic contacts, (v) patients, and (vi) recovered – based on four attributes (see Table 1). The classification is made by considering both the traditional grouping in the previous studies and current modeling functions. For example, it is necessary to separate people who have never been exposed to the disease and who have been exposed but not infected. We regard the former as ‘susceptible people’ and the latter as ‘non-infected contacts’. They both remain susceptible to the infectious disease but only the exposed contacts may be traced and quarantined. We define the ‘contact’ state of the ‘patients’ and ‘recovered people’ as false even though they were exposed to the disease. This arbitrary state setting allows us to focus on tracing and quarantining the contacts in the model. The deaths of the disease do not have a category as they disappear from the model.
The agents are created in the setup procedure of the model. When setting up the model, a defined number of agents are created and randomly distributed in the simulation window. All of them are susceptible people except one presymptomatic individual. To help visualize the process, we color code the susceptible people as green, the non-infected contacts as magenta, the presymptomatic and asymptomatic individuals as yellow, the patients as orange and dark red, and the recovered people as blue (see Fig. 3). When an agent is isolated, i.e., a patient, or quarantined, i.e., a contact, its shape changes from a person to a sheltered person.
The catalog of behaviors that each agent performs within the simulation is actions. There are three basic actions that the agents take in the simulation. The first is mobility, which refers to how agents move around in the environments. Like previous SEIR simulations, we assume agents move a certain distance in a random direction. The second is interaction, which assumes that agents are interacting with one another in a way that allows for potential transmission of COVID-19. For instance, John Hopkins University defines contacts as sustained interaction within six feet for at least 15 min (Gurley, 2020). Through being mobile and interacting, agents can change the properties of other agents to represent the viral transmission. Consequentially, this has a change on the environment because infected agents will eventually remove themselves from the system and quarantine, where they will either become immune or die.
The resulting contact network is fixed, meaning that interactions do not follow a stochastic model for new selection of ties as in the case of inferential network models (e.g., a stochastic actor-oriented model): the placement of the agents is random. However, because the random placements of objects in a fixed space tends to produce clusters in what is commonly known as the clustering illusion (Gilovich, 1991, pp. 19-20), what is emerges is a typical ‘small-world’ world network with an above-average clustering coefficient. Users can also specify a ‘traveler’ percentage to manipulate the amount of agents that travel to random portion of the space, unrestricted by their current mobility settings. The purpose of this function is to introduce potential super-spreaders that travel more across the space to make more contacts across clusters (i.e., brokering). Indeed, the current default setting (mobility = 2 and travelers = 10%) tended to produce a contact network that could not reject the assumption that it comes from a power-law distribution (GOF = 0.049, p = 0.67) according to Clauset et al. (2009) distribution comparisons.
In general, the lower the mobility settings, the more clustered the contact network will be because the agents are more restricted to their initial local placements. Likewise, the higher the traveler settings, the more centralized the network will be until that value reaches 50%. That is, after 50%, the majority of agents will have unrestricted mobility. For instance, if set at 100%, a Bernoulli random contact network will be extracted because 100% of the agents are placed at random areas of the map after each interaction (Table 2).
Rules are the heart of any ABM. They define ‘how agents choose an action, update properties, and interact with each other and their environment’ (Hammond, 2015, p. 176). Given we aim to investigate the effect of contact tracing, we begin with the basic SEIR rules followed by the inclusion of contact tracing rules.
Before articulating specific rules, the following assumptions are made:
We assume agents’ backgrounds, such as age, gender, occupation, health history, etc., are uniform as these features should not determine whether they should be traced or quarantined.
We assume agents’ mobilities are uniform to simplify the model.
We assume patients, when under quarantine, are so well isolated that they do not infect other people in this model.
We assume the recovered people are fully immune to the disease as we focus on the effect of quarantine during one epidemic outbreak event.
The basic SEIR model rules are set as below in each tick, which we interpret as a full day:
All agents move a certain distance in a random direction, unless a traveler percentage is set.
Each presymptomatic or asymptomatic ego defines all susceptible people and unquarantined non-infected alters within its infection radius as its contacts (Fig. 4) and record these agents’ IDs in its egocentric contact-history list. At a transmission rate, the presymptomatic ego infects one of these contacts and records the infected agent’s ID in its infection-history list.
If the presymptomatic individuals pass the incubation period, they become patients and are immediately isolated.
If the patients pass the disease period, they either recover and become immune, or die at the fatality rate.
The asymptomatic individuals have no symptoms. Unless they are traced and quarantined, they infect the susceptible people and non-infected contacts within their infection radius. They become recovered and immune after 14 days.
The main rule differences between our models and other SEIR models are that the presymptomatic and asymptomatic individuals are the only agents who infect others.
Since all presymptomatic individuals have a constantly updated contact-history and infection-history in this model, when they enter the patient phase, we can trace all their contacts and then quarantine part of or all the contacts. The tracing rules in each day are set as below:
Identify the new patients and generate a full contact list based on the contact-history of all these new patients. The contacts in this list are regarded as the 1st-level contact.
Examine the ‘infected’ status of the 1st-level contacts, if any presymptomatic or asymptomatic individuals are found, trace their contacts using their contact-history and generate the next level contact list, regarded as the 2nd-level contacts. We found some contacts could have been included in the 1st-level contact list, because a person may be counted as the contact by more than one patient. If this happens, we exclude the contacts who have been included in the 1st-level contact list.
Examine the ‘infected’ status of 2nd-level contacts, if any presymptomatic or asymptomatic individuals are found, trace their contacts using their contact-history and generate the 3rd-level contact list. Exclude the contacts who have been included in the 2nd-level contact list.
Repeat the above procedures until all contacts are traced.
In the model, we can test the effect of quarantine up to different contact levels. The quarantine rules are specified below:
Calculate the total number of contacts at a certain level based on the particular contact list.
Use the total number and the defined quarantine rate to calculate the number of contacts to be quarantined. If the number has a decimal, round down the number.
Randomly choose and quarantine the number of contacts in the list.
When quarantining beyond 1st-level contacts, we identify the presymptomatic and asymptomatic individuals from the quarantined agents, trace their contacts, calculate the number of contacts to be quarantined in the same way as stated above, and then quarantine that number of the contacts.
Time refers to the unit of analysis representing a passage of time. Operationally, it is usually represented by terms such as ‘ticks’, ‘iterations’, or ‘rounds’. In the current simulation, a ‘tick’, for interpretation, represents a 24-hr day. Moving from day 1 to day 2, the model is put into action. What happens on day 2 will have implications for what happens on day 3 and so on. For instance, if a contact is infected on day 33, the incubation period of the contact will be cumulated when day 34 begins.
Finally, the environment represents the geometric space and the components in the background. The current simulation is defined by (i) size of the space (i.e., world resolution) and (ii) number of agents. In other words, agents are moving and interacting with one another in an open space, defined by how big it is and how many other agents are also included.
The developed model is verified in three ways. First, we carefully go through the model procedures to ensure the conceptual rules are properly translated into the programming codes. Secondly, we compare our data on susceptible, infected, and recovered groups with the data from the classic epidemic models and confirm that our model can produce a typical epidemic data pattern for these people groups. Third, we consult with experts in Public Health to ensure the model elements and assumptions are appropriate.
Once the model is verified, we run a series of one-factor-at-a-time (OFAT) tests (Ten Broeke et al., 2016) to validate the model and calibrate the following two emerging parameters: (i) number of contacts per person and (ii) the basic reproduction number (R 0) of COVID-19. Additional details on these tests are included in the Online Appendix2, the main goal is to find settings by exploring reasonable values with respect to the transmission rate, population size, and world resolution that would reproduce typical outcomes related to interaction contacts in the social network analysis literature and R 0 (i.e., average number of people an infected person infects) in the epidemiological literature.
After calibrating our model, we found that, when the world resolution is 30 × 30 with a population size of 1,000, an agent with an infection radius of 4 may have 14 to 15 contacts, which somewhat reflects the average contacts in the USA and European countries reported by social network and tracing studies (e.g., Del Valle et al., 2007; Mossong et al., 2008; Rothwell, 2020). However, there is uncertainty regarding the true R 0 value of COVID-19 (Liu et al., 2020a, b). As such, we report our testing results for three situations.
Based on a meta-analysis of COVID-19 R 0 estimation studies, Liu et al. (2020a, b) conclude that the best-guessed estimation seems to be a number between two and three. Several studies seem to vary on the lower and higher end of that range. For instance, a number of studies have found estimated the typical R 0 to be somewhere near 2.20 (see Table 3). Thus, in situation one, we proceed with settings that produce an average R 0 of 2.11. However, some studies have estimated the R 0 to be slightly higher at 2.50 (Imai et al., 2020), 2.55 (Majumder and Mandl, 2020), and 2.68 (Wu et al., 2020). As such, we produce a second situation that produces R 0 values at an average of 2.56 and can interpret these settings as the higher-end R 0 context.
However, in these two contexts, it is assumed that COVID-19 is spreading with little public health interventions to slow down the spread. Given the rise of measures like mask-wearing, we include a third situation which assumes a similar type of intervention. For instance, Li et al.’s (2020b) simulation suggests that if about half the population is regularly wearing masks, the R 0 can drop to between 1.60 and 1.70. As such, we modified the transmission rate to produce a third situation of an average R 0 of 1.62 (see Table 3).
To manipulate informant accuracy, we simply adjust the ‘%-contacts-quarantined rate’. For instance, when a presymptomatic patient (i.e., ego) finally feels their symptoms and becomes sick, we assume they get contact traced and their contacts (i.e., alters) are told to immediately quarantine for 14 days. To add heterogeneity to this manipulation, the set percentage is the mean of a normal distribution (SD = 5%) and a value is drawn from this distribution.
The user can also manipulate false positives. In this case false positive mean that an agent being contact traced names an agent not on their contact list to go under quarantine. Because people tend to error less with false positives (Bernard et al., 1982), we set the rate to be drawn at a normal distribution with a mean of 10% (SD = 5%). The online appendix reports results ranging from 0 to 30%.
To manipulate the level of contact tracing, we analyze 1st, 2nd, and 3rd-level waves of a snowball sample for each ego that is infectious after their incubation period. 1st-level tracing represents the traditional baseline approach. Infectious egos are contact traced and their alters are quarantined. In 2nd-level tracing, the alters of initial set of alters are traced as well and they are correspondingly quarantined. Finally, in the 3rd-level tracing, the alters from the second wave are traced as well and then quarantined (see Fig. 6).
There are a variety of outcomes researchers can measure to gauge the severity of an epidemiological outbreak (Rainwater-Lovett et al., 2016). For our purposes, we are most interested in strategies for containment that would prevent the need for herd immunity. As such, prevalence, which is simply the percentage of total infections in a population, is used as the key outcome. There is no precise rule-of-thumb for a critical value of prevalence but obviously, the lower, the better3.
However, low prevalence may come at the cost of excess quarantine orders. To account for this, we also look at the number of quarantines that were administered because of the contact tracing. This could roughly be interpreted as a measure of efficiency because the ideal strategy would be to have the lowest infection prevalence paired with the least amount of quarantine orders. For instance, if everybody just quarantined all the time, it would prohibit the spread of the disease. However, in real-life, that would not be an ideal policy as excessive quarantines would come at drastic social (e.g., isolation) and economic costs (e.g., halt in economic output) (e.g., Elmer and Stadtfeld, 2020). Indeed, this number can even go over 100% because somebody can be quarantined more than once if they are listed as a contact multiple times over the duration of the simulation.
The model (ConTrace) is created using NetLogo 6.1.1 (Wilensky, 1999) and is available for download in the online appendix. For informant accuracy, we run from 0 (i.e. basic SEIR model) to 100 percent informant accuracy, increasing the value in increments of five (e.g., 0, 5, 10, etc.). For each value, the simulation is run 30 times. Finally, we repeat the process for 1st, 2nd, and 3rd-level contact tracing and for when the R 0 values are 1.62, 2.11, and 2.56. This results in 5,400 simulations to answer R1, R2, and R3. The median values of prevalence and number of quarantines are the key outcomes of interest.
The three situations (S1, S2, and S3) represent estimates on the spread of COVID-19 using three different R 0 values. S1 represents the lowest spread condition where the average R 0 is 1.62. This R 0 assumes that some sort of public health intervention like mask-wearing is in place (Li et al., 2020a, b). On the other hand, we manipulate two different R 0 values between two and three, the best range estimates of the typical R 0 of COVID-19 (Liu et al., 2020a, b). In the lower-end situation (S2), the average R 0 value is 2.11. In the higher-end situation (S3), the average R 0 value is 2.56. The main effects of informant accuracy are plotted in Figure 7 and the main effects of contact tracing level are plotted in Figure 8. To understand how these two factors interact, the three-way interaction (informant accuracy, level of contact tracing, and situation) is plotted in Figure 9.
Because collecting egocentric data is challenging, R1 asked how informant accuracy would influence the spread of COVID-19. Figure 7 plots the main effect of informant accuracy across each situation. Two key trends emerge. First, there seems to be a clear linear trend: the higher the informant accuracy, the less prevalent COVID-19 is and the less people have to quarantine. Second, the steepness of the slope depends on R 0 in each situation. That is, the lower the R 0, the less the accurate the informant needs to be.
For instance, to get at under 10% prevalence, when the R 0 = 1.62, a patient only needs to be 45% accurate with their contacts. However, when the R 0 = 2.11, that patient needs to be at least 75% accurate. Moreover, this context comes at a cost of quarantining more people as well (near 40%). When the R 0 = 2.56, informant accuracy has less of an impact. For instance, even with 100% informant accuracy, about half will have to go under quarantine.
R2 asked what impact multi-level contact tracing would have assuming a traditional snowball design. To look at the main effect of level of contact tracing, Figure 8 plots the average prevalence and quarantine values across each of the three situations. The clear theme that emerges is that are significant differences in prevalence and quarantines between 1st and 2nd-level tracing, but smaller different between 2nd and 3rd.
However, there is one caveat. There appears to be only small differences in prevalence, but significant differences in quarantines when comparing situations where the R 0 = 2.11 and where the R 0 = 2.56. That is, overall, the main effect of multi-level tracing appears in the context of S1, when the R 0 = 1.62.
Figure 9 plots the main results across the interaction between informant accuracy and level of contact tracing. Overall, this graph tells a much more complete story. The three situations are separated by color: blue lines represent 1st-level tracing, red lines represent 2nd-level tracing, and green lines represent 3rd-level tracing. The two outcomes are distinguished by line type: solid lines represent prevalence and dashed lines represent quarantines. The Y-axis represent percent of the population with respect to these two outcomes. Quarantines can go over 100% because an agent can be quarantined more than once (e.g., after their 14-day quarantine is over, they can be traced again if they made contact with a different infected agent). Finally, informant accuracy is plotted on the X-axis.
To begin at answering R3, we will look at 1st level contact tracing across each context and see how prevalence and quarantines (Y-axis) differs across levels of network accuracy (X-axis). Overall, the results are contingent on the transmission dynamics of COVID-19. When the R 0 = 1.62, a key inflection point emerges at about 75% informant accuracy with prevalence at under 5% with less than 20% of the population having to go under quarantine. When the R 0 = 2.11 and the R 0 = 2.56, there really exists no viable strategy for 1st-level contact tracing. Even at 100% informant accuracy, the outbreak is not contained as vast majority of agents end up contracting COVID-19, even when most agents have to go under quarantine at least once. In other words, it is the worst of both worlds: heavy infection rates and lots of quarantine mandates.
Overall, 2nd and 3rd-level tracing look striking similar. For instance, when R 0 = 1.62 for 2nd and 3rd-level tracing, the key inflection point seems to drop to 45% informant accuracy to obtain levels of prevalence under 5% with less than 20% of the population having to go under quarantine. In other words, patients only have to only be about half right about their contacts if those contacts are traced as well. When R 0 = 2.11, similar critical informant accuracy level rise to about 75% for both strategies. However, when R 0 = 2.56, 2nd-level tracing needs to be about 90% accurate for any viable results, but can dip to about 75% informant accuracy without needing too many excessive quarantines (~20%) if 3rd-level tracing is used.
In light of these results, we see three general themes, all revolving around a critical value of 75%, but in different contexts:
Traditional 1st-level contract tracing is only really effective when other interventions are taking place like mask-wearing to drive down the R 0 near 1.62. In that case, patients need to be, on average, 75% accurate in naming their contacts.
When the R 0 is at the lower end of between two and three (e.g., 2.11), 2nd-level tracing may be a viable strategy. In this case, patients will have to be, on average, 75% accurate in naming their contacts.
If the R 0 is at the higher end of between two and three (e.g., 2.56), contact tracing alone will not be effective at mitigating the spread of COVID-19. The only viable strategy is 3rd-level contact tracing in which respondents will still need to be at least 75% accurate.
Sensitivity analysis is ‘a method that measures how the impact of uncertainties of one or more input variables can lead to uncertainties on the output variables’ (Pichery, 2014). Here, we explore how two input factors not initially included in the original model might influence prevalence rates: (i) timing of contact tracing and (ii) asymptomatic cases. We view these as crucial factors because, in real-world settings, it is unclear when contact tracing is implemented because of fluctuations in receiving back test results and because of resources available to do the tracing. Likewise, it is unclear what percentage of COVID-19 cases are asymptomatic, meaning that asymptomatic individuals are highly unlikely to quarantine. In these tests, we manipulate the timing of the contact tracing and percent of asymptomatic cases while keeping the ideal setting of network accuracy at an average of 100% as a constant to demonstrate general trends (see Fig. 10).
The results show that the impact of contact tracing on the spread of COVID-19 is sensitive to both the timing of contact tracing and the percentage of symptomatic cases. When R 0 = 1.62, there is more time to spare at increases in prevalence only take shape after the third day. However, this is still dependent on the amount of the population that is asymptomatic. Once the population is at least 50% asymptomatic, the timing nor the method of tracing has little impact as the majority of the population will become infected.
In 2nd-level tracing, the impact of timing and the R 0 can be significantly relaxed, but only if the asymptomatic rate is not over 50%. For instance, when R 0 = 1.62, noticeable differences in prevalence do not emerge until the fourth day. In higher R 0 situations (e.g., 2.11), that begins to change to the third day. Finally, in even higher R 0 situations (e.g., 2.56), contact tracing seems to be ineffective even if it occurs on the same day.
Because contact tracing is essentially a form of ENA, the current research sought to investigate the impact of contact tracing on the spread of COVID-19 through an egocentric, agent-based model. After creating a basic SEIR model of COVID-19 for three different contexts, we manipulated two aspects of contact tracing: (i) informant accuracy (i.e., how accurate the contact tracing was in eliciting alters) and (ii) level of tracing (i.e., the number of snowball waves used). Finally, we did a post hoc sensitivity analysis to see how the (i) timing of contact tracing and (ii) amount of asymptomatic cases influenced the spread of COVID-19 as well. Below, we will discuss the results in line with research on informant accuracy, best practices for collecting egocentric data collection, and wider literature regarding contact tracing.
One general theme of the current results is that higher levels of informant accuracy results in better mitigation of COVID-19 and less quarantines. However, how accurate are informants with respect to their contacts? This question may be difficult to answer because of all the different ways informant accuracy is measured, especially in the seminal BKS studies. For instance, Bernard et al. (1982) provide 15 different informant accuracy measures. In the context of contact tracing and the current study, the most relevant measures are omission errors, which Bernard et al. (1982) label as T2: the number people not recalled who were actually communicated with (i.e., number of true positives). In their electronic information exchange system (EIES) study of 57 users (i.e., students and scientists) of EIES, the average T2 percentage was 66%. In other words, the informant accuracy rate in line with the current study would only be 34%. Contrastingly, in a study of teletype workers (Killworth and Bernard, 1976), informant accuracy was at about 50%. For reasonable outcomes of COVID-19 prevalence, this would mean that there would need to be mask-wearing and second-level tracing to be effective.
However, some contextual variables give reason to suspect that informants might be slightly better able to remember their contacts during contact tracing. Most notably, the BKS studies are often during longer periods of time ranging from several weeks to months. It can be extremely easy to forget who you may have bumped into several months past. On the other hand, contact tracing poses short-term, rather than long-term memory challenges. According to John Hopkins University, typical contact tracing only requires patients to remember contacts from the past five to seven days, the average incubation period (Gurley, 2020). In any case, more research may be needed to gauge typical baseline T2 values of informant accuracy for contacts from the past several days.
Nevertheless, the current research has demonstrated that informant accuracy plays a big role COVID-19 prevalence and the number of quarantines issued. For instance, when the R 0 = 1.62, a decrease from 75 to 70% informant accuracy jumps from under 5% to just over 40% and moves the number of quarantines from just under 20% to an unfeasible percentage of over 100%. As such, what are some ways in which informant accuracy can be improved? The next section discusses the current results in line with these important efforts.
Improving informant accuracy for ENA has long been a concern for social network researchers. For instance, Hsieh (2014) formalized the retrieval cue approach for ENA. This approach assumes that the ‘successful recall of an event depends primarily on how well the retrieval cues match the event’s representations in one’s memory organization’ (p. 3). A retrieval cue is any additional piece of information that helps an ego remembers a past event. The retrieval cue approach mirrors Tulving’s (1974) theory of cue-dependent forgetting. This theory assumes that forgetting (i.e., the inability to recall something in the present that could be recalled in the past) does not mean that the memory is lost, but only temporarily inaccessible. In Hsieh’s study, participants were randomly assigned to a retrieval cue condition in which they were instructed to look at their (i) cell phone contact list, (ii) last 30 emails, and (iii) friend list on Facebook and Twitter as retrieval cues. Hsieh’s results found that this approach yielded more contacts than a baseline approach.
However, although Hsieh’s (2014) recall aid might be a good strategy for remembering relational states, it may not be as useful for certain relational events. For instance, it is unlikely that a random elevator conversational contact would turn up in somebody’s cell phone, social media, or e-mail. As such, are there certain recall aids that may be better tailored for recalling relational events, rather than states?
One possibility may be the context-based recall aid designed by Bidart and Charbonneau (2011). The context-based name generator begins by asking individuals about social context cues in everyday life (e.g., work activity, shopping, home life). Then, once relevant contexts have been triggered, contact names corresponding to those contexts are generated. As Bidart and Charbonneau explain, the context-based approach is motivated by field theory (Feld, 1981). Field theory regards situated action (e.g., activity foci) as a key unit of analysis. For instance, if certain activity foci can be activated (e.g., transportation), more nonchalant relational events might be better able to be recalled. Indeed, Pilny and Huber (2021) tested three different contact tracing aids and found that the context-based instrument significantly elicited more contacts and places visited. If informant accuracy significantly influences the efficacy of contact tracing, then more work is indeed needed to develop recall aids more tailored toward relational events, rather than relational states.
Second, the timing of contact tracing matters greatly. The current results largely replicate the models put forth by Kretzschmar et al. (2020), who find significant delays in the effectiveness in contact tracing even after one day. After three days, contact tracing essentially has no effect unless in contexts of mask-wearing. This is the case because contact tracing is inherently dependent on not just quarantining contacts, but when those contacts are quarantined. If there is a significant delay, then it simply allows those contacts to remain in the system and further transmit COVID-19. In other words, the damage has already been done.
For applied implications, the results of this study strongly suggest that practitioners in charge of training contact tracers spend time working with techniques to improve informant accuracy. We suggest the following techniques to help improve informant accuracy:
Encouraging the use of contextual recall aids (e.g., activating relevant foci that puts somewhat random interactions in context).
Improving basic qualitative interview techniques (e.g., the use of probing questions, establishing rapport and trust).
Implementing contact tracing first to reduce interviewee fatigue.
Clearly establishing the definition of a contact to reduce differential interpretation (e.g., interactions within six feet for at least 10 min).
The results of the simulations found that multi-level tracing was more effective at reducing the prevalence of COVID-19 than traditional single-level tracing. The reasoning seems quite simple. Not only were more contacts traced, but these contacts were not just random; they were the contacts of the initial contacts. That is, in second-level tracing, we contact traced the contacts who are at much more risk of contracting COVID-19 because they were in contact with someone who made contact with an infectious person.
Moreover, multi-level contact tracing was less sensitive to informant accuracy, timing, and percent of asymptomatic cases. However, is multi-level contact tracing practical in real-life? Though some countries like South Korea have used versions of multi-level tracing (Schneider et al., 2020), they require fast implementation in order to be effective. For instance, consider if the amount of average contacts is 14 in the simulations. Not only does a tracer have to contact to those 14 contacts to notify them to quarantine and get tested, but the 14 contacts also need to be traced as well. Then, the contacts of those 14 contacts need to be notified to quarantine and get tested, resulting in an additional 196 notifications (i.e., 142).
The growing needs for quicker contact tracing have sparked interest in using advanced information and communication technologies. For instance, smartphones and their applications have been utilized to trace contacts and inform individuals if they have been near an infected individual with Ebola (see Danquah et al., 2019). Contemporarily, similar technology is in development by Apple and Google to trace the spread of COVID-19 via a phone’s Bluetooth (Greenberg, 2020), with other countries, such as Singapore, distributing wearable dongles to do so Asher, (2020). While these novel approaches are exciting, they are still fraught with complications and privacy issues (Danquah et al., 2019). Indeed, a recent survey done by Avira found that 71% of Americans would not be willing to use a contact tracing smartphone application.
Another approach to quickly administer contact tracing may be to move the procedure from in-person phone calls to electronic surveys. For instance, there is a considerable amount of ENA research using surveys to collect data on alters going back to the General Social Survey (Burt, 1984). However, there are still questions as to how reliable surveys are compared to the more ‘gold-standard’ method of personal interviews. Nevertheless, some recent research may be beginning to make progress in this vein.
For instance, Hogan et al. (2019) report on the efficacy of Network Canvas, a digital egocentric network data collection tool designed to help ease the burden of collecting such egocentric data. In a similar vein, Hollstein et al. (2020) tested four different egocentric tools that emphasized visualization. They found that most participants preferred concentric circles over funnel tools and free designs, even though these instruments did not significantly influence network size or composition (see also Eddens and Fagan, 2018). For multi-level tracing, we recommend that researchers continue to explore designing electric instruments for egocentric networks, but perhaps also pay more attention to completion times and effort needed to complete the tracing. As the current results show, multi-level tracing is more robust against informant accuracy, so accuracy can be sacrificed a bit if the instrument can be deployed quickly and to many people. For instance, when the R 0 = 1.62, respondents only need to be about 50% accurate in second-level tracing.
There are several limitations worth noting. The most obvious is that the contact network is largely ‘fixed’ and does not follow a stochastic selection process that includes endogenous factors like closure/centralization or exogenous tendencies like mixing (e.g., attribute homophily). This is especially notable because the social network structure underlying any infectious disease is just as important as a disease’s biological properties when determining how contagious that disease is with metrics like the R 0 value (Hébert-Dufresne et al., 2020). This suggests that the results of the current simulations cannot be generalized beyond similar network structures reported in Table 2. This is important because different contexts, communities, and cities may have varying structures of interpersonal contact. For instance, de Anda-Jáuregui et al. (2020), using cell phone data, report on the contact network of Mexico City. And although their results show similar levels of centralization and clustering, it was much more fragmented (i.e., had lots of separate components). Future work may consider how manipulating the contact network structure may impact the spread of COVID-19 and how contact tracing can be leveraged with such general network selection tendencies (e.g., Prem et al., 2017).
Finally, beyond the dynamics of the contact network, there are still unknowns related to the dynamic spread of COVID-19. Various studies have reported different R 0, secondary transmission rates, and growth rates related to COVID-19. Without precise values on these transmission dynamics, researchers are taking educated guesses as to how COVID-19 spreads. Indeed, as the current results suggest, different transmission dynamics, such as varying secondary transmission rates, are likely to return different results.
By now, informant accuracy is a well-known and well-established issue regarding social egocentric network data collection (Corman et al., 2021). Contact tracing is a one applied technique of ENA where researchers can move beyond merely demonstrating that informant accuracy exists, but can begin to investigate the consequences of varying levels of informant accuracy. The current research shows that informant accuracy is critical to controlling the diffusion of an infectious disease. Overall, the results suggest that if contact tracing is to be effective, it must be fast, accurate, and accompanied by other interventions like mask-wearing to drive down the average R 0. Moreover, the results show the promise of multi-level tracing because it is more robust to lower levels of informant accuracy. How researchers and practitioners can deploy fast and accurate contact tracing instruments could be a vital next step in helping control outbreaks of various infectious diseases.
The first two authors contributed equally. This research was supported by the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant UL1TR001998. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Anonymous link to Appendix: https://osf.io/493n7/?view_only=dc6bbca1533f4341ad188e5fe08b32c1
The online appendix also details a typical infection network, which closely mirrors a core-periphery infection structure. The over-dispersion parameter (k) is 0.14, in line with typical estimations of the k-value for COVID-19 (Endo, 2020).
Fine et al. (2011). , for example, define critical level of infection prevalence for herd immunity to develop (H c) as a function of the R 0 value of the infection: Based on this equation, the critical prevalence level (H c) for the average-context spread condition (R 0 = 2.11) would be 52.60%. Likewise, the crucial prevalence level for the high-context spread condition (R 0 = 2.56) would be 60.93%. Finally, for the context assuming mask-wearing, we would assume a critical level (R 0 = 1.62) of 38.27%.