Pervasive Therapy: Designing Conversation-Based Interfaces for Ecological Momentary Intervention

The unmet need for mental health treatment has motivated considerable research on the design and evaluation of pervasive technology to support people’s mental health. An enduring idea is the use of conversation-based interfaces to deliver mental health support, which is now a realistic prospect given their widespread use in consumer devices. The ubiquity and varied characteristics of these devices could create a reality where mental health support is only an utterance away, enabling the delivery of ecological momentary interventions—in the moment mental health support in natural settings. We term these systems “pervasive therapy” and in this article consider their future and the open challenges of designing them. Our discussion considers major research directions for their design including intervention and conversation planning, context-aware conversation-based interactions, and the involvement of clinicians. Throughout this article we focus on what a realistic vision is for the near future of pervasive therapy.

The unmet need for mental health treatment has motivated considerable research on the design and evaluation of pervasive technology to support people's mental health. An enduring idea is the use of conversation-based interfaces to deliver mental health support, which is now a realistic prospect given their widespread use in consumer devices. The ubiquity and varied characteristics of these devices could create a reality where mental health support is only an utterance away, enabling the delivery of ecological momentary interventions-in the moment mental health support in natural settings. We term these systems "pervasive therapy" and in this article consider their future and the open challenges of designing them. Our discussion considers major research directions for their design including intervention and conversation planning, context-aware conversation-based interactions, and the involvement of clinicians. Throughout this article we focus on what a realistic vision is for the near future of pervasive therapy. M ental illness is a major cause of disability worldwide; however, many people do not receive treatment. This is due to a combination of structural and attitudinal barriers. Structural barriers relate to the availability, accessibility, and affordability of care. Attitudinal barriers include a reluctance to engage with treatment due in part to the deep and enduring stigma surrounding mental illness. 1 Technology has the potential to address both types of barrier through the development of digital treatments that can be deployed at scale, and by removing barriers to help seeking through the provision of flexible and relatively anonymous intervention options. 2 One compelling vision for the use of technology to support mental health treatment is through the delivery of ecological momentary interventions (EMIs); these are a set of "treatments that are provided to people during their everyday lives (i.e., in real time) and in natural settings (i.e., real world)." 3 As many health issues are related to people's everyday behaviors, EMIs are well positioned to be impactful through promoting health behavior change. They have been used to support people with both their physical and mental health with goals relating to exercise, diet, substance use, and anxiety; they can be used independently as well as in conjunction with other treatments. In the case of supporting people with their mental health, EMI can be used between sessions with clinicians as with the established psychotherapy practice of "homework" activities. 3 As demonstrated by Balaskas et al., 4 research is exploring the use of EMI to support a range of therapeutic intervention strategies including acceptance and commitment therapy (ACT), cognitive behavioral therapy (CBT), and motivational interviewing. With advances in sensor technology, mobile computing, and machine learning, the feasibility of EMI technology is dramatically improving. 4 What is more, advances in speech and language technologies are increasing the viability of conversationbased interaction whereby people and computers interact through the exchange of textual or spoken natural language. With the emergence of voice assistants and chatbots, devices affording such interaction have surged in popularity to the extent that people have near constant access to them. These interfaces offer a promising foundation for realizing the enduring vision of conversation-based interfaces being used to deliver mental health treatment. 5 Part of this vision's allure is that conversation-based interaction could resemble, and draw upon, established talk-based mental health treatment. Furthermore, research has repeatedly found that talking and writing about one's experiences can positively impact both their physical and mental health. 6 Taking these technology developments together, we can envision the use of conversation-based interfaces as part of the delivery of EMIs. A future in which natural language forms part of our interactions with mental health interventions, which leverage the full range of affordances of pervasive computing technology. We term technology delivering such interventions as pervasive therapy systems; systems which use conversation-based user interfaces to deliver in-themoment interventions for mental health.
Pervasive therapy systems may in the long term enable novel approaches to patient-centered mental health support; this would however require building a substantial and credible clinical evidence base. For the moment, we believe that the most value will be found in using them to advance the delivery of established therapeutic practices. Similarly, there is a tendency when considering advanced computer systems, particularly those using artificial intelligence (AI), to envisage a jump from current practice to full automation. However, doing so ignores many different possibilities for the role of such systems in treatment, and the potential for more collaborative approaches combining the complementary strengths of clinicians and technology, not least in addressing difficult issues of responsibility, risk, and oversight.
In this article, we consider a future vision for conversation-based EMI systems, which we term pervasive therapy, and the open challenges arising from this. We consider the architecture of these systems and identify three major research directions, which will contribute to their emergence. We also discuss the potential challenges of introducing these technologies. Our aim with this article is to identify ambitious yet realistic research directions for research and development in the short to medium term, advancing toward a long-term vision to increase the availability of care.

PERVASIVE THERAPY SYSTEMS
By integrating the capabilities of conversation-based interaction and EMI architectures, we constructed a conceptual architecture for pervasive therapy systems (see Figure 1). The purpose of this architecture is to support articulation of, and reasoning around, the design decisions involved in developing such systems in order to inform future research and system design.
A recent review of EMIs for mental health 4 observed that existing systems typically collect data about the user from the sensors of their smartphone and connected wearables, as well as by prompting the user to interact with smartphone applications. The collected data is then used to infer opportune moments to intervene, and potentially choose which intervention components to deliver.
Integrating this approach to EMI with a conversation-based interaction architecture requires two major changes. First, it needs capabilities for operating with natural languages (i.e., processing and generating utterances). Second, it has to allow for planning of interactions at a conversational level, as well as at an intervention level. At the intervention level, a choice needs to be made regarding the intervention to be used, and for a given intervention, planning how and when to deliver it. At the conversational level, the system needs to plan interactions, which deliver the selected intervention components. To support reasoning about these two levels of operation, we conceptualize the general architecture of conversation-based EMI systems as containing two "controllers," which are illustrated in Figure 1. An intervention controller, which is responsible for planning the treatment, delegates intervention actions to a conversation controller, which is responsible for delivering the actions to the patient using conversation-based interaction.
The architecture needs to recognize the wide range of devices through which the patient and system could interact. This range of devices has the potential to improve both the availability and sensing capabilities of the system, it could include smart speakers, computers, cars, smartphones, and wearables. With this diverse range of conversation-based interfaces comes the design challenges of when and how to use each device. To guide this use, systems could draw on sensor-informed models of the patient and their environment, as well as clinician and patient configurations.
Key to this pervasive therapy vision is the involvement of clinicians. Previous EMI research has involved clinicians in a range of ways. 4 Clinicians have been involved in tailoring treatments to align with patient needs, and have provided support by communicating with patients through the system, for example, by providing patients with feedback. To accommodate clinician involvement, we include in the architecture a clinician interface. The interface could support clinicians in a variety of roles, ranging from light-touch supervisory monitoring to highly blended delivery in which the EMI is used as an adjunct to treatment, for example by providing support between in-person therapy sessions.
Three aspects of this architecture that offer important directions for future research are as follows.
1) The integration of intervention and conversation planning; 2) designing context-aware, conversation-based interactions; and 3) designing for clinician involvement.
We discuss the key considerations of these directions in the following sections.

INTERVENTION AND CONVERSATION CONTROLLERS
We have conceptualized conversation-based EMI systems as being managed by two controllers: an intervention and a conversation controller.
The intervention controller manages the EMI treatment, which involves deciding which intervention components to deliver and when to deliver them. Using the clinician interface, clinicians can configure the intervention controller to run a treatment program aligned to the patient's needs. A range of different treatment programs would be available to support different therapeutic intervention strategies (e.g., ACT, CBT, and motivational interviewing), adaptable to both the health challenges experienced by the patient (e.g., anxiety), and to who the patient is (e.g., an adolescent). Common intervention components, such as mood logging, may be shared across treatment programs.
The patient data and model represent the system's understanding of the patient and their context. They would describe and track factors relating to the patient's health, treatment plan, and interaction preferences, as well as their momentary state and activities.
The intervention controller uses the inferred state of the patient in conjunction with knowledge of the patient's treatment plan and progress to determine if the conditions for a particular intervention action are satisfied. For example, on inferring that a patient experiencing depression has exercised, the system could attempt to initiate a reflection activity as part of a CBT programme. Upon determining that an intervention action should be delivered, the intervention controller passes the action to the conversation controller.
The conversation controller enacts the intervention action it receives using conversation-based interaction. There is considerable variation within the design of conversation-based interactions, particularly when multiple device types are involved. For example, interactions may differ by their use of text or speech, social talk, and by the formality of language. To align the interaction characteristics to the patient, the conversation controller can take configurations from both the patient and clinician. Common configurations that suit specific patient groups well (e.g., young adults) can be made available as starting configurations. Informed of the inferred state of the patient through the patient model, the conversation controller attempts to construct appropriate interactions to be performed by the connected devices. These interactions may consist of multiple exchanges between the patient and system with the patient's input informing the system's following outputs.
The conversation controller operates with natural language. A starting point for creating this part of the system is to keep the conversation-based interactions constrained. Conversations could be highly transactional with the system's outputs fully scripted within the conversation controller and patient input restricted to predefined responses. To enable more sophisticated interactions, we would want the patient to be able to give richer input and the system to adapt its output to them. An example of richer input is the patient interacting with the system using unconstrained natural language. To adapt the system output to the patient, natural language processing techniques could be used to customise scripted utterances to reference the patient and their data. If the interaction involves voice input, then voice recognition will be required. Voice is a rich modality; in addition to extracting textual content from voice input the system may attempt to infer further information, such as the patient's affective state. However, we must be cautious around the use of unconstrained natural language input. Limited input opportunity may frustrate the patient and limit the effectiveness of the system. Conversely, insufficient natural language understanding not only risks a poor patient experience whereby the system capabilities fall short of the patient's expectations, 7 but also misunderstandings pose a risk to patient wellbeing (e.g., leading to inappropriate advice). 8,9 CONTEXT-AWARE,

CONVERSATION-BASED INTERACTION
Pervasive therapy systems have the ability to deliver interventions using a range of devices with conversation-based interfaces that have varied characteristics. Deciding when and how to use which device is hence an important aspect of delivery.

Interaction Prompts
Interactions could be patient initiated; however, systems could also initiate interaction by "prompting" the patient, something EMI systems typically do to encourage engagement. 4 Approaches to prompt timing include prompting the patient at specific, random, or at event-based times, that is, when the system senses a certain situation, for example, using heart rate, sleep, or location data. 4 Sensor data could inform which device is most appropriate to deliver a prompt from. The prompt could be a notification encouraging the patient to initiate, or the first utterance of a conversation-based interaction.

Delivery Modalities
Interventions can be delivered by different modalities to suit the inferred context of the patient. A comparison of text and speech interaction for self-reporting and reflection found both modalities offered value and carried different challenges. 10 While text interaction allows people to take their time considering the prompt and their response, speech interaction encourages immediate response and, although it can be convenient and engaging, risks annoying the user more. 10 A factor affecting modality suitability is the privacy of the patient's environment as people are less likely to engage in speech interaction in a public setting. 7,11 Privacy will be important given the sensitive nature of one's health; however, what constitutes a private setting may differ considerably between users. The capability of a pervasive therapy system to provide opportune interventions is tied to its capability to accurately sense and infer the patient's context (e.g., if other people are present).

Human-Like Interaction
In attempts to create conversation-based interfaces that deliver effective mental health treatment, research has explored emulating successful aspects of human delivery.
A research focus is therapeutic alliance (i.e., the working relationship between patient and clinician), which is associated with positive therapeutic outcomes and has been demonstrated to arise from conversationbased interactions. 9 Methods of fostering therapeutic alliance include social and relational talk, for example, expressing empathy, humour, continuity behaviors, and references to mutual knowledge between the patient and the system. 9 Prior work has found these techniques to have positive effects both on the patient experience and health outcomes. 9 Yet, human-like interaction may not be the best way forward, at least for some patients. Research studying people's experiences with voice assistants is finding that people question the appropriateness of human-like behavior. 7,11,12 Instead of human-like social behavior, direct and utility-focused interaction may better suit some patients. 12 What is more, human likeness may contribute to mismatches between system capabilities and patient expectations whereby the system's capabilities fall short of the user's expectations. 7

Patient Agency
Critical to both the patient's interaction experience and wellbeing is the fulfillment of psychological needs including feelings of competence and autonomy. 13 To support the feelings of competence, we need to ensure that the patient understands how to make use of the system's capabilities, as well as what is beyond them; however, as previously discussed, how to align system capabilities and patient expectations is an open challenge. To support the patient's autonomy, personalization is a promising approach. Patients should be able to customize both when and how they interact with a system. 13

CLINICIAN INVOLVEMENT
A critical design decision for a pervasive therapy system is how clinicians will be involved. Much of the delivery of evidence-based mental health treatment is through professional mental health services, primarily delivered by or with clinicians. Working with clinicians provides a means to leverage their skills, work practices, and judgement to address important issues concerning oversight, responsibility, and risk.
Clinician involvement with the EMI system can occur in two ways. The system may be used as a component of a "blended care" treatment, which combines in-person and digital treatment. 14 In addition, clinicians may interact with the system directly to contribute to the treatment delivery. Involving clinicians in digitally supported treatment can also help challenge perceptions and criticisms of digital treatments (e.g., distancing clinicians from patients) 15 as well as extrinsically motivate patients to engage with the treatment. 16 Higher intensity interventions for people in higher levels of distress cannot realistically be based on unsupervised, standalone technology, and if these technologies are to be used with a broader cohort of patients (beyond those with mild levels of distress), clinician involvement should be considered from the outset.
Miner et al. 15 lay out four high level care delivery approaches. Two of the approaches are the extremes of "human only" and "AI only" delivery. While "AI only" options should be treated with caution, as discussed previously, it has been argued that AI has the potential to mitigate some challenges of accessibility and burnout found in human-only deliveries. 15 Between these extremes is the opportunity for collaborative human-AI delivery. One collaborative approach is "human delivered, AI informed," 15 which would be limited by many of the access challenges of "human only" care. A second collaborative approach is "AI delivered, human supervised" care, which Miner et al. described as "one of the less developed but more alluring ideas of AI psychotherapy." 15 The pervasive therapy architecture outlined previously exemplifies a largely AI-delivered, human-supervised approach, which can manifest in multiple ways. We identify and describe four (non-mutually exclusive) forms of clinician involvement (see Table 1) in a pervasive therapy system.

Configuration
The first point for clinician involvement is configuration, giving the clinician the opportunity to tailor the intervention to suit the patient and their individual mental health needs at the start of the treatment. The main methods for clinicians to tailor a pervasive therapy system are through intervention and conversation controller configuration to describe what interventions to deliver and how to deliver them. This activity may extend beyond simply medical considerations to ones of technology engagement. The task of configuration is potentially well suited to being done collaboratively with the patient, particularly if the treatment is being used as part of a blended care approach. This interaction involving the patient, clinician, and system could contribute to establishing the involvement of the clinician in the system, the reputability of the system, and the patient's future engagement.

Monitoring
While patients are using a pervasive therapy system, clinicians could monitor their progress, and the actions of the system, using the system's clinician interface. As part of a blended care approach, this activity can be

Configuration
Clinicians tailor the pervasive therapy system for an individual patient at the start of the treatment.

Monitoring
Clinicians are able to observe the actions of the pervasive therapy system and data collected by it regarding their patient(s).

Adjustment
Clinicians can change the intervention delivery. This may involve reconfiguring the intervention or conversation controllers.

Communication
Clinicians and patients can exchange messages through the medium of the pervasive therapy system. used to inform in-person treatment. For patients using such a system, monitoring can be used to identify those who are having difficulty engaging with the treatment or who are indicated as being at potential risk of harm. In such cases where a patient is observed to be of higher risk, the clinician can take appropriate action, for example moving to a higher intensity of treatment within a stepped care model. Prior research by Matthews and Doherty, 17 that involved clinicians in the development of a digital mood logging application, identified the importance of setting boundaries with regard to patient monitoring. If there is to be monitoring of patients by clinicians, care must be taken to establish appropriate patient expectations to avoid generating the inappropriate expectation of clinicians continuously monitoring patient data.

IN CONTRAST, THE COMMUNICATION COULD BE MORE FLEXIBLE ALLOWING PATIENTS TO EXPRESS THEMSELVES VERBALLY IN A MORE FREE-FORM FASHION.
Pervasive therapy involves collecting substantial amounts of patient data including clinical measures, sensor data, and natural language data. Natural language data may be particularly verbose from a clinician perspective. An open challenge is to develop methods to supporting effective and efficient clinician review and action, potentially across a large number of patients. Inference and prioritization methods could be investigated as a means of highlighting to clinicians the patients most in need of their attention. However, if any method of prioritization is to be considered, rigor is necessary to ensure safety. A possible system failure that could cause serious harm would be the system classifying a high-risk patient as low risk, resulting in the patient not receiving appropriate care. For example, a naive summarization of speech content may emphasize obvious symptoms (e.g., reports of tiredness) while neglecting subtle features (e.g., tone of voice or discontinuities in delivery) that could indicate the patient is at higher risk.

Adjustment
A level of clinician involvement, beyond monitoring patients, is adjustment. Adjustment allows clinicians to reconfigure the treatment in an attempt to better match the changing needs of the patient. Adjustment could include reconfiguring the intervention and conversation controllers, and scheduling additional intervention components. For example, in cases of poor engagement, the system could be reconfigured to interact in a manner better suited to the patient (e.g., use more social language) or to deliver a greater proportion of prompts using a specific device that works well for the patient (e.g., via smartwatch).
A challenge when designing for adjustment is determining how transparent adjustments are to patients. Observable changes to a patient's treatment without sufficient explanation risks disrupting and unsettling the patient, which could negatively impact their engagement and wellbeing. The therapeutic reasoning for adjustments could be explained to patients to help them understand why they have happened; however, care will be needed to ensure explanations are accessible to all, including those with lower levels of mental health literacy.

Communication
Communication, whereby the system allows explicit exchanges between the patient and clinician, is a higher level of clinician involvement. Two key design challenges for incorporating communication are deciding how rigid and how synchronous the communication is.
A more rigid implementation would allow patients to send predefined communications with specific purposes, such as requesting feedback or an intervention adjustment. In contrast, the communication could be more flexible allowing patients to express themselves verbally in a more free-form fashion. More rigid designs risk not allowing the patient to communicate valuable information while more flexible designs risk patients sending communications that the clinicians are ill equipped to respond to using the system.
How synchronous the communication is refers to how immediately patients can expect their communications to be seen and responded to. A synchronous design would have clinicians always available allowing for quick message exchange or even real-time conversation. Designs like this put a greater demand on clinicians and there is always the risk that clinician demand surpasses supply leaving patients that are expecting interaction, unattended. Communication could be more asynchronous whereby patient communications are responded to, not immediately, but within a reasonably short time period. Asynchronous designs, which express to the patient a realistic timeline for receiving responses, appear a feasible approach, as previously demonstrated in work studying guided Internet-based cognitive behavioral therapy. 16 Doherty et al. 16 developed and studied an online mental health intervention platform that involved clinicians providing platform-mediated feedback to patients. They found clinicians engaged more with the more engaged patients; clinicians also found it challenging to find ways to provide effective support to less engaged clients. As such, there may be a need to support clinicians to distribute their time between patients, as well as to support each patient effectively. Through the analysis of clinician support messages to patients, effective support strategies can be identified, which presents opportunity to design interfaces that support clinicians in supporting clients. 18

RISKS AND ETHICAL CONCERNS
A pervasive therapy system could present significant opportunities in bridging the mental healthcare gap, but logistical and ethical considerations need to be made for the main stakeholders in the systems: clinicians and patients.
Clinicians are often overloaded, with burnout a recognised issue; not only does this affect clinicians' own wellbeing, but it can also be tied to worse outcomes for patients. 15 To safely and effectively adopt a pervasive therapy system, care must be taken to integrate the system into existing workflows, which includes sufficiently training clinicians. Fiske et al. 19 described a lack of framework around how clinicians should be trained for, and engage with, digital mental health interventions involving AI. Not only would clinicians need to become comfortable with the use of a technical system within their practice, but they would also likely need to be able to provide a level of support and guidance for new patients using the system. This guidance could relate to system navigation and expectation management (e.g., of the level of clinician involvement and AI capabilities), as well as recognizing when a patient may be better suited to solely face-toface therapy. It would be of benefit to all stakeholders to involve clinicians in the design and deployment of future pervasive therapy systems to better understand the opportunities and limitations of integrating such a system into an established healthcare framework.
Patients' interaction with a pervasive therapy system may influence their future mental healthcare engagement. As has been noted with conversational agents, user expectations may exceed actual system capabilities with this gulf between expectations and realities leading to feelings of frustration and subsequent attrition. 7 Of notable concern here also is the idea that patients who become disillusioned with the system could subsequently lose confidence in human practitioners or therapy as a whole. 15 In contrast, if these systems are available all the time and satisfy expectations, there is a risk that patients may overengage, becoming dependent on the systems. 19 This prospect could be mitigated by utilizing clinician involvement, as with ensuring informed consent, to aid patients in understanding the capacity of the system and its shortfalls as well as highlighting when these shortfalls are reflective of the system, rather than the therapeutic methods. In addition, patients and clinicians should be aware of the potential limitations of system insights, so as to encourage human autonomy in decisions relating to patient treatment. 20 This further highlights the importance of training clinicians so that they can effectively induct patients.
Pervasive therapy systems will collect substantial amounts of sensitive data about each patient. For these systems to operate, each conversation-based interface must be sufficiently informed about the patient, requiring some level of information sharing between devices. Our architecture illustration presents a centralized approach, whereby each device communicates with a central entity, which collates the information from each device. We chose to illustrate a centralized approach because this reflects contemporary consumer technology, which uses the "cloud" as a central entity to share data between a patient's devices. These cloud-device ecosystems are typically operated by commercial companies that people would need to trust to protect and not misuse their data (e.g., using the patient model to inform marketing). 7 Alternatively, a decentralized approach could be taken whereby data are stored locally on devices and exchanged directly between them. This approach offers patients greater control of their data, but is a departure from contemporary consumer technology. For either approach, care would need to be taken to ensure patients fully understand how and why their data is being used, so that they are informed enough to decide what they do or do not consent to. 20 This understanding should include how the patient's use of the system may be analyzed to inform research and development, for example, if their data may be used for the training of machine learning models. 20 It should also include the conditions under which their data may be shared with third parties (e.g., due to legal proceedings).
An open challenge of AI-involved therapies is determining who would have a responsibility to act in cases of patients disclosing information that could indicate that they are at risk of harming themselves or others. Laws are in place in certain parts of the world whereby clinicians must disclose if they feel patients are at risk, if clinicians fail to do so and patients act on these feelings, the clinicians could be held liable. 15 Similar laws have not yet been established in the case of interventions involving AI, but would bring clarity for both patients and clinicians, ideally before they are needed. In the interim, Fiske et al. noted that a solution to this concern in computer-guided therapies is to mandate continuous monitoring by a medical professional, 19 although this is labor intensive for clinicians and so may not be feasible in a pervasive therapy system that is always available. Automated approaches could be used, however as previously discussed in relation to patient monitoring, automated risk detection is not infallible. As such, there is a complex and unavoidable design challenge of balancing AI and human risk detection when designing systems for mental health support.

CONCLUSION
Future systems applying conversation-based interaction to EMI have the potential to be effective tools for supporting people with their mental health. Open challenges remain around how to create systems capable of planning and delivering interventions using conversation-based interaction, the design of context-aware conversation-based interactions, and the involvement of clinicians. Both the technical constraints of the system (e.g., its natural language understanding capabilities) and human factors (e.g., clinician capacity and patient perceptions) will have a profound impact on the design. It is worth noting that a number of the concerns raised are not immediately answerable nor should we think that they are easily addressed with technical solutions. Through working with both clinicians and patients, we may be able to address these concerns and productively explore these research directions. This could enable us to create pervasive therapy systems that afford patients different interaction modes to suit different contexts, allowing for "real time, real world" interventions to be provided at many different "real times" in a myriad of "real worlds."