Investigation of a Web-based Explainable AI Screening for Prolonged Grief Disorder

Losing a loved one through death is known to be one of the most challenging life events. To help the bereaved and their therapists monitor and better understand the factors that contribute to Prolonged Grief Disorder (PGD), we co-designed and studied a web-based explainable AI screening system named “Grief Inquiries Following Tragedy (GIFT).” We used an initial iteration of the system to collect PGD-related data from 611 participants. Using this data, we developed a model that could be used to screen and explain the different factors contributing to PGD. Our results showed that a Random Forest model using Bereavement risk and outcome features performed best in detecting PGD (AUC=0.772), with features such as a negative intepretation of grief and the ability to integrate stressful life events contributing strongly to the model. Afterwards, five grief experts were asked to provide feedback on a mock-up of the results generated by the GIFT model, and discuss the potential value of the explanatory AI model in real-world PGD care. Overall, the grief experts were generally receptive towards using such a tool in a clinical setting and acknowledged the benefit of offering a personalized result to the users based on the explainable AI model. Our results also showed that, in addition to the explainability of the model, the grief experts also preferred a more "empathetic" and "actionable" AI system, especially, when designing for patient end-users.


I. INTRODUCTION
O NE out of ten individuals who experiences the death of a loved one is at risk of Prolonged Grief Disorder (PGD) [1], a mental disorder characterized by intense, distressing and disabling symptoms in which mourners experience protracted and preoccupying yearnings, emotional numbness, identity disruption and lack of meaning in the absence of their deceased loved ones. Despite the high level of grief in the first 12 months, normal grievers gradually come to terms of their sadness, and their grief level will decrease as they move into acceptance of the losses. However, grievers who develop PGD often describe the feeling of being "stuck" in their grief and suffer chronic symptoms such as emptiness and bitterness. Overall, this condition poses one of the highest risk for suicide among mourners. When left untreated, individuals with PGD have been shown to suffer from these symptoms for a protracted period of time, even lasting up to decades. Previously, plenty of evidence has been proposed to support its inclusion into DSM-V (The Diagnostical and Statistical Manual of Mental Disorders, Fifth Edition, a standardized classification of psychiatric disorders used by mental health professionals in the United States) [2] as a form of mental disorder, but there were concerns of its similar symptoms to other disorders such as Major Depressive Disorder (MDD) and Posttraumatic Stress Disorder (PTSD). It wasn't until 2020 and 2021 that PGD was officially included in the ICD-11 (International Classification of Diseases, Eleventh Edition, a global standard for health information and causes of death) [3] and DSM-V-TR with a valid diagnostic criteria (see [4], [5]), making it a distinct and new mental disorder which requires attention and research to develop effective interventions.
Due to the recency of the official recognition of this mental disorder and the lower awareness of it, mourners with PGD, often lack awareness of their conditions and are reluctant to seek help from mental health professionals [6]. Administering grief interventions to normal grievers as a preventative measure, regardless of their risk factors of developing PGD, could be counterproductive as it could instead disrupt their natural coping process [7], [8]. On the other hand, delaying targeted treatment for PGD grievers could also be detrimental to their future well-being. As such it is particularly important to be able to identify grievers who are vulnerable to PGD to allow clinicians to start treatment and possibly prevent worsening of the disorder in its early stage. In addition, it is also important to be able to explain to bereaved individuals the nature of their grief and the associated risk factors to allow them to be more aware of their conditions. Yet this could be difficult as bereaved individuals who suffer from more intense and protracted grief syndromes could be reluctant to visit mental health professionals [6], suggesting that there is a need for alternative approaches to help support the bereaved.
With the booming of digital devices and enhanced coverage of the internet, online activities are increasingly interwoven into our everyday experiences, including the experience of losing a loved one and mourning the loss [9]. Vanderwerker and Prigerson showed as early as 2004 that more than half of the bereaved used online platforms for support [10] and other studies show how social network platforms have been used by bereaved individuals to maintain continuing bonds with the deceased [11]- [14]. Such examples highlight the potential of online technology to support people in grief, and the accessibility and anonymity on these platforms could provide a valuable resource to help them cope with their losses. Furthermore, recent advances in artificial intelligence technology has also enabled the development of automatic algorithms which can help screen and detect various mental health issues such as depression [15] and PTSD [16]. Despite the preliminary findings in recent psychological and computer science studies, the implementation of technology to prevent the development of PGD remains scarce. In particular, given the emerging development of explainable AI systems, we see great potential in utilizing this technology to develop a user friendly system that could help bereaved individuals monitor their conditions and at the same time provide meaningful feedback to them regarding their grief to detect early signs of PGD and help prevent its occurrence.

A. RESEARCH AIMS
In this work, our main aim is to develop a web-based system to support people in the early stages of bereavement (first 12 months), especially for those who are prone to experience prolonged and intense grief. In particular, we seek to examine how classification and explainable AI models can be used as part of this system to support such individuals in a broader mental healthcare service setting. Hence, while we aim to develop an explainable AI model based on the existing grief self-monitoring system, we also aim to use this model as a probe in a qualitative examination to further understand the potential use cases to optimize the system so that it could be implemented effectively to support grief care. Overall, the objectives of our study could be summarized as follows: 1) to develop and evaluate an AI model that is effective in screening for PGD and describing the various risk factors for this condition which can be used as part of a web-based platform to support bereaved individuals during the early stages of grief 2) to carry out a preliminary qualitative examination regarding the use of explainable AI models and identify the opportunities and challenges of deploying such models in the grief counseling service process Overall, our study is divided into four key phrases. The first phase concentrates on the development of an internetbased application to collect data related to the experience of people with Grief. We developed a research instrument termed GIFT (Grief Inquiries Following Tragedy) that was deployed to collect data from 611 participants. Using this data, we then developed a model to screen PGD and evaluated the model using cross validation. The results showed that certain supervised learning models such as Random Forest can provide satisfactory classification performance, better than other models such as logistic regression (LR) and support vector machine (SVM). In the third phrase, we used the Shapely Addictive Explanations (SHAP) feature attribution method to explain the underlining risk factors that contributed to the classification of PGD for each user. In the final phase, we created a mockup of the results from an explainable model which was then evaluated by grief experts for their potential value.
The reminder of this paper is organized as the following: in Section 2, we discuss related research. In Section 3, we present our model, explaining the features we designed. In Section 4, we evaluate our model with the collected data and highlight how the different risk factors contributed to the classification of PGD. In section 5, we describe the themes derived from our interviews with grief experts regarding the applicability and contexts in using such an AI model and the potential use of such a system in grief care. In Section 6, we offer some discussion on the findings resulting from this work. Finally, Section 7 will conclude this paper.

II. RELATED RESEARCH
In this section, we highlight related research in four key domains. First, due to the relative novelty of PGD as a recognized mental health disorder, we first provide an overview about this condition by highlighting previous theoretical studies related to loss and grief (e.g. Bolby's attachment theory [17], Kübler-Ross's five stages of grief [18] and Neimeyer's meaning reconstruction theory [19]). Next, as digital technology is increasingly mediating and influencing how we mourn and grief, we outline several studies in the field of Human Computer Interaction which have shown how such technology could play a role in grief care (such as to help establish continuing bonds [9], [13] or provide social support [20]). However, there have been few studies exploring diagnosing and treating PGD using the state-of-the-art technologies such as machine learning. Given that a key objective in this study is to develop a machine learning model to help screen for PGD, we next provide an overview of how machine learning models have been developed and used in previous studies to monitor and screen for similar mental health conditions to provide context to our research. Finally, we focus more specifically on the topic of explainable AI and discuss the value of such models and highlight past studies that have developed explainable and interpretable models for use in healthcare diagnosis and screening.

A. MALADAPTIVE REACTIONS TO GRIEF AND PROLONGED GRIEF DISORDER
Even though recent studies show that the majority of mourners demonstrate some resilience against the stress of losing a loved one [21]- [23], and some researchers even note enhanced spirituality and meaning making as the positive results of coping with loss [24], [25], a subset (10%) of mourners could develop a maladaptive response to their losses referred to as Prolonged Grief Disorder (PGD) [26]. Individuals with PGD could experience an atypical dysfunction in their daily life for a prolonged period after their loss (a pervasive yearning for the deceased, intense emotional pain etc.) [27]. PGD was recently introduced into both the Diagnostic and Statistical Manual of Mental Disorder 5th Text Revision (DSM-5-RT) and the International Classification of Diseases 11th Revision (ICD-11) as a mental disorder [27]. The PG-13, a diagnostic tool developed by Prigerson et al., has suggested five necessary criteria for identifying PGD [28], [29]. These criteria are respectively: (A) the duration criterion (at least 12 months after the loss), (B) significant degree of yearning and preoccupation of the thoughts of the deceased, (C) 8 out of 3 clinically significant cognitive, emotional and behavioral symptoms (avoiding reminders of the loss, disbelief or emotional numbing over the loss, identity crisis or difficulty trusting others), (D) the impairment criterion (experiencing social or occupational dysfunction), and (E) the duration and severity of bereavement exceeds the social, cultural, or religious norms for the individual's culture and context and (F) the symptoms are not better explained by other conditions such as Major Depressive Disorder (MDD) or Posttraumatic Stress Disorder (PTSD) [30]. Due to the recency in which PGD was recognized, researchers and healthcare providers rely primarily on PG-13 and PG-13-R [9], [31]- [33] (see Sekowski Prigerson's comparison of each PGD diagnostic tool in [34]) as the state-of-the-art diagnostic approach. However, it is also important for future studies to provide a more in-depth understanding in regards to the field experiences of accurately diagnosing PGD to facilitate the maturity of treating this mental disorder.
For the individuals who suffer from PGD, psychotherapeutic interventions are needed to support the adaptation and acceptance of their losses. On the other hand, offering grief counseling and therapy to normal grievers could have a deleterious effect and even disturb the natural bereavement process [7], [35]. Since providing grief counseling to normal grievers may be unwarranted, a screening tool that helps determine who might develop more severe forms of grief and benefit from psychotherapeutic intervention is of importance. In addition, since it could be difficult for grievers to recognize early stages of complicated grief and understand the underlying factors that contribute to this condition, it would be beneficial to devise a mechanism that is able to meaningfully explain to bereaved individuals their risk factors for prolonged grief.

B. THE USE OF TECHNOLOGY IN GRIEF CARE
Technology is playing a larger role in many people's bereavement-related activities and grief coping experiences, as grievers frequently utilize it to support their coping before referring to psychological interventions [10], [36], [37]. Studies as early as in 2004 have demonstrated that more than half of bereaved individuals have used online bereavement support and such resources have been shown to have potential in preventing and protecting the bereaved from further mental disorders [10], [36]. However, Krysinska and Andriessen cautioned that resources created by professional bereavement organizations are often not immediately available, calling for the quality of online bereavement support and the authenticity of information into question [37]. Regardless, bereaved individuals often turn to online bereavement forums for emotional support and to share experiences anonymously. Massimi et al. have described an exploratory design of an online bereavement support forum [38]. Social network users frequently post reminiscing photos or messages to the deceased loved ones and maintain continuing bonds with the deceased through such conduct. Funeral companies have begun to stream memorials or funerals online [39]. In addition, psychologists are also offering grief interventions or advice through emails or their personal page [40]- [42]. Overall, these examples demonstrate the potential of technology to support bereavement-related activities.
In the context of self-management, the investigation of VOLUME 4, 2016 technology use in grief care is a relatively understudied area.
Prior studies tend to focus more on the management of chronic conditions such as diabetes or heart disease, aiming to help users better manage their conditions through dietary restrictions or a healthy physical exercise regime [43], [44]. While ample self-management applications are studied and utilized to support patients in the popular mental health field such as depression, most of them focus on such disorders when they occur from specific illnesses such as cancer or stroke [45], [46]. For grief, early initiatives to develop digital self-help technology to support bereaved individuals include "My Grief", a mobile application built to educate users and help them monitor their grief and provide self-guided exercises based on established Cognitive Behavioral Therapy principles [47]. The Prigerson's team has released an online PGD self-screening tool named Grief Intensity Scale (based on adapted PG-13, manuscript under review) since 2015 and has received over ten thousand (10,818) submissions until 2021, showing a strong but hugely unmet demand for grief self-management. Overall, despite the obvious and time sensitive demand of grief self-management resources, there have been few studies offering guidelines and reflections on the best practices of such a technology.

C. MACHINE LEARNING MODELS TO MONITOR FOR PSYCHOLOGICAL DISORDERS
Following the recent improvements in artificial intelligence, a number of studies have begun to examine whether models could be constructed to automatically and accurately screen and monitor the occurrence of various psychological disorders and conditions. Research in this context could be divided into 1) studies that develop machine learning models to automatically screen users for various psychological conditions using their social media profiles and 2) studies that develop machine learning models to monitor users for various psychological conditions based on a set of predetermined variables (e.g. demographic, life-style and psycho-social factors) [48], [49]. One example of research of the former type is a study that sought to develop machine learning models to automatically screen social network users (Facebook, Twitter etc.) for depression using text and visual features [50], predetermined linguistic cues [15] or networked graph features [51]. While such models could be useful in screening for psychological disorders from a public health perspective, they require users to be members of such services and have an active digital profile. An example of a study in the second group is one that aims to develop models to classify people with cancer into those with low and high levels of depression [52]. Another study utilized features such as physical health disorders, demographics and psychiatric disorders to predict suicide risk [53]. When screening and monitoring for complicated forms of grief however, most methods rely on non-automated approaches and there has yet to be models developed to detect maladaptive forms of grief.

D. EXPLAINABLE AI IN HEALTHCARE
Although the recent development of data-driven AI promises to automate diagnosis, screening and monitoring of patients [54], many AI models functions as 'black boxes'. Often, it is not possible for human users to understand how these models, such as deep neural networks, combine low level features through a large number of neurons across multiple layers, to arrive at a prediction. Such AI, especially when used in healthcare, presents a significant challenge for a number of reasons. First, no AI systems are perfect and despite their impressive performance, they still can make errors. Due to the black-box nature, we cannot understand why a particular mistake was made which prevents us from improving the system to avoid similar mistakes being made in the future [55]. Second, it is important for a clinician and for patients to understand why a given machine response was made to be able to make informed decisions regarding subsequent treatments [56]. This is particularly relevant in mental health, where treatments are often in form of psychotherapies, which rely on the clinician's in-depth understanding of the underlying issues related to the particular disorder. In other words, simply knowing that a patient has a particular mental health disorder, (e.g. grief disorder) is not sufficient. We need the AI to tell us why patients are experiencing symptoms related to grief disorder, in order for clinicians to tailor the psychotherapy program to tackle the underlying problems. Although in recent years researchers have attempted to develop explainable and interpretable AI for healthcare domains, especially ones involving medical image analysis, such as dermatology [57], oncology [58], radiology [59] and pathology [60] to our knowledge none has attempted the use of explainable AI in psychological disorders, where the explainability of the models can play a crucial role in informing therapies. In grief disorder for instance, knowing that the source of a client's preoccupation with the loved one's absence results from distress over unresolved conflict or regrets or unfulfilled wishes in the relationship can help the clinicians focus on addressing "unfinished business" with the deceased through the use of imaginal conversation procedures [61].

III. DEVELOPMENT OF AN EXPLAINABLE CLASSIFICATION MODEL FOR PGD
To develop a web based system to support people in the early stages of bereavement, we had explored two key approaches that could be used to monitor for poor adjustment outcomes for bereavement during the initial stages of our research. The co-design process which we carried out led to two potential PGD screening prototypes, A Natural Language Processing based "My Grief Journal" and a risk factor based "Grief Inquiry Following Tragedy (GIFT)" application. Following a session of evaluation with mental health experts on both the My Grief Journal and GIFT systems, it was determined that risk factor approach used in the GIFT system would be more precise and better able to monitor for PGD. Therefore, the GIFT system was selected for further development. An in- depth description of the My Grief Journal and GIFT system as well as the co-design session and pilot study which was carried out can be found in an earlier publication [62].
The GIFT application was then deployed online to collect data from bereaved users. This data was then used to develop a model to screen for PGD and explain their risk factors. Our aim is to integrate this model with the existing GIFT system to create a single function online application that serves both to screen PGD and provides a user-friendly translation of the psychological measurements and factors that could lead to this condition for bereaved users.

A. DATA COLLECTION
Amazon's Mechanical Turk (Mturk) was selected as the primary portal for data collection. Previous studies have shown that Mturk is an alternative portal that helps researchers gather low-cost yet high-quality data from diverse samples of participants, making it a relatively affordable and reasonable choice for an exploratory study like ours [63], [64]. The study participants recruited through Mturk were invited to interact with GIFT through computers or mobile devices. When recruiting participants, the following inclusion criteria were utilized to select participants who: 1) were over 18 years old 2) did not belong to one of these vulnerable populations: prisoners, pregnant women, children or any other class of subjects who require special consideration 3) were grieving the death of a loved one 4) focused on a loved one who died more than 6 months ago 5) focused on loved one who died less than 5 years ago 6) could read English well

B. REVISED GIFT FOR DATA COLLECTION
Several changes were made to the GIFT application in order to use it to collect data in our study. First, due to ethical requirements, we added a single page long digitally signed informed consent form to the system. In addition, as participants were not expected to revisit the system after completing the study, the login function was altered. Users no longer needed to create a profile with a username and password. To ensure the accuracy of the results, we also added several attention checkers to the data collection module in the questionnaire. Attention checkers are questions which are purposely designed to check whether the users have paid attention while answering the questions, often by presenting users with choices that are not valid [65] For example, the answer "son" would be impossible to the question stating "I'm the lost person's..." if they indicated that their gender was female. Visual feedback such as about the degree of grief and their areas of post-traumatic growth after their grief were provided to users after they completed all the questions in the data collection module. In addition, a completion code was provided to participants to allow them to revisit the feedback after the study had ended. The screenshots of the GIFT application that was employed in the data collection study are shown in Figure1.

C. RISK FACTOR MEASUREMENTS FOR PGD
The measurements included in the data collection module of the GIFT application were originally selected based on the potential risk factors for complicated grief reported by Burke and Neimeyer [66] and several other review papers [67], [68]. These risk factors could be categorized into the following groups: Background Risk Factors (including the socio-demographic factors of the bereaved and the factors related to the deceased), Bereavement Risk Factors (including interpersonal, intrapersonal and situational factors), and Bereavement Outcome Factors (Posttraumatic Stress Disorder, Major Depressive Disorder, Resilience, Integration of Stressful Life Experiences and Posttraumatic Growth). This categorization took reference from the categories and frameworks presented in previous studies that examine the relationship between different risk factors [66]- [68]. The selected measures were later reviewed and further refined by researchers and external experts.

1) Socio-Demographic Factors of the Bereaved and Factors Related to the Deceased
The personal information questionnaire was developed based on several hypothesized socio-demographic risk factors and information about the deceased: gender, age, relationship with the deceased (spouse or parent), education level, marriage status, religion, frequency of religious activities, importance of faith, multiple prior losses, the recency of death, violent death, pre-loss frequency of contact, decease's gender and age of deceased. This information helped researchers gain a more thorough understanding of the loss circumstances and validate if these hypothesized risk factors contributed to the classification of PGD. A total of 21 questions were included. These represented the Background Risk Factors.

2) Bereavement Risk Factors (BRF)
The Bereavement Risk Factors (BRF) sought to evaluate several evidence-based risk factors for prolonged grief. Each of the BRF questions corresponded to a specific risk factor and the questionnaire comprised 25 questions (marked as CG1-CG25 in our model). Five items were derived from the confirmed predictors: neuroticism (anxiety-proneness), pre-death dependency on the deceased, low social support, insecure attachment style and discovering the body. Other candidate items were derived from review papers and clinical practice. The questionnaire was a work-in-progress measurement co-developed by the researchers and the psychotherapy experts specified in bereavement research and treatment. These factors were collectively termed Bereavement risk factors for PGD.

3) CESD-R
Depression has been cited as an important factor which could be useful in screening for PGD as syndromal grief often exhibits responses that are similar to depression. Ample literature has pointed out the high association between PGD and Depression. While sharing several similar symptoms, they however are distinctive enough to be diagnosed as a separate condition [69]- [71]. In the GIFT application, depression was measured using the CESD-R, a 20-item screening instrument measuring depression and depressive disorder based on the criteria defined by DSM-V [72]. Depression was considered to be a factor related to the Bereavement outcome of PGD.

4) PCL-C
Post-traumatic Stress Disorder (PTSD) possesses several similar symptoms to grief but has its own characteristics, such as the tendency to avoid thoughts or reminders of the traumatic event. The severity of symptoms of PTSD has also proven to be predictive of PGD [73]. Furthermore, PGD was found to be a predictor of PTSD, indicating the high association between these two similar but distinctive mental disorders [74]. To measure Posttraumatic Stress Disorder (PTSD), we selected the PCL-C, which is a widely adapted self-administered scale for PTSD [75]. The PCL-C comprises 17-items that measure the symptoms of PTSD defined by the American Psychiatric Association' Diagnostic and Statistical Manual (DSM-V). PTSD was considered to be a risk factor related to the Bereavement outcome of PGD.

5) CD-RISC-10
Resilience is defined as one's ability to regain emotional equilibrium after experiencing a potentially traumatic event. Individuals with higher resilience are believed to be able to adjust better when encountering stressful life challenges. Connor-Davidson Resilience Scale is developed to measure an individual's resilience and has three versions: the 25-item, 10-item, and 2-item version [76]. Researchers need to purchase the questionnaire from the developer to be authorized to use the questionnaire in the study. The 10-item scale was selected for this study due to the concern that 25-item scale might cause an overload for the participants in the process of filling all the questionnaires. Resilience was considered to be a factor related to the Bereavement outcome of PGD.

6) ISLES-SF
Meaning making was hypothesized to be the crucial mechanism that facilitates the adjustment to a stressful life event [77]. ISLES-SF consists of six items and has been validated to perform well in measuring the meaning making ability after an individual experienced loss of a loved one. The item 1, 2 and 3 of ISLES-SF measure the comprehensibility of the event and the item 4, 5 and 6 measure one's sense of having a footing in the world after the stressful life event. Stressful life events were considered to be related to the Bereavement outcome of PGD.

7) PTGI
Experiencing post-traumatic growth following the trauma is not a rare phenomenon. The types of growth are well documented and can be measured through PTGI, a 21-item scale to measure positive outcomes following the experience of trauma. The types of post-traumatic are divided into five factors: New Possibilities, Relating to Others, Personal Strength, Spiritual Change, and Appreciation of Life [78]. Post-traumatic growth was considered to be related to the Bereavement outcome of PGD.

8) PGD
The level of complication due to grief was measured using the PG-13 questionnaire. The PG-13 is a robust diagnostic tool of Prolonged Grief Disorder (PGD in short, equivalent to PGD in DSM-V) which is widely used in studies related to PGD [28]. The study employed the criteria of PG-13 to determine the membership of normal grievers or PGD grievers. PG-13 can be acquired through contacting the researchers who developed measurement for PGD.

D. DATA COLLECTION PROCEDURE
The data collection module in the GIFT system was used to collect data to train the model in the study. The details of the study were posted on the Mechanical Turks (MTurk) website. Participants would accept the "task" on the website and proceed to the website to complete the informed consent form signing and proceed to data collection module. Participants would receive a validation code after completing the questions in the module and submitted the code back to MTurk for validation. Researchers then reviewed their answers to determine if the participants are submitting eligible answers to be included in the study.
To assess the validity of the questionnaire answers, we implemented several attention check questions within the questionnaire system to capture "click through" users. The check questions were questions that were related to recruitment criteria or had logic relationship, such as asking participants the gender of the deceased and their relationship to the deceased (e.g., female as to aunt, male as to uncle). We also tested the system with around five researchers to gauge the average time of completing the questionnaire and compared it against our study participants. If the completion time was less than eight minutes, the research staff would review the submission more carefully. After completing the questionnaires, participants were provided with feedback related to the answers given (e.g. their grief level (PG-13 score)). The study protocol was reviewed and approved by the Institutional Review Boards from both the Eindhoven University of Technology (under case number Archie 533) and the University of Memphis (PRO-FY2017-286).

E. TRAINING THE MODEL
When building the model, we examined different combinations of the three risk feature groups described earlier in section 4.3: 1) Background Risk Factors (Socio-demographic VOLUME 4, 2016 factors of the bereaved and factors related to the deceased), Bereavement Risk Factors (interpersonal, intrapersonal and situational factors mainly from the BRF scales), and Bereavement Outcome Factors (Post-traumatic Stress Disorder, Integration of Stressful Life Experiences etc). We examined six different combinations of the three feature groups, 1) Background only, 2) Bereavement Risk factors only, 3) Bereavement Outcome factors only, 4) Background and Bereavement risk factors, 5) Background and outcome factors, 6) Bereavement risk and outcome factors and 7) Background, Bereavement risk and Outcome factors. The exact features used in each of the groups are described in Table 1.
The data collected was then used to train a classification model to screen for PGD. Several classification algorithms were examined including Linear Regression, SVM, Random Forest, XGBoost and KNN classifier. Missing nonecategorical features were filled in with the mean value and missing categorical values were filled in with the most frequent category.

F. EXPLAINING THE RISK FACTORS BEHIND THE CLASSIFICATION OF PGD
The Shapely values were also calculated for each individual sample to highlight the contributing risk factors for each user based on our model. We created several mockup visualizations of the explainable AI component for use in the feedback session with grief experts based on these results. One was based on a simplified horizontal bar graph ( Figure  2 bottom). The bars on the right denote that the feature pushes the probability of PGD higher for this individual user and the bars on the left denote that the feature pushes the probability of PGD lower. The other was based on the force plot described in an earlier publication [79] (Figure 2 top). The arrows pushing towards the right denote that the feature pushes the probability of PGD higher. The arrows pushing towards the left denote that the feature pushes the probability of PGD lower. The size of the arrow denotes the degree of impact that feature has on the classification probability. It should be noted that in the mockup visualizations, we decided to show feature types from both the bereavement risk factors and bereavement outcome factors, if they showed a high contribution to the classification of PGD. This includes features such as the level of depression and PTSD of patients as we felt that it could be useful to highlight to practitioners, the degree to which certain comorbidities influenced the classification of the PGD for each sample.

IV. RESULTS OF THE SCREENING MODEL ACCURACY EVALUATION AND THE CONTRIBUTING RISK FACTORS
In this section, we discuss the results of the accuracy evaluation of the model. We will first discuss the evaluation approach, followed by a discussion on data quality. Then we present the evaluation results.

A. EVALUATION SETUP
Prior studies which have simulated the effect of different validation strategies (bootstrap sampling, none-repeated crossvalidation, repeated train test split etc.) for classification models have shown that the approach of repeated cross validation had resulted in the lowest bias and true error rate when tested with a sample size similar to the one used in our study (N=600) [80]. As such, we adopted this approach in our evaluation. To account for the imbalanced dataset, we used a Stratified Group 4-fold cross validation approach which ensures an equal ratio of positive and negative samples when dividing the training and validation data set to evaluate the performance of the model. When training the model, grid search was used to fine tune the hyper-parameters. Table 2 shows the combination of the parameters which were examined during the Grid search. To evaluate the performance of the model we used the Area Under the Curve (AUC) from the ROC curve. The cross validation evaluation was repeated 5 times for each model and feature set combination and we used the averaged AUC score to represent the performance of the model.

B. DATA QUALITY
Overall, a total of 829 users requested the personal login link and 778 signed the informed consent form but some dropped out in the middle of the study. 611 participants completed the mandatory measures that were used in the final analyses. Around 4.9% of the participants screened positive for PGD. More than half of our samples were female participants (64%) and the mean age of participants was 39 years old. Table 3 outlines the social-demographic characteristics of the participants in the study and Table 4 outlines the factors related to the loss of participants in our study. Table 5 shows the performance of each feature set and each learning algorithm in our experiment. The results showed that the best performing model was the Random Forest model which used the features from the Bereavement Risk Factors and Bereavement Outcome Factors feature sets (AUC=0.772).On average, this model was able to classify users correctly 93.3% of the time (Non-PGD users correctly 95.14% and PGD users correctly 59.3%). This is followed by the Random Forest model which used only the Bereavement Outcome Factors (AUC=0.764). This model was able to correctly classify users 92.04% of the time (Non-PGD users correctly 93.79% and PGD users correctly 58.67%). Table 6 shows the confusion matrix of the best and worst iteration for the best performing model (Random forest model using features from the Bereavement Risk Factors and Bereavement Outcome Factors feature sets). In addition, figure 3 shows the trade-off in accuracy when adjusting the threshold to detect PGD and Non-PGD users. The highest average accuracy for detecting both PGD and Non-PGD users is when the threshold is set to 0.31 (Non-PGD users could be identified correctly 86.91% and PGD users 93.3%).   In terms of the algorithm, our results showed that the random forest algorithm generally performed best (Average AUC among all feature set combinations= 0.

D. OVERALL RISK FACTORS CONTRIBUTING TO THE CLASSIFICATION OF PGD
As described in section III-E, the Shapely Addictive Explanations were used to examine the risk factors which contributed to the classification of PGD for our models. As the best performing models all used the Random Forest algorithm, we used the more optimized TreeExplainer method to calculate the Shapley Values to determine the local and global importance of each feature in our model. This method enables the exact computation of the Shapley values by leveraging the internal structure of tree based models (see [81]). Figure 4

V. FEEDBACK REGARDING THE USE OF EXPLAINABLE AI MODELS FOR SCREENING WITHIN GRIEF CARE
After the development of the model, we carried out a qualitative feedback study to examine preliminarily whether and how an AI powered system and its explainable features could contribute to clinical practice and enhance the acceptance of the clinical stakeholders. More specifically, we aimed to investigate whether users would adopt such models in their practice and if so, what would be the possible use cases and potential challenges. It should be noted that we had decided to adopt a qualitative approach in this part of the study as we felt that this particular use case of machine learning and explainable AI (in grief care to diagnose PGD, a mental health disorder which has only just been recently defined) is carried out in a relatively novel context with few prior research and thus it would be important to first understand the stakeholders and possible use context rather than evaluating the effect of specific explainable AI strategies. As many previous novel AI integration attempts suffered from skepticism from medical stakeholders (even with highly accurate models), such an understanding we believe would play an important role in helping develop an AI system which is well accepted and would be put into actual practice [82]- [84]. Overall, 5 practitioners and experts in grief care (1 Grief care Nurse (Female), 1 Psychiatrist (Male), 3 Practicing clinical psychologists (3 Female)) were recruited and interviewed to provide qualitative feedback on the explainable AI system. Participants were first given an explanation of the overall GIFT system, the classification model developed (e.g. which risk factors were used) and the concept of explainable AI. Afterwards, through semi-structured interviews, they were asked to discuss their perceptions on how machine learning models and explainable AI systems could be used in clinical practice and provide feedback on the mockup of the results generated by the GIFT explainable model. In particular, given the lack of existing research in the use of explainable AI systems in mental healthcare in general and grief care in particular, we decided to focus our inquiry on: 1) How can a screening and explainable AI tool such as this be used in your clinical practice with grief patients? 2) What would the potential advantages and pitfalls of such as system be? and 3) What would be the best way to present data in the explainable AI model to your patients/the clinical staff member?
The interviews were then analyzed using thematic analysis [85]. First, the interview data were read through to gain an initial overall understanding. Afterwards, three researchers who have experiences in HCI for healthcare and machine learning coded the data. Codes which showed a similar patterns were then grouped together into themes which were then refined.

A. INTERVIEW RESULTS
Generally, all of our participants agreed that an explainable AI system that is able to offer personalized screening results would be useful both to grievers (or patients) and healthcare practitioners. For the practitioners, such a tool would be useful in providing an overview of the grief status and underlining risk factors for each patient. For the bereaved individuals, such a personalized system can be helpful in allowing users to better understand their own symptoms and whether they are potentially experiencing any complications. Interestingly, while self-help screening and assessment tools in the form of online questionnaires for a variety of mental health conditions are widely available on the internet, P01 believed that a personalized report generated by the explainable AI model would better enhance their acceptance among grievers. This is particularly valuable, as literature has shown that a personalized cognitive-behavioral therapy (CBT) is more effective than a standard one for mental disorders such as autism spectrum disorder [86]. Considering that grief is a highly personal experience, it is likely that by improving the users' acceptance to the screening results, the system could further contribute to better self-monitoring of their mental wellbeing.
It is hard for people to realize states such as depression and it is difficult for them to make sense whether their symptoms are normal or abnormal. Such a system would be useful in allowing patients to more objectively understand their VOLUME 4, 2016 Participants were also surprisingly receptive towards the use of such a tool in their clinical practice. For instance, during the interviews P01, P02, and P03 highlighted that they would be willing to integrate such a system as part of their intake process before patients visit the therapists. Furthermore, participants also discussed the value of such tools when employed longitudinally to monitor changes in the patients risk factors.
I could see it being something that could be a part of that that's generated from that intake process...the process there was the person would have an intake and like a full psycho social intake with another. I was often doing them with the person, and I would be at the computer asking them the questions, filling out the reports, the different questionnaires, and then whoever whichever counselor was assigned to that new patient, they would have that full intake to review before they met with the person.(P01, Licensed Psychologist) At regular intervals, to be able to monitor progress on symptoms is typically how that would be used. So this is something that could kind of happen at those frequent interviews as well. And then certainly at the end of a treatment course to be able to speak to and give feedback on what's changed and what hasn't changed. (P02, Social Worker) P03 further suggested that a personalized risk factors report can support mental health professionals in triaging patients, redirecting them to necessary treatments or helping identify an area to focus on. Such tools, they believe, would be valuable for counsellors who might not be experts in areas of grief but have to deal with grief patients in their practices.
Many people, first of all, are still not trained properly [for grief therapy.] So I think this would be really helpful to kind of first of all to also give people a sense. What are the main areas for indicating possible risk and also possible protective factors? And then what we need to tackle? What can we maximize? So at least it gives some people the kind of ideas and especially for those who are not really in grief counseling and therapy. (P03, Clinical Psychologist) Due to the highly emotional nature of grief experience [27], our participants also expressed concerns regarding the use of an automated screening model as part of a selfmonitoring system. P03 mentioned that users could experience severe negative emotion when using the system unsupervised and some safety measures should be applied to the process. This aspect would be discussed further in section 6.1.2.
We are not sitting there keeping an eye on them to see whether they may react into hysterical cry and so on. So maybe [some] kind of safety measures would be still applicable, just like any other kind of research studies that we do.(P03, Clinical Psychologist) In depth analysis of the interview data allowed us to identify three key themes, each illustrating the unique perspectives of how experts in grief care viewed the use of machine learning models and explainable AI in their practice. 1) Screening result as a "rule of thumb" rather than a "ruler" Previous studies demonstrated that the accuracy of a AI model played a key role in the level of trust users had in the system (which in turn significantly influenced their acceptance of the system) [87], [88]. Hence, we were uncertain whether the performance of our model was sufficient for the practitioners to adopt our models in their practice in a clinically meaningful way. To our surprise, our participants did not seem to place a strong emphasis on the accuracy of the models. P04 and P05 indicated that in treating psychological conditions, most of the conventional psychometric tools are used more as a reference instead of to provide concrete diagnosis. P02 further mentioned that in mental healthcare, such a mechanism relies heavily on users' self-report and the screening results can therefore be rather subjective.
The lower accuracy of a machine learning model might not be so bad in terms of acceptance as the psychometric tools that they use today don't have 100% [accuracy], most people use such tools only as a reference and always explain to the patients that these measures don't allow us to understand everything about your condition and are only used as reference. (P05, Psychiatrist) we can't really make a diagnosis from these types of survey questions, but I think it could kind of allude to [the symptoms...like,] people who report symptoms in this level often find it helpful to talk with their doctor or talk with a mental health professional about those to see for further assessment or something like that. (P02, Social Worker) In this sense, the explainable nature of our AI model could be useful in the diagnosis, as this could present clinicians with information about the risk factors which are believed to have resulted in PGD. They could then compare these factors with their own mental model of what they believe to be the causes and symptoms of prolonged grief and form their own judgement. As such, the result is seen more as a "rule of thumb" approximation rather than an objective measurement of the condition.

2) From explainable to actionable AI
Participants emphasized in the interview that patients tend to have little knowledge about their risk factors and symptoms, and hence would appreciate more concrete and actionable VOLUME 4, 2016 advice on what they could do after receiving the diagnosis. Although some of the risk factors only provided clues on which therapies or actions could be helpful in confronting them, participants felt that the current system could benefit from linking these risk factors directly to potentially useful actionable advice that therapists usually provide during treatment. This was not surprising as nudging users towards positive and even therapeutic behavioral changes is usually the purpose of a self-monitoring system. By raising awareness of problematic symptoms or behaviors, the system generally aims to enhance the possibility of users wanting to make changes to their behaviors [89]. As such, participants felt that after helping users understand their grief profile, the natural next step of such systems should be to make clear to users what actions could be taken to address the factors that are posing risks to their coping. For example, P01 and P02 suggested that patients with more severe symptoms and predicted risk for PGD should be provided with a warning and advice on how and where they could receive medical assistant.
At some point of the questionnaire is a clinical level, then they should receive some sort of warnings or the action items. (P01, Licensed Psychologist) [The advice can be,] you might want to consider reaching out to either, like, your family doctor or a mental health professional to discuss those just to give some sort of kind of actionable (P02, Social Worker) P03 and P02 further pointed out two specific follow-up actions that the system can provide: i) connecting users to the clinically validated self-help resources and ii) connecting users to mental health professionals in relevant domains based on the model results.
If I actually didn't know that I was that bad. But after I do this, I realized that all the score or the red bar becomes so confronting and that I can't really deny my bad situation...Then how can we bring a closure for this portion change? For me. I imagine that it could be the kind of national hotline, kind of information or kind of link to where they can then seek help if they would like to.(P03, Clinical Psychologist) we hear so much from people about how hard it is to understand how to find somebody in the mental health field to connect with...if there were, like, a couple of links that they could use to start or something like that that could, you know, make it a little easier to connect to the right place. (P02, Social Worker) However, some participants cautioned against the AI system directly providing actionable advice for the patients to self manage the risk factors. This was understandable given that in the case of PGD treatment, the mental health professional plays a critical role in offering guidance and adjusting the therapy based on the progress of the disorder [90]. However, P01 did agree that the system can at least motivate the users to implement protective factors (activities to improve resilience etc.) for PGD.
if the protective factor is something that they can enact, then I could see, like learning that being something that somebody might sort of seek out or enhance more in their life.(P01, Licensed Psychologist) The participants' different perceptions towards risk and protective factors were brought to our attention. Concretely, our participants thought that implementing protective factors was something that could be done as an individual, even with a limited knowledge of PGD, but trying to amend risk factors should be done with extra caution and under the guidance of professionals.In other words, in comparison to risk factors that should be diagnosed and treated by professional healthcare workers, protective factors could be considered as a hint to offer self-manageable action advice when coping with grief. Furthermore, it is more aligned to the purpose of developing a self-help system that enables users to perform certain protective actions that are safe and less likely to cause undesirable outcomes. However, more studies are certainly necessary to examine how action advice based on protective factors can be recommended effectively.

3) From explainable to empathetic AI
Participants often stated that users need a "warm hand" to support and comfort them in the process of coping with bereavement when using the AI system. Although the use of AI has been increasingly explored in healthcare, our interviews supported by ample literature, showed that patients and healthcare professionals generally have limited trust in medical AI systems [91]- [93]. For example, in She's interview [94], users complained about receiving "canned" response and did not learn anything new when the self-monitoring app "simply confirmed that they were indeed not doing well." Furthermore, as grief literature has highlighted, the bereaved could show a tendency of denial [18], avoiding reminders of the deceased or becoming emotionally numb to the loss and grief for a period of time [27]. These reactions were normal among some individuals who are not yet ready to accept an emotionally distressful event involving death. Therefore, when an explainable AI model has to deliver undesirable screening outcomes, it could be difficult for the bereaved to accept such results. For such users, participants suggested that a human mediated approach might be necessary, in which a mental health professional would be needed to carefully guide the patients through the explainable AI results and address their concerns and denial in a sensitive manner. In particular, participants cautioned against using GIFT in an unsupervised manner for patients in the earliest stages of grief as it could be too overwhelming.
For someone who's just bereaved and wants to learn more about their experience, it might be overwhelming or just sort of not as helpful for them to see (P01, Licensed Psychologist) We need to have the kind of safety net in case they are so triggered and become so hysterical. Right. We want to ensure that there is someone being with them or we need to hold them in a room. Need to just send them off and say bye. (P01, Licensed Psychologist) Furthermore, participants suggested that the explainable results should be presented in a manner which is "sensitive" for people in grief, hinting that the system should be able to incorporate cognitive and affective empathy approaches (which are commonly used in psychotherapy [95], [96]) when explaining the results to users. For example, P04 mentioned that the system could try to recognize and relate to what the bereaved is going through and be considerate to their current feelings when explaining the risk factors. In addition, if certain risk factors which are difficult to change (particularly those related demographics) were found to be predictors of PGD, the reasoning behind them should be explained to users carefully rather than just highlighting the presence of those risk factors through the model. for example, if it was someone who identifies as homosexual and that's a risk factor, I think you would really want to coach that in an understanding of like, you know why that's a risk factor (P01, Licensed Psychologist) Although these evidences did not understate the importance of a model's performance or explainability, they did show that presenting the results with a thoughtful manner could potentially lead to more desirable outcome.
Perhaps they want some answers to help with their coping, they want some more humane and empathetic explanation to their condition for example, we know that this person is no longer in your life, that's why it is quite painful to you. . . maybe such an interpretation/assessment would be better. (P04, Grief care Nurse) This emphasis on sensitivity seems to even be extended to the design of the user interface for the explainable AI system. For example, one participant pointed out that numeric results or charts could be perceived as too "mechanical" and "impersonal" and users might appreciate a less mechanical visualization of the explainable results in the form of emojis (smiling faces) or verbal narratives.
The way that you are analyzing people's characteristic using numbers seems very mechanical....Perhaps instead you can use facial expression scales (emoji) instead of these graphs... or a sort of thermometer to show it as hot and cold temperatures....perhaps a heart mark, a smiley face or a slightly sad face... some sort of illustration like this [might be better] (P04, Grief care Nurse) Another important aspect in designing an empathetic AI is that participants felt that it should be up to the patients themselves to decide on whether they should be shown the explainable results and whether an AI system should be used in their treatments. In addition, patients should also have a choice in determining the level of information that would be disclosed through the model. Participants also felt that the context in which the tool is implemented is another factor which would impact the level information that should be disclosed.
When patients want to see [the results]....they should be given a choice...Predetermining a time [in their grief] to automatically present the system to them seems a bit...for people who want to understand their grief and their strengths and this would be helpful to them...Maybe we can make a leaflet to explain to them that there such a system avaliable and allow participants to access it at a time they want (P04, Grief care Nurse) For people who chose to do so, they might want to see all the information. However, for people who were referred to use [the tool] by the doctor [during the treatment section], perhaps there should be a separate report section for the doctor and one that aims to explain information for the patient. (P05, Psychiatrist) For the bereaved individuals, providing advice on what the results imply and how they should understand their grief experiences can be much more critical than providing them the analytical results and numeric outcomes. An empathetic AI may provide explanation through narratives and storytelling, rather than numerical visualizations.

VI. DISCUSSION
To date, there has been a lack of research exploring the use of digital interventions for PGD in the early stages of bereavement [97]. Through our study, we seek to develop GIFT, a tool which can be used to screen for PGD and help bereaved individuals and their therapists to better understand their grief, tasks which form the cornerstone of developing effective preventive interventions for PGD. This involves the development of a machine learning model trained to classify PGD and explain to individual users the risk factors which might lead to this condition. While there have been prior studies exploring this topic [26], [66], [67], [98], none have included as many factors as our study, and most were aimed at examining whether these factors were associated with the development of PGD for the general population and not in the context of developing a machine learning model that could be used to screen and prevent PGD. Our best performing model achieved an acceptable AUC performance score of 0.77 (F-score=0.73, Accuracy=93.3%). Similar models have been developed to screen other mental disorders or mental health conditions through various data sources such as depression using social networks data (F-score= 0.73 using VOLUME 4, 2016 decision trees) [15], depression for senior citizens based on demographic and co-morbidity variables (Accuracy= 97.2% using Artificial neural networks) [99], anxiety for seafarers based on working condition and the presence of chronic diseases [100] (Accuracy= 89.4% using gradient boosting on decision trees) and PTSD based on demographics, trauma type and psychological co-morbidities (AUC=0.75 using a Target Information Equivalence Algorithm and SVM) [16]. However, to our knowledge, our study is the first to develop a classification machine learning model for PGD. In addition, while previous studies focused on the training and evaluation of the model [52], [53], [99], in our study, we had also codesigned and developed an online platform which could be used to implement the model by collecting data from users and providing meaningful explanation for users about their grief.
Overall, the result from our experiment to examine and evaluate the performance of classification models built using feature sets had provided useful insights into the effectiveness of various risk factors in screening for PGD. Interestingly, social-demographic factors (gender, low income, education level, religion etc.) did not seem to be effective features for the classification of PGD. Despite previous literature highlighting how such factors could play a role in the development of complicated forms of grief [67], our evaluation showed that intrapersonal bereavement risk factors and bereavement outcome factors tended to have a larger impact on our models [66], [101]. As such, there was a misalignment of our findings in the predictive power of demographic factors in comparison to previous literature [66]. Demographic factors such as the relationship with the deceased as a spouse, gender (being female) and age (being younger) did not seem to be strong enough to make an overall impact on the model. Most of the previous literature utilized Inventory of Complicated Grief (ICG) to screen for PGD [102], and there were not enough studies examining the association between demographic factors and PGD using PG-13 [28], the state-of-the-art diagnostic tool for PGD. More studies are certainly needed to further clarify the predictive powers of demographic factors for PGD.
Factors associated with the successful integration of stressful life experiences, Post traumatic stress and depression were features which played prominent roles. Much of these findings were aligned with previous literature. The ISLES which measures an individual's ability to adapt to stressful life events has been regarded as one of the critical factors in meaning reconstruction and researchers such as Neimeyer [103]- [105], Burke [66], [106] and Currier [107], [108] have demonstrated meaning reconstruction as one of the key predictors of successful adjustment post-loss. Bereavementrelated Depression and PTSD have been known to share many symptoms that are indicative of people in grief developing PGD such as severe emotional distress and intrusive memories and a group of studies have also indicated that PGD was often associated with bereavement-related PTSD and Depression [109], [110]. Taken as a whole, our results indicate that measures which denote psychological responses towards the loss (e.g. the ability to understand and make meaning out of the loss, tendency to avoid thoughts about their loss, signs and symptoms of depression) tended to be stronger predictors towards PGD than situational features related to the loss (e.g. whether users discovered the body, the suddenness of death) or features related to the specific relationship characteristics between the individual and their lost loved ones (e.g. whether their were problems and complications in the relationship). Such features may be too circumstantial to have a strong classification effect on PGD on the majority of users. However, they should not be discarded entirely in an explainable model as some of these features (i.e. level of dependency on the deceased, lengthy illness) do contribute strongly to the probability of PGD for specific individuals.
Surprisingly, despite high levels of neuroticism being thought of as a risk factor for PGD [36], [111], this factor did not contribute strongly to the classification model in our study. As this characteristic is also associated with other bereavement outcome features such as depression [112], which has a stronger impact on the model, this feature could be more of a co-variate than being a direct predictor to PGD. While social support based features had been shown to be a significant risk factor in a number of studies, [10], [113], [114], only a few were moderately strong predictors in our model. Features such as the lack of social support while experiencing grief (i.e. not having someone to talk openly to about their grief) for most users, was not found to be a strong predictor of PGD. One interpretation of the this is that such social support might only predict PGD if the griever felt it was needed or if they were dissatisfied with the support received [115]. However, features such as poor family dynamics and caregiver burden seems to have a moderate contribution to our models for some of the users, indicating that social support based bereavement risk factors that are more associated with family relationships have a better overall effect in classifying PGD. In addition, compared to measures that represented the nature of the relationship with the ones they lost (perceived level of dependency, whether the relationship was perceived to be problematic etc.), measures related to the psychological characteristics of how participants as an individual form relationships with others (such as attachment styles) played a stronger role in classifying PGD. In particular, the measurement related to the presence of insecure attachment styles was a strong predictor to PGD in our models, confirming the results from previous studies which suggested that avoidant or anxious attachment styles could be associated with complicated forms of grief after loss [116], [117].
Finally, the results from the qualitative feedback session with grief researchers and practitioners on our explainable AI UI mockup had highlighted several interesting insights into the use of machine learning models in grief care. While earlier works which aims to developed models to diagnose conditions in mental health tended to emphasize on perfor-mance [99], our findings echoed those from more recent studies which sought to put such models into practice, showing how the explainability of the model could be equally essential in enhancing user acceptance of the system [88], [118]. In the context of PGD in particular, participants in the interview mentioned that in practice, diagnosing whether a user has a mental health condition is often subjective and while they do use the results from diagnostic tools, it is often used only as a reference and practitioners tend to look into other factors such as symptoms or risk factors as well. Participants viewed the self monitoring system in a similar manner and regarded the accuracy and predicted score from the system more as a reference to rather than a definite index. The explainable aspects of the model was thought to allow users to better trust the diagnosis as well as increase awareness and understanding of their condition and was useful in helping practitioners form their own judgements about the patient's condition. Similar benefits of an interpretable machine learning model (particularly when used as part of a decision support system) have also been reported for clinicians in other medical domains apart from mental healthcare, such as for medical diagnosis in pathology and oncology [119].
Furthermore, compared to conventional explainable AI systems which tend to be designed for medical staff [59], [60], our results also highlighted the importance of moving from an explainable AI system to a more actionable and empathetic AI when designing for patients, or end-users in grief care. Perhaps one reason for this could be attributed to the sensitive nature of therapeutic care for bereaved individuals. After having so recently lost a loved one, such individuals could become "stuck" in their grief, finding it difficult to take the necessary actions to move forward in a positive manner with their losses [27], [28], [110]. Thus, it is understandable that participants felt that an actionable model able to suggest tailored mental health resources is more useful than a model which could only explain risk factors. In addition, the importance of patient choice in deciding whether they should be presented with the results, draws a parallel with the treatment practices found in therapeutic alliance and patientcentered care [120], suggesting that participants frame the use of AI through similar principles which they apply through in their therapeutic practice. Finally, while the results from an explainable AI model could be presented empathetically through careful mediation and dialog with a therapy staff, our results also highlight how the interface itself could be designed to convey an empathetic explanation of the conditions for patients (e.g. through pictograms or narration instead of bars or numbers etc.). This we believe points to an interesting design opportunity in which an explainable AI interface could be designed to not only be easy to interpret [121], but also to be comforting and sensitive to the emotional state of mental health patients.

VII. CONCLUSION
In our research, we aimed to develop an online application to support bereaved individuals in the early stages of grief, especially those who are at risk of developing Prolonged Grief Disorder (PGD) by building a screening system with explainable AI features and interviewing grief experts regarding the contexts in which such systems could be applied to and deployed to support the grief care procedure. We utilized a previously developed "Grief Inquiry Following Tragedy" application (GIFT) for data collection and feature demonstration. The application would help screen bereaved individuals for PGD and at the same time, help users better understand their condition and the risk factors associated with developing more complicated forms of grief. An earlier iteration of the application was deployed to collect data online from 611 users who had recently lost a loved one and based on this data we developed an explainable model for PGD which is used as part of the application. After experimenting with different machine learning algorithms and feature set combinations of PGD risk factors, we found that the Random Forest model trained using the Bereavement Risk and outcome factors resulted in the best classification performance. Afterwards, 5 experts in grief care were then interviewed to provide qualitative feedback on the use of screening and explainable AI systems in their practice as a means to screen for and monitor PGD. A thematic analysis of the interviewed results highlighted 3 key themes, including how screening models in mental health could benefit from a more empathetic and actionable AI and the importance of patient choice in deciding whether they should be shown the explainable results.
There are several limitations which should be noted in this study. First, the generalizability of our study should also be clarified for a clear interpretation of this results. In this study, participants were recruited mainly from within the United States and most of whom were native English speakers. Therefore, the study results may not be completely applicable for bereaved individuals from backgrounds with different cultural identities or context towards loss. The bereavement responses in the normative belief and rituals after death of loved ones can vary depending on the culture and as such, the result should be interpreted with caution and more intercultural studies would certainly be beneficial. Secondly, the applicability of our model with individuals who have experienced multiple losses still needs further evaluation. In the current model, the classification was based on the risk factors associated with a single death event. It is hard to conclude that the impact of risk factors would remain the same even though grievers experience multiple deaths. More investigations on the applicability of the model should be conducted for such users. Thirdly, when evaluating the results of the machine learning model in this study, we did not use a truly independent test dataset that was collected from a different sampling batch as a validation dataset due to the limited scope of the study. While we feel that our sampling is still valid and the model evaluation results are generalizable enough (as we were able to recruit a wide range of participants (different age group, loss types etc.) during our sampling), reaffirming the performance of the model with data collected during a separate time period or with a specific user group (such as with actual patients at clinics who are seeking help from grief) might further improve the general applicability of our model. In addition, readers should also be cautioned about the imbalanced dataset. While we had experimented with various upsampling and oversampling approaches (using methods such as SMOTE), we did not find significant improvements in the accuracy. For example, for the best performing feature set (Bereavement Risk and Outcome Factors), the best performing model after using SMOTE oversampling was the logistic regression model (AUC=0.72, F-score=0.62, Accuracy=0.887) which had lower performance when compared to the non-smote random forest model reported in our study (AUC=0.77, F-score=0.73,Accuracy=0.932). This seems to be because oversampling resulted in a higher number of false positives. Finally, due to the difficulty in recruiting experienced experts and practitioners in grief care, the feedback session was carried out with a relatively small number of participants. While we do believe that the results highlight several interesting observations which we hope would encourage further research into the use of explainable models in mental healthcare, they should still be interpreted with caution due to the limited sample size.
Following the development of the explainable model in this study, our future works would involve refining the current GIFT system into Empowered to Grieve (EtG), a single function online application that serves to screen for PGD in the first 12 months of bereavement and provide a userfriendly translation of psychological measurements and feedback to bereaved users. We aim to carry out a longitudinal study using the refined application to evaluate whether the use of an AI powered screening tool would be beneficial in the early stages of grief care. Through this study, we would investigate whether bereaved individuals would accept and trust suggestions provided through an AI system and whether such models impact their coping outcomes as well as track the long term impact of are system through pre-post evaluation of grief intensity.
WAN JOU SHE , PhD, is a research associate in medicine at the Center for Research on End of Life Care, Weill Cornell Medical College. She received her PhD degree in Industrial Design specifying on Designed Intelligence from the Endhoven University of Technology, The Netherlands in 2018. Wan Jou's research interests include digital health, selfmonitoring system, explainable AI, and Prolonged Grief Disorder.
CHEE SIANG ANG , PhD, is a Senior Lecturer in Multimedia and Digital Systems in the School of Computing, University of Kent. His main research area is in digital health, where he investigates, designs and develops new technologies which can provide treatment and (self-) management of health conditions, through effective prevention, early intervention, personalised treatment and continuous monitoring of the conditions. ROBERT A. NEIMEYER , PhD., is Professor in the Department of Psychology, University of Memphis, where he also maintains an active clinical practice. Neimeyer also serves as Director of the Portland Institute for Loss and Transition, which offers training and certification in grief therapy. Since completing his doctoral training at the University of Nebraska in 1982, he has published 30 books, including a series of volumes on Techniques of Grief Therapy and Grief and the Expressive Arts, the latter with Barbara Thompson, and serves as Editor of the journal Death Studies. The author of over 500 articles and book chapters, he is currently working to advance a more adequate theory of grieving as a meaning-making process, both in his published work and through his frequent professional workshops for national and international audiences. Neimeyer served as President of the Association for Death Education and Counseling (ADEC), and Chair of the International Work Group for Death, Dying, Bereavement. In recognition of his contributions, he has been granted the Eminent Faculty Award by the University of Memphis, made a Fellow of the Clinical Psychology Division of the American Psychological Association, and given Lifetime Achievement Awards by both the Association for Death Education and Counseling and the International Network on Personal Meaning. LAURIE A. BURKE , PhD, is a research assistant professor at the University of Memphis, where her work focuses on aspects of death, dying, loss, and grief processes, including violent death loss, complicated grief, social support, and families bereaved by terminal illness. A major research interest of hers is complicated spiritual grief -a spiritual crisis following loss. She led the development and validation of the Inventory of Complicated Spiritual Grief (ICSG) and is currently testing its revised version, the ICSG-R. She is a licensed clinical psychologist and maintains an active private practice in Portland, OR, serving grieving individuals, with a primary focus on assisting traumatically bereaved adults (i.e., individuals grieving losses from homicide, suicide, or fatal accident) YIHONG ZHANG , PhD, is currently a postdoc researcher at the Department of Multimedia Engineering, Osaka University. He received his PhD in computer science in 2016 from The University of Adelaide, Australia. Since then, he had worked as a postdoc researcher at Kyoto University, Japan and Nanyang Technological University, Singapore. Yihong's research interest include social computing, data mining, and statistical modeling. VOLUME 4, 2016