Exploring the Determinants of Users’ Continuance Usage Intention of Smart Voice Assistants

The use of personal voice-assistants like Amazon Alexa and Google Assistant has been on the rise recently. To ensure a long-term success and widespread diffusion of these products it is important to evaluate their continued usage scenario instead of the initial adoption intention. Majority of research evaluating the continuance usage scenario do so via an expectation-confirmation approach. However, in this work a user engagement-based approach is taken for evaluating the utilitarian and hedonic attitudes of the users towards the continued usage scenario. This is augmented with additional contextual constructs like trust, privacy risk, and satisfaction. At present, there is little empirical evidence of user engagement with voice-assistants. Moreover, the present work focuses on the continuance usage of late adopters by considering two unique personal factors (slowness of adoption and skepticism). By evaluating the engagement aspect of the laggard group, the current findings contribute to theory by providing a better understanding of how the proposed antecedents determine the continuance intention. Data is collected from 244 late adopters of voice-assistants who use these devices in their daily life for building the research framework. All the proposed hypotheses are supported except the effect of privacy risk. The implications for both theory and practice are provided based on the findings.


I. INTRODUCTION
In the recent years, voice-assistants have become a hot selling product in the technology category. The advancements in Artificial Intelligence (AI), Machine Learning (ML) and Natural Language Processing (NLP) have made it possible for the modern generation voiceassistants (e.g., Amazon Alexa, Apple Siri, Google Assistant, Microsoft Cortana or Samsung Bixby) to be programmed to think like human-beings and imitate the intelligence of human conversations [1]. One reason behind the popularity of the voice-assistants is their ability to facilitate humanmachine interactions in a natural and intuitive way, similar to human-human interpersonal conversations. In 2019, the global voice-assistant market was valued at 11.9 billion US dollars and is expected to grow to over 35.5 billion US dollars by 2025 [2]. Although these numbers are promising, yet as of 2020 a declining number of consumers say that they are likely to become first-time voice-assistant owners than in previous years [3]. In the long run for any product to be successful its repeated usage is necessary, in addition to the adoption by new consumers [4]. Thus, the continuance usage has a greater impact on the long-term viability and sustainability of any product [4][5]. Voice-assistants not only offer unprecedented opportunities and use-cases for the consumers but also bring with them a number of concerns, making them a hot topic for current research [6]. For example, most of the extant research on voice-assistants focus either on understanding the consumers interactions with them [1,[7][8], or the utilitarian and hedonic benefits [9][10][11] that they provide. The current generation voice-assistants are technologically far superior and possess a variety of novel skills to improve their engagement level with the consumers. For example, very recently Amazon has developed a new Neural Text-to-Speech technology (NTTS) and embedded it in their Alexa skills that will help to create a more natural and intuitive voice experience for the users [12]. This will allow Alexa to respond either with a happy/excited or disappointed/empathetic tone with varying levels of intensity depending upon the users' mood. Moreover, the NTTS technology enables Alexa to adapt to different speaking styles based upon the users' preferences and geolocation. Similar efforts are being put by other vendors like Google to equip their voice-assistants will real-time emotions and make them more realistic by having varying moods and having them to respond to the emotions of the users. The basic objective behind detecting the user emotion and modifying its behavior accordingly is to create a better engagement with the users. In fact, the ability of these voice-assistants to make happy or empathetic tones was found to increase the engagement and satisfaction of the users by over 30% [12]. While current literatures have focused on unearthing the utilitarian and hedonic experiences of the users with the voice-assistants [9][10][11], the effect of user engagement on the continuance usage is relatively underexplored [1]. In the recent years' user engagement has received a lot of research interest as it enables the companies to keep their customers engaged using specific products/services by coming up with innovative solutions that allows them to remain competitive in the market. With the onset of the Internet of Things (IoT) paradigm there has been a tremendous increase in the number of different services available to the users (e.g. smart-home, smart-city, smarthealth, voice -assistants, etc.) that makes them overwhelmed with regards to the variety of services available. Therefore, choosing which services to adopt and use continuously is a challenging task. If the users are not engaged with a particular product or service, they will abandon after the first use [13]. Thus, it becomes very important not only to assess the role of user engagement in emerging services provided by the voice -assistants, but also to examine its strength and impact leading to the continuance usage scenario [14]. One of the objectives behind this work is to address this limitation and help fill-in the research gap by providing a theoretical conceptualization followed by an empirical validation of the proposed research model focusing on the user engagement aspect. Another limitation of the current studies on voice-assistants is that majority of them investigate the initial acceptance, for e.g. the works in [1,8,10]. Although the importance of initial acceptance is crucial for understanding the initial popularity and market growth, the long-term viability and sustainability is better determined by the continued usage decision [4]. Attracting consumers for the first time is a relatively easier task to do when compared to the cognitive beliefs and affects formed by them based upon their own usage experiences with the voice-assistants. Initial use and repeated or continued use are two theoretically distinct concepts that are affected by different set of factors [15]. Therefore, the postadoption and initial-adoption stages are different [16] and should be explored by different theoretical frameworks. Consequently, in this study a post-adoption scenario is considered by including the concept of user engagement along with its determinants on the continuance usage scenario. A few existing studies have stressed on the importance of certain personal factors (e.g. personal innovativeness, selfefficacy, technology readiness, etc.) for explaining the continuance usage of different information systems [16][17][18]. However, very few works have actually investigated the effect of the personal factors on the continuance usage [18]. For example, researchers in [19] consider three personal factors and their effect on technological innovations. Most of the works that consider personal factors focus on the early adopters i.e. the users who are in the initial stage as per the diffusion of innovation curve [20]. Little empirical evidence exists with regards to the late majority/laggards i.e. the late adopters [19]. However, the continuance usage scenario is conceptually closer to the late adopters than the early adopters [19,21]. We anticipate that for the voice-assistant usage scenario, the late adopters are the most appropriate group to focus upon due to the following main reasons. First, a better understanding of this last 50% of the users' preferences and purchase behavior will give a better insight to the process of diffusion and success of the voice-assistants. Second, by focusing on the late adopters the post-adoption behaviors can be easily understood [19]. Third, since the late adopters are difficult to convince about a product [19], their feedback is extremely critical in removing the current obstacles/drawbacks that the voice-assistants might have. Thus, an understanding of the perceptions of the late adopters is an important aspect. However, to the best of our knowledge the current works related to voice-assistant adoption and usage do not explore this scenario. This study attempts to fulfill this gap by including the personal factors of the late adopters that are associated with the continuance usage intention. Specifically, based upon the work of authors in [19] two key personal factors that are the hallmark of late adopters are considered: slowness of adoption and skepticism. Overall, the contribution of this work is threefold. First, despite considerable research on the acceptance and users' behavioral intention towards voice-assistants, studies focusing on the continuance usage aspect is far from few. Since voice-assistants have been in the market for some time now it becomes important to delve into the details of the continuance usage aspect, rather than their initial acceptance for ensuring their future long-term sustainability. Second, the aspect of user engagement although very important has been largely overlooked in the voice-assistant usage scenario. No doubt the utilitarian and hedonic aspects are important dimensions to consider while using the voice-assistants-a fact that most of the extant research focus on, however; the level of engagement that these devices can provide may determine the attitude and satisfaction of the users towards using the voice-assistants on a continued basis. Unlike any other human-computer interaction scenario, the ability of the voice-assistants to engage in meaningful conversations and react verbally with emotions based upon the user's mood enables them to create an engaging environment and develop a deep bonding with its users. Therefore, by including user engagement as one of the core constructs that determine the continuance usage scenario this work tries to advance the current literatures. Third, this work is unique because it considers the personal factors related to the user experience that characterize the late adoption scenario. With this threefold objective the research model that is proposed in this work aims to advance the current theoretical understanding of the continuance usage scenario along with helping the managers and practitioners to specify a detailed pattern of the factors that will stimulate this continued usage. The remainder of this work is organized as follows: Section II presents the pertinent literature review together with the research hypotheses and proposed theoretical model. Section III presents the methodology and study design. The results are presented in Section IV. Section V provides the discussion and research implications, while the conclusion along with suggestions for future research is outlined in Section VI.

A. USER STUDIES ON VOICE-ASSISTANTS
Extant research focusses on two distinct aspects of voiceassistants: (a) technological aspect that tries improving the speech recognition, incorporating emotions into voice or improving the privacy and security aspects of voiceassistants, and (b) user behavior and acceptance aspect that primarily focus on the user interactions with the voiceassistants along with factors that lead to the acceptance and adoption of these smart devices. Since the objective behind the current work is more closely related to the second aspect, therefore we present an up-to-date status of literatures related to the usage and acceptance of the voice-assistants. Authors in [1] focus on the voice interactions between the users and household usage of Amazon Alexa smart speaker. The Uses and Gratification Theory approach is used for understanding the motivations for adopting and using these devices. While utilitarian benefits are found to be an important factor, the effect of hedonic benefits on the usage scenario is less significant and relevant only to small households. Authors in [8] investigate the mechanisms by which the users develop virtual friendships with the voiceassistants. They observe that the voice-assistants can provide a sense of intimacy, understanding and enjoyability by using a Social Response Theory approach. Similar work is done by researchers in [22] where they propose and validate a theoretical model based on the Parasocial Relationship Theory for explaining the continuance intention to adopt and use voice-assistants. They focus on the user tasks and find that task attraction, physical attraction and social attraction are the most prominent factors affecting the adoption of these devices. Authors in [9] extend the Wixom and Todd Information System Success Model mainly with utilitarian (perceived usefulness and perceived ease of use) factors and hedonic factors together with the trust and risk issues that affect the user attitudes towards voice-assistant usage. Further, they investigate the relationships separately based on gender. Authors in [10] examine the effects of Technology Acceptance Model (utilitarian benefits) on consumer engagement and loyalty. Although they take the concept of consumer engagement in their proposed model, yet they treat it as a simple construct having a singular dimension. However, current research on consumer engagement suggests that it is a much more complex and multidimensional concept combining the behavioral aspect, psychological aspect and a combination of both these factors [23][24]. Therefore, there is a need to conceptualize this multidimensional aspect of engagement for holistically examining its influence on the continuance usage. Authors in [11] use multiple models (Technology Acceptance Model, Theory of Planned Behavior and Value-based Adoption Model) for comparing the usage intention of the voiceassistants and further conduct a multiple regression analysis for evaluating the impact of each of the factors on the usage intention. They conclude that the hedonic aspects of voiceassistant usage have a greater impact than its utilitarian aspects. Based on Perceived Value Theory, authors in [25] develop a comprehensive model for explaining the potential customers' intentions to adopt voice-assistants. The reported results are similar to those in [11], meaning that the hedonic dimension has a greater significance than the utilitarian dimension. Further, the works in [25][26] also consider the utilitarian and hedonic aspects for explaining the usage of the voice-assistants. A diffusion analysis of voice-assistants among end-consumers is done by authors in [27] using a multivariate probit model that shows that the acceptance and usage of these devices vary with consumer age. As evident from the current literature review results related to the voice-assistant adoption fewer studies report the use of continuance usage as the dependent variable, when compared to the adoption intention. This implies that current voiceassistant related acceptance studies focus more on the initial acceptance, and not on the continued usage. Second, the effect of user engagement on continuance usage is underexplored. There is only one study that explores the effect of user engagement; however, it oversimplifies the concept of user engagement that does not fit well to the multidimensional conceptualization of this factor. The concept of user engagement and its multidimensional nature is discussed in detail in the next sub-section. The importance of utilitarian and hedonic benefits is established as most of the works consider these factors and find that the relationships are significant either with adoption intention or continuance usage (depending on the study objectives), though with a varying degree. Finally, to the best of our knowledge the effects of personal characteristics pertaining specifically to the late adopters on the continuance usage of the voice-assistants is not accounted for by extant research. Keeping in mind these research gaps, the current work proposes a theoretical model that aims to better explain the continuance usage scenario of the voice-assistants.

B. CONCEPTUALIZING USER ENGAGEMENT
The concept of user engagement has recently attracted significant research interests both from an academic and practitioner viewpoint [28]. Extant research has conceptualized user engagement in different ways [23][24]29], by broadly categorizing into three dimensions capturing the cognitive, affective, and behavioral aspects. This work adopts the conceptualization proposed by authors in [29] and considers user engagement as a higher order construct [14]. The cognitive dimension refers to the user's thoughts, knowledge, concentration, and interest in a specific object. For example, in case of the voice-assistants cognitive engagement refers to the user's interest regarding the services provided by these devices. Music streaming is one such service that is highly used by the users when engaging with a voice-assistant [30]. Since the voice-assistants use sophisticated artificial intelligence algorithms they can easily identify the user's music listening habits and preferences over a period of time and curate exclusive playlists for them. Similarly, using these devices for the purpose of in-car navigation helps to reduce visual distractions and consequently chances of accidents, still the voice prompts giving an accurate description of how to reach the destination. A variety of such services that are provided by the voice-assistants match well to the cognitive dimension of user engagement. This cognitive effect has been confirmed by other researchers too in the psychological stream [31][32]. The second dimension of engagement is the affective/emotional aspect. This dimension refers to the state of emotional activity i.e. a feeling of positive or negative inspirations or the pride related to and caused by using the engagement object (voice-assistants for the present case). The affective/emotional aspect of engagement is extremely relevant for the current scenario. The voice-assistants are different from other conventional smart devices as they possess human-like characteristics i.e., they are anthropomorphic by nature [33]. Anthropomorphism is defined as the tendency to attribute the actual or perceived behavior of non-human actors, human characteristics, intentions or emotions [34]. For example, one study in [35] related to the use of voice-assistants by elderly people show that these devices can provide socioemotional support to the users by providing them with an engaging experience that helps to reduce their loneliness. Thus, the users of the voiceassistants do not just perceive them as static objects but are interacting with them continuously and engaging through short conversations, commands or queries that helps building relationships with these devices [36]. Therefore, the affective aspect of engagement is very much prominent while using voice-assistants. The third dimension of engagement i.e., the behavioral aspect refers to the user's behavioral manifestations towards an object of interest [37]. When the users envision different ways in which they can interact or engage with the voiceassistants, naturally it will lead to a greater involvement. The way different users engage with their voice-assistants is based on their lifestyle patterns, daily schedules, the various activities that they do, and much more. In fact, the voiceassistants have become a central focal point in a smart-home setup as many of the smart-home equipments are voice enabled. Creating automatic routines for home-automation is not only useful and a convenient way of doing certain things, but also opens new opportunities for the users to interact with their voice-assistants [38]. The versatility of the voiceassistants, especially their capability to integrate with a myriad of IoT devices and sensors, for example in a smarthome setup will lead to an effective and productive usage of these devices. It will help the users believe that engagement with the voice-assistants help to satisfy their needs that will ultimately lead to their continued usage. The behavioral manifestation towards voice-assistants is also seen in case of people having certain special needs, for e.g., those who are physically challenged as it reduces their reliance on caregivers and help them regain some independence [39]. As evident from the above discussion the concept of user engagement is an extremely relevant factor for voiceassistants, however this aspect has not been much investigated. Current literatures dedicated to user engagement finds that for companies who promote their brand online, there is a need for maintaining an active engagement with the customers through repeated transactions [28,40]. In fact, majority of the studies focus on this online consumer-brand engagement, e.g. [14,28,29,40]. However, user engagement is context-dependent having varying levels of intensity [41], together with the different dimensions that are discussed above. The multidimensional conceptualization of user engagement will enable a holistic examination of its effect and influence on the continuance usage of the voiceassistants that is missing from the current works. Extant research has shown that all three dimensions of user engagement are related to the positive experiences when using a service [14,28,29,41]. For example, the cognitive dimension leads to a sense of pleasure [29,42], the affective dimension leads to feelings and a sense of bondage evoked by the interactions [14,28,29], while the behavioral dimension leads to a higher level of activation that may lead to a routine behavior [37,43]. All these aspects of engagement are highly relevant in the voice-assistant usage scenario as explained above. Therefore, each of these engagement dimensions are expected to increase the users' continuance usage of the voice-assistants and has been supported in previous research too [29,32,42]. This is stated formally as: H1: User engagement with the voice-assistants is positively related to continuance usage.

C. ATTITUDE AND CONTINUANCE USAGE
It has been suggested theoretically that user attitude can be treated as a composite component comprising of two distinct aspects: the utilitarian and hedonic attributes [45,46]. This bidimensional nature of user attitude is because of two main reasons: the affective (hedonic) gratification, and instrumental, utilitarian reasons [45]. The voice-assistants serve a dual-purpose role. From a utilitarian perspective these devices provide useful and convenient way for the users to fulfill certain tasks, for example searching for some information, adding items to the shopping cart, looking up customer service information or turning various electronic equipments on/off [46]. Ability of these devices to provide a total hands-free way of navigation not only makes the usage scenario simplified, but also provides the users with multitasking opportunities [9]. The learning curve associated with using the voice-assistants is low, as everything can be controlled merely by using one's voice that makes these ideal to be used by elderly and children alike [9]. Therefore, as utilitarian devices the voice-assistants are expected to create a positive attitude in the user's mind. At the same time, several researches have also established the hedonic aspect of the voice-assistants [1,8,9,11] and its importance in shaping the user attitude. These dual aspects are seen to positively shape the attitude in several other contexts also, for example in smart-homes [47][48], smart-wearables [49][50], mobile services [51] and many more. Use of any type of information system is for both utilitarian and hedonic reasons, and these are the main motivators that drive the system usage. Therefore, it is no surprise that majority of the acceptance related studies take these as determinants of system usage. Since the overall attitude that the users form towards any information system is a combination of these two aspects, therefore in this work attitude is treated as a composite construct that shapes the overall perception of the users leading to a continued system usage. Hence, the following hypothesis: H2: Attitude is positively related to continuance usage

D. TRUST, RISK, AND CONTINUANCE USAGE
The effects of trust and privacy risks have been investigated in a number of different technological settings, such as internet banking [52], mobile payments [53], smart-home adoption [47][48], and even in case of voice-assistants [9]. Trust can be defined as the extent to which users believe that using any technology will be reliable, credible and safe [54]. The idea of privacy risk is however opposite to that of trust that is weighed by the users while making decisions from a cost and benefit perspective [55]. For example, the voice-assistants gather a lot of information about its users beyond their knowledge and control. This prevents many individuals from using these devices, as they are concerned about their privacy [56]. The voice-assistants can do a variety of tasks from making appointments, placing orders, looking up information to make payments, due to which they need an extensive set of software permissions, which the individuals normally provide [57]. Thus, although the voice-assistants provide several benefits, but such benefits are associated with a new set of risk factors. To take full advantage of the services provided by the voice-assistants, both trust and risk are significant factors, and need to be weighed carefully. In a smart-home setup, since the voice-assistants can automate several tasks, therefore trusting this automation is the first step in reducing the risk perceptions. The level of personalized services that the voice-assistants provide will depend on a great extent the information that they collect from the users. In fact, such a privacy-personalization paradox is common with adoption of newer technologies [58], due to which trust and risk become important factors in this aspect. Extant research has found trust to be an important factor that shapes the continuance usage of any information system [59][60]. The effects of privacy risks are just the opposite that have a dampening effect by distracting the potential customers from using a product or service [1,9]. Therefore, both these factors are relevant and important ones that affects the long-term system usage. Consistent with previous literatures the conceptual framework proposed in this work also contends that trust and privacy risk will influence the continuance usage. Thus, it is hypothesized: H3: Trust is positively related to continuance usage H4: Privacy risk is negatively related to continuance usage

E. SATISFACTION AND CONTINUANCE USAGE
The satisfaction that is obtained by the users after purchasing and experiencing any product is one of the prime determinants of the continued usage intention [61]. Current literatures on user satisfaction are characterized by the prevalence of the confirmation-disconfirmation paradigm that is the basis of the Expectation Confirmation Model (ECM) [62]. ECM postulates that the users' continuance intention to use any information systems or services is positively affected by the overall satisfaction in using the systems and services. While the users perceive overall satisfaction in terms of the utilitarian and hedonic aspects of service consumption, yet it is an emotional reaction to the perceived differences between the expectations and the actual service performance received [62]. Several studies related to smart products and services have established the relationship between satisfaction and the continuance usage scenario [27,50]. Therefore, it is hypothesized: H5: Satisfaction is positively related to continuance usage

F. SLOWNESS OF ADOPTION, SKEPTICISM, AND CONTINUANCE USAGE
Current research has shown that the process of technology adoption and their continued use is affected by different types of personal factors, such as self-efficacy, personal innovativeness, technology anxiety and technology familiarity [16,17,18]. Authors in [19] pointed out that while using new technologies, slowness of adoption and skepticism are the main barriers encountered. Whenever the customers are resistant to any technology it will lead to a delay in the acceptance and usage scenario. In fact, around 50% of the customers belong to this group of late adopters as per the diffusion of innovation curve [19,20]. Users belonging to this group are usually skeptic towards any novel product [19]. A detailed analysis has been done on the late adopters by authors in [19,63] focusing on their personal characteristics that drive the late adoption of innovation. Based on those this work proposes slowness of adoption and skepticism as factors that affect the continuance usage of the voice-assistants. In the recent years the voice-assistants have been evolving at a rapid pace and their presence is almost ubiquitous now as these are embedded with all modern smartphone operating systems. With technological advancements the services provided by the voice-assistants have also evolved, and now they are an integral part of a smart-home setup. Thus, the world of digital technology is changing rapidly with the introduction of new types of smart products very frequently. Therefore, it is critical to create products that have not only a high rate of adoption, but also the users use them continuously for a prolonged period of time. Considering the maturity of the technologies that the voice-assistants rely on, it is important to understand the phenomenon of late adoption as it can help giving an insight behind the reasons of its slow diffusion. Thus, in the voice-assistant context it is important to focus and understand the determinants of late adoption that will ultimately ensure their faster market diffusion along with their continued usage. Slowness of adoption refers to how slowly the users accept a new technology or service [19,20,63]. This is one of the hallmark traits of the late adopters [19,20]. Extant research stresses on the importance of adding the slow adopters as the users of a system or service for understanding the long-term continuance usage [63,64]. Normally, the slow adopters have got a conservative and traditional mindset, and they tend to wait till the products are mature and the prices are low [63,64]. They tend to be more loyal customers than the early adopters [64]. Therefore, it is expected that the slow adopters will have a higher level of continuance usage intention. Skepticism is a feature of those users who have a cautious approach towards adopting a new technology or service [19]. Normally these users provide resistance towards any change in their regular habits often cause by using a new technology [19,64]. Therefore, these users do not prefer uncertainties that are associated with any new technology, and they want the technology to be developed and matured first. Authors in [17] point out that users having a high level of skepticism will in turn have a higher degree of resistance and not be motivated to use any new technology. However, as technology matures and once, they adopt these products/services they tend to be more loyal customers when compared to the early adopters [19,64]. Considering that the voice-assistants are not new in the market anymore, focusing on the late adopters will give the companies a chance to ensure a continued usage that will lead to their long-term sustainability. Keeping in mind the above arguments the following hypotheses are proposed: H6: Slowness of adoption is positively related to continuance usage H7: Skepticism is positively related to continuance usage Continuance usage is the final dependent variable. Fig. 1 shows the proposed research framework.

A. DATA COLLECTION AND SAMPLE
An online questionnaire is created and distributed via Google Forms for the purpose of data collection. The context of the administered survey is restricted to the users of Amazon Alexa and Google Assistant voice-assistants. The choice of these two voice-assistants is done due to the following reasons. First, Amazon and Google are the current market leaders in voice-assistants. As of 2020 the global market share of Alexa stands at 31.7% and that of Google very close to 31.4%, thereby jointly capturing nearly 63% of the market share [65]. Second, both these voice-assistants are matured and advanced, supporting a variety of skills and routines enabling the users to perform a multitude of tasks. Moreover, majority of the smart-home products and accessories are compatible with either or both voice-assistants. One challenge that is faced with regards to sample selection is to identify those who continuously use and at the same time are late adopters of the voice-assistants. To ensure that the questionnaire is filled out by respondents who are familiar with the present study context, some screening questions are use at the beginning of the survey. First, the respondents are asked if they have any previous experience using voice-assistants (Amazon Alexa and Google Assistant in particular). If any of the participants responded "no" to this question, then they are unable to take part in the remaining survey. Second, the participants who responded "yes" to this question are then asked if they have used these voiceassistants within the past three months. Those who answer "no" to this question are further screened out from the survey to ensure that the participants under consideration are regular users of voice-assistants, and not one-time users who do not use them on a regular basis. These two screening questions ensure that a continued usage scenario of the voice-assistants is simulated. Third, the participants who responded "yes" to  this question are then asked about their total usage experience with the voice-assistants. Although this third question is not used for screening purpose as all the participants are allowed to complete the remaining part of the survey, yet for the purpose of data analysis only those samples are taken who reported that their usage experience with voice-assistants is within one year. Considering that the voice-assistants have been in the commercial market for some time now (Alexa was launched in 2014 and Google Assistant in 2016) those who started using these within one year fit well to the late adopter group. Thus, by using these three questions at the beginning of the survey we aim to analyze the data from those who are not only the late adopters but use their voice-assistants on a continuous basis. A total of 731 responses are collected out of which 259 respondents reported their usage experience to be within one year. After data cleansing and removing incomplete responses the final sample size is 244. The mean age of the respondents is 31.25 years. The proportion of males and females are almost equal (52% male and 48% female). Most of the respondents (around 43%) reported a monthly household income between 30,000 to 60,000 Thai Baht (THB). Around 67% of the respondents use Amazon Alexa, while the remaining 33% Google Assistant. Using voice-assistant is: [32] UA 1 Not helpful -helpful UA 2 Unnecessary -necessary UA 3 Impractical -practical UA 4 Ineffective -effective UA 5 Useless -useful Attitude (Hedonic) Using voice-assistant is: HA 1 Not fun -fun HA 2 Dull -exciting HA 3 Not delightful -delightful HA 4 Not thrilling -thrilling HA 5 Not enjoyable -enjoyable I intend to continue using my voice-assistants, rather than discontinue its use CU 3 I will consistently use voice-assistants as much as possible

B. MEASUREMENT ITEMS
To ensure a good validity all the measurement items are adapted from extant literatures with relevant modifications in the questionnaire wordings to suite the present context. All the items (except attitude) are measured on a 7-point Likert scale, ranging from 1 (strongly disagree) to 7 (strongly agree). The attitude construct (utilitarian and hedonic) is measured on a 7-point semantic differential scale. Since the wording of the items are slightly modified, at first the draft questionnaire is shown to four experts in the field of information systems and technology adoption. This is done in order to ensure the content validity. Based upon the feedback received the wordings of a few items are modified for making the questionnaire more structured after which it is pre-tested on a small sample size of 14 university students.
These 14 university students also have experience using the voice-assistants and they are not a part of the main survey. The final questionnaire comprising of all the constructs of interest along with their original source of adaptation is provided in Table II.

C. COMMON METHOD VARIANCE
Common Method Variance (CMV) is a well-known problem associated with any type of survey research and is defined as the systematic error variance that is shared among variables which are measured with the same source or method [66]. Extant research outlines two main approaches for minimizing the effects of CMV: (a) by carefully designing the survey and (b) to use statistical remedies to control the impact after data collection. Thus, the first approach is during the design phase before data collection (procedural remedy), while the second one is applicable after data has been collected (statistical remedy). A marker-variable technique is used as the procedural remedy as outlined by authors in [67]. This technique uses a special variable that is deliberately prepared and incorporated into the survey. While creating this special marker-variable it should be ensured that it is theoretically unrelated to at least one variable in the study. In this work two marker-variables are used that are absolutely not related to the context under investigation: (a) I believe that the current pandemic (COVID-19) will come to an end by December 2020, and (b) The survey was of appropriate length. None of the marker-variables are significantly related to the continuance usage construct and have extremely low path coefficients (|β| < 0.015, p < 0.01), (, indicating that CMV will not have any significant impact. A Harman's single-factor test is carried out as a part of the statistical remedy as outlined by authors in [66,67]. For carrying out this test an Exploratory Factor Analysis (EFA) is conducted in SPSS version 17.0, and it is observed that for the original unrotated solution multiple factors emerge and the highest variance contributed by the first single factor is 34% of the overall variance. This is below the recommended level of 50% [66]. Additionally, a full collinearity test is performed by calculating the Variance Inflation Factor (VIF) values for all the independent and dependent variables in the research model. The highest VIF value of 3.88 is obtained for the dependent variable (continuance usage) that is below the threshold level of 5 as recommended for covariance-based methodologies [68]. All these are an indication that CMV should not have much influence on the results.

IV. DATA ANALYSIS AND RESULTS
The data analysis is done using the Partial Least Squares Algorithm Structural Equation Modelling (PLS-SEM). PLS is chosen due to the following main reasons. First, the primary objective of this work is to explain the variance in one dependable construct (continuance usage). Extant research has shown that PLS algorithm maximizes the variance that is explained by the final dependent variable [69] making it suitable for the present case. Second, the user engagement and attitude constructs that are considered in this work are treated as second-order constructs. This is because each of the sub-dimensions of user engagement (cognitive/affective/behavioral) as well as attitude (utilitarian/hedonic) have a distinct meaning and can be viewed as the overall causes for their respective factors. In such a scenario, the second-order reflective-formative model is a better empirical approach where the first order (sub) dimensions are measured reflectively, while the second-order dimension is measured formatively [70]. This type of a reflective-formative approach provides a simple and effective way of examining the total effects of user engagement and attitude as well as the role of their subdimensions. Extant research has shown that PLS approach is a better alternative than covariance-based methods when the research framework contains both reflective and formative measures [71]. Third, there is a consensus among information systems researchers that PLS is immune to smaller sample size and more tolerant if the data is not normally distributed [69,71]. Next, the measurement model is discussed first followed by the structural model.  Table III, which indicate that both these values are above the recommended level of 0.70 [73] and lower than the upper limit of 0.95 [71]. Moreover, all the item loadings are statistically significant (p < 0.01). For evaluating the convergent validity, the factor loading of each individual item is checked for a threshold level of 0.70 [71,74]. Following this procedure two items are removed (one from behavioral dimension, and one from hedonic attitude) as they have loadings of less than 0.70. Additionally, each of the constructs have an Average Variance Extracted (AVE) score of equal to or greater than the threshold value of 0.50 [69,71,73]. All these results are shown in Table III that indicates that the requirements for convergent validity are satisfied. The discriminant validity is assessed by using the Fornell and Larcker criterion [75]. As per this criterion four metrics should be checked: (a) the item reliability for each measure, (b) the CR value of each construct, (c) the AVE value of each construct, and (d) the inter-construct correlations. The interconstruct correlation matrix is shown in Table IV. From Table IV it is evident that the square-root of AVE for each latent variable exceeds the correlations with all other latent variables, thereby satisfying the Fornell Larcker criterion. Further, the correlation among the latent variables are significantly less than 1, and the construct has 95% confidence intervals for each correlation coefficient. Since none of the confidence intervals includes one, it establishes the discriminant validity. However, in some cases related to the correlations between the first-order constructs and their respective second-order constructs exceed the square-root of AVE. As per extant research in [70,71] this is an expected result and not critical in case of second-order models because the same underline indicators are used for the higher-order and lower-order components.    [69,71,72]. The presented research model explains 79.4% of the variance in the final dependent construct i.e., the continuance usage of the voice-assistants. The obtained R 2 value has substantial predictive power and is statistically significant. The structural model is presented in Fig. 2 [73]. Apart from measuring the R 2 value of the research model the predictive relevance of the model in terms of the Q 2 statistics is also calculated. Specifically, Stone-Geisser's Q 2 value is taken to be the criterion for predictive relevance and is obtained by using the blindfolding procedure. Blindfolding is a sample re-use technique, which systematically deletes data points and provides a prognosis of their original values [76]. A Q 2 value of 0.458 is obtained for the endogenous latent construct (voice-assistant continuance usage), which is sufficiently greater than 0, indicating the model's predictive relevance [76]. In this work user engagement has been treated as a composite (second order) construct. As previously mentioned in the literature review section, extant research envisions engagement as a multi-dimensional concept. Therefore, whether this hierarchical decomposition of user engagement into sub-dimensions and treating it as a second-order construct is justified or not needs to be answered. For this reason, an alternate model (Model 2) is tested where user engagement is completely decoupled from its three subdimensions because maybe theoretically it may be more appropriate. Re-running the PLS algorithm for Model 2 indicates that the various model-fit statistics are inferior when compared to the model proposed in this work. Table VI shows a comparison of the fit indices of the two models. For Model 2 the Chi-square value is higher than the proposed model, which indicates that the model proposed in this work gives a better fit. Further, a Chi-square difference test is also conducted to examine the statistical significance of the Chisquare values and the result is found to be significant. This means that treating the user engagement construct as a higher-order one is fruitful because in the decomposed model due to the introduction of additional paths from the user engagement construct to continuance usage leads to a deterioration in the model-fit indices.

V. DISCUSSION
As mentioned before in the Introduction section there are threefold objectives behind this work. The first objective is to highlight the current continuance usage scenario of the voiceassistants as opposed to their usage intention that majority of the current voice-assistant literatures focus on. Since these smart-devices can provide virtual companionship to the users and their anthropomorphic features are often highlighted, hence the level of engagement provided will be a major driving force behind their continued usage. This is the second research question proposed in this work. Additionally, most of the extant voice-assistant literatures focus on the behavioral intention and early adopters, and not much is known about the personal traits of the late adopters that shape the continuance usage scenario. Consequently, as the third objective of this study slowness of adoption and skepticism are taken to be the reference personal traits of the late adopters for examining their effects on the continuance usage. The research model is augmented with certain other relevant contextual factors: trust, privacy risk, satisfaction, and attitude (utilitarian and hedonic). Given the unique characteristics possessed by the voice-assistants (hands-free, controlled by voice, and deep integration with various types of smart-home devices), the existing models of technology adoption are not comprehensive enough in explaining their adoption and usage. The research model presented in this work has a high explanatory power (R 2 = 0.794), explaining 79.4% of the variance and provides some important contributions to the field of field of technology adoption and human computer interaction scenario related to the voiceassistants.

A. THEORETICAL CONTRIBUTIONS
First, related to the adoption of the voice-assistants the present work provides a new way of understanding the continuance usage scenario by focusing on the aspect of user engagement. In fact, in the proposed model user engagement not only has the highest significant regression weight towards continuance usage, but it also has a large effect size. Moreover, in this work the effect of user engagement is rigorously tested that has seldomly been done related to any type of smart-products. The user engagement concept is more prevalent in an online brand community setting [14,28,29,41] and is an under-researched construct related to the adoption of smart-products. However, one of the goals behind the modern technologies like artificial intelligence, machine learning or natural language processing is to improve the "humanness" of the smart-devices like voiceassistants and leverage their anthropomorphic features for providing better human companionship and assistance that can lead towards a vivid user engagement. Thus, although engagement is a highly relevant construct in the adoption of smart-devices, its effect is not examined at a greater depth by existing literatures. The current theoretical findings lend support that through the lens of engagement the users can envision their continued usage with the voice-assistants. Thus, the theoretical importance of user engagement is reinforced for explaining the continued usage scenario. The current literatures on user-engagement points towards a multidimensional flavor, however, the way by which this multidimensional aspect is theorized is not clear and differs in different studies. For example, some studies have modelled user engagement as a multidimensional composite construct (i.e., measuring the cumulative effect of the different dimensions) [10,14,77], whereas other studies have modelled it as a first-order multidimensional construct (i.e. measuring the individual effects of each dimension) [29,78]. Therefore, there is no consensus as to which is a better approach of treating the user engagement construct that will ultimately lead to a greater predictive capability of the overall research model. Due to this lack of agreement in theorizing the user engagement construct, in this work the framework's structure is evaluated by considering an alternative (Model 2). The results (from Table 6) suggest that there is a flexibility in modelling the user engagement construct, the results being better when engagement is theorized as a higher-order composite construct, with all the dimensions of user engagement (cognitive, affective, and behavioral) being confirmed, and the cumulative second-order construct (user engagement) significantly predicting the outcome variable i.e., the voice-assistance continuance usage. Therefore, the current study from a user engagement perspective provides a consolidated understanding and future direction regarding how this construct should be modelled in conceptual frameworks related to adoption and continued usage of smart technologies.
Another key contribution of this work is to focus on the adoption aspect of the voice-assistants under the theoretical lens of the continued usage scenario, and not the behavioral intention to adopt. While extant research has clearly demarcated the difference between the two approaches (continued usage vs. usage intention) of technology adoption [15,16], the novelty of this work lies in its selection of the antecedents for continued usage. From a theoretical perspective majority of the studies related to adoption of smart products take an Expectation Confirmation Model (ECM) based approach (or some of its derivatives) while explaining the continuance usage intention. For example, the studies in [9,16,49,50] are mostly ECM based where satisfaction obtained after using a product is the prime factor affecting its continuance usage. While the role of satisfaction in determining the continued usage cannot be undermined, yet for smart products like voice-assistants it might not be the best way to judge the continued usage scenario. This is primarily due to the different characteristics of voice-based interaction that totally provides a newer way of interacting with devices equipped with these technologies. As mentioned before, the anthropomorphic capabilities possessed by the voice-assistants are unique that leads to better user engagements. Therefore, by decoupling the central idea of satisfaction related to continued usage and incorporating the user engagement construct additionally provides an alternative theoretical approach for evaluating the continued use construct. In the proposed model, the impact as well as the effect size of the engagement construct when compared to satisfaction on the continued usage is far greater, with both being significant. This indicates that in case of technologies that incorporate artificial intelligence for making the devices smart and intelligent, ECM might not be the best theoretical tool to evaluate the continued usage scenario as it might overlook some aspects of technology adoption.
In line with existing research, this study also examined the effects of utilitarian and hedonic attitudes that are formed by using the voice-assistants and how they shape the continuance usage scenario. However, attitude has been theorized as a second-order construct instead of first order that is common-place with existing literatures on voiceassistant adoption [9,10,25,26]. Theoretically, such a proposition has the following advantages. First, extant research has established that attitude has a multidimensional nature with at least two components: utility and enjoyment [45,46]. In such a scenario if the relationships between the second-order construct and its indicant lower-order factors are high, then the second-order construct represents an entity that reflects the essence of the specific lower-order factors because the residual variance unexplained by the secondorder construct in each indicator is small. For example, as evident from Figure 2 for the present case both utilitarian and hedonic dimensions are highly related to the attitude construct. Such a high substantial significant correlation indicates that attitude represents the variance shared by the two indicant factors: utility and hedonic. While the significant effect of both these attitude dimensions on continuance usage is not a new finding, however, our conceptualization and treatment of attitude as a second-order construct is advantageous theoretically because it permits creation of a simple causal structure for predicting the continuance usage without neglecting any specific details. Moreover, it also opens the chance for including any further specific details (apart from the utilitarian and hedonic aspect) that may serve as additional indicators of the continuance usage construct. If the additional indicators do not satisfactorily reflect the higher-order construct i.e., if they have a weak correlation, then in that case they might be best represented by an independent construct. Therefore, by treating attitude as a second order construct we can create a theoretical model that is flexible by permitting the addition of other specific constructs that might/might not be related to their specific higher-order levels. Having such a theoretical flexibility is advantageous, especially when considering the adoption of smart products, because typically with advances in technology the smart products also evolve very quickly with time that might change their adoption scenario. The companies keep on adding new features, functionalities and services to their smart devices that might radically change the way people use them. For example, authors in [79] show that fashion has an important role to play in technology acceptance. In fact, various smart devices like smartwatches, fitness trackers, smart-glasses, etc. are often perceived as fashionable items that drives their usage [79,80,81]. Similar arguments can be made for the case of voice-assistants too. With rapid advancements in artificial intelligence technology, individuals may adopt and use the voiceassistants to look trendy and enhance their social status to make them more important in their peer groups. Such a fashion-centric attitude towards voice-assistant continuance usage has not been investigated by current works and proposing attitude as a second-order construct gives the theoretical advantage and flexibility of adding this new attitude dimension. As contextual factors, the effects of trust and privacy risk are included in the proposed research model. While the effect of trust is found to be significant, that of privacy risk is nonsignificant on the continuance usage intention. Although these two factors are not new per-se and have been used in numerous studies related to the adoption of smart devices, yet the current context is unique and hence it will be interesting to examine the effect of these two factors on the continuance usage of the voice-assistants. Generally, majority of the extant research point towards a positive significant effect of trust and a negative significant effect of privacy risks on the usage scenario. Thus, the current results disagree from the privacy risk perspective. This is an important observation because previously we mentioned that the unique features that are available with the voice-based smart systems may not facilitate to explain their adoption in terms of the existing theories. The voice-assistants are like human companions possessing anthropomorphic features with a certain degree of emotional support for its users. Moreover, the use of voice is convenient as it provides a hands-free way to do certain things, for example making payments while cooking, answering calls while driving and many more that makes the users' inclined towards using these devices. By including the trust and risk factors in the model simultaneously the users of the voice-assistants can easily create a mental model that allows them to judge the relative advantages and disadvantages of using these devices. This can be one possible explanation for the non-significant effect of privacy risk in this work whereas in prior voice-assistant adoption works [1,9,22,26,82] it has been significant. Therefore, from a theoretical perspective while engaging in adoption studies related to anthropomorphic technologies the dual effects of benefits and risks must be weighed simultaneously as it will enable the users to create a cognitive model that better explains the adoption and usage scenario. The final key contribution of this work is the incorporation of the personal factors: slowness of adoption and skepticism that characterize the late adoption in order to ensure the continuance usage. To the best of our knowledge not much of the extant literatures have focused on the late adopters and their insights to use smart devices like voice-assistants. Therefore, the current work contributes to the literature of adoption and diffusion of innovative technologies and devices by focusing on the specific aspects of this laggard group. Since the late adopters represent fifty percent of the users under the diffusion of innovation curve, understanding the factors that lead to their continued usage of any product/service is of utmost importance. Extant literature characterizes these type of users as being slow to adopt any new technology or innovation and being resistant to changes [19,63,64]. Both these constructs are found to significantly affect the continued usage of the voice-assistants. This is in line with previous results in [63,64], which show this group to be more loyal customers having a high level of continuance intention. Moreover, it is also an indication that the current generation of voice-assistants are matured products having reasonable price that makes the late adopters use these on a continued basis. Therefore, including the personal factors related to the late adopters help in advancing the literature related to continuance usage of smart devices by focusing specifically on this group of users.

B. PRACTICAL CONTRIBUTIONS
There are several implications for practice. First, for the manufacturers, it is important to make investments in voiceassistants keeping in mind the users' needs rather than following a market flooding strategy. They should know the users' usage pattern of the voice-assistants and the features that can improve the engagement with these devices. As technology improves continuously and we have a better understanding of natural language processing or artificial intelligence, the developers of the voice-assistants should focus on making the conversations with these devices more human-like and natural. For example, extant research has shown that the usability of voice-assistants depends on the accent of English spoken and varies among the native/nonnative speakers [83]. Similarly, the attractiveness of the voice (tone, timbre, sound intensity) also has effects on the usage scenario [84]. Therefore, the manufacturers must stress on the quality of the synthetic speech that the voice-assistants have for improving the engagement with these devices. The current findings also reveal the significant effects from both utilitarian and hedonic attitudes that the users have towards these devices. As utilitarian devices the users are engaged and motivated to use the voice-assistants for completing various tasks, for example searching for information, placing orders, making payments, seeking support and several others. The developers must focus on creating more application scenarios/skills for the voiceassistants focusing on this utilitarian value that provides convenience to the users. At the same time, the hedonic aspect should not be neglected too, and the developers may focus on creating more engaged and playful services like verbal games, quizzes among others to promote the continued usage scenario. For user engagement although all the three dimensions are significant, however, the effects from affective and behavioral dimensions are the strongest. Therefore, the practitioners are encouraged to closely consider the affective and behavioral aspects because they are the most salient dimensions of user engagement for explaining the voiceassistant continuance usage. To improve affection, the managers should consider how the voice-assistants can offer personal meaningful experiences to the users. For example, the idea of voice-assistants having emotions and acting as human companions can be made as the core marketing strategy for these devices that may be able to build a connection between the users and the company's brand. Specific target groups can be focused on, for example the elderly people since recent statistics show that globally there has been a rise in the number of such people living alone [85]. Therefore, the voice-assistants are ideal devices for them for providing companionship and assistance in various activities of daily living. Likewise, for focusing on the behavioral manifestation of the users the managers should consider the integration of the voice-assistants with a greater smart-home setup. This will provide an even greater opportunity for the users to interact and engage with voicebased services for a variety of purpose and increase the level of activation. The result from this work also has important implications as to how the practitioners should handle the issues of trust and privacy risks. The results demonstrate that trust is a more significant factor for the users. Although it can be argued that trust and risk are the opposite sides of the same coin, still it implies that the practitioners should market these products around the frame of trust rather than minimizing the effects of privacy risks. In fact, a survey in [86] found out that 75% of the users are willing to share their personal information with the brands that they trust. Amazon, Apple, Microsoft, Google and PayPal are found to be the top 5 trusted brands as per the survey reports. Interestingly, the place of Facebook on the list is 92, which may be due to some past trustcompromising activities and policies followed by the company. This gives a stern message to the marketers that the size and dominance of a company does not guarantee consumer trust. Thus, efforts should be made to leverage aspects such as guarantees with respect to product satisfaction, meeting the expectations of the users, having a transparent and user-friendly usage policy-all of which will be helpful for building and improving trust. At the same time anthropomorphizing the voice-assistants will also help to increase trust. Authors in [87] have found that giving human characteristics to technology improves the trust in the technology. Therefore, advertising the anthropomorphic features of voice-assistants will serve the dual purpose of improving user engagement as well as the feelings of trust. Finally, efforts should be given to create a cycle of high user satisfaction. Satisfaction is a key construct to guarantee continuance usage. Therefore, as the voice-assistants evolve with respect to new features and capabilities the users must be aware of the new skills. In the voice-assistant market there is no uniform product standard yet, and therefore it is highly possible that different companies will focus on different aspects of skill development that might confuse the users and create frustration. This can be a problematic scenario, especially for the late adopters since they prefer a stable environment and are generally resistant to changes. Therefore, it is extremely important for the developers to ensure that the new features that they incorporate to the voice-assistants are not too radical, nor they remove any features or functionalities that the devices currently have. The changes and advancements that are made should be incremental with an objective to provide a more stable and usable system, rather than providing gimmicky features. Therefore, the marketers should prioritize on how the voiceassistants can assist the users in their daily living and provide them relevant examples of how any new skill/feature can be beneficial for them and provide a sense of satisfaction.

V. LIMITATIONS, AND FUTURE RESEARCH
While there are certain strengths of this research like strong predictive power, testing for alternative model structures and the investigation of user engagement in the voice-assistant context, there are also certain limitations that must be acknowledged. The first limitation is with respect to the survey procedure. The data which is collected in this work is of cross-sectional type that can measure the users' perceptions at any one point in time. Although it seems to be a valid choice for data collection keeping in mind the objectives of this work (focusing on one specific stage of the adoption cyclecontinuance usage), yet it might not fully capture the dynamic and interactive nature of the voiceassistant usage scenario. Therefore, future studies can extend the current findings by conducting a longitudinal survey for checking whether the user intentions change over a period of time. Moreover, given the rapid pace at which artificial intelligence-based technologies are developing, the results of the current study cannot represent accurately where this technology is heading. New developments may change the user habit and usage behavior. Therefore, a longitudinal survey might be a better option. Second, the focus of this work is on the continued usage aspect of the voice-assistants from a late adopter perspective. However, there can be many users who tried using the voiceassistants but discontinued their use due to some reasons. The inclusion of such subjects will decrease the overall satisfaction level of the users that might in turn affect the continuance usage scenario. User commitment to use any product/service is therefore an important aspect that has been neglected in this work, along with why certain users discontinue their use of the voice-assistants. Future studies may therefore collect data parallelly from the current users and the ex-users of voice-assistants who discontinued their use for investigating into the commitment and loyalty aspect that drives the continuance usage. Third, the concept of user engagement that is used in this work is based on the work of authors in [29]. Although this construct exhibits a high degree of reliability and validity in the current study, yet theoretically it has its base situated in brand value of a company and online community settings. Brand engagement in a social media context maybe different from engaging with emerging technologies and services provided by smart devices like the voice-assistants. Therefore, future research may focus on developing a new