Systematic Literature Mapping of User Story Research

User stories are a widely used artifact in Agile software development. Currently, only a limited number of secondary studies have reviewed the research on the user story technique. These research reviews focused on specific research topics related to ambiguity of requirements, effort estimation, and the application of Natural Language Processing. To our knowledge, a systematic mapping of all user story research has not been performed. To this end, we study the academic literature to investigate what user stories research has been performed, what types of problems have been identified, what sort of solutions or other types of research outcomes have been achieved, how mature the research is, and what research gaps exist. We followed Systematic Mapping Study guidelines to synthesize the currently available academic research on user stories. In total, we found 186 unique peer-reviewed studies, published in the period 2001-2021. We observed that research on the user story technique and its use had grown exponentially over the last seven years. Further, using a five-dimensional classification framework– requirements engineering activity, problem class, outcome class, type of research, type of publication– we observed several patterns in the classification of these studies across the different framework dimensions, which provided insights into the state-of-the-art and maturity of the research. We also identified four research gaps: the paucity of focused literature reviews; a lack of research on the role that user stories play in human cognition and interaction; a lack of comprehensive and mature solutions for resolving ambiguity issues with user stories early in the project; and a lack of validation and evaluation of proposed solutions. Several research opportunities are suggested, making our paper a useful reference for future research on user stories allowing researchers to clearly position their contributions.


I. INTRODUCTION
A rising trend in the adoption of Agile Software Development (ASD) practices has been observed after the publication of the Agile Manifesto [1], [2]. This trend can be explained by the rapidly changing business environment that requires adaptive and flexible systems to support business/organizational competencies [3], [4]. ASD addresses these needs through promising several benefits, including a high-quality product [1], [2], efficient resource usage [3], faster software development, and high adaptability of the requirements [4], [5]. ASD not only focuses on design and The associate editor coordinating the review of this manuscript and approving it for publication was Hailong Sun . coding activities but also on requirements engineering (RE) activities that run through the project lifecycle [3], [6]. Those activities encompass requirements elicitation, documentation, analysis, negotiation, validation, and management [2], [7]. ASD offers flexible and iterative processes for identifying and changing requirements, even in late stages of the software development process [2], [8]. However, those processes require extensive interaction within the development team and between developers and users to assure software quality, timely delivery, customer satisfaction, and product conformance [9]- [12].
Regarding user-developer interaction, user stories have been proposed to express commitments between the development team and (a type of) user [13], [14]. These VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ commitments, which express user expectations, are described in a semi-structured natural language, often using predefined templates [15], [16]. Effective communication among stakeholders is encouraged by providing such user story template as a standard requirements description language. Further, user stories are always written from a user perspective and actively involve users in the software development process. More effective communication among stakeholders improves productivity and customer satisfaction [17]- [21]. Throughout the project lifecycle, user stories help sharing the understanding of the expected system goals and functions and are also beneficial to monitor progress towards developing the expected system features and identifying persisting problems [2], [15], [22], [23]. Despite these promises, empirical studies on the use and benefits of user stories have mixed results. On the one hand, studies have contributed empirical evidence that user stories are indeed advantageous in improving team productivity, software quality, and faster delivery [23]- [25]. However, some studies demonstrated that user stories are vulnerable to multiple interpretations and fail to capture complete requirements [21], [25]. These problems have encouraged researchers to improve the user story technique, for instance by leveraging the knowledge captured in conceptual models [26]- [29], documented user experiences [15], [30], and ontologies [31]- [33]. These proposed improvements intend to more unambiguously and completely capture and specify user requirements, by bridging knowledge gaps between user and developer communities and by developing a shared understanding of system features among project stakeholders. However, it is not known whether these solutions solve all the problems and how effective these solutions are.
In [34], we performed a systematic literature review of 36 studies on ambiguity in user stories. The aim of the review was to analyse and synthesise what is known about ambiguity in user stories in terms of how this problem manifests itself, and what causes and effects have been described in the literature. Furthermore, the review discusses and compares the different solutions that have been proposed and the empirical evidence of their effectiveness. The purpose of the review was to provide information that researchers interested in further investigating or solving the problem of imprecise or multiple interpretations of user stories, can use to motivate their research questions and to compare their outcomes with the most relevant related work.
The 36 studies reviewed were taken from a much larger set of 165 peer-reviewed studies investigating the user story technique or the use of user stories in ASD practice. Requirements ambiguity related problems were just one type of problem addressed by these studies. A comprehensive mapping of the 165 studies in terms of problems investigated, solutions proposed, and evidence of solution effectiveness, was not within the scope of our past systematic literature review as the sole focus of this review was the problem of ambiguous user stories. Also, a systematic search of the literature did not identify published literature studies of RE or ASD that systematically mapped all known research on the user story technique or the use of user stories in RE activities during ASD projects.
Our current study fills this gap by mapping the currently documented knowledge of applications, evaluations, and improvements of the user story technique. This paper extends our previous study by updating the systematic search for published peer-reviewed research on user stories (resulting in an extended set of 186 unique studies) and investigating what research on user stories has been performed, what type of problems have been addressed, what sort of outcomes these studies have produced, how mature the research on user stories is, and what research gaps and further research opportunities exist regarding user stories and their use during RE activities of ASD projects. We believe this systematic literature mapping can be a useful reference for future research on any topic related to the use of user stories in ASD or topics related to RE in Agile methodologies or ASD projects (and not just requirements ambiguity as in our previous study [34]) allowing researchers to clearly position their contributions.
Our systematic literature mapping was guided by the following research questions (RQs): This paper is divided into six sections -the first section, which is this section, presents the knowledge gap and our study's objective. Section 2 describes and exemplifies key concepts of the user story technique and briefly discusses previous literature studies in ASD or RE, to contrast them with our study. Next, section 3 presents our literature search, selection and classification methodology. Subsequently, section 4 presents the results of our systematic literature mapping. Section 5 discusses these results and their implications in terms of future research opportunities. It also discusses the limitations of our mapping study. Finally, section 6 concludes the paper.

II. BACKGROUND AND RELATED WORK
As background for our paper, we first explain what a user story is in the context of ASD. Next, we discuss secondary studies (i.e., research reviews and literature mappings) in the domains of ASD and RE.

A. THE USER STORY TECHNIQUE
The user story is a lightweight RE artifact that has been promoted by major ASD methodologies like Scrum and SAFe. It allows for a standardized description of system features required by users, who are actively involved in the RE process. These requirements are always formulated from the perspective of a (type of) system user. Prospective system users are thus the source of the user stories, which after writing, act as a contract between these users and the development team. All changes to user stories need to be negotiated, and the planning and progress monitoring of system development can be based on prioritized lists of user stories. Thus, a user story becomes a 'unit' of system functionality based on which system development is managed. In ASD practice, templates are used to standardize user story formulation. The best known of these templates is the Connextra template, popularized by Mike Cohn [14]. In the survey reported in Lucassen et al. [24], 59% of the respondents indicated using this template for user story writing. The template is: ''As a <role>, I want <goal> so that <benefit>'' In this template, <role> describes a (type of) system user who wants the system to achieve or do something, <goal> describes the action to be performed by the system in support of the user, and <benefit> provides the rationale for this action for the user. An example user story following this template is: Related to RE in ASD, Inayat et al. [35] and Heikkila et al. [11] conducted, respectively, a systematic literature review and a systematic mapping study, both published in 2015.
These studies provide insights into the challenges faced in traditional RE, how ASD overcame those challenges, and what kind of problems persist. Other systematic literature reviews and mapping studies investigating ASD, though not specifically focusing on RE activities in ASD projects, were reported in [36]- [40]. In the RE domain, systematic literature reviews and mapping studies were mainly conducted to identify frequently used techniques, current limitations of the techniques, and characteristics of RE artifacts [6], [41]- [49]. More specifically related to user stories, apart from our own review of the research on ambiguity in user stories [34], the systematic search of the literature that we conducted for this paper (see section 3) returned three reviews of user story research. Khan et al. [50] reviewed 24 papers on project effort estimation techniques that are based on user stories. The goal of the literature review was to identify those characteristics of user stories that affect the effort estimates. A similar review was performed by Duran et al. [51], who investigated, based on a set of 26 papers, which attributes related to people, software systems, teams, projects, and organizations, are used to estimate the complexity of user stories. The most recent review (published in 2021) is the study of Raharjana et al. [52], who reviewed 38 studies that discuss the application of Natural Language Processing (NLP) techniques to user stories. Their review identifies different purposes of applying NLP, concludes that NLP helps managing user stories, and identifies opportunities and challenges in applying NLP techniques to user stories.
All these literature studies on user stories (i.e., [50]- [52], [228]) were systematic literature reviews that reviewed a limited set of papers (24 to 38) for answering specific review research questions. Given that the user story is the most popular RE artifact in ASD [15], [16], [24] and interaction with users during ASD remains challenging [2], [14], [53], a systematic mapping of all published research on user stories will contribute to an overview and better understanding of our knowledge regarding this technique. As a systematic literature mapping of 186 unique peer-reviewed studies on user stories, this paper distinguishes itself from the systematic literature reviews with specific user story research foci that were discussed in this section.

III. METHODOLOGY
We followed the guidelines of Petersen et al. [54] for systematic mapping studies in Software Engineering to retrieve to the best possible extent all relevant studies from carefully selected digital libraries and to summarize these studies to provide a structured overview of the published research on user stories. This section presents our search strategy, the selection process, and the classification schema that we used to structure the literature overview and summarize the state-of-the-art, as an answer to our research questions (see Section 1).
The search strategy and selection process that are presented in this section were applied in our previous systematic literature review on ambiguity in user stories, as in the first search and selection phase of that review study, we aimed for exhaustiveness in our identification of peer-reviewed papers on user stories. In the second phase of the selection process, we retained only the papers that addressed ambiguity problems with user stories (i.e., 36 out of 165 papers). To make this paper self-contained, we repeat the description of the search strategy and selection process as they were published in [34]. However, we emphasize that the paper counts mentioned are different as we repeated the search and selection at the end of December 2021, focusing on recently published papers. This way, we could extend our set of relevant papers with 21 new papers, for a total of 186 papers to be analysed. The classification schema used for this analysis is new, hence not taken from [34].
The search strategy was designed by first defining our search space as consisting of the following digital libraries: Web of Science, Scopus, Science Direct, Google Scholar, IEEE Xplore, Association for Computing Machinery (ACM) digital library, and Association for Information Systems (AIS) e-Library. The reason for selecting these digital libraries was pragmatic -they are either freely accessible or our research institute provides access to them. Especially the inclusion of Google Scholar ensures that we cover with near certainty the entirety of the academic literature. On the other hand, it necessitates care in the selection of documents as Google Scholar also includes unpublished reports and other forms of 'grey literature' (see sub-section 3.2). That is why we found it useful to include also digital libraries that mainly or exclusively contain journals, proceedings, and books for which peer review is assured. These other libraries might compensate for flaws in the search engine of Google Scholar or can be used to verify if a certain document found with Google Scholar was likely to be peer-reviewed.
Relevant sources were then searched using the search string ''user story OR user stories,'' which was applied by the digital libraries' search engines to the title, abstract or keywords of indexed documents, or any combination of these, depending on the search engine's functionality. We limited our search to documents published since 2001, which is the year of publication of the Agile Manifesto. To further limit the search to the appropriate ASD context, we concatenated another search string with names or abbreviations of Agile methodologies that prescribe or suggest the use of user stories (or artefacts like user stories). After some studying, we found out that user stories play a role in documenting requirements in several ASD methodologies that are well known and widely used: Scrum, Extreme Programming (XP), Scaled Agile Framework (SAFe), Behavior-Driven Development (BDD) and Feature-Driven Development (FDD). We explicitly included the BDD and FDD methods because they extend user stories (i.e., test scenarios in BDD) or offer an alternative to user stories (i.e., features in FDD), so it is plausible that papers reporting on research related to these ASD methodologies, also investigate the user story artifact. We also found out that hybrid ASD methodologies (e.g., Kanban and Scrum, RUP and XP) employ the user story technique, although not always in a primary role. The full search string we eventually used was: Prior to the selection of this search string, different combinations of search terms were constructed using the PICOC method [55]. Those combinations were then tested as queries in the search engines of the selected digital libraries. The results were compared to select the search string that resulted in the most comprehensive number of studies returned.
We also deliberately added ''agile'' to the second term in the concatenation, to cover for documents that do not explicitly mention a particular ASD methodology in their title, abstract or keywords. As a drawback, we noticed that the search engines returned several documents that accidently mention the word ''agile'' and consider some concept of user story in a different context than ASD. We addressed this drawback through our choice of inclusion criteria that select papers based on relevance (see sub-section 3.2).

B. STUDY SELECTION 2
Next, inclusion and exclusion criteria were defined for deciding which documents returned by the search engines were relevant (i.e., inclusion) and of sufficient quality (i.e., exclusion) to be further considered. Given the high number of documents returned (n = 597), these criteria were manually applied by the corresponding author only. Fig. 1 summarizes the selection process. We first applied the following exclusion criteria: • Text in English: Documents in other languages were immediately excluded.
• Peer-reviewed publications: We require that documents are published in journals, proceedings, or books for which we may reasonably assume or in case of doubt (e.g., retrieved via Google Scholar) can verify that they use peer review. This excludes unpublished research reports and 'grey literature' (e.g., practitioner guidelines, opinion articles, company white papers) as we wished to be assured to a reasonable extent of their quality, as independently assessed through the peer-review process. As we wished to map research studies only, editorials or introductions to special issues in academic outlets were also excluded.
• Secondary studies: Research review studies (e.g., the literature studies mentioned in Section 2) were excluded as we were looking for primary studies only. This criterion was easy to apply by inspecting the document's title and abstract. For deciding on the inclusion of the documents that were not excluded by applying the former criteria, we then used the following criteria for deciding on a document's relevance, which were applied by inspecting each document's abstract: • Agile Software Development (ASD) context: After some trials, we learned that the concept of user story is also used in other domains (e.g., healthcare, movies). The term ''agile'' may also appear in these other contexts, which is why such documents were returned. A document is only relevant if the research context is software development.
• Focus on user stories: Documents could have been returned by the search engines just because the term ''user story'' was mentioned. We consider a document as relevant if, based on the abstract, the document reports on research that has the user story technique (or its use) as object of study. Next, we applied again exclusion criteria, in the following order: • Full text is not available: For documents that were, according to the inclusion criteria, relevant based on their abstract, but for which we could not access the full text, we contacted the authors to obtain the full text through academic social networks such as Research-Gate, which often succeeded, but not always. That is the reason why we applied this exclusion criterion in the latest phase of the selection as for investigating our research questions, we reckoned that the abstract alone might not be sufficient.
• Completeness: To be able to classify studies (see subsection 3.3), the documents must clearly state the research problem or issue investigated, the research question(s)/objective(s), the research methodology, and the research outcomes. If this information could not be retrieved from the abstract, we searched for it in the rest of the document. The search was performed on the different databases iteratively, with the last run in December 2021. Duplicates returned by more than one search engine were eliminated. In total, 597 documents were automatically extracted by running the search query. This set contained 172 duplicates. The other 425 documents were submitted to our selection criteria.
In total, 160 papers (i.e., unique research studies relevant to our mapping study) were selected. The other documents were eliminated on the grounds of 'grey literature' (41 documents), non-English papers (1 document), full-text not available to us (55 documents), no focus on user stories as object of study in an ASD context (154 documents), and secondary studies (14 documents). No documents were rejected for reasons of incompleteness, which might be explained by the prior exclusion of non-peer reviewed publications.
Next, a limited snowballing process was applied to search for additional studies that were missed by our search strategy. To identify such papers, we analyzed the fourteen research reviews. We also searched for other relevant papers, not yet in our set of documents, published by (the few) authors that had more than one publication in our document set. All candidate papers were submitted to our inclusion and exclusion criteria. This snowballing process yielded 26 additional papers. Ultimately, 186 studies were selected as relevant for the literature mapping (see Table 3 in the Appendix). These studies were then classified and analyzed to investigate the RQs.

C. CLASSIFICATION SCHEMA
To map the selected studies, we constructed a multidimensional classification schema, covering research area, research problem, research outcome, research type, and publication type, where each dimension corresponds to a RQ (see Section 1). As no significant patterns were found in author data (e.g., frequent authors and affiliations), we decided to exclude demographic data from our classification schema.
For two classification dimensions (i.e., research area, type of research), the initial values were pre-defined based on existing classifications schemes. For two other dimensions (i.e., research problem, research outcome), the classes emerged during the classification itself by grouping similar papers.
The classification of the 186 papers was done by the corresponding author based on the contents of the full paper. Any doubts were discussed with the second author.

1) RESEARCH AREA
User stories are used as RE artifact throughout the project lifecycle. We noticed that research on user stories assumes a certain context of use of the user story technique or the user stories themselves. This context relates to the RE activity in which the user story technique was used or for which the user stories were used. We accordingly classify studies in three distinct research areas (or contexts) following the main groups of activities in the RE process as identified in [56]: requirements elicitation and documentation, requirements analysis and negotiation, and requirements validation and management. These areas are distinct from each other, while being broad enough to allow for a more fine-grained classification within each research area. VOLUME 10, 2022 • Requirements elicitation and documentation activities collect the requirements from stakeholders through means like discussions, interviews, and workshops. The process might also involve the use of models, such as a domain ontology, to facilitate understanding among stakeholders. The requirements are then adjusted into the user story format to establish a common basis for planning the software development process and facilitating a common understanding of what features should be developed.
• Requirements analysis and negotiation activities focus on analyzing the requirements documented in user stories to further specify and refine them or to support project management activities like project planning (e.g., prioritizing user stories or using user stories as a basis for effort and cost estimation). Analysis may involve specifying, representing, or visualizing user stories using different types of models. Also included are project management activities related to identifying and resolving prioritization and estimation conflicts caused by different perceptions of the requirements' importance and required development effort. Negotiations are conducted to achieve compromises that please all stakeholders to the greatest possible extent.
• Requirements validation and management activities refer to requirements testing and requirements (change) management processes, with related techniques, tools, and assessments. The requirements validation process involves the use of test-case techniques, acceptance tests, and other technical reviews to ensure that the requirements continually meet stakeholders' expectations. Meanwhile, requirements management deals with assessing which requirements are affected by changes to other requirements and ensuring that the documented requirements have been addressed during system design. Requirements management also concerns the monitoring of the implementation of the requirements.
This classification also shows that user stories were studied both as an artifact to document the output of RE activities (i.e., elicitation and documentation research area) and as an input to RE activities (i.e., analysis and negotiation and validation and management research areas).

2) RESEARCH PROBLEM
This classification dimension emerged during the analysis of the selected studies and resulted from a synthesis of the problems with the user story technique addressed by these studies. We finally settled down to the following classes of problems: • Ambiguity refers to problems regarding the articulation of requirements as user stories, which may cause doubtful, imprecise and multiple interpretations of the requirements [34]. These problems are typically caused by different uses of language to express requirements, limitations of the user story template, and differences in application domain knowledge and experience.
• Collaboration is a class of problems that refer to a lack of effective collaboration within the project team and between project stakeholders and that can be traced down to user stories (e.g., lack of user story validation, conflicts of interest, non-participation of stakeholders). In contrast to ambiguity, collaboration problems are not related to the application of the user story technique itself, but to the use of user stories during the project as a mechanism for facilitating communication and collaboration [34].
• System design accentuates problems related to dependencies amongst user stories, the complexity and accuracy of the requirements articulated as user stories, and the conformance of the system architecture to the user stories. Studies in this class emphasize the importance of having high-quality user stories to provide a basis for designing a reliable, flexible, adaptive, and responsive system which conforms to the requirements. In this class, papers are positioned that address problems related to the impact that user stories have on the quality of the system and its development [34].

3) RESEARCH OUTCOME
The values of this dimension were not pre-defined but emerged during the classification of the results and findings of the reviewed studies. We synthesized the research outcomes of the reviewed studies in six classes: • Description: Research that observes the use of user stories in organizations, via surveys, questionnaires, or other observational methods. Research outcomes are findings resulting from an analysis of the observed practice of the user story technique.
• Explanation: This type of outcome confirms or rejects hypotheses related to the use of user stories. The explanation offers a confirmation or rebuttal of a hypothesized relationship between on the one hand the use of user stories or one or more properties of these user stories and on the other hand variables of interest (e.g., requirements understanding).
• Algorithm: This type of outcome comprises solutions that take the form of a prescription of a series of computational operations to be performed on user stories. Examples include algorithms for similarity checking, effort estimation and requirements prioritization.
• Model: This class was used for proposed solutions that use graphical models for understanding the interaction among user stories and analyzing dependencies between user stories. The models that are proposed are usually those used in RE in the context of more traditional software development methodologies (e.g., goal models, process models, use case diagrams, class diagrams), but new types of models or information visualizations are also included in this class.
• Prototype: Research that presents tools -typically, prototypes used in research or laboratory environments that have not yet been commercialized -for supporting the management of user stories.
• Framework: While the previous solutions focus on ranking or performing calculations on/with user stories (i.e., algorithms), visualizing collections of user stories (i.e., models) and managing such collections (i.e., prototypes), a wide variety of solutions were proposed for improving user story writing. Any artifact that is proposed as an instrument to help with user story writing (e.g., ontologies, taxonomies, sentence patterns, glossaries, templates), was classified as framework.

4) RESEARCH TYPE
With research type, we wish to capture the extent to which research outcomes are supported by empirical evidence.
Has feasibility been shown? Was the solution validated in a setting created by the researcher? Have problems been conceived based on research gaps or have they been observed in practice? Have proposed solutions been implemented and has their performance in practice been observed? Guided by these questions, our mapping study classified user story studies into three groups: proposed-of-solution, validation research and evaluation research. This classification is similar to the classification or RE papers by Wieringa et al. [57].
• Proposed-of-solution: This type of study elucidates a proposed solution (i.e., framework, algorithm, model, prototype) for a user story technique related problem, using an example or proof of concept, but without a proper validation or evaluation. Justification of the solution is obtained by comparing the proposed solution with related work or by demonstrating that the solution works (i.e., a proof of concept).
• Validation research: This type of study not just illustrates or demonstrates a proposed solution, but also validates the solution design or tests a hypothesis about the solution through an experiment or simulation study. Quantitative analysis of the experimental or simulation data is used to test the utility, quality, effectiveness, or efficiency of the solution.
• Evaluation research: This category of studies evaluates the utility, quality, effectiveness, or efficiency of a solution by observing its implementation in the real-world. Also, research that describes practices or problems or investigates hypothesized phenomena related to the use of the user story technique in practice, belongs to this class.

5) PUBLICATION TYPE
The identification of (the type of) publication venue is essential to identify at which academic level user story research has been acknowledged. This classification is also vital to locate scientific events where extensive knowledge can be gained and relevant feedback on user story research can be obtained. Here, we classified research publications into three types: journal articles, conference proceedings papers, and book chapters. We also investigated when the different studies were published to see if any trends in the research can be discerned.

IV. RESULTS
A. (RQ1) WHAT RESEARCH AREAS RELATED TO THE USER STORY TECHNIQUE CAN BE IDENTIFIED? Figure 2 shows the absolute and relative numbers of papers classified per RE activity. Apart from classifying the selected studies into three distinct research areas related to broad classes of RE activities [56], they were also classified more granular per activity within each area, following the definition of RE activities in [58]. This meant that for the requirements analysis and negotiation research area, the papers that were classified in that area, studied user stories as a means to further specify system requirements, as a basis for prioritizing requirements and as a basis for estimating project resources, including time, budget, and effort. The other two research areas were decomposed into their defining constituents.
Observing Figure 2, more than half of the selected studies investigated user stories in the context of requirements analysis and negotiation activities (53%, 98 documents). Surprisingly, despite being widely prescribed or suggested as requirements documentation artifact in the main Agile methodologies, the user story technique has been researched much less in relation to requirements elicitation By performing a thematic analysis, we identified twenty-two unique issues of interest (briefly, problems) that were discussed in our set of 186 papers. We aggregated these problems into the problem classes ambiguity, collaboration, and system design, as explained in sub-section 3.3.2. The distribution of studies amongst these classes is 23% (42 documents) for ambiguity, 26% (48 documents) for collaboration, and 52% (96 documents) for system design. By combining this classification with the classification according to research area, we obtain Figure 3 and Table 1.
Four unique problems were identified that could be classified as related to ambiguity. Ambiguity problems arise due to user stories having doubtful, imprecise, or multiple interpretations. Studies focusing on the vagueness of requirements formulated as user stories have investigated different sources of ambiguity that (potentially) result in interpretation problems. Other studies have investigated ambiguity problems as a consequence of multiple or uncertain interpretations of user stories. Three problems were distinguished: user stories being understood as inconsistent; user stories being perceived as insufficiently describing requirements (regarding completeness and precision); and user stories being judged as duplicating functionality [34]. These problems have mostly been situated in requirements elicitation and documentation (64%) as during these RE activities, the requirements are elicited, elucidated, and written as user stories. Other studies assume the context of requirements analysis and negotiation (36%), as it is during these activities that interpretation problems surface (e.g., during an analysis of the consistency and completeness of a set of user stories). No studies were found that situate ambiguity problems related to user stories in requirements validation and management activities.
Studies focusing on collaboration problems related to the use of user stories, have investigated the role of user stories in human interaction during software development. The focus of these studies is on shortcomings (and solutions for these) that user stories have in facilitating communication and collaboration. Eight different problems have been identified in studies that assume a context of user stories use during requirements analysis and negotiation (40%) and requirements validation and management (44%) activities. Two of these problems, conflicts of interest between stakeholders and communication challenges, have also been investigated related to requirements elicitation and documentation (17%). Finally, regarding system design, studies have focused on problems with system development and the quality of  the resulting systems that can be traced back to (lack of) user story quality or shortcomings of methods that rely on high-quality user stories as input. Considering that most studies having been conducted in the context of requirements analysis and negotiation activities (67%), the main issues investigated are project management problems with resource estimation, planning, prioritization, and other types of analysis based on user stories. Studies have also addressed shortcomings of the user story technique in capturing security/privacy constraints and non-functional requirements.

C. (RQ3) WHAT KIND OF OUTCOMES HAVE BEEN ACHIEVED?
We grouped the research outcomes of the 186 selected papers in six classes that emerged during the classification.
The outcome classes Framework, Algorithm, Model, and Prototype are solution-oriented, whereas Description and Explanation intend to increase the understanding of observed user story related problems and practices or the impact of the use or quality of user stories on other variables relevant to ASD.
The distribution of studies amongst research outcome classes is, in decreasing order of frequency, Algorithm (39%, 72 documents), Model (17%, 31 documents), Framework (14%, 26 documents), Description (12%, 22 documents), Prototype (11%, 20 documents), and Explanation (8%, 15 documents). Figure 4 shows the absolute frequencies of the studies within each outcome class for the research area (right-hand side, confer RQ1) and type of problem addressed (left-hand side, confer RQ2). Table 2 groups all mapped papers by research outcome (and outcome classes) and further classifies them by research area and problem class. In what follows, we comment on the main insights obtained from this mapping of research outcomes.
We identified four patterns in the data. First, the most frequent outcome, Algorithm, is clearly overrepresented in the system design problem class (i.e., 67% of the studies that propose algorithmic solutions versus 50% of the studies classified as addressing system design problems) and slightly overrepresented for the requirements analysis and negotiation research area (i.e., 61% of the studies that propose algorithmic solutions versus 45% of the studies that assume requirements analysis and negotiation activities as research context). For system design classified problems and requirements analysis and negotiation activities, Algorithm is also the most frequent outcome class -for both double as frequent as any other kind of outcome. Inspecting the papers for those two partly overlapping 'bubbles' in Fig. 4 (i.e., 'bubbles' with sizes 48 and 44), we see that algorithms have mainly been proposed for system design type of problems related to project management activities such as effort/cost/time estimation, project planning optimization, and requirements/work prioritization. These problems are well-defined in terms of what the expected outcomes are and rely on requirements documented as user stories as input, where the quality of the results (e.g., estimation accuracy) strongly depends on the quality of the user stories as input. For these well-defined problems, algorithms are proposed as well-defined solutions. It is not surprising that machine learning techniques (e.g., NLP-based) were almost exclusively applied in the studies attributed to this pattern.
Second, studies proposing solutions classified as Model, target problems classified as system design or ambiguity related, where they are clearly overrepresented for the latter problem class (i.e., 52% of the studies that propose the use of models as solution versus 38% of the studies classified in the ambiguity problem class). These studies are also overrepresented in the requirements analysis and negotiation research area (i.e., 58% of the studies that propose the use of VOLUME 10, 2022 models versus 18% of the studies that assume requirements analysis and negotiation activities as research context). For ambiguity related problems, Model is also the most frequent type of outcome/solution (i.e., respectively 38% of outcomes and 43% of solutions when excluding Description and Explanation outcome classes). A closer look at these studies learns that the use of models is particularly proposed to analyze inconsistencies in related sets of user stories (e.g., epics) that show up during requirements specification activities. By using models to visually depict related user stories, dependencies between user stories can be discovered and problems caused by such dependencies can be more easily diagnosed. Some studies classified in the Algorithm outcome class, also support this purpose as they present NLP-based algorithms to generate models from sets of user stories, but do not focus on the use of those models to address ambiguity problems. Not surprisingly, the types of models proposed for requirements elicitation and documentation are mostly conceptual models (e.g., goal models) whereas they are software models (e.g., UML diagrams) for requirements analysis and negotiation.
Third, studies classified as Description, are mostly classified in the collaboration problem class (54% compared to 25% of all studies) and the analysis and negotiation group of RE activities (59% compared to 13% of all studies). These studies focus on getting a deeper understanding of how user stories facilitate communication and collaboration within ASD project teams during system specification activities and what shortcomings have been observed with using user stories in these activities.
Fourth, the use of various types of solution artifact classified as Framework (see sub-section 3.2.3) is overrepresented in the ambiguity (46% compared to 23% of all studies) and system design (50% compared to 52% of all studies) problem classes. Solutions such as ontologies, taxonomies, glossaries, sentence patterns, controlled languages and user story template extensions have been proposed for improving user story writing (61% of studies in the Framework outcome class are classified in the requirements elicitation and documentation research area, where only 33% of all studies are situated). Studies in the Framework outcome class generally propose solutions for avoiding multiple interpretations (i.e., ambiguity) and improving the quality of requirements documented as user stories (i.e., system design).
As for the Prototype outcome class, we see no clear pattern in this class. We expected to find this type of outcome more frequently in studies that we classified as related to requirements management and validation activities. The relative scarcity of solutions for which working prototype software has been developed is an indication that many solutions are presented conceptually (e.g., as frameworks, algorithms, use of models), without being automated or supported by software, which may be a hindrance to their implementation in practice.
Finally, we observe for the Explanation outcome class, a scarcity of studies and no clear pattern of distribution over problem class and research area. The low number of studies investigating associations or causal relationships between the use and quality of user stories and other variables of interest in ASD might be explained by most papers being in engineering type of journals and conferences, apart from some outlets related to human and cognitive aspects of software engineering or human-computer interaction -searching for explanations of phenomena is more common to the socialbehavioral sciences. Summary of proposed solutions or other research outcomes in the mapped literature (studies are identified by their reference in the bibliography at the end of this paper).

D. (RQ4) HOW HAS THE RESEARCH BEEN CONDUCTED?
The distribution of papers over research types provides an indication of the maturity level of the state-of-the-art in a research field [57]. Figure 5 shows that for the research on user stories, it ranges from proposed-of-solution (28%, 52 documents), which can be considered the lowest maturity level, over validation research (46%, 84 documents) to evaluation research (26%, 50 documents), which can be considered the highest maturity level. The research on user story ambiguity seems to be the least mature, with a large proportion of papers (40%) falling into the proposed-of-solution class. For user story research focusing on collaboration problems, the picture is different with 84% of the studies being of the validation or evaluation research types. For the system design class, 48% of the studies are of the validation research type and a further 24% are of the evaluation research type.
Other insights are obtained by mapping research type against both problem class and research outcome class ( Figure 6). The high proportion of validation research of the papers that present algorithms (57%) is not surprising as algorithms are typically tested using a quantitative analysis of performance attributes in benchmarking studies (on empirical data that is collected) or simulation studies (on data that is artificially generated) that are typically desk research studies in the computer lab. In contrast, validation research is underrepresented in the Model outcome class (23%). In the Framework outcome class it is also underrepresented (35%), except for the studies addressing ambiguity problems where 50% of the proposed frameworks solutions were subjected to validation.
Most studies proposing models are of type proposed-ofsolution (58%). This is particularly evident for papers that present modelling solutions for solving issues of ambiguity with user stories (10 out of 16 studies). Also, 50% of the papers proposing frameworks are of the proposed-of-solution type. There is also hardly any evaluation research for frameworks (15%) and models (19%). So, our data shows that solutions in the form of models and frameworks have been less validated or evaluated than algorithms and prototypes. By their nature, models and solution artifacts classified as frameworks might be harder to validate/evaluate. A closer look at the studies taught us that these solutions are proposed for less well-defined problems (e.g., compared to what algorithms are used for) and that the maturity of the solutions is therefore also less than for algorithms. This is different for prototypes where evaluation research accounts for 30% of the studies (compared to 26% overall). Also, validation research is with 50% of the studies well represented in this outcome class. Looking into the studies, we see that a prototype as a working software system, is relatively easy to test in a laboratory setting or to implement and evaluate in a real case-study. For studies classified as Description and Explanation, the most common research type is also evaluation research (respectively 50% and 60%), which is most evident for studies focusing on collaboration problems related to the use of user stories. The proposedof-solution type of research is per definition missing for these research outcomes. Regarding collaboration problems, problem investigation studies were undertaken to understand how user stories have been exploited to improve communication between project stakeholders and within developer teams. Evaluation research was also prominent in studies assessing the benefits and identifying the potential impact of recommended solutions for user story related collaboration problems.   increase in research on the user story technique in recent years. This increase can be observed for all publication types considered. Figure 8 shows that the earlier observed increase in research on the user story technique holds for each research type. This could indicate some potential for more validation and evaluation studies in the future, as also in the most recent period of our literature mapping, many studies were proposing and demonstrating solutions without validation or evaluation.

V. DISCUSSION
Our mapping study provides several insights into the state-ofthe-art of the research on the user story technique or the use of user stories in RE activities during ASD projects. Our study is the first systematic mapping of the academic literature on user stories. Based on the patterns we discovered in the mapping data, we identify research gaps and suggest research opportunities.

A. STATE-OF-THE-ART AND MATURITY OF THE RESEARCH
First, plotting the 186 mapped studies over time (RQ5) indicates that the momentum for research on ASD's user story artifact is not over. While ASD has been around for at least twenty years (counting from the publication of the Agile Manifesto), research on user stories is in general recent, with 78% of the mapped studies being published in the 2015-2021 period. Clearly, this research topic is 'young'. Out of the studies that present solutions to problems with user stories (RQ3), 35% proposed a solution without any kind of validation or evaluation (RQ4). This signifies that this research area can also further mature.
Next, regarding the outcomes of the studies, we noted that most are solution-oriented (80%). Only a minor part of the studies focusses on problem investigation or the explanation of observed phenomena (20%) (RQ3). Comparing this finding with the large share of engineering type of journals and conferences (e.g., IEEE, ACM) used for scholarly communication of the research (RQ5), we observe that mostly  Computer Science and Software Engineering researchers have been active in this area. The socio-behavioral research that is typical for (Management) Information Systems scholars is also present, but to a much lesser extent.
Our study of the kind of solutions proposed (RQ3), showed that algorithmic solutions, i.e., precise stepwise procedures for performing a task, is the most common type of solution and has typically been proposed to solve well-defined problems related to system design issues (RQ2) that show up during requirements analysis and negotiation activities (RQ1). Algorithms for supporting project management activities such as estimating project variables (e.g., effort, time, cost), optimizing project schedules and prioritizing work, taking user stories as input, are plentiful in this research area. Most of these algorithms have also been validated (RQ4).
For less well-defined problems such as user stories that suffer from vagueness (due to ambiguity in their formulation) and the negative consequences of possible imprecise or multiple interpretations of requirements documented as user stories, like inconsistency, insufficiency and duplication of functionality (RQ2), solutions have been proposed that involve the use of conceptual models or software models, or that employ different artifacts that help in user story writing (e.g., ontologies, taxonomies, glossaries, user story templates) (RQ3). While models help in understanding and analyzing relationships and dependencies between user stories, solution artifacts for user story writing aim at producing higher-quality inputs for system design activities by exploiting contextual information. In general, these types of solution have been less subjected to validation and evaluation than algorithms (RQ4).
Apart from problems of imprecise and multiple interpretations caused by ambiguity, and the impact that the quality of user stories has on system development and the quality of the resulting systems, a third class of problems investigated relates to human interaction during system development (RQ2). Although user stories have been introduced in ASD to facilitate communication with users and collaboration within project teams, several studies have highlighted problems with the use of user stories. There is no clear pattern on how such collaboration problems have been solved, maybe because of the predominant technical focus of the research that has looked more into how to improve the user story technique than into understanding how to make better use of user stories in RE activities (RQ3).
Related to this last observation, studies have mostly assumed a context of use of user stories in various RE activities related to estimation, prioritization, and system specification, where user stories are considered as an input to activities (RQ1). Only one quarter of the mapped papers have studied use stories in the context of RE activities in which they are used to elicit and document requirements, while the validation and management of requirements documented as user stories has been the least frequent context of research on user stories (RQ1). For validating, testing, and managing (changes of) user stories, relatively mature solutions in the form of algorithms and prototypes, typically validated or evaluated, have been developed (RQ3, RQ4).

B. RESEARCH GAPS
Based on this literature mapping of the academic research on user stories, we formulate the following four research gaps.
First, there is a lack of validation and evaluation research, as a considerable proportion of proposed solutions, even when published in peer-reviewed publication outlets, have not been tested in laboratory or real-world settings. This is particularly true for solutions proposing the use of models to analyze user story dependencies and for different kinds of solution artifacts (labeled in our study as frameworks) that assist in writing high-quality user stories. A thorough testing of proposed solutions not only increases the maturity of the research. It also facilitates the transfer of the research results to practice.
Second, while ambiguity in user stories is a known problem leading to interpretation problems and adverse consequences like inconsistencies in expectations regarding the system, duplication of required functionality and insufficiently specified requirements, these problems have received relatively little attention of researchers (only 23% of the mapped studies directly address these problems). Apart from algorithmic solutions, most of the proposed solutions for dealing with interpretation problems or mitigating ambiguity in user story formulation, are of the model and framework types, which have been less validated and evaluated. Furthermore, while user stories are created during requirements elicitation and documentation, only one quarter of the mapped studies assumed these RE activities as the context of the research. Our mapping shows that studies predominantly focus on the use of a given set of user stories in later RE activities. On the other hand, many studies that we classified as being related to system design, investigate RE activities that critically depend on high-quality user stories. Therefore, we conclude that there is a lack of thoroughly tested solutions for improving the unambiguous formulation of requirements as user stories. Our previous research review on ambiguity in user stories confirmed this research gap [34].
Third, the number of studies investigating the use of user stories, problems that arise with their use, and the consequences of these problems for the ASD process and its outcomes, is relatively small. Most of the research has focused on developing solutions for specific problems that are situated in rather narrowly defined ASD RE contexts. We have not found many studies in the mapped literature that investigate the impact of these solutions on the quality of the RE process in ASD projects. In particular, the impact of problems and proposed solutions on human cognition and interaction has been under-investigated. A deeper understanding of these issues could inform the predominantly engineeringoriented research on user stories towards developing better solutions.
Fourth, our literature study was a systematic mapping of the peer-reviewed academic research on user stories. Critical reviews of the research findings or meta-analysis of studies that focus on the same research problem are scarce (e.g., [50]- [52], [34]). A systematic literature review focusing on certain aspects of our literature mapping, e.g., the earlier discussed research gaps, may reveal more focused knowledge gaps.
Concluding, based on these four research gaps, we see research opportunities related to more focused literature reviews, the role that user stories play in human cognition and interaction (e.g., the user story as a mediating artifact or boundary object), resolving ambiguity issues with user stories in early stages of RE, and validation and evaluation of (earlier) proposed solutions.

C. LIMITATIONS 3
Despite rigorously following systematic literature mapping guidelines, our research has limitations that could threaten the validity of our study. These limitations have also been discussed in [34], but we repeat them here for the sake of completeness.
First and foremost, because of access limitations of the digital libraries of our institute, we excluded papers for which we could not inspect a full-text version. After removing from the initial search result, all duplicates, applying our exclusion and inclusion criteria, and adding the papers found through snowballing, a total of 241 relevant studies were identified, of which 55 (i.e., 23%) could not be obtained in full-text version. This is a relatively large proportion, even if we tried to obtain as many papers as possible by contacting the authors. On the other hand, the assumption that researchers have access to all academic literature is a limitation of the Systematic Literature Review methodology, even if it is not always acknowledged. Consequently, due to lack of reference, it is hard to evaluate the severeness of this limitation.
Second, we only mapped papers that were written in English. Given that we intended to map only the 'academic' literature, we do not believe this limitation is severe as the academic literature in Software Engineering is generally written in English. Not having mapped the 'grey literature' is not considered by us as a validity threat because this was a deliberate research design choice.
Third, the selection and initial classification of papers was performed by one (junior) researcher, while doubts and possible interpretation problems were discussed with a senior researcher. A larger team of researchers might reduce some of the inherent subjectivity in these processes, however, at the expense of an increased effort and time investment.

VI. CONCLUSION
Our study systematically mapped the academic research on the user story technique and the use of user stories for and in RE activities. We classified 186 studies on user stories according to the RE activities that were assumed as context for user story (technique) use, specifically to identify typical problems that have emerged during those activities.
We also investigated what kind of solutions have been proposed for these problems and whether these solutions have been validated or evaluated. We also plotted the studies of our mapping study in time and noticed a steady increase during the entire period covered by our literature study (2001 -2021) for all publication types, with a sharp peak during the last six years.
Our systematic search of the academic literature found only four literature studies that were devoted to research on user stories, even if the concept is well known, intrinsically connected to RE activities in ASD projects, and the use of the technique promoted by major ASD methodologies like Scrum, XP and SAFe. Despite the benefits that some studies demonstrated, problems with user stories have been reported, and solutions have been proposed. However, a clear view of the state-of-the-art was lacking as the previous literature studies were systematic literature reviews that reviewed a limited set of papers (24 to 38) for answering specific review research questions.
Our literature mapping thus contributes to providing an overview of the current research on user stories, with a focus on solving problems related to the use of user stories for and in RE activities. We offer novel insights into patterns we discerned in the research and the maturity of this research area. Based on research gaps identified, we suggest the following topics for future research: more focused literature reviews and meta-analyses to assess how far problems have really been solved, more validation and evaluation studies of the proposed solutions, more research addressing problems related to the correct interpretation of requirements documented as user stories, and more research on the use of user stories to facilitate human cognition and interaction within ASD projects. Table 3 for the literature mapping: