A Novel Case Base Reasoning and Frequent Pattern Based Decision Support System for Mitigating Software Risk Factors

Software risk management is crucial for the success of software project development. The existing literature has models for risk management, but is too complex to be used in practice. The information in the existing studies is scattered over different articles which makes it difficult to find relevant knowledge to establish relationship between risk factors and mitigations. This paper presents a novel model which identifies the relationship between risk factors and mitigations automatically by using intelligent Decision Support System (DSS). The proposed model has four steps. Firstly, the input of the system has been designed where risk factors and mitigations have been inputted into it. Secondly, rule based machine learning approach has been used for mining of associations between risks and mitigations. Thirdly, Case Based Reasoning (CBR) approach has been used to determine the previous cases as rules. Finally, automated rules have been generated to develop an intelligent DSS to mitigate the software risks. The proposed technique copes with the highly cited existing limitations of risk handling like, lack of generic DSS and intelligent relationship between software risks and mitigations. Automated rules have been discovered with a novel idea of CBR and frequent pattern. The proposed model is capable of mitigating upcoming risks in future. Star schema has been implemented to support our proposed DSS. Moreover, from highly cited literature 40 studies were identified from which 26 risk factors, 57 mitigations, 14 questions and 26 automated rules have been extracted. According to the validation of IT industry experts, the average of the effectiveness of DSS is 51-55%. The novelty of the proposed research is that it uses two state of the art methods (Rule Based Machine Learning and CBR) to identify software risk mitigations. The results of the proposed model show that the chances of risks in software development have been reduced significantly.


I. INTRODUCTION
Software projects fail due to various different types of risk factors, namely certain, uncertain, dependent and independent [1], [2]. Risk factors force project team to compromise on the software project scope, objectives, planning, scheduling, budget and execution. Numerous factors affect the whole software development process, such as poor decision of project managers, inappropriate hiring of employees by a human resource department, etc. [1].
Software risk management is very crucial for the success of project development [3]- [5]. The main problem for the software industry is the risks which cannot be ignored during The associate editor coordinating the review of this manuscript and approving it for publication was Justin Zhang . the software development phases [6]. Software projects are very complex by nature and therefore, liable to failure due to this reason. However, many researchers have made efforts to minimize the failure of the software projects. In 1995, United State of America (USA) spends around $250B on 175,000 projects, but a good of them were failed or did not achieve the required results. One of the major reasons for these failures was the wrong assessment and mismanagement by the project managers [7]. There are many other statistics, where software projects are reported to fail due wrong decision at different level of the project development. It can be observed from the current literature that there is no formal DSS available that can be used in software development to support software developers to avoid such kind of failures. Incorporating intelligence DSS to the process becomes imperative. Therefore, this research proposes an intelligent system to bridge a gap between project manager's decisions and software risks. This DSS will work on the basis of four steps. First, Domain Experts or literature supplies their knowledge in the shape of software risk factors, along with mitigations into the Knowledge Base (KB) [8], [21]. In this regard an exploratory survey has also been included in this research [24]. Second, Rule Based Machine Learning approach known as the Association Rule Learning is used for mining associations. Eclat algorithm is used to find associations between these risks and their mitigations in the form of frequent patterns and rules. Third, CBR approach is used to determine the previous cases in the form of rules [9], [10]. Conclusively, the intelligent decision net is established for risk factors and risk mitigation.
The scope of this research work is, risk factors and risk mitigations of Risk Mitigation, Risk Monitoring and Risk Management (RMMM) plan i.e. Risk Factors → Risk Mitigation.
This paper is divided into various sections. Section II presents the contribution. Section III discusses the review of related work. Section IV describes the proposed design of DSS. Section V shows the prototype development. Section VI explains the validation of the proposed model. The conclusion is presented in the final section VII.

II. CONTRIBUTION
The contributions of this research work are as follows: 1. The proposed technique copes with the highly cited existing limitations of risk handling like, lack of generic DSS and intelligent relationship between software risk factors and mitigations [1], [2]- [10], [19], [25]- [29], [36], [37]. 2. The intelligent relationship between risks and mitigations, in the form of rules, has been discovered by using a novel idea of CBR and frequent pattern base [8], [9]. 3. The proposed model is capable of mitigating upcoming risks in the future, based on knowledge discovery which also learns the new cases [37], [25]. 4. A total no. of 26 highly cited risk factors and 57 risk mitigations have been identified from the existing literature and our proposed DSS mitigate these risk factors [8], [24]. 5. A star schema structure has been implemented to support our proposed DSS for knowledge discovery.

III. RELATED WORK
This section discusses the latest literature on risk factors, risk mitigations and related issues. In [11], a Knowledge Management (KM) in software engineering has been discussed. This research paper describes the levels of knowledge, i.e. Data (raw facts and figures), Information (processed form of data), Knowledge (Meta Information) and the importance of KM in software engineering. This paper focuses on KM to make better decisions, decrease time and cost, increase quality of the product and prevent human errors. It also reveals that KM is a prevention and mitigation strategy for risks. It represents the flow of information through Experience Factory Organization (EFO). It has three phases. a) Package experience by analysis, evaluating and synthesizing raw facts and model building. b) Making an experience base or KB of data, models and experience. c) Identifying and using the relevant experience of previous projects for the current project.
In [12], a DSS for software project management introduces the hybrid approach to the problem because not a single process model is best suitable for all processes. It combines three different models, i.e. analytical model, discrete model and continuous model. This hybrid structure provides both qualitative and quantitative suggestions for better results of software processes. But this research work has been unable to propose a single generic DSS model.
In [13], a risk assessment of software projects using fuzzy inference system introduces the fuzzy rule based system. Approximately 17 million fuzzy rules have been introduced for better comparison of different projects or risk mitigations, but instead of creating the whole rule based or KB, this research work uses a heuristic approach for inference.
In [6], Software Engineering Body of Knowledge (SWEBOK) explains the risk management guidelines. As per a Delphi study [14], 53 risk factors, e.g., project size, team size, scope creeps, etc. from managers were collected from Hong Kong, Finland and United States. This study adopted three phases, i.e. brain storming, narrowing down and ranking of risks. Software risk management methodologies, discussed in the technical report of the Software Engineering Institute to manage the risks during development and risk mitigation, can be achieved through the risk mitigation strategy and risk planning in [15].
In [16], a risk-based decision support has been based on space programs with risks. The main focus of this research is reduction of technical, cost and schedule risks. Total risk architecture has been developed to improve the quality of decision making for mitigating risks. This risk architecture collects risk drivers into project risks, supportability risks and mission risks with decisions to accept initialization the feedback loop. If risks are not acceptable, then it reconsiders the re-plan of mitigation, avoidance strategies and control paradigm. However, if risks are acceptable, then it evaluates the fault tree and processes the risks.
In [17], a goal-oriented framework of risk analysis has been proposed. It has proposed the optimal solution in terms of multiple objects, for example, risk factors, cost of treatment, rewards on goals, etc. This model or framework helps in risk analysis and reduction of chances of risks. A three layer approach has been followed in this work. 1) Asset Layer. 2) Event Layer and 3) Treatment layer.
In [18], a risk oriented model to assess strategic decisions on new product development projects has been proposed. This work makes use of a decision tree for handling the risks, such as, increasing the budget and delays in project delivery. VOLUME 8, 2020 In [19], a conceptual framework for Knowledge Based Risk Management and processes has been proposed. It proposed elements to build a Knowledge Base Risk Management (KBRM) framework for Information Technology (IT) projects. It also suggests the merger of knowledge management and risk management process for the improvement of Risk Response Planning (RRP).
In [20], the major focus is on the phases of risk identifications and risk assessment. It follows three steps of flow: a) risk identification, b) risk assessment and c) risk treatment or controlling. This research conducted a case study. And the extensively explained through this case study.
In [25], risk management in Distributed Software Development (DSD) has been presented. A DSS has been developed for practitioners in order to assess risks and choose suitable control strategies. For the construction of a knowledge base, a systematic literature review has been conducted.
In [26], risks and opportunities of software-as-a-service have been explored from a survey of IT executives. They have introduced a research model on the basis of the opportunityrisk framework. Data from a survey of 349 IT executives at various German companies has been collected and analyzed. Some prominent factors have been found such as security threats, cost, performance, economic risks, etc.
In [27], Wallace's work has focused on five risk factors to measure software risks by using fuzzy logic. The factors are Team, Planning, Complexity, Requirements and User. This work proposes a framework for the risk assessment and management. The total no. of 243 fuzzy rules have been generated using this framework.
In [28], a study has been conducted with automated search for risks and risk mitigations in global software development. They have extracted 85 risks and 77 risk mitigations and then, categorized them into four parts, i.e. software development, outsourcing rationale, project management and human resource.
In [19], an empirical study has been conducted for the software development risk management. A total number of 145 software projects have been investigated and a model for software risk management has been proposed. The final survey conducted in this study has 78 questions.
In [29], problems and solutions of global software development and collaboration have been discussed. The important barriers are cultural, temporal, geographic and linguistic distances. These barriers are overcome by building knowledge sharing infrastructure, synchronous communication and frequent site visits.
In [30], a tertiary study has been conducted. The main objective is to conduct a Systematic Literature Review (SLR) in distributed software development. Out of fourteen SLRs, three are related to software design, software engineering education and requirements. Four topics are related to the engineering process. Seven SLRs are related to management of distributed development. The main aim of this study is to identify the challenges of distributed software development and finding their solutions.
In [31], the use of Global Software Engineering (GSE) jargon has been investigated. For identification of the problem, a Delphi-inspired study has been conducted with ten researchers in global software engineering. They developed an empirically based glossary for the important concepts in global software engineering. Then they developed a taxonomy for global software engineering based on generalization-specialization relationships. It is used to map and categorize the existing knowledge.
In [32], a study of the use of agile practices in global software engineering has been conducted. This study focuses on ten years' research papers from 1999-2009. Then, these papers have been classified into research type and contribution.
In [33], a systematic literature review has been conducted to identify challenges and solutions in distributed software development projects. An evidence-based distributed software development model has been proposed in it. A total no. of 54 related works have been studied. These were published from 1998 to 2009.
In [34], a global software engineering knowledge management approach for intensive risk mitigation has been proposed. The four step approach is focused, i.e. a knowledge management, satisfiability of goal, requirement and maintenance of the requirement. An agile management system using knowledge management was discussed in this work. They have proposed architecture based and algorithmic development based approach for the prediction of software risks and their mitigation. The articles of [25], [35]- [37] have also discussed about decision support system in risk Assessment, risk management, challenges and risk analytics respectively. KB = Knowledge Base. The mathematical algorithm of the proposed DSS engine design is given below.
After review of the related work, it has been concluded that there should be a compact rule based DSS for creating a link between software risk factors and mitigations using RBS and CBR with data mining techniques.

IV. PROPOSED DECISION SUPPORT SYSTEM ENGINE DESIGN
A DSS Engine algorithm and mathematical proof are proposed given below respectively. Domain Experts enter software risk factors and mitigations into the KB. It then searches for existing rules stored in the KB. If rule(s) are found, then DSS engine executes these rule(s) and an association among risk factors and risk mitigations is generated. Otherwise, a) It prioritizes the risk factors in relation with risk mitigations which are already stored in the KB. b) It makes new relations of the entered risk factors with risk mitigations. These relations are created through input from the domain experts as well as from the existing literature. c) An association rule mining technique, Eclat algorithm is used to find frequent patterns and then CBR approach is used to determine new or updated rules. It also matches with previous cases of existing rules. d) These relationships are then added as rules in the KB.
e) The new rules extracted from the KB are then executed. f) It then generates the intelligent risk mitigation decision rules.
In CBR, a new case of risk factor and its mitigation comes as a problem statement. Case Base contains learned cases or solved cases. If a case exists, then it is used as a proposed solution in the form of a solved case. Otherwise, this rule will be revised and tested to be retained in Case of risks and mitigations in Step I. Generation of automated rules in step II. In step III a star schema for data warehouse. Eclat algorithm is implemented in step IV. And CBR in step V. The terms used in the below mathematical algorithm are as under. Step 1: BEGIN Step 2: Input 'X' Step 3: Select 'Y' Step

A. STEP I: IDENTIFICATION OF RISK FACTORS AND MITIGATIONS
A survey was conducted to get information about risks and mitigations from different users, including software practitioners and developers. On the basis of their feedback, ratings have been assigned to risk factors in terms of percentage. A total number of 20 software risk factors and s50 risk mitigations have been identified which is mentioned below. The survey has 20 software risk factors and 50 risk mitigations mentioned below. These software risk factors and mitigations have been extracted from [8], [24]. To enhance the research work, the researcher also collected 6 risk factors and 7 mitigations from [21]. So, there are a total number of 26 software risk factors and 57 risk mitigations in the survey. The survey questions are also available in Appendix 'A'. All of these risk factors and risk mitigations have been listed below in Table 1, 2, 3 and 4, along with their brief descriptions.

B. STEP II: IDENTIFICATION OF RULES IN THE FORM OF INTELLIGENT RELATIONSHIP OF RISK FACTORS AND MITIGATIONS
First 20 rules have been identified from [1], [3] and [8], whereas the rules from 21 to 26 are from [21] in this research. These rules act as risk control techniques against software risks. Some of the rules have also been updated from [21], such as, M6 and M45 have been added in Rule 3, M53 has been added in Rule 5, M54 has been added in Rule 8 and M34 has been added in Rule 12. These new additions of mitigations have been formatted as bold, shown below. So, the KB has been improved from the review of literature of [21]. The rules are the conjunctions of risk mitigations. Risk mitigations against risk factors are entered as input from the domain engineers. Software risk factors are the antecedent part of the rules and Software risk mitigations are the consequent part of the rules. The arrow mentioned below is the implication between antecedent and consequent.    fact column. Risk Avoidance for a particular factor, for a particular risk mitigation, for a particular factor associated    Table basically contains the above mentioned 26 rules. This data warehouse is useful for composing data into one corporate database. After this, data mining extracts eloquent data from that common database and a star schema, making Dimensions and Fact tables are also presented in Table 5.

D. STEP IV: DRY RUN OF éclat ALGORITHM
Dry Run of Eclat algorithm: Three step approach [22], [23]. STEP 1) Calculate minimum support, the candidate and frequent item sets generations.
By taking 10% minimum support(S) for Eclat, we get the answer 03 after applying the formula. Total items in step I of section IV is 26. As a result, only those items are selected in the frequent items which have minimum 03 occurrences in step I of section IV.
10/100 * 26 = 03. (1) On the basis of 10% minimum support, the candidates and frequent items are generated. Candidate generation is a move where frequent subsets are stretched (one item at a time) and candidates are tested with the actual data. In the first step, 56 candidates are generated from M1 to M50.    The DSS prototype in Fig. 1 is developed in the C# language by using Microsoft Visual Studio and SQL server. Software risks factors and Software risk mitigations can be added by Domain Engineers such as Factor Id, Factor Name, Factor Description, Mitigation Id, Mitigation Name and Mitigation Description. By clicking ADD button, they can be entered into the Knowledge Base. Then, the DSS shows rules and frequent patterns in the prototype software. For example, the size of the project is a risk factor and can be mitigated through clear idea of the requirements from customers, by using proper and exhaustive testing techniques, goals and objectives must be defined, etc. The frequent patterns of risk mitigations are also generated using Eclat algorithm automatically, i.e. Clear-Idea-of-the-requirements (M1), Define-Goals-and-Objectives (M12). You can also add relationships by using risk Dependency and severity of risk from the dropdown lists. Searching option is also available in the prototype for the convenience of users. You can search by Risk Factors or Risk Mitigations.

VI. VALIDATION
This DSS tool and its rules have been validated by three experts of different IT companies based in Islamabad Pakistan. Some of the past projects have been checked   proposed prototype. These rules are helpful to mitigate risks in those projects. So they have calculated the percentage of effectiveness by the percentage formula.
Effectiveness% = (Estimated resolved issues/ Estimated issues occurred) * 100 (4) Average Effectiveness% = (Expert1% +Expert2% + Expert3%)/3 (5) A Prospective validation was performed for the decision rules as these are individually evaluated on the basis of past experience of the individuals. The evaluation of prospective case study was carried out in two phases. The first phase involved the participation of the three expert individuals from three different companies. In the second phase, the tool has been used by the same experts. First expert, Mr. Abdullah Abbasi (Software Quality Assurance Engineer, DPLIT) tested the rules on the past tested projects, including management information system and electronic medical record. He then recommended the rules 4, 5, 6, 9, 10, 3, 14 and 17 with the effectiveness of 48-50%. Second expert, Mr. Mudassar Khan, MTBC) tested the rules in the past tested project rule based system. He then recommended the rules 4,7,8,12,13,14,18,19 and 20 with the effectiveness of 55-60%. Third expert, Mr. Muhammad Imran (Manager CU Online department, COMSATS) tested the rules on the past tested projects, including automated calling unit and Electronic data interchange. He then recommended the rules 1, 2, 3, 11, 14, 15 and 16 with the effectiveness of 50-55%.
So, the effectiveness of these rules can reduce the risk factors of software development failures. The screenshots of the validation letters from the three experts are given below in Fig. 2, Fig. 3 and Fig. 4 respectively.

VII. CONCLUSION
In this study, an intelligent relationship between risk factors and mitigations, in the form of rules, has been presented by using a novel idea of CBR and frequent pattern base. Risk factors and mitigations are the key parameters of this new study. The present study has a number of advantages. Firstly, the new way of dealing with DSS is based upon the highly cited limitations of risk handling like, lack of generic DSS and intelligent relationship of risks and mitigations. Secondly, it is dealing with the individual software risk and mitigation. The KB will increase with time in the form of relationships between risks and mitigations. This means that the proposed model is capable of mitigating future occurring risks based on knowledge discovery. Thirdly, this study presents an idea of a knowledge discovery based DSS, which has a compact KB of the link between risks and mitigations. Fourthly, Rule based machine learning is conceptually a type of RBS, and it is a technique of Artificial Intelligence. So, a total no. of 26 rules in the form of 26 risk factors and 57 mitigations has been identified in this study. Eclat algorithm has some advantages: a) It uses depth-first search technique, which reduces memory requirement. b) It is faster than Apriori algorithm. c) There is no need to scan database each time. d) It follows the vertical data format. On the other hand, Apriori and Fp-Tree algorithms are using horizontal data format.
The contributions of this research work are: (1) The dataset has been extended, as six new risk factors and seven risk mitigations have been introduced along with the previously existing data. (2) Rule based machine learning approach (association rule mining) has been used i.e., Eclat algorithm.
(3) Case Based Reasoning (CBR) technique has been used in the work for finding new and previous cases. It provides better reasons for decision making. (4) A star schema for data warehouse has been proposed for data mining. (5) A prototype has been developed and prospective validation has been performed.
During this research, there were some threats to the validity of the proposed approach. First, Whether or not the case base reasoning and frequent pattern base technique lead to a successful decision support system. Second, how this work will be validated to prove the effectiveness of the proposed model. Both the threats are overcome successfully. The section IV proposed DSS engine design and section VI validation, proves the validity of the proposed approach.
The future studies need to focus on risk monitoring and risk management of RMMM plan. Q13: Rank the Software risk factors according to their severity.
Q14: Please link the following risk mitigations with the risk factors, according to your experience.