Evaluation of Critical Thinking in Online Software Engineering Teaching: A Systematic Mapping Study

Critical thinking consists in analysing and evaluating the coherence of reasoning. This ability is crucial when we talk about software quality (SQ). SQ is closely related with the engineer’s ability to judge and discriminate between solutions correctly, so students are required to analyse, evaluate and draw conclusions. Critical thinking, therefore, becomes a crucial part of the training of software engineers. The problem arises from the diversity of proposals and the lack of rigour in existing experiences, making it difficult to find specific recommendations, especially in online contexts. This article reports a systematic mapping study (SMS), the purpose of which was to detect, organise and characterise specific dimensions in online teaching-learning of critical thinking for software engineering. Based on the results of the SMS, we propose a preliminary framework for the evaluation of critical thinking in the training of software engineers in a context of online higher education. It is expected that this proposal will serve as a basis for instructors of the discipline when evaluating critical thinking in a context of online teaching.


I. INTRODUCTION
The ubiquity of information has resulted in an unprecedented dependence on technology in our society [1]. As a consequence, the demand of the IT industry for engineers capable of creating high quality software has increased [2], [3]. Software quality is closely related with the engineer's ability to judge and discriminate between solutions correctly, as well as adapting to the continuous changes in tools and techniques. This implies maintaining a high level of critical thinking in every decision during software production [4].
Critical thinking is considered an essential skill in the training of future engineers and generally in the 21st century [5]. However, studies investigating critical thinking in students on courses related with software production are limited, even though it is a competence required in the discipline [6], [7]. Likewise, the instruments for evaluating critical thinking with the greatest cover are generic and standardised [8], and do not consider the disciplinary context or the mode of study. The scarce evidence available is insufficient to establish a consensus on how to evaluate the components of critical thinking in students on courses related with software production, leading to contradictory results [9]- [12].
In this scenario, the need arises to carry out a review of the current state of the evaluation of critical thinking in software engineering training imparted online, to give teachers and students a guide as to which are the most important elements to consider and what gaps in knowledge need to be addressed.
This article presents a systematic mapping study of the literature (SMS), carried out by applying the systematic mapping protocol proposed by Petersen [13]. Reviewing and analysing hundreds of research articles enabled us to establish the main topics currently being addressed in the discipline, and to identify missing areas and recent trends.
The principal contribution of this article, therefore, is to determine the central themes that need to be considered for the application of any strategy for teaching critical thinking in software engineering, in an online environment.
The article is organised as follows: Section II describes the main characteristics of the study; Section III presents the method used for systematic mapping; Section IV describes the main results; Section V offers answers to the Research Questions; Section VI, based on the results obtained, proposes a preliminary framework for evaluating critical thinking in the context of the teaching of software engineering online; Section VII summarises the biases of the work; and finally Section VIII presents the conclusions of the study.

II. BACKGROUND A. CRITICAL THINKING
Many of the current definitions of critical thinking in the context of education are based on Bloom's taxonomy, in which the three highest levels of the hierarchy generally represent critical thinking: analyse, evaluate and create [14]. This has led to a variety of definitions with no apparent consensus between investigators, resulting in ambiguities when teaching and when carrying out practical assessments [15]. Nevertheless, there is an agreement that the skills of critical thinking consist principally in analysing arguments, drawing inferences, judging or evaluating, and taking decisions, with the object of guiding problem-solving [16].
Furthermore, critical thinking can be approached as a way of being -thought; or of acting -willingness [17]. Willingness to employ critical thinking is the intrinsic motivation of the subjects to value, consciously and actively, the arguments presented [18]. The most important aspects of willingness are related with mental openness, curiosity, the desire to be informed and respect for external or different points of view [16].
Previous or specific knowledge of a domain play a central role, both in the skills of critical thinking and in the willingness to employ it, as they allow us to establish the topics to which critical thinking is to be applied [19].
Together, the skill of thinking critically, the willingness to do so, and previous knowledge about a topic, represent the central structure for establishing a teaching-learning strategy for critical thinking.
The commonest strategies in use today are fusion and immersion. In [20] the authors propose the fusion approach, which implies in-depth instruction in the material, plus separate instruction in critical thinking [21], as well as including everyday problems with the idea of teaching the students how to transfer the skills of critical thinking to specific contexts [22]. The immersion approach assumes that the students acquire critical thinking skills as a consequence of learning the course contents, and not as independent parts of the course [23].

B. EXPERIENCES OF TEACHING CRITICAL THINKING ONLINE
The literature reports various teaching proposals that show a positive impact on critical thinking, mainly following the immersion approach: by problem-based learning [24], technological learning [25] or flipping part of the contents [26]. However, in a completely online environment, but with a face-to-face culture in both students and teachers, this becomes a serious challenge [27].
Recent experiences indicate that it is possible to address the development of critical thinking in an online format with relative success using project-based teaching [28] and inquiry-based learning [29] to develop the willingness to employ critical thinking [30].
A critical thinking teaching model based on web simulations has also been proposed; it received positive evaluation when trialled in students on the Network and Communications course. This model includes a simulation process which starts with visual inspections, combined with online discussion to encourage socio-constructivist learning. It has been shown to have a significant impact on the acquisition of critical thinking in system design, assessed by pre and post-testing [31]. A different model of teaching design for active learning in online contexts has been proposed, which encourages critical thinking in problem-solving. In this model the teacher guides the reflective process of analysis, and designs learning activities which contribute to active, student-focused learning, establishing situations of critical analysis [32].
In [33], the authors report that Japanese and German students collaborated in a training programme in software engineering through learning based on collaborative projects online. Incorporation of an intercultural dimension produced a positive result in the framework of this teaching strategy, although it is more demanding for both students and teachers. Collaborative work proved to be a positive strategy for the development of critical thinking. A similar initiative in a computer sciences course applied a model of encouraging collaboration and the evaluation of online teamwork, based on evidence of constructive feedback between peers. This model showed that peer comments in the analysis of complex problems contribute to teamwork and improve the critical skills of the participants. Thus encouraging discussion emerges as a strategy for reflection on both individual and team performance [34].

C. EVALUATION OF CRITICAL THINKING IN THE ONLINE CONTEXT
Evaluating critical thinking skills and/or the willingness to employ critical thinking presents great challenges to both teachers and students. In this context, problems arise related with the reliability and validity of the instruments available, stressing that traditional evaluation formats are not appropriate for measuring critical thinking skills [16].
Since the aspects included are drawn mainly from the three levels of Bloom's taxonomy, creativity is an inherent element of these three levels. Furthermore, evaluation of this aspect introduces subjectivity and error, as Silva points out [21]. There are other approaches that suggest the use of open question problems, rather than multiple choice [35]. By the same token, the problems set should have more than one possible solution, allowing students to argue more than one point of view [36].
Standardised instruments have been used to enable participants to assess their own critical thinking; one of these is CAT (Critical Thinking Assessment Test) [37], proposed jointly with project-based teaching to integrate critical thinking by solving complex problems [9]. This experimental study, with a control group, showed evidence of an improvement in critical thinking skills among students using an active, project-based teaching strategy. Nevertheless a comment was made with respect to the low motivation of the students who were asked to answer the same test twice, which may have affected the results [8].
In other models for teaching software engineering, personalised instruments have been established for evaluating critical thinking. Integration of a model to encourage critical reading, SQ4R [38], has been used as an analysis strategy for software engineering. Students perceive this strategy as a useful tool for improving comprehension of course contents and for the effective development of software engineering projects [6]. Another work proposes standardised evaluations combined with observation and qualitative feedback from the participants, identifying opportunities for improving critical analysis [8]; then there are studies which include only personalised instruments for evaluating this skill, based on authentic learning [39], thought-based learning [40] and the generation of evidence of learning by means of formal marking systems or observation parameters [41].
Consequently, critical thinking is seen to be vitally important for software engineering students in higher education, since various techniques and solutions must be considered at every stage of the development process. Nevertheless, so far as we know there has been no evidence of proposals for how to link and evaluate explicitly critical thinking skills with the competences required for developing high quality software, in virtual contexts and in the new scenarios of collaborative learning.

III. METHOD
The systematic mapping technique proposed by Petersen [13] offers a way of verifying, analysing and categorising results related with a specific topic or area of interest, thus allowing the scope of the investigation to be determined, and the knowledge obtained to be classified.
Performing a systematic mapping study (SMS) involves following the stages described in Fig. 1 sequentially. Thus as the various stages of the process are completed, concrete results are obtained which form the direct input of the following stage, in order to achieve systematic mapping as the final result. The activities that make up the systematic mapping process are described in the following sections.

A. DEFINITION OF THE RESEARCH QUESTIONS:
The first stage is the presentation of the Research Questions (RQ) which provide the methodological basis of the rest of the process. Table 1 details the three RQ of this investigation.

B. SEARCH EXECUTION
The first step towards obtaining results was to define key words for the study in English, with possible variants; these were obtained from the RQ (see Table 2): The final search chain used was as follows: (("assessing" OR "assessment") AND ("guideline" OR "approach" OR "method" OR "framework" OR "strategies" OR "model")) AND ("critical Thinking" OR "critical thinking skills") AND ("software developing" OR "software development" OR "software build" OR "software production" OR "software engineering") AND ("online learning" OR "online courses" OR "remote training" OR "online training" OR "remote education" OR "remote teaching") In selecting data sources we included our own search engines, with complete access from the site where the study was carried out. The sources in which the search was carried out were ACM, Springer, ScienceDirect, IEEE Xplore, Wiley and WOS. The search was performed in January 2021. Fig. 2 presents a summary of the results.   Research works were selected first by the inclusion criterion through analysis of the title, abstract and key words; this produced a large number of works that make a significant contribution to the study area. At the same time the date range filter was applied and duplicates were eliminated.
In the following step, the exclusion criterion was applied to the abstract of the article, eliminating short articles (less than 3 pages), theses, technical reports and tutorials. Only articles focusing on the evaluation of critical thinking and software development teaching were retained. Finally, a full-text analysis was carried out of the candidate documents, selected individually by secret vote of all the authors. The concordance between the researchers was validated using Fleiss' Kappa Index, as proposed by Gwet [42], which gave a reliability of 87.1%. The final product was 21 relevant articles for the analysis.

IV. RESULTS
The first group of articles selected were literature reviews; four articles were identified that addressed similar topics to that of the present proposal. Although none of them addressed exactly the topic of critical thinking in software engineering teaching in online contexts, they do represent an interesting basis for the study: Chanin et al. [43] identify important contributions of education in engineering in a context of "software start-up". They describe a variety of best practices and methodologies particularly suitable for businesses. In this work we can recognise the emergence of skills linked to critical thinking.
Veras et al. [44] found that activities in online classes follow three main strategies: (1) project-based learning (38.3%.); (2) problem-based learning and self-teaching (50.0%.); and (3) team learning (7.7%). The studies reviewed report challenges such as teachers with work overload and time limitations, as well as difficulty in maintaining student motivation in hyper-mediatised environments. This context has a negative impact on the willingness to think critically.
Garousi et al. [45] report a meta-analysis aimed at consolidating the alignment of education in software engineering with the needs of the industry. Their results lead to the following conclusions: (1) software requirements, design and testing are the most important skills; and (2) the biggest knowledge gaps are found in model-management and the processes of software engineering, design (and technological architecture) and testing.
Finally, Anthonysamy et al. [46] compiled information on metacognitive knowledge, resource management and motivational belief strategies. The authors conclude that Self-regulated Learning Strategies (SRLS) have a positive correlation with non-academic results. These strategies are closely linked with a willingness in students to employ critical thinking.  Positive evolution and growing interest can be seen over the ten years included in this study. One particular aspect is the considerable increase in the number of studies related with virtual education in software engineering. This may have been promoted by the context of the COVID-19 pandemic and the quarantine processes applied in educational institutions as a result (see Fig. 4). Fig. 2 reports the sources of the data. The largest numbers of articles came from Springer and ACM. As the filters were applied successively, all the sources were reduced by a similar percentage, and most of those selected (81%) were obtained from these two sources.
Finally, 38% of the results were drawn from journals, the remainder were articles from scientific conferences. Fig. 6 shows the classification of the different categories emerging from the analysis performed in this study. Eight categories were identified: • Software engineering (SE) teaching strategies: The various types of initiatives applied in the teaching-learning processes are described, such as: Project-based learning, Problem-based learning or Flipped classroom. • Mode: Defines the format of the intervention or proposal, divided into "offline" (in person) and "online" (remote). • Type of study: Defines the methodology applied to the evaluation of the results. Finally: • Type of study: 23.81% are "Reports of experiences", i.e. they present an approach without necessarily the rigour of a case report. 38.10% are "Proposals" of the authors themselves. • Teaching strategies: 52.38% use (commercial) development exercises in local software production companies. In other words, although the experiments are performed in an online context, most of them are to solve local problems and respond to internal needs. • Evaluation processes in software engineering: 42.86% do not define a direct strategy for measuring performance. This information is often omitted, and because it was outside the scope of the initiatives it was probably not considered important by the authors. Nevertheless, it may be noted that 71.43% of the studies do not declare any kind of evaluation of critical thinking in their experiments.
In the analysis by category, we note that 57.14% offer only relative findings, but have no clear evaluation mechanisms either for topics directly related with engineering or for teaching-learning processes.
The articles selected are listed in Appendix A. No single author or research group predominates. Figure 4 shows that the oldest articles date from 2011, while the most recent and most numerous are from 2020.
If we analyse the key words declared by the authors of the 21 articles selected, important findings may be noted (see Fig. 7). Leaving aside the obvious results "software engineering education" and "online platforms" (28.4%), two frequent key words are "Problem-based learning" (10.4%) and "Project-based learning" (9.4%). Indeed, nearly 60% of the results mention a collaborative approach based on projects or joint problem solving. It is important to note, however, that the majority of the articles describe practical VOLUME 4, 2016

V. DISCUSSION
This section seeks to give answers to the Research Questions formulated in Section III, based on the results obtained and the systematic maps.
A systematic map is drawn based on simultaneous counting and classification of the dimensions of interest; the diameter of the bubble is proportional to the number of investigations linked to the study dimensions (Fig. 8, 9 and  10). This allows a general view of the field of study.

A. RQ1: WHAT RESEARCH TOPICS CHARACTERISE THE EVALUATION OF CRITICAL THINKING IN SOFTWARE ENGINEERING TEACHING?
Analysis of the articles revealed three central research topics: (i) Teaching strategies, (ii) Evaluation strategies, and (iii) Stages in the software development life cycle (see Fig.   8).
• Teaching strategies: approximately 42% of the articles addressed the investigation from the perspective of project-based learning, which was one of the most frequent strategies used to evaluate the development of critical thinking (see Fig. 11). Nevertheless, the great majority of the initiatives lacked any formal support for evaluating critical thinking. On the contrary, the "flipped classroom" strategy is little used, around 4%, and as in the rest of the strategies there was no mention of the use of instruments to evaluate thinking, nor of Use Scenarios for the software life cycle. The above may be attributed to investigations indicating that this type of teaching strategy presents a low level of compliance with pre-class activities [47], and a marked resistance if it is the first time that the course has been taught [48], which would have a direct impact on the arguments proposed. • Software engineering evaluation strategies: A considerable number of articles (9), give no details of how learning was evaluated (see Figure 12). Approximately 40% of the articles are split between "Observation" and the development of "Artifacts" that are evaluated. This agrees with the difficulties of setting up evaluations in the context of software engineering teaching proposed in [49]. • Stages in the software development life cycle (SDLC): There is a striking absence of studies which address stages like testing and/or software evolution exclusively, or the maintenance processes associated with this stage (see Figure 13). This offers a niche for study in the widely researched connection between tests and software quality, and how these elements are linked with critical thinking skills. We note that nine  articles report the use of a complete life cycle, but without giving details of the stages, which makes it impossible to investigate exactly which tasks might have an impact on the student's critical thinking.

B. RQ2: WHAT TOPICS RELATED WITH THE EVALUATION OF CRITICAL THINKING HAVE BEEN PUBLISHED IN THE FIELD OF SOFTWARE ENGINEERING TEACHING?
The articles reviewed report three main categories of instruments for measuring critical thinking: • Universal or generic evaluation instruments, applied without specifying a discipline.   • Ad hoc evaluation instruments, i.e. adaptations or adjustments of existing instruments for measuring critical thinking. • Others, i.e. instruments or artifacts developed specifically for the situation to be measured.
In this classification, 67.86% of the articles (see Fig. 9) do not report the use of instruments for evaluating critical thinking, despite mentioning it as a potential finding or element to be developed. This is unexpected, and reveals fundamental problems with the definition of the study, as mentioned in [16] and [50]; and also the lack of investigation into the evaluation of critical thinking in software engineering teaching environments [6].
Most of the other 32.14% fall into the test categories Ad hoc, Standardised and Other. These results may be related with the secondary nature attributed historically to critical thinking in software engineering teaching environments [6] and the complexities associated with setting up a standardised test like the CAT test [51], which requires several days' training.
In this context, only one article [31] proposes a framework close to the evaluation of critical thinking, but not applied specifically to software engineering teaching, suggesting a potential means which should be explored. Fig. 5 shows that 67% of the articles refer to the online mode, while the remaining 33% are based on an offline format. This finding reinforces the hegemonic proposal of remote work aligned with industry, which had already been growing steadily but accelerated during the last year as mentioned by [52].

C. RQ3: WHAT IS THE PREDOMINANT METHOD OF EVALUATING CRITICAL THINKING IN SOFTWARE ENGINEERING TEACHING?
Similarly, one of the most frequent combinations is "project-based" learning in the online format combined with "teamwork" (see Fig. 10). However the studies reviewed provide few details on the evaluation instruments used, the evaluation strategy or the instruments with which the study was performed.

VI. FRAMEWORK FOR INTEGRATING CRITICAL THINKING INTO VIRTUAL SOFTWARE ENGINEERING TEACHING
Based on analysis of the articles selected, a preliminary framework can be proposed as a guide to setting up an initiative for evaluating critical thinking applied to software engineering teaching in an online context. The proposal is based on four components that interact constantly and repeatedly, and can be combined with the principal active learning strategies detected in the articles (Fig. 14).
Firstly, as proposed in [53], the inclusion of (a) active, collaborative learning strategies, such as project-based learning, problem-based learning, case studies and capstone studies, among others, promote a suitable working space for developing an initiative of this type. Selection of one of these strategies will depend on the capacity and experience of the teacher or instructor, as well as the course characteristics. This component is therefore the functional base of the proposal.
Depending on the course characteristics, the second area to address (b) is the level(s) of instruction covered by the initiative. Direct or indirect instruction, or none, are the categories of participation of the instructor in the course group. These levels must be inclusive and capable of balanced combination, reflecting the different realities and needs of the IT industry.
After these two components, the next (c) is the process of deliberative analysis, a central element of the reflection process required to solve a particular software problem. Analysis, characterised by argument, questions and exchange of information, seeks to deliberate on the problems inherent in software development in a work environment that is of necessity collaborative. Evaluation allows the subject to connect and combine elements in search of a solution to a particular problem. Finally, creation allows all the previous elements to be synthesised into a proposal for a solution. These elements must be iterative, as they must be repeated until individual and group satisfaction is reached among the participants.
The importance of the online mode necessarily requires the incorporation of support tools to facilitate synchronic and asynchronic communication; organisation of the documents needed for the work by a documentation manager; management and dissemination of the source code for the whole project; and finally organising and orchestrating the tasks assigned.
The evaluation component must be adjusted equally to all the other components and avoid the problems associated with generic methods of evaluation, as described in [8]. Thus, following [50], we propose the following dimensions for inclusion in the evaluation of critical thinking in software development environments that are close to industrial: • Critical dimension: evaluation of the depth of the arguments and the analysis carried out in the Deliberation component, with two central elements: (a) evaluation of evidence collected and used which comes from direct events involving software programming, and (b) evaluation of the arguments used from the perspective of guiding a decision which is grounded in the quality of the software and the requirements. • Synthesis dimension: associated with comprehension of the whole from its parts. It is therefore necessary to evaluate (a) whether the scope of the implications of any decisions taken is understood, and (b) the robustness of the arguments presented. • Interaction of Dimensions: combines the Critical and Synthesis dimensions with the purpose of evaluating whether the student (a) understands the causality and the explanation before moving towards a decision. Given the incipient stage of this proposal, the operationalisation of each of the components to be applied has not yet been developed However this area is included in the future research lines of the team, and can be considered as a support for researchers and instructors.

VII. THREATS TO VALIDITY
This section reports how the biases in the validation of the present work were treated [54].

A. INTERNAL VALIDITY
Threats to internal validity are factors which might affect the results of the present study. The following aspects were considered, with their respective mitigation plans: • Search for studies: To mitigate this threat, we used the pre-defined search chain in the principal electronic databases. Before carrying out a real search in each site, we carried out a pilot search in all the databases selected to verify the accuracy of our search chain. • Bias in study selection: Studies were selected by applying explicitly defined inclusion and exclusion criteria. To avoid possible bias, we also carried out validation by cross-verification for all the studies selected.  • Bias in data extraction: To obtain consistent data and avoid biases in data extraction, we defined a summary of the results of the data found. First one of the authors constructed the results distribution tool. Then two authors distributed the number of studies equitably and obtained the data as per the data extraction form. The same two authors discussed their findings and shared them regularly to avoid bias in data extraction.

B. EXTERNAL VALIDITY
Threats to external validity are restrictions which restrict the capacity for generalisation of the results. The inherent threat related with external validity is whether the primary studies report initiatives of critical thinking in software engineering education in the online mode. We mitigated this threat by choosing peer-reviewed studies and excluding grey literature (white papers, editorials, etc.).

C. CONCLUSION VALIDITY
Threats to the validity of the conclusions are issues which affect the ability to draw correct conclusions. Although we used the template of Petersen et al. [13], which assumes that it is impossible to identify all the relevant primary studies in existence, we managed this threat to validity by discussing our results in sessions with education professionals and software engineering academics. The number of primary studies obtained for this systematic mapping was such as to enable us to carry out a critical analysis of each one.

D. CONSTRUCT VALIDITY
The validity of the construct is related with the generalisation of the result to the concept or theory behind the execution of the study [54]. The principal threat is subjectivity in our results. To mitigate this threat, all three researchers carried out the main steps of the systematic mapping study independently. Subsequently they discussed their results in order to reach a consensus.

VIII. CONCLUSIONS
This article presents a preliminary proposal for a framework for the incorporation of critical thinking in the context of software engineering education online. The proposal was developed from a systematic mapping study of articles related with this topic. Three Research Questions were formulated, all of which were analysed using Petersen's approach [13]. The answers to these questions enabled us to detect key elements for evaluating critical thinking in the context of software engineering education, and these were the principal input for drafting our proposal.
Our proposal takes a generic approach, in other words it could be applied in other teaching contexts, not only software engineering. This is the case so long as the object of the topic shares essential features with software development. This is necessary in view of the multidisciplinary nature of this area. Moreover, the need is growing every day due to the way software and the software life cycle are used to solve problems.
It must be stressed that our results are preliminary, with a top-down approach; in future works we will explore operationalisation of the evaluation dimensions. Our work offers a route for research, and support for teachers of software engineering courses.
Future work, within the authors' study framework, will present deeper analysis of the proposal, followed later by a start-up version in an experimental initiative, to obtain indications of how the proposal can be adapted or improved.

JAIME DÍAZ
Ph.D. in Engineering Informatics, Master in Computer Science, engineer by profession. Currently working as a full-time professor in Universidad de la Frontera (Chile). Business processes and User eXperience Advisor. Technological projects evaluator at CORFO (a Chilean government organization). His research interests focus on HCI, eCommerce and education. He is interested in modelling human behavior by means of data science, with applications in education and in heuristic problems. VOLUME 4, 2016