Identifying difficulties of software modeling through class diagrams: A long-term comparative analysis

Software modeling is a creative activity in which software components and their relationships are identified based on customer requirements. Based on the literature, object-oriented software modeling is based on four fundamental pillars which are abstraction, encapsulation, decomposition, and inheritance. However, despite the existence of guidelines and recommendations for implementing the object-oriented approach, novice software designers do not make good design decisions, leading to inefficient designs that cannot be modifiable, understandable, or user-friendly distribute at the development level. The literature reveals that the most common difficulties faced by software designers is a lack of understanding and confusion of concepts related to the object-oriented approach, as well as difficulties in creating Unified Modeling Language diagrams, especially class diagrams. The work presented in this article uses a qualitative and quantitative approach to determine, in a group of university students, what are the most recurrent difficulties and their persistence during the time. The qualitative case study is the method that allowed to generate the documents: diagnostic and evaluation tests. Additionally, a thematic analysis was used to identify, analyze and report patterns within the data. In order to know the occurrences of the problems in the case study, as part of our quantitative approach, a comparative study was applied to compare the results obtained between the diagnostic and evaluation tests and thus establish the similarities and differences among the cases observed, through the hierarchical clustering technique. The findings of this study show us 16 difficulties identified after the qualitative analysis, while the quantitative analysis shows us the number of occurrences and their persistence over time. The difficulties reported in both analyzes focus on these three difficulties: a) Definition of attributes that could be a class, b) Classes with inadequate or insufficient behavior and, c) Incorrect use of multiplicity between classes. Each of these difficulties is analyzed in depth in this study.


I. INTRODUCTION
Software engineering (SE) is a branch of computer science that studies the creation of reliable and quality software based, on engineering methods and techniques [1]. In other words, the SE is the practical application of scientific knowledge to the design and construction of programmes and to the associated documentation required to develop, operate and maintain these programs [2]. SE processes generally consist of five structural activities: Requirements Definition, System and Software Design, Unit Implementation and Testing, System Integration and Testing; and Operation and Maintenance [3].
Software design is a creative activity where the components of the software and their relationships are identified, based on customer requirements. Software design, particularly the object-oriented approach, is based on four principles: abstraction, encapsulation, decomposition and inheritance [4], [5]. This approach is one of the most used to represent the problem domain and involves the design of classes and the relationships between those classes. Such classes define both the objects in the system and their interactions [3]; the standardized language to express such a scenario is through Unified Language Modeling (UML).
Software design is the only way to accurately materialize customer requirements, and actually there are guidelines and recommendations [5], [6] for object-oriented software design that designers, generally beginners, fail to implement in practice. As a result, designers have great difficulty in finding good decisions, which leads to the violation of the principles of the object-oriented approach, which results in low-quality designs [7]- [9]. In the literature we find that the most common difficulties faced by software designers are: • Lack of understanding of object-oriented concepts [10], [11], [12], [13]. • Confusion between concepts related to object/class, class/collection and concepts involving modeling [9]. • Confusion in the association of class/object, attribute/method and class/subclass related to inheritance concepts [14]. • Difficulties in creating UML diagrams, syntax and notation errors; errors related to attributes, association and classes [14], [15]. • Difficulty in organizing the information in the diagrams and in using correctly the generalization-specialization type association [16]. • Incorrect mapping of the problem concept, misuse of inheritance, relationships and function names [17]. • Lack of creation of classes for relevant aspects of the application [10]. The work presented in this article uses a qualitative and quantitative approach to determine in a group of university students, what are the most recurrent difficulties and their persistence during the time. The qualitative case study is the method that allowed to generate the documents: diagnostic and evaluation tests. Additionally, a thematic analysis was used to identify, analyze and report patterns within the data. In order to know the occurrences of the problems in the students of the case study, as part of our quantitative approach, a comparative study was applied to compare the results obtained between the diagnostic and evaluation tests and thus establish the similarities and differences among the cases observed, through the hierarchical clustering technique.
The findings of this study show us 16 difficulties identified after the qualitative analysis, while the quantitative analysis shows us the number of occurrences and their persistence over time. The difficulties reported in both analyzes focus on these three difficulties: a) Definition of attributes that could be a class, b) Classes with inadequate or insufficient behavior and, c) Incorrect use of multiplicity between classes; each of these is analyzed in depth in this study.
The results of this study are very helpful when identifying the importance of software design and the most common problems that arise from the object-oriented approach. However, few references study the persistence of difficulties and misconceptions of students. Our study focuses on discovering perceptions of students in the long term, when designing software, based on exercises that make it possible to analyze perceptions of students of object-oriented approach concepts and their difficulties. The results will allow the reflection of those who study the object-oriented approach and its influence on educators who focus on teaching object-oriented software design.
The rest of the article is structured as follows. Section II shows several related works found in the literature and the contribution that our work has in this regard. Section III presents the research methodology. Section IV shows the development of the qualitative study. Section V displays the development of the quantitative study. Section VI presents the results of the study and Section VII presents a discussion explaining the advices that emerge from it and the threats of reliability. Finally, Section VIII dconcludes the paper.

II. RELATED WORK
In the literature we have found works related to how students implement software design concepts under the objectoriented approach, considering both qualitative and quantitative perspectives. Several of these works show the difficulties of students in software design and programming.
Some works in the state of the art study how students conceive fundamental concepts of object-classes, modeling and object-oriented programming (OOP) through longitudinal studies with a mixed approach. The researcher Xinogalos, based on a literature review and taking as a model the studies of Eckerdal and Thune [18], [19] compares conceptions about objects and classes from the literature with data from written exams that include open-ended questions applied to a group of students. In this study, the authors identify that the most referenced misconceptions are the confusion between object/class, class/collection and concepts involving modeling, the latter because students perceive class as an abstraction of some type of object/entity in the domain of the problem [9]. Through the exploration of mental models and evaluation of tests taken by students, the authors identify that the most problematic areas are related to the association between class/object, attribute/method and class/subclass including inheritance concepts, in addition to syntactic errors [14].
There are a couple of studies [20], [21], where the authors state that there is a relationship between the performance of students obtained in tests related to software design questions and produce a more reliable program code. In [20], the author studies the relationship between the ability to create a design with the ability to program from the analysis of UML diagrams versus source code. Based on the review of the literature and the analysis of the work presented by the students, the author expresses that the students have difficulties to close the gap between the problem descriptions and the code, however, the author is not explicit in defining which are the problem student face when transferring diagrams to code. In [21], through a planned quantitative method and correlation analysis, the authors conclude that students improved their programming performance thanks to their previous designs. In addition, in the same study, the researches based on test results and object-oriented metrics parameters conduct an empirical study using the hierarchical clustering technique to compare design quality of students and their performance in terms of correctness. For this purpose, they analyze the diagrams made by the students at the beginning of the course and the corrections they do to those diagrams at the end of the course.
Several researchers conceive the use of UML notation and modeling skills as tools for success in software design, and they study the difficulties presented by students in creating UML diagrams. In [15] the authors conduct a study to determine the typical mistakes students make when creating class diagrams, through error analysis and quantitative approach. The authors consider the results of the final tests performed by the students, consisting of the creation of diagrams, to categorize four errors: a) syntactic errors including notation errors, b) attribute-related errors, c) association-related error and, d) class-related errors. In [17] the authors rely on a literature review and their experience as teachers by suggesting that for many students the visualization of objects is not always obvious and that common errors in modeling include: incorrect mapping of the concept of the problem, misuse of inheritance, misuse of relationships and misuse of function names. The researchers propose a problem-based interactive exercise, consisting of a game called Chitty-Chitty Bang-Bang (CCBB) for a better understanding of object-oriented concepts, obtaining beneficial results in average students and, particularly in below-average students, but little effect on brighter students. In [22] the authors study the difficulties that students face while learning to model UML diagrams, such as: difficulty in understanding the syntax and semantics of the diagram, difficulty in organizing information in diagrams, difficulty in correctly using the generalization-specialization type association. In order to overcome these difficulties, the authors propose to explore pedagogical methods such as: problem-based learning (PBL) and learning from erroneous examples (ErrEx), so that students are actively involved in the learning process. These three papers found that errors of syntax and semantics, inheritance and association, are the most common among beginners.
The previous research to this study [10], focuses on the exploration of object-oriented software design decisions that students make and their possible causes, through a qualitative study. Through a thematic analysis applied to diagnostic and evaluation tests, presented by students, which consist of developing 3 exercises that involve concepts of the objectoriented approach. Researchers identify that the most common errors are related with creating/not creating classes and objects, class behavior, class names, and confusion between subclass/superclass. This reflects a lack of abstraction in students, manifesting an excessive simplification of problems. As to the possible causes, the authors mention: strict copy of reality, influence of the structured approach, simplistic general description and lack of understanding of concepts of the object-oriented approach.
The contribution of this work lies in studying the difficulties that students face in the design of software under the object-oriented approach and in addition the persistence of these difficulties in the long term after academic instruction. The results of this study will allow the reflection of those who study the object-oriented approach and its influence on the learning of object-oriented software design.

III. RESEARCH METHODOLOGY
This section shows the research questions, the chosen methodology, and the details of the selected case study.

A. RESEARCH QUESTIONS
This research was conducted through two research questions: • ¿What are the difficulties that students present when designing using class diagrams? • ¿From the difficulties previously found, what are the most frequent difficulties that students present when designing using class diagrams during the academic period? This study requires a qualitative and quantitative data analysis perspective. On the one hand, qualitative research includes all non-numerical data. According to [23], qualitative research consists of extracting descriptions from observations taken from interviews, narrations, recordings, audio transcripts, written records of all kinds, among others. These data are generated by case studies, action-research and ethnography. In cite [24], the objective of qualitative research is understanding and focuses on the investigation of facts, as well as the interpretation of events.
In particular, a qualitative case study is defined as a holistic and intensive description and analysis of a single instance, phenomenon or social unit [25]. In our work the qualitative case study, is the method that allowed to generate the documents, which consist of the evaluation tests. In addition, a thematic analysis was used to identify, analyze and report patterns within the data [26].
On the other hand, quantitative research includes numerical data that try to determine the association or correlation between variables, the generalization and objectification of the results. After the study of the association or correlation, it aims, in turn, to make causal inference that explains why things happen or not, in a certain way [27].
In order to know the occurrences of the problems in case studies, as part of our quantitative approach, a comparative study was applied, to compare the results obtained in the diagnostic test applied to the students at the beginning of the academic period and the evaluation test applied at the end of the course. The comparison aims to understand unknown things from known, with the possibility of explaining and interpreting them; and also aims to systematize the information distinguishing the differences with similar cases [28].

B. RESEARCH METHOD
In this study, we have applied an instance of qualitative case study research called thematic analysis, which is defined as a research method that allow to identify, analyze, organize, VOLUME 4, 2016 describe and report topics that are within a set of data. The thematic analysis allowed the identification, analysis and reporting patterns within the data [22]. In addition, to know the occurrence of problems in students, it was necessary to make use of a quantitative analysis. Most quantitative data analysis involves a process of abstraction that starts from the research of qualitative data that have been previously analyzed and that is important for the research topic [29].

1) Setting
The case study was conducted with a group of students from a University of the Faculty of Informatics. Details are shown below:

2) Subject
The subject is Modeling and Software Design, required subject of the fifth semester in the Faculty of Informatics, this subject is taught for 4 hours a week for 16 weeks. Students taking this subject must have previously studied the subjects of Database Management, Programming I and Programming II; and have as a co-requirement the subject of Software Engineering I. The content of Programming II focuses on the teaching of object-oriented programming languages, which means that students already have prior knowledge about the concepts of this approach at the programming level.

3) Lecture Structure
The content of the subject during the 16 weeks of the semester is shown below:

4) Participants
The group of students in this subject initially consisted of 26 students. However, the evaluated results were 22 since four students were discarded for the study, three of the discarded students repeated the subject and another student left the course, therefore, in the comparative statistics was not taken into account.

5) Test
The diagnostic test was performed in the second week after the UML Diagram topic and it was composed of three exercises. The exercises presented to the students for this case study were chosen because they allow to apply the concepts learned in UML topic. While the evaluation test was presented at the end of the academic period. The first exercise called "Betting" involves the use of inheritance and the decomposition of the problem. The second exercise called "Circle" is related to graphic objects that allows to know the way in which the students conceive the problem and finally the exercise called "Hotel" is a transactional exercise whose characteristics allow us to know the understanding of the objects in an exercise that could be solved in a structured way. The statement of each exercise is shown: It is required to make an application for the sports betting service, where a user must register for the bets. Bets can be received by transfer or by card. The system supports different types of bets, for example: Single bet (which team wins), special bet (which minute scores the first goal) and others.

• Exercise Circle
This application consists on drawing a small circle inside a larger circle. The smaller circle can move inside the larger circle, without getting out of it. • Exercise Hotel This application is responsible for booking rooms in a hotel. It is necessary to take into account the booking dates and the verification of room availability.

IV. QUALITATIVE CASE STUDY
The qualitative approach bases the analysis on the result of the software design exercises performed by the students. The thematic analysis process was based on the model proposed by Seidel [30], which consists of the following stages: Two sets of qualitative data (diagnostic and evaluation test) were collected for this study. The data collected were obtained based on the 3 design exercises already described in Subsection III-B5 and an interview was applied to the students who were part of a diagnostic test carried out on 26 students at the beginning of the academic period, this corresponds to the first data set published in a previous work (accepted for publication). At the end of the academic period an evaluation test was applied with the same exercises, obtaining the second set of data. In both the diagnostic and evaluation tests, the design of each statement was requested through class diagrams.
Prior to the diagnostic test, students received instruction on concepts related to Language Unified Modeling (UML), so that all students have a standard language to design the exercises. In this instruction only sequence diagrams and class diagrams with their respective relationships were considered, and special attention was paid to the semantic structure of inheritance. Finally, the concepts related to the object-oriented approach such as object, class and message were also updated. In addition, it is important to emphasize that students have already received subjects where they have previously programmed in object-oriented languages, this means that the concepts taught in class were a reinforcement to their knowledge of object-oriented concepts.

B. DATA ENCODING
After obtaining the results of the diagnostic and evaluation tests conducted with the students, we proceeded to assign codes to the collected documents. "A code in qualitative research is most often defined as a word or short phrase that symbolically assigns a summative, salient, essencecapturing, and/or evocative attribute to a portion of visual or language-based data" [31].
At this stage a deductive coding was carried out and the test were coded using Atlas.ti software [32]. For this, we have taken as a reference the problems that were previously identified in a work published by the same authors of this study (accepted for publication), however, it is important to mention that some of the findings shown in this study have also been reported by other authors [33]- [36]. In the previous study ten student's design decisions were identified, which are shown below: 1) Tendency to create a third class between two other classes to associate them, instead of creating a manymany association between them. 2) Assigning the behavior of a real-life object to the class diagram as is, instead of using an abstraction of that concept at the software level. 3) Lack of creation of classes for relevant aspects of the application. 4) Designing classes with different names, but with the same structure. 5) Designing classes without any behavior. Special interest in defining classes only through their attributes. 6) Place responsibilities on classes that should not be responsible for that behavior. 7) Creating classes that differ from their superclass or sibling classes only by its attribute values, when the behavior should be the same. 8) Assignment of complex behaviors as attributes. 9) Belief of students that placing an ID attribute in each class will allow them to access all instances of that class. 10) Definition of classes that are not concepts.
It is important to mention that in this study we did not limit ourselves to coding only the problems mentioned above. The experts were free to code problems that were not taken into account in the previous study. Consequently, in the present research we found 8 of the 10 problems previously identified and 9 additional problems, so that the 16 problems found are explained later. At this stage a total of 365 codes were obtained in the diagnostic test and 420 codes in the evaluation test. The coding was performed separately by two experts, and then applied the peer evaluation technique, in order to guarantee the coding process.

C. DATA REFINEMENT
This is the stage where the preliminary codes obtained in the previous stage were analyzed, matching similar codes or separating them. This stage was carried out in conjunction with the researchers involved in the coding in order to maintain the integrity of the coding.

D. GROUPING OF QUALITATIVE DATA
Finally, the codes generated in the Refinement stage were grouped in section IV-C. The grouping was performed following the criteria of the research questions. These codes were then abstracted into categories through a thematic analysis. The details of the resulting categories and their respective acronyms are shown below: 1) Convert attributes into classes (CLA), refers to extract an attribute and convert it into a class. However, this created a class that represents only data, without behavior. For example when the student designs a Circle class and a Position class, but the latter has only attributes Xo and Yo. 2) Not considering the problem from an holistic perspective (HOL). This category is related to the fact that students do not conceive all the aspects necessary to solve the problem. For example, in the exercise Betting, the student should check all aspects that influence the resolution of the exercise, such as whether there are sufficient funds for a person to place a bet. 3) Not including the classes necessary for the design (NUM), refers to omitting classes in the diagram in spite of being explicitly mentioned in the statement, for example, the absence of the Account class in the exercise Betting. 4) Creation of classes that should be related to a concept (FUN), but the concept itself does not exist in the diagram. For example, when a student has created a class named BetType, but the concept Bet does not exist in the diagram. 5) Incorrect use of multiplicity between classes (LIS) because the student does not identify the possible existence of several instances of the same class. For example, in the exercise Hotel, some students did not identify that multiple reservations can be made to the same room. 6) Classes with inadequate or insufficient behavior (COM), this category refers to those classes that were created with a behavior foreign to the concept of the class or the behavior only partially represents the concept, for example, the creation of a Hotel class, which has reserveRoom() method. VOLUME 4, 2016 7) Creating the same class multiple times on a single class diagram (INS), instead of instancing the class multiple times. For example, the creation of different rooms, such as SingleRoom, DoubleRoom, TripleRoom, instead of just the class Room. 8) Defining attributes that could be a class (ATR), an example of this is when the student creates a Reserve class, and places an attribute roomNumber, and there is no Room class in the design. 9) Placement of different methods that could have been represented by a single method (MET), for example, when it is placed move-left, and move-right instead of move. 10) Classes built in the image and likeness of concepts of the real life (REA), for example, in the case of a student who created a Room class with a toClean() method. 11) Creating a Main class, which only function should be to trigger the start of the program, and filling it with functions that should be part of other classes (MAI). 12) Creation of classes whose name and behavior represent an action and not a concept (ACC), an example was presented in the exercise Circle, where a student diagrammed a class labeled SmallCircle, and additionally two classes: one labeled Draw and one Move. 13) Creating relationships between confusing or erroneous classes (PCL), it refers to syntax errors in UML semantics. For example, using the composition relationship instead of aggregation. 14) Construction of classes with attributes but no methods, even when they needed methods with distinct behaviors in the context of the exercise (SIC). For example, the Client class with the attributes name, lastName, id, and without any methods. 15) Construction of inheritance structures whose derived classes only differ from the base class by their attributes (HER). For example, when students created a base class called Bet with an implemented bet() method, and two subclasses, one called IndividualBet with an attribute called individualFactor and the other ComplexBet with an attribute called complexFactor but both with the same implemented method inherited from the base class called Bet 16) Similarity to the entity-relationship model (REL), refers to the fact that the learner tends to create an intermediate class for recording details of related classes.

V. QUANTITATIVE ANALYSIS
The quantitative approach allows the search for patterns in the obtained data. Therefore, comparison is essential in any field of research as it allows the establishment of systematic similarities and differences among observed cases, as well as the possible development and testing of hypotheses and theories about their causal relationships [37]. A definition for the term comparison is "the act of observing two or more things in order to discover their relationships or estimate their differences and similarities" [38].

A. COMPARATIVE ANALYSIS
In a comparative analysis, a distinction is made between Most Similar System Design (MSSD) and Most Different System Design (MDSD). When applying MSSD the research objects are chosen as similar as possible, except for the phenomenon whose effects we are interested in evaluating. While in MDSD the strategy is to choose research units that are as different as possible, the basic logic is that differences cannot explain similarities [39]. Comparative analysis involves several techniques including: case study, statistical analysis and experimental research. In addition, it involves a focus on the analysis of a limited number of cases. The outcome is focused on obtaining data that leads to the definition of a problem or to the improvement of knowledge about it [37].
According to [40], the comparative analysis presents two strategies: case studies and study of variables, which are defined below: Case study : A small number of cases are defined and experimental rigor is sought through the identification of comparable effects of a phenomenon and the analysis of differences and similarities between them.
Variables study : It aims to formulate broad generalizations about the objects to be studied and to test abstract hypotheses derived from theories applicable to the cases of study.
The efficiency of the different methods available for conducting comparative research will depend on their effectiveness in solving the problem of causal complexity analysis.
In this research the Hierarchical Agglomerative Clustering (HAC) method will be implemented. The main idea is to group similar data points in one group and separate the different observations into other groups, calculating the distance between them. The hierarchical grouping is represented by dendograms that allow a clear analysis of similarities and differences between the individuals in the case study, facilitating the monitoring of the persistence of difficulties and misconceptions of the students in object-oriented software design.

1) Variables of study
In this study and based on the results of the case studies, it was determined that the comparative entities are: • Results of the diagnostic tests, performed on students at the beginning of the academic instruction. • Results of the evaluation tests, conducted at the end of the academic instruction.

2) Definition of variables
For this research, two case studies were defined, represented by the results of the diagnostic test (Case 1) and the evaluation test (Case 2) of a group of 22 students, which is equivalent to the dependent variables.
On the other hand, the independent variables are represented by the categories of difficulties found during the qualitative study and which were classified as: CLA, HOL, NUM, FUN, LIS, COM, INS, ATR, MET, REL, REA, MAI, ACC, PCL, SIC and HER described in the section IV-D.
Once the comparable entities and the variables to be studied have been defined, the comparative strategy allows to pay attention to those aspects that are very different (the most differences) and those aspects that are very similar (the most similarities).

3) Variable dichotomization
Cluster analysis is extremely important in scientific research, in any branch of knowledge. Bearing in mind that classification is one of the fundamental objectives of science and to the extent that cluster analysis provides us with the technical means to carry it out, it will be essential in any investigation.
Therefore, once the cases and variables to be studied have been defined, the HAC method is used to categorize the students who took the diagnostic and evaluation tests into groups. The groupings made were based on the type and number of occurrences of problems found in the students of the case study on object-oriented software design.
The cluster analysis technique aims to sort individuals into groups, so that the individuals in the group are as similar as possible to each other and as diverse as possible between elements of other groups. Hierarchical clustering groups data based on the distance between each of the individuals in the group. This technique tries to achieve successive groupings among individuals so that they are progressively integrated into clusters which, in turn, would be joined together at a higher level forming larger groups that will later be joined together to reach the final cluster containing all the cases analyzed. However, it is not appropriate for very large data set [41].
Based on the agglomerative hierarchical method, groups were generated in each of the phases of the process looking for the number of clusters for an optimal grouping. At the beginning, each individual is separated. At each step, the closest individuals are merged to form different clusters. That is, each observation is assigned to its own cluster. Then, the similarity or distance between each of the clusters is calculated and the two most similar clusters are merged into one [42].

4) Hierarchical Agglomerative Clustering (HAC)
The HAC algorithm aims to classify individuals. It is fundamentally about solving the following problem: Given a set of individuals of N elements characterized by the information of n variables Xj, (j = 1,2, ..., n). We set ourselves the challenge of being able to classify them in such a way that the individuals belonging to a group are as similar to each other as possible, the different groups being as dissimilar as possible among them.
With cluster analysis, we search a set of groups with different individuals assigned by some criterion of homogeneity, in our study the criterion is given by the type and number of occurrences of problems found in students in the design of object oriented software. Additionally, the possibility of reassignments should be considered throughout the process, establish criteria to stop and/or perform the grouping, and define a measure of similarity or divergence to classify individuals into groups.
The Euclidean distance is the best known and easiest to understand dissimilarity, since its definition coincides with the most common concept of distance (space between two points). The Euclidean distance is recommended when the variables are homogeneous and measured in similar units and/or when the variance matrix is unknown.
Given a set of N elements to be grouped and NxN distance matrix, the basic process of Johnson's hierarchical clustering [43] can be structured according to the following scheme: • Step 1: Assign each element to a cluster, so that having N elements, we will obtain N clusters. The distances/similarities among the clusters should be equal to the distances between the elements they contain. • Step 2: Find the closest/similar pair of clusters and combine them into a single cluster, so that we get less clusters in each phase. • Step 3: Calculate the distances between the new cluster and each of the old clusters. • Step 4: Update the matrix to specify the distance between the different clusters that are formed as a result of the merger. Steps 2, 3 and 4 can be repeated according to the researcher's criteria. These steps consist of searching for similarities between clusters. Therefore, it is required to determine a distance measure between each data point. For this purpose, we use the Euclidean distance function using (1). The similarity between individuals is plotted using dendrograms. (1) Step 1: The initial matrix was obtained based on the study cases formed by a group of diagnostic tests and an evaluation test group. The equation (1), resulting in a matrix of 4 quadrants, was applied to these data.
In Figure 1, the data from matrix are shown from right to left and corresponding to the relationship between diagnostic tests vs diagnosis, diagnosis vs evaluation, evaluation vs diagnosis and evaluation vs evaluation. The grey boxes represent the clusters formed, the blue boxes are clusters with a single set of results. The yellow boxes are the cases of students who were part of the cluster and the red boxes are cases that have already been part of a cluster.

FIGURE 2. Initial cluster
Step 2: To find the closest pair of clusters and combine them into a single cluster we use the centroid method. This method consists of obtaining the average of each individual contained in the clusters. For example, in Figure 2 it can be seen that Cluster 1 contains 6 individuals corresponding to the diagnostic test, in this sense applying the centroid method corresponds to a value of 36 whose average would be 6. Cluster 2 has a total of 12 whose average is 2. Cluster 4 has a total of 24 and its average is 3 and so on with the rest of the clusters.
Step 3: We then calculate the Euclidean distances of the centroids by pairs of clusters and observe which is the smallest distance recorded, which will constitute the first clusters to be grouped. the cluster 2 that will be joined will be those whose calculated distance is the smallest, Figure 3.

FIGURE 3. Initial distance matrix
Step 4: Updating the matrix, from 11 clusters that were initially obtained, after the applied process it was reduced to 6 clusters as shown in Figure 4. In order to get this pairing, we take into account the results of Figure 3. Where, cluster 1 and cluster 5 become one because the distance between them is the smallest. In the same way, cluster 2 and cluster 4; cluster 3 and cluster 6; cluster 7 and cluster 8; cluster 10 and cluster 11 will be joined together, while cluster 9 does not match with any other cluster since the distances are very large. It is worth mentioning that the minimum distance between clusters is 1.
Applying the agglomerative cluster technique, we obtained 5 possible clusters to analyze. In the initial grouping a total of 11 clusters were obtained. Where clusters 6 and 11 had only 1 individual to analyze, which is why it was discarded.
Group 2 is constituted with a total of 6 clusters, which became the study group since all the groups have at least 2 individuals to analyze. In addition, there is the possibility of analyzing the similarities and differences between the individuals that make it up.
Group 3 is made up of a total of 4 clusters. In this case, clusters 3 and 4 together represent 5 of the 22 individuals to be analyzed. While clusters 1 and 2 comprise the majority of individuals.
Group 4 is constituted with a total of 3 clusters. Where cluster 4 has 2 individuals to analyze, cluster 2 has the same results as in the previous grouping, and cluster 1 comprises the majority of individuals.
Group 5 consists of a total of 2 clusters. Here, all cluster groupings form one cluster. Cluster 1 comprise the majority of individuals while cluster 2 has just 2 individuals to analyze.
Since, Cluster 1 has 2 clusters with a single individual to analyze and that the groups 3, 4 and 5 comprise the majority of individuals in a given cluster making it difficult to establish similarities and differences between these. We chose to analyze the 2 groups formed by 6 clusters.

B. CLUSTER ANALYSIS
After prior analysis of clusters and number of clusters, it was decided to work with the set of 6 clusters, for comparison of results. The 1 to 4 groups represent similar characteristics in terms of number and type of occurrences in categories. While groups 5 and 6 represent characteristics that were found with least frequency.

1) Cluster 1
This grouping shows that the category with the highest number of occurrences is ATR, followed by FUN and COM categories. It is observed that this group of characteristics are presented only in the diagnostic tests of the students, Figure  6.

2) Cluster 2
In this grouping we observe results of occurrences in both diagnostic and evaluation tests, Figure 7. In diagnostic tests the category with the most occurrences is LIST, followed by NUM and in equal proportions HOL, FUN, COM, REL and HER. While in the evaluation tests the category with the highest number of occurrences is LIS, followed by NUM and ATR.
When comparing the results obtained in the diagnostic and evaluation tests, it is observed that the problem related to categories LIS and NUM continues and, in addition, they are presented in greater number in the evaluation tests.
FUN and COM categories are maintained in both diagnostic and evaluation tests. Also, we found a particular case where student 2, has recurrences in the LIS category in both tests.
The categories HOL, REL and HER present a minimum number of occurrences in the diagnostic tests, but do not appear in the evaluation tests. On the contrary, the categories REA, ACC and ATR do not appear in the diagnostic tests, while in the evaluation tests they present a minimum number of occurrences.
Moreover, in this grouping, one of the students (E2) who is part of the group presents the same number of occurrences in the LIS category in both diagnostic and evaluation tests. He overcomes the difficulties with the FUN category. However, in the evaluation test, the ACC category appears. In general, the student has the same number of occurrences in both tests in the LIS category.

3) Cluster 3
This grouping is made up of data only from the diagnostic test, Figure 8. Where, it is observed that there are two categories with the highest number of occurrences COM and HOL, followed by the category ATR.

4) Cluster 4
In this group, occurrences are observed in diagnostic tests and evaluation tests, Figure 9. In diagnostic tests the category with the most occurrences is COM, followed by SIC and in equal proportions FUN and INS. While in the evaluation tests the category with the highest number of occurrences is LIS, followed by the categories NUM and ATR.
The categories CLA, HOL, COM, INS PCL and SIC register occurrences in the diagnostic tests, but not in the evaluation tests. This could be due to the fact that the concepts covered by the aforementioned categories were clearer for this group of students. However, in the evaluation test, the categories ATR and MAI appear. It is evident that the problems related to the categories NUM and LIS in the evaluation tests occur in greater numbers than in the diagnostic tests. On the other hand, the FUN category registers a minimal reduction of occurrences in the evaluation test. While the categories REA and ACC remain with the same number of occurrences in diagnostic and evaluation. Furthermore, in this grouping we find that student E10 presents problems with the LIS category in the diagnostic and evaluation test, with a greater number of occurrences in the evaluation test. Here overcomes the problems with the categories COM, INS and REA. However, new categories such as NUM and ATR appear. In general, in the evaluation test it has fewer occurrences (1 less) but the problem with the LIS category persists.

5) Cluster 5
Occurrences in diagnostic and evaluation tests are visualized, Figure 10. In diagnostic tests the categories with the most occurrences are NUM and FUN alike. Followed by the categories HOL and ATR. While in the evaluation tests the predominant category is COM, followed by NUM and LIS. When comparing the results obtained in the diagnostic and evaluation tests, the COM category increases considerably in the evaluation tests. Category NUM is presented in the same number of occurrences in both diagnosis and evaluation. While the ATR category decreases the number of occurrences in evaluation tests.
In addition, the HOL, FUN and REL categories appear in the diagnostic tests, but not in the evaluation tests. The opposite of the LIS category, which is this group, appears only in the evaluation tests. This group consists of evaluation test results only, Figure  11. It is noted that the category with the highest number of occurrences is NUM, followed by COM and REL categories with the same number of occurrences.

VI. RESULTS
This section details the results by clusters, the findings of all clusters in general and finally a discussion.

A. RESULTS BY CLUSTERS
• Clusters 1 and 3 represent individuals from diagnostic tests only. In cluster 1 the category with the most oc- currence is ATR followed by FUN and COM. While in cluster 3 the categories with the highest number of occurrences equally are HOL and COM followed by the ATR category. In addition, it is evident that in grouping 1 and 3 the categories that represent the greatest difficulty for students are COM and ATR. • In Clusters 2 and 4, there is a greater number of occurrences in the evaluation tests. Where, the categories that represent greater difficulty to students in both groupings are LIS, NUM and ATR. • In Cluster 5, the category with the greatest difficulty in both diagnosis and evaluation is NUM, since the number of occurrences is maintained. In addition, in the evaluation test, the COM category increases considerably the number of occurrences compared to the diagnostic test. The opposite happens with the ATR category where the number of occurrences in the evaluation tests decreases compared to the diagnostic tests. • Clusters 1, 2, 3 and 4 have in common that the category where students present more problems is ATR. In addition, it is observed that in Cluster 2, 4 and 5 the category in common, where there are more occurrences, is NUM. • Clusters 2 and 4 exist particular cases of the students E2 and E10 respectively, both students present problems in the LIS category in both diagnostic and evaluation tests.

B. OVERALL RESULTS
• It is observed that the LIS and NUM categories considerably increase the number of occurrences in the evaluation tests. This means that, the problems in terms of the aforementioned categories not only remain the same, but increase in number of occurrences.

VII. DISCUSSION
This section discusses the results of the difficulties identified in software modeling through class diagram. The findings of this study have implications also for students as instructor of software modeling. Moreover, we present the limitations of this study, especially in the qualitative part.

A. DIFFICULTIES OF SOFTWARE MODELING
In the case of diagnostic tests, the two categories with the highest occurrences are COM and ATR. While those of evaluation tests are LIS and ATR. Below we discuss this results:

1) Defining attributes that could be a class (ATR)
From the comparative study, it can be seen that the category that persists both, in the diagnostic and in the evaluation test, is called ATR, which we will discussed below. This category refers to the simplification of a concept by defining it as an attribute of a class, instead of having conceived it as a class by itself due to its complexity. Some students believe that placing "few attributes" is a way to define correctly a concept. They showed this behavior, when they placed in the Circle class an attribute called type and in Bet class an attribute called typeOfBet.
Furthermore, difficulties related to misassigned attributes and missing attributes have also been found in the literature [34]. However, there is an important tendency to think that a concept can be defined only with attributes, leaving aside methods. This is also related to the behavior we get used to see in the structured approach, where data is used by functions, as [44] defines systems under the structured approach: "A software system is a system that manipulates and stores data", so that data under this approach have a leading role. The influence of the structured approach on the implementation of the object-oriented approach has already been discussed in the literature [10]. However, their manifestations go beyond giving more relevance to the attributes; these results coincide with those analyzed in [45], where students assigned to the Employee class the methods to calculate the salary of an employee, when these should belong to the Human Resources class. This behavior shows a clear procedural design where the Employee class is in control and the Human Resources class is just a data. Detienne [46], [47] also shows his findings related to the problems that novice learners have when decomposing large procedures into smaller functional units, thus reflecting the tendency to place all or most of the functional procedure in a single class. Finally, in the work presented by Ven Yu Sien [34], his findings show a lack of identification of related concepts within the domain problem and problems with misassigned or not assigned attributes.
As in the previous category, students show a clear lack of abstraction by not being able to conceptualize a concept through a class with its own behavior or by reducing a concept to an attribute.

2) Classes with inappropriate or insufficient behavior (COM)
Initially, students presented the highest number of problems in the COM category, a problem that decreases in 30% after the previous instruction.
The concepts of a class and an object are very similar in the object-oriented approach, however an object is a concrete entity that exists in time and space, while a class represents only an abstraction [48]. That is why abstraction is a fundamental concept in the object-oriented approach. When defining a real life object as a class, with its attributes and relevant methods, it is a necessity to use abstraction to reduce the object to only the parts that are needed for the software that is being designed [45]. In this sense students in this study have difficulties in giving the class the right behavior and this has been represented in different ways. Sometimes because methods associated with the class do not correspond to the concept that this class represents, or because there is an overload of methods with low cohesiveness between them, or there is a class without a behavior.
Different examples of the COM category were seen in the exercise Hotel, when the Room class has a method moveFurniture(), or in the exercise Betting where students assigned to the Bet class behaviors related to the verification of aspects of the event. We have also seen classes with an overload of methods with little cohesion between them, for example, a Bet class with methods related to the payment and the registration of the gambler. Although the Bet class at first glance has one "behavior". The Bet class is a clear example of an overloaded class that does many different things. The overloading of methods in a class has also been cited in other works [49], [50]. We also find classes defined only with attributes, such as Client class and Hotel class, or absence of methods in classes [51].
Many authors describe this problem when defining classes, some of them attribute it to the confusing behavior of assigning a "real" behavior of the physical object to the software object. This was also a finding reported in [52], who conducted a study where students were asked to create a composite class consisting of several simple classes, where the composite class was called Room and the simple classes: Mirror, Bed, and Cupboard. The students placed the addMirror method in the Room class. The authors interpret this behavior as a student confusion, since it is a possible situation in real life. This involves assigning the erroneous behavior to the Room class; related results were also reported by [10].
Other studies conducted by [9], report difficulties of students in conceiving a class as an abstraction of some kind of entity in the real-world problem domain. Although some authors defend the idea that objects have the property of naturalness, which is understood as the property that allows mapping the physical objects of the problem domain to the software [53], [54].
Also some students have created classes built only with the get() and set() methods, giving the false sensation that these have behavior, when these methods indicate that through them the attributes of that class can be accessed from outside, rather than the behavior of the class itself. Students are often motivated to use get() and set methods to hide the modules, being a misinterpretation of the Information Hiding Principle [55]. The difficulty of defining objects has also been documented in the literature [45], [54], [56].

3) Incorrect use of multiplicity between classes (LIS)
In the Evaluation tests, the LIS category is found with the highest number of occurrences, which will be discussed below.
Class diagrams allow us to show the classes and the associations between them. Additionally it allows us to visualize the number of objects involved in the association through multiplicity. Thanks to the multiplicity it is possible to define an exact number of objects that are involved; or, if * is used, it indicates that there are an indefinite number of objects in the association [3]. In this way, UML allows to specify the role of the objects that participate in the association.
In this study there were manifestations related to the LIS category in the exercise Betting. One of the expected multiplicities was between the Bet and Gambler classes, since the person making the bet could place several bets, and this was not considered by many students. Most of the students who made the Bet and Gambler classes performed a multiplicity of 1 to 1 instead of 1 to *. This is evidenced when students did not draw any multiplicity or when they wrote methods such as getAllBet() in the class Bet, without knowing where or how they handle all bets.
At the software design level, another relationship is aggregation relationship which is used between two classes and is a type of association, which means that an object (the whole) is formed by other objects (the parts) [3]. It is required to define this multiplicity when it want to express the existence of more than one object of the same type. It can also be used aggregation to represent a physical container.
Students do not abstract globally, usually thinking that an object has a specific task. When students realize that the task is to manage a set of objects, they understand the need for some mechanism to deal with multiple instances; however, they are unable to define multiplicity correctly. The difficulty is also related to the conception that a whole and its parts is not always considered like a container, rather this whole/parts relationship is more conceptual [48].
The difficulty of define the multiplicity is a persistent problem that has been manifested in several nuances, being a possible cause of this problem, the difference in between structured and object-oriented approach, where conceptually there is no data and all elements are variables. This possible cause lies largely in a lack of understanding of the object concept rather than in a direct relation to problems with the UML.
LIS is the category that had the highest number of occurrences of problems in the Bet and Hotel exercises, however in exercise Circle it did not have many appearances, because in that exercise it was not required to use multiple objects for its resolution, unlike the first exercises.
We found that 2 of the 22 students who are part of the comparative study, have problems in the same categories in both tests. In both cases the category with which they present problems is LIS. The first case (E2) presents the same number of occurrences in both tests. While the second case (E10) presents a greater number of occurrences in the evaluation test.

B. ADVICES
several authors mention the possible causes of the difficulties in modeling that students usually present [57]- [60]. For example, with respect to assigning an appropriate behavior to classes, some authors agree that students do not perceive the fact that a class models some real-world phenomenon, something in the problem domain. In this sense, the literature contemplates the use of UML to carry out the design process correctly, such is the case of Prasad et al. [61] who claim that UML diagrams specify behaviors and scenarios of a given system at various levels of abstraction.
One of the difficulties shown in the diagnostic stage, it was the transfer of models from the structured approach such as the Entity-Relationship model to the object-oriented approach (REL). For this, some authors propose tools to bridge the gap between object-oriented programming and procedural programming, one of them is Web Plan Object Language (WPOL), proposed by Ebrahimi [62].
WPOL is a solution based on a Plan-Object paradigm, where a plan must exist to request, dictate and guide the creation of objects. With a similar intention, Xinogalos [63] uses the objectKarel tool in his study to help students in their transition from procedural/imperative programming to object orientation. Xinogalos introduces object-oriented programming concepts using the microworld approach with objectKarel for a clear, playful and practical presentation of objects and classes, without neglecting other fundamental concepts such as inheritance and polymorphism.

C. THREATS TO RELIABILITY
Qualitative research has been widely criticized for not providing enough information about the analysis of the data and how it has worked from the raw data to its conclusions. This study adheres to the quality criteria presented by Yvonna S. Lincoln, Egon G. Guba [64], W. Lawrence Neuman [65] and Sharan B. Merriam [25] in the educational context.
On one hand, the work has reliability, that is, the consistency of the results obtained from the data. To ensure reliability, the researchers of this study, instead of requiring that people outside the research agree that, based on the data VOLUME 4, 2016 collected, the results make sense, are consistent and reliable. They detailed the traceability of the source data and the decisions taken to reach their conclusions. The details of the environment and participants are also described, which will allow other researchers to apply this study in similar contexts.
On the other hand, validity that means truthfulness, but in the qualitative context we could rather speak of authenticity, which means capturing a detailed view of the research process. To ensure validity in this research, we have applied strategies such as triangulation, by using several researchers so that each exercise was analyzed and coded separately. The codes and categories were consensual through peer debriefing techniques, ensuring the credibility of the research in this way.
In the presented research students go through different stages of learning: a) when the concepts are presented to the students, b) when they do exercises to try to learn the concepts, and c) when the students take the assessments. In this sense, it should be noted that there is a possible threat to the validity of the research because the stage in which the students present the problems was not identified, nor were the causes of the problems. It is important to recognize that the problems might have been caused by the approach of the teacher while teaching the topic rather than the approach of the students while learning it. Nevertheless, neither the identification of the stage nor the causes were considered within the scope of the study.
In addition, to avoid ethical conflicts regarding the manipulation of the data collected from the students, informed consent forms were prepared to guarantee anonymity and confidentiality of the data obtained from the students. This report was read and signed by the students before the research.

VIII. CONCLUSION
The work carried out allowed us to determine which are the most recurrent difficulties in object-oriented software design and their persistence in a group of university students.
The qualitative study approach was used to generate the documentation from the diagnostic test and student interviews. The thematic analysis of this documentation allowed us to identify, analyze and report patterns within the data, resulting a total of 16 categories.
As a result of the quantitative approach, it was possible to determine the occurrences of the problems of the students in the case study. In addition, the results obtained between the diagnostic and evaluation tests were compared to establish similarities and differences between the cases observed, using the hierarchical clustering technique.
When comparing the number of occurrences of the categories where students present greater difficulty, between the diagnostic and evaluation tests applied at the beginning and at the end of the course respectively, it was found that students present a greater number of difficulties in the LIS and NUM categories. The number of occurrences in these categories not only remains the same, but also increases in the evaluation test. The categories REA and ACC register a lower number of occurrences in the diagnostic tests, but increase their number in the evaluation tests.
On the other hand, the concepts related to the categories FUN, COM, ATR and REL have been partially overcome, their number of occurrences in the evaluation test is lower than the number of occurrences in the diagnostic test.
The categories CLA, HOL, INS, MET, PCL, SIC and HER register a minimum number of occurrences in the diagnostic tests, but they do not register occurrences in the evaluation tests. This could be due to the fact that the concepts covered by the aforementioned categories were clearer for this group of students. No occurrences are recorded in the MAI category in the diagnostic tests, however, it appears in the evaluation tests.
Consequently, the comparative study allowed us to know if difficulties of the students in object-oriented software design have been overcome or not at the end of the course, or at the same time, to know what new difficulties they present when making design decisions.
The comparative study also shows that there are students who have difficulties in the same category with a similar number of occurrences in both diagnostic and evaluation tests. As is the case of students E2 and E10, both students present problems in the LIS category.