The Influence of Student Abilities and High School on Student Growth: A Case Study of Chinese National College Entrance Exam

Enabled by available educational data and data mining techniques, educational data analysis has become a hot topic. Current researches mainly focus on the prediction of problems and performance rather than revealing the underlying causal relationships. Based on a unique exam data, we extracted the abilities of examinee from HSEE (High School Entrance Exam) based on the knowledge of educational experts, then we measured student growth from middle school to high school in total score and subject scores. We studied the impact of high school ranking and student abilities of HSEE on student growth by multiple linear regression model, in which high school ranking is divided into 5 levels, Level 1 being the best and Level 5 being the poorest. We found that: 1) the higher of the ranking of the high school was, the higher of their student growth in total score was, but there were exceptions in Level 4 and Level 5 schools. The growth in subject scores did not follow the same rule. Level 3 schools performed better than Level 2 schools in student growth in Physics, and Level 2 schools performed better in student growth in Chemistry. 2) Student abilities in HSEE have different impacts on student growth in total score and subject scores. For student growth in total score, the abilities of English memory and Math analysis and solutions have larger positive influences than the other abilities. For student growth in the subject score, most abilities have a negative impact on the growth of the same subject, except for English listening and memory. Our research can not only help educational authorities evaluate the impact of high schools on the variations of student abilities to ensure equity and efficiency, but also help students and parents choose schools based on student abilities and the characteristics of high schools.


I. INTRODUCTION
With the development of digital technology, all kinds of educational data are saved, which shows great potential for educational analysis and improvement (Dutt, Ismail, and Herawan, 2017; Romero and Ventura, 2017) [30], [31].For example, there are rich data about teaching and learning information in the exam data.Previous research showed that The associate editor coordinating the review of this manuscript and approving it for publication was Laxmisha Rai .exam data analysis can not only help us evaluate the setting of questions of examination paper but also evaluate students' knowledge and ability (Casalino et al., 2017) [32].Furthermore, the students' performance can also reflect the level of the respective educational institutions in the local areas.If admitted to a qualified high school, the students can have more advantages in National College Entrance Exam (NCEE) which is the primary way to compete for higher education resources for them.Thus, it is important for students and parents to choose a proper high school.
Parents and students usually select schools based on the following criteria: 1) School quality: ranking of school, academic achievement, category of career guidance, students' average social economic status (SES) in the school, education programs [1], [2], exam grades, the quality of incoming students [3].2) School type: school being public/private [4].
3) Location of schools: travel distance, number of schools in the same district [2].4) Students' family backgrounds: parental income and educational qualifications [1].5) Other considerations: the attendance of former schoolmates [3].Among these considerations, the quality of the school usually plays a decisive role.Analysis of students' achievements in schools can reflect the quality of schools and help us predict students' scores in advance.
In conclusion, current research usually uses students' total score or GPA, but not considers the abilities of students in the tests.Moreover, the influence of students' abilities on the variations of student performance in high school is seldom investigated.Analyzing the students' abilities in different schools can help students choose the most suitable school.In this paper, we measured the students' abilities in the HSEE based on expert knowledge.The student growth from middle school to high school in total score and subject scores are calculated.At last, the determinants of student growth, i.e., the critical success factors facilitating student growth, are found, including the students' abilities in HSEE, the size of the school, and the average score of school and the ranking of high school.
The next sections are arranged as follows: In Part II, related research in student growth, student performance, and educational data mining are reviewed and summarized.In Part III, we introduced the measurement of student growth and two models are proposed to evaluate the determinants of student growth.In Part V, the results are analyzed and discussed.In Part VI, we conclude the work of this paper.

II. RELATED WORK A. STUDENT GROWTH
The researches on student growth evaluate the change of student performance and related factors.Student growth, sometimes used as ''value-added'', is a measurement of the change of student performance in a certain period, for example, from Grade 1 to Grade 2. Current research measures student growth normally by using student academic achievement, normalized student standard test score, and percentile of student growth.Claudio Thieme et al. (2016) introduced environmental variables into two traditional evaluation models: state model, and pure value-added model [16].They used student academic achievement both at the beginning and at the end of the educational process for evaluation.They compared and explained the measurement results of four models based on Chilean education data.They clarified the significant environmental variables and found a general model that could exclude background effects to measure schools or students.Daniel F. McCaffrey et al. ( 2004) conducted the research based on a longitude data set, using students' score in every grade.They showed a general multi-variable longitudinal hybrid model, which combined the intrinsic complex clustering structure of longitudinal student data linked to teachers, and showed the application of the model in separating the contribution of teachers or schools to student development [17].Betebenner et al. (2010) proved that the growth-to-standard approaches cannot show the progress of students comprehensively, and introduced the percentile of student growth as a normative description of growth, which can accommodate, encourage and expand the standard reference objectives [18].Eric M. Anderman and his co-workers (2014) studied five growth models of academic valueadded assessment: Student Gain Score Model, the Covariate Adjustment Model, the Student Percentile Gain Model, Univariate Value-Added Response Models, and Multivariate Value-Added Response Models [19].They found the differences among the models, such as that the Student Gain Score Model is simple on math and concept, while Multivariate Value-Added Response Models are highly complex.They suggested that education researchers should choose appropriate models based on technique, education, and policies in practical analysis.

B. STUDENT PERFORMANCE
Student performance describes the overall performance of students, which is usually measured by scores and serves as the golden standard of educational performance.The prediction of student performance is a research hotspot, which has been investigated widely.The student achievement analysis usually related to machine learning, clustering [5], Knowledge Database Discovery, data mining, and association rule mining [20], the data dashboard and online learning platform [6], regression analysis, etc. J.D. Rubright et al (2019) used a hierarchical linear model to study the demographic differences in the United States Medical Licensing Examination scores and how students' prior academic achievement explains the difference.They found that previous examination performance and undergraduate performance can temper the demographic differences in USMLE scores [7].H.J. Lim and H. Jung (2019) studied the factors in predicting digital reading, and the factors that influence print reading ability [8].They confirmed that factors such as navigation, metacognitive summary, etc. had an important impact on students' reading performance, and navigation universally performed well in predicting digital reading [8].
With the help of student performance analysis, students and parents can predict future performance before entering the school and help to make choices of high schools.Moreover, teachers and schools can monitor students' learning progress, providing support in a timely manner.

C. EDUCATIONAL DATA MINING AND LEARNING ANALYTICS
Educational Data Mining (EDM) aims at mining knowledge from data in an educational context to solve educational problems.It is open to whatever kind of analysis techniques.Generally, five categories of technical methods are used in EDM [21]: prediction [11], [22], clustering [23], relationship mining [20], distillation for human judgment [24], and discovery with models [16].Compared with EDM, Learning Analytics focuses on the application of the existed technique which can provide analytical support to the education system and is related to academic analytics, action analytics [25], [26], predictive analytics and may employ social network analysis [14], [27].

III. METHODS AND MODELS A. DATA INTRODUCTION
The forms and contents of the NCEE vary across provinces and over time.Students with talents in some subjects, e.g., fine arts, music, dancing, athletics, and student leadership may get some extra rewards.The term ''student performance in the NCEE'' only refers to original test scores of Chinese, Mathematics, English, Physics, Chemistry and total score in the NCEE.Test scores of other subjects (Biology) and extra rewards (in student leadership abilities, Art, or Sports) are not considered in this study.The related subjects' score in HSEE is also included to measure ''pre-school'' abilities and student growth from middle school to high school.Table 1 shows the content of the original data.A sample is collected from Beijing municipality, including 15,801 students who participate HSEE in 2011 and NCEE in 2014.

B. DATA PREPROCESS 1) STUDENT ABILITY MEASUREMENT
The exam score reflects students' acquired knowledge and abilities.The Beijing Education Examination Authority organizes experienced educational experts and teachers to analyze the results of NCEE and publishes official analysis reports on the NCEE and HSEE every year.The reports conclude the structure of abilities examined in every subject.For example, the structure of abilities examined in Math is as follows: We measure the Student Abilities (SA) based on the reports and the actual items in exams.In total, there are 19 measured student abilities in 5 subjects.

2) THE STUDENT GROWTH IN TWO ASPECTS
As the full mark for different abilities and subjects are different, we normalize the scores by division by the respective full marks.We analyze student growth in two aspects: 1) Students' total score.2) Student's subject score.

a: STUDENT GROWTH IN TOTAL SCORE
Student growth in total score (SGTS) measures the change of student overall performance from middle school to high school.We normalize the student's total score of hsee and the total score of NCEE, then we calculate the total student growth of student i: where i=1,2,. . .15801.

b: STUDENT GROWTH IN SUBJECT SCORE
The student growth in a subject score (SGSS) measures student performance change of every subject from middle school to high school.Like the total student growth, we first normalize the student's every subject score of HSEE and NCEE, then we calculate the subject student growth of student i in subject j: where i = 1, 2, . . ., 15801, j = 1, 2, 3, 4, 5.

C. MODEL
We use a linear regression model to analyze the impact of pre-high-school ability and high school on student growth in two aspects: 1) Total score, 2) Subject score.To compare the influence of school characteristics, we use two models for every aspect, one model including the ranking of school and the other one not.

1) MODELLING STUDENT GROWTH IN TOTAL SCORE
The SGTS can be modelled as a linear function of students' ability in HSEE, which is shown in Model 1: where Ability ij (j = 1, 2, . . ., 19) is the ability score j of student i. β 0 and β j (j = 1, 2, . . ., 19) are the parameters needed to be estimated.Empirical analysis on the determinants of student academic achievement could be divided into two categories: the effects of school-related inputs, i.e., school effects, and the effects of non-school related inputs, such as student's own characteristics, family background, and community.In this research, we also evaluate the impact of high schools, which is indicated by its ranking and varied by the category of resources, or policies, or school climate, etc. among schools.Thus we have Model 2: where If _rankk (k = 1, 2, 3, 4) is a dummy variable which represents that if the category of high school in which student i is enrolled equals k, Average score of school i measures the average score of HSEE in the enrolled school, and Size of school i is the number of students in the same year.

2) MODELING STUDENT GROWTH IN SUBJECT SCORE
Similarly, we construct models for SGSS as follows: Model 3: We use adjusted-R 2 , AIC, and BIC to analyze the performance of the models.
RSS is the sum of residual variance, and TSS is the sum of the variance of origin data.adjusted-R 2 can eliminate the influence of the added variable and can reflect the fitness of models.
AIC (Akaike Information Criterion) measures the fitness of statistical models.AIC was first proposed by Japanese statistician Hiroshi Chichi in 1974 [28].AIC provides a criterion to weigh the complexity and fitness of models.AIC is defined as follows: BIC (Bayesian Information Criterion) is similar to AIC, The criterion was proposed by Schwarz in 1978 [29], and is defined as follows: In the equation, k is the number of variables, L is the likelihood function, and n is the number of cases.

IV. RESULT AND DISCUSSION
A. DESCRIPTIVE STATISTICS First, we exclude outliers.We found there were some abnormal scores in HSEE, where the students get zero in some subjects, which will cause biases in the analysis of student growth.3 data samples are deleted.After deleting outliners, there are 15,798 cases in our analysis, including 8,719 male students and 7,079 female students.The average value of NCEE total score is 482.75, and the average value of HSEE total score is 463.47.There are 197 high schools in our analysis and the schools are divided into 5 categories, in which Level 1 is regarded as the best and Level 5 is the poorest.There are 32 schools in Level 1, 38 schools in Level 2, 39 schools in Level 3, 41 schools in Level 4, 12 schools in Level 5.The mean of the normalized student growth is negative, most of them lower than −0.1, which implies that on student performance decreases by over 10% of the full mark in the score.
Then we examine the Pearson product correlation between student growth in subjects and total scores.Pearson product correlation is a common index to measure the covariation between the two variables, which range from −1 to 1.As shown in Table 5, the correlation coefficients between the total scores and the subjects, i.e.Math, Chemistry and English, are bigger than 0.47, which are higher than the correlation among other subjects.Student growth in total score has a relatively strong correlation with student growth in subjects, especially Math, Physics, and Chemistry.
There are 15,798 cases in total, 5,351 students of Level 1 high school, 4,430 students of Level 2 high school, 2,756 students of Level 3 high school, 1,860 students of Level 4 high school, and 509 students of Level 5 high school.As shown in Table 6, the descriptive statistics show that in overall, the higher the ranking is, the larger the student growth is in the total score.However, there is an exception that Level 5 school has higher high school score than Level 4 school.As we can see, the average middle school score of Level 5 school is higher than Level 4 school.It can be concluded that the higher the average middle school score is, the higher the student growth is in high school, which shows the importance   From Table 7, we find that: 1) Male students have better performance than female students in Math, Chemistry, and Physics, while female students are better at English and Chinese, and have better performance in the total score.
2) The difference between male and female students is similar between high school and middle school.3) The average student growth of 5 subjects and total score are all negative, student score decreased by about 14%.The reason might be that the difficulty is increased in NCEE.However, the scores of students with a higher score decrease less, which means that the students with a higher score are better at maintaining their performance.

B. RELEVANCE ANALYSIS AND VARIABLE SELECTION
We examine the relevance between school characteristics and student abilities in HSEE.From Figure 2 and Table 12, we find that the relevance between the average HSEE score of the high school and the size of the school have a strong correlation in the Level 1 high school.The reason might be that the students who attend top high school generally gain a higher score.It is the same as student ability in HSEE.Within the student ability in HSEE, the abilities of the same subject usually have higher relevance than other subjects.For the relevance between different subjects, the relevance among English abilities, Math abilities, and Chemistry abilities, is much higher than the relevance between Chinese abilities and Math abilities.The Mathematical Thinking (SA12) and Analysis and Solution (SA13) ability of Math are strongly correlated to the memory (SA7), reading (SA8), and writing (SA9) ability of English.The ability of Chemistry has a strong correlation with the memory (SA7) and reading (SA8) ability of English.We can find that the correlation between different abilities is generally small, which indicates that abilities are independent and can not be replaced by one another.
From Table 8, we find that the average HSEE score of the high school and the Level 1 high school have a high VIF (variance inflation factor), which indicates the high collinearity.We notice that after removing the average HSEE score of the high school, the VIF of other variables all decreases to an acceptable level.Therefore, we delete the average score of the school.

C. EVALUATION OF STUDENT ABILITIES AND HIGH SCHOOL'S IMPACT ON STUDENT GROWTH
We analyze preschool ability and high school's impact on student growth in two aspects: 1) Students' total score, 2) Students' subject score.

1) MODEL FITNESS
As shown in Table 9, after adding the ranking of school and the size of the school in Model 1, adjusted-R 2 increases, and the AIC and BIC of Model 2 are smaller than Model 1, and similarly, Model 4 performs better than Model 3. The ranking of school improves the performance of the model.
At last, we evaluate the assumptions of the linear regression model.As Table 8 shows, after deleting schavg (average HSEE score of the school), the VIF of all the independent variables are smaller than 6, which means that the model passed the multicollinearity test.Then the White test is conducted to test the heteroskedasticity.Prob>chi2= 0.0000, the models have heteroskedasticity, thus FWSL (Feasible Weighted Least Square) is used to estimate the parameters.TABLE 10.The range of each attribute.* * * .The variable is significant at the 0.001 level.* * .The variable is significant at the 0.01 level.* .The variable is significant at the 0.05 level.Ifrankn (n=1,2,3,4) is a dummy variable representing that if the ranking of high school equals n.

2) MODEL ANALYSIS
From Table 10, we can learn that the effect of most of the student abilities in HSEE and the ranking of high school is significant.
In terms of the effect of the size of high school, the size of the school is significant only for student growth in English score and the effect is positive which means that the largersized high school can better facilitate the learning of English.Table 10 also shows the impact on student growth of the high school in 1, 2, 3, 4 levels of ranking compared with Level 5 high school.First, for the student growth in total score, the impact of the high school is consistent with their ranking, the higher the ranking is, the larger the positive impact is (0.143>0.071>0.063>0.03).Second, for the student growth in Physics, there is a reversed order between Level 2, Level 3 and Level 4, i.e., Level 3 has a larger positive effect than Level 4 and Level 4 has a larger positive effect than Level 2. Third, for the student growth in Chemistry, Level 2 has a higher impact than Level 1 and there is no significant difference between Level 4 and Level 5 high schools.Fourth, for the student growth in Math, the effect is consistent with their ranking but again there is no significant difference between Level 4 and Level 5 high schools.Besides, for the student growth in English, the effect of Level 1 school is positive while the effect of Level 2, Level 3 and Level 4 high schools is not significant.At last, for the student growth in Chinese, the effect of high school is consistent with the ranking, i.e., the higher ranking is, higher the impact is.
Next, we analyze the impact of the student abilities on student growth.First, for the Chinese abilities, SA2 (comprehending and application) and SA4 (appreciation and exploration) have positive effects on the student growth in Chinese while SA5 (writing) has a negative effect on student growth in Chinese.It is strange that except for SA1 (recognition), the other four Chinese abilities have negative effects on student growth in English, which means higher Chinese abilities may undermine the learning of English.SA1, SA2, and SA4 have a positive effect on student growth in Math, which means that higher Chinese abilities can help the learning of math.Similarly, SA1, SA2, and SA4 also have a positive effect on the student growth in Chemistry and Physics, however, we also observe that SA5 has a negative impact on the student growth in Chemistry and Physics.Regarding the effect of the Chinese abilities on total score, SA1 and SA2 have a positive effect while SA3 (analysis and synthesis) and SA5 have a negative effect, and the effect of SA4 is not significant.
Second, for the effect of the English abilities, SA6 (listening) and SA7 (memory) have a positive effect on student growth in all the subjects.SA8 (reading) has a positive effect on student growth in Chinese, English, Chemistry, and Physics has a negative impact on student growth in Math.SA9 (writing) has a negative impact on student growth in Math and total score.
Third, for the effect of the Math abilities, all the Math abilities have a negative impact on student growth in Chinese.For the student growth in the other subjects, i.e., English, Math, Chemistry and Physics, SA10 (space imagination and data processing), SA12 (mathematical thinking), and SA13 (analysis and solution) have a positive effect while SA11 (operation solution) has a negative effect.Due to the mixed effect, the effect of SA10 is not significant while SA11 has a negative effect and SA12 and SA13 have a positive effect on the student growth in the total score.
Fourth, for effect of the Chemistry abilities, all three abilities have negative impacts on the student growth in Chemistry while the impact on the student growth in the other subjects is either positive or not significant.SA14 (absorption and integration), SA15 (analysis and solution), and SA16 (experiment exploration) have a positive impact on student growth in the total score.Fifth, similar as the effect of the Chemistry abilities, all three Physics abilities have a negative effect on the student growth in Physics while the impact on the student growth in other subjects is either positive or not significant.SA17 (comprehension and reasoning) and SA19 (application and exploration) have a positive effect while SA18 (experiment) has a negative effect on student growth in the total score.
At last, to summarize the effect of the student abilities on student growth, we rank the effect of the student abilities in Table 11.We find that student abilities in HSEE have different impacts on student growth.For student growth in total score, English memory (SA7) and Math analysis and solution (SA13) have the two highest positive impacts.For student growth in the subject, most of the abilities have a negative effect on student growth in the same subject.There is an exception that English listening (SA6) and memory (SA7) have a positive influence on student growth in English.For the average order of student ability, we find that English memory (SA7), Math analysis and solution (SA13), Physics comprehension and reasoning (SA17), and English listening (SA6) show stronger positive impacts on the overall student growth, which indicates that these abilities are critical for student growth.

V. CONCLUSION
The analysis of educational data has been an important issue for education authorities and academic institution.With the application of data mining techniques, educational data mining is helpful for students and parents, as well as education authorities.As a typical form of educational data, the exam data contains rich information about student knowledge and the educational situation of the area.
With a unique dataset, we measured student growth from middle school to high school in two aspects: total score and subject scores.We studied the difference between gender, and the impact of students' abilities in HSEE, high school rankings, the size of high school, and the average score of high school on student growth.Our findings are as follows: 1) The students score generally decreased in high school due to the increasing difficulty in courses, but the students who had a higher score in HSEE decreased less than others.2) Compared with Level 5 school, the high school that ranks higher, their student growth in total score and the subject scores generally is higher.There is an exception that Level 5 school has higher high school score than that of Level 4 school.The reason is that Level 4 school and Level 5 school have similar middle school score and student in the total score.What's more, the Level 3 school performs better in student growth in Physics than Level 2 school.3) Student abilities in HSEE have a different impact on student growth.For student growth in total score, English memory (SA7) and Math analysis and solution (SA13) show the two highest positive impacts.From the perspective of the average order of student ability, we find that English memory (SA7), Math analysis and solution (SA13), Physics comprehension and reasoning (SA17), and English listening (SA6) show stronger positive impacts on the overall student growth than the other abilities, which indicates that these abilities are critical for student growth.
The contribution of this paper is two-fold.On one hand, based on the knowledge of educational experts, the abilities of students are calculated from the HSEE, which serves as a solid base for the evaluation of students' ability.On the other hand, our work can show the difference in the impact of school characteristics and students' abilities.With the help of our research, students and parents can choose school considering not only the ranking of school but also the influence of student ability.Moreover, education authorities can evaluate the students' performance and the education quality of schools according to the findings of our research.
In addition, there are some limitations which leave room for further improvement and opportunities for future research.First, our data only included 15,798 students in one city in the same year.Besides, our model doesn't consider the nested structure of the data.Future research can also investigate student growth in other cities or in different years, even in a different culture.Besides, a more complex model, such as hierarchical linear model and two-stage least square, can be used to test the robustness of our results.

APPENDIX
Table 12 shows the correlation between school characteristics and student ability in HSEE.
of student ability.From Figure 1, we find that schools of different levels have different growth distribution in student growth.As shown in Figures (c), (d), schools of Level 1 and Level 2 have higher student growth in Math and English, which might lead to the higher student growth in total score as shown in Figure (a).In Figure (f), schools of Level 1 also have better student growth in Chemistry.In Figure (e), schools of Level 3 have higher student growth in Physics.However, the descriptive statistics do not control the impact of other variables, which means the result cannot show how school characteristics influence student growth.

FIGURE 2 .
FIGURE 2. The heat map of the correlation between school characteristics and student ability in HSEE.SAn (n = 1, 2, . . ., 19) is the Student Ability, schavg is the average score of the school, schsize is the size of the school.Ifrankn(n = 1, 2, 3, 4) is the ranking of the school.Schavg has a strong correlation with schsize and HSrank1.The highest values are in red and the lowest values are in green.The other values are in the middle determined by its closeness to the maximal or minimal value.

TABLE 1 .
The content of original data.

TABLE 2 .
The structure of abilities examined in Math.Item No. is the question that tests the corresponding ability, the score is the full mark of the ability.

TABLE 3 .
The abilities examined in 5 subjects.

TABLE 5 .
The Pearson product correlation among subject score and total score.* * .The variable is significant at the 0.01 level.

TABLE 6 .
The comparison of the total score and SGTS among schools.

TABLE 7 .
t-test for gender.

TABLE 8 .
The VIF of variables before and after removing school size.SA is the Student Ability, schavg is the average score of the school, schsize is the size of the school.Ifrankn(n=1,2,3,4) is the ranking of the school.After the average score of school, the VIF of other variables all reduce.

TABLE 9 .
Dependent variables and model fitness.

TABLE 11 .
The order of the coefficient.AvgO is the average order of ability.

TABLE 12 .
The correlation of school characteristics and student ability in HSEE.* * .Variable is significant at 0.01 category.* .Variable is significant at 0.05 category.