Machine Learning Framework for Multi-Level Classification of Company Revenue

The planning and execution of a business strategy are important aspects of the strategic human resource management of a company. In previous studies, machine learning algorithms were used to determine the main factors correlating employees with company performance. In this study, we introduced a method based on machine-learning algorithms for the classification of company revenue. Both annual and integrated datasets were examined to evaluate the classification performance of the framework under both binary and multiclass conditions. The performance of the proposed method was validated using six evaluation metrics: accuracy, precision, recall, F1-score, receiver operating characteristic curve, and area under the curve. As the experimental results indicate, the XGBoost classifier displayed the best classification performance among the three algorithms (XGBoost classifier, stochastic gradient descent classifier, and logistic regression) used in this study. Moreover, we confirmed the important features of the trained XGBoost model in accordance with variables focusing on human resource management studies. These results demonstrate that the proposed framework has strength in terms of both classification and practical implementation. This study provides novel insights into the relationship between employees and the revenue levels of their employer.


I. INTRODUCTION
Estimating the future performance of a company has been a subject of significant interest in various fields. Researchers in the field of human resource management (HRM) have focused on the relationship between organizations and their performance. To reflect various elements of a business practice, strategic human resource management (SHRM) studies have been conducted based on traditional HRM topics. In contrast to previous research focusing mainly on specific subjects, studies on SHRM have emphasized the planning deployment of human resources and activities to achieve the goals of an organization [1].
Many SHRM studies have provided perspectives for both goal-oriented and mutually balanced plans through the alignment of HRM elements [2], [3]. Such perspectives are used to emphasize the role of HRM in supporting business strategies [4]. In this context, researchers have The associate editor coordinating the review of this manuscript and approving it for publication was Sotirios Goudos . been interested in the relationships between HRM policy and company-level performance [5]- [7]. A number of studies conducted in a wide range of industries such as banking [8], steel manufacturing [9], [10], and automotive assembly [11] have suggested an integrative approach with regard to both goals, and their means are essential in HRM practices [12].
In addition, it is important to consider the HR policy from an employee's perspective. Employees in an organization interact with HR systems more than HR managers. In addition, employees are affected by the HRM policy itself [13]. Furthermore, because of individual differences, policies can be interpreted in different ways [14]. For this reason, survey data such as organizational culture, satisfaction, stress level, communication, and trust collected by the members of an organization are particularly useful because they can represent various types of feedback regarding their organization in a more realistic way. However, the association between revenue at the company level and employee characteristics has yet to be sufficiently studied. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ To examine the various components of HRM, a dataset with a large amount of information is essential for research purposes. However, in several HRM studies, researchers have used relatively small datasets because it is extremely difficult to collect and handle datasets collected from many different subjects. Notably, in the field of HRM, researchers are interested in the application of big data. The analysis of big data facilitates better decision-making processes in various HRM practices, including competitiveness, innovation, and efficiency in an organization [15], [16]. In addition, recent studies have shown that the application of various analysis methods has led to the identification of latent patterns or relationships [17], [18].
In previous studies, many researchers used different methods to investigate embedded patterns in various datasets. They used statistical modeling to analyze the relationships in structural data, including employee survey data. Owing to an increase in the amount of data and computing resources, diverse machine learning and deep learning methods have been applied to various domains. Machine learning algorithms have been used to estimate the costs or forecast dangerous events in the economic domain [19]- [22]. In addition, previous studies have shown that a machine learning model with a time series can be applied to the stock market [23], [24].
In the field of human resource management, previous studies have shown applicable models based on machine learning models. These studies have suggested a model for the prediction of employee turnover using supervised machine-learning algorithms [25]. In addition, an online recruitment system using ensemble machine-learning models was introduced [26]. Several previous studies have focused on performance management related to employee performance evaluation [27]. In these studies, the researchers focused on the relationship between the characteristics of the collaboration process and the teamwork performance [28].
Based on previous studies applying machine learning models, we hypothesize that machine learning algorithms have the potential to classify company revenue levels from employee characteristics. To prove this hypothesis, from a large-scale employee survey dataset, we extracted 312 employee-related variables as features based on the survey categories (from among 1712 variables in the original dataset). The selected features were applied to three classifiers, i.e., logistic regression, stochastic gradient descent (SGD), and XGBoost classifiers. Finally, the performance of the classifiers was evaluated based on six evaluation metrics: accuracy, precision, recall, F1-score, receiver operating characteristic (ROC) curve, and area under the curve (AUC).
The objective of this research is to develop a novel methodology based on a machine learning algorithm for the classification of company revenue in reference to employee status. The major contributions of this paper are as follows: (1) We propose a machine learning based framework for the classification of company revenue based on the characteristics or status of the employees.
In addition, we evaluate the performance of the model under various conditions, including both binary and multiclass classification, on a large dataset. Moreover, we compared the classification performance of popular machine learning classification algorithms: the XGBoost classifier, SGD classifier, and logistic regression. (2) Advancing from the simple training of machine learning models, our framework provides practical implications in terms of HRM. Because the methodology used in our research can be generally applied to companies in diverse situations, the framework meets the needs of company-level performance managers. In addition, we suggest several insights regarding the HR policy design from the feature importance of the trained XGBoost model corresponding with previous HRM studies.

A. OVERVIEW
This study consists of four steps. First, we collected and combined three sub-datasets from the Human Capital Corporate Panel (HCCP) dataset and employee-related features selected from it. Second, complete data for all features were selected from the datasets, and a log transform was applied to financial information data, and two types of datasets were generated to validate the framework. Third, three types of machine learning models were constructed from the datasets. Finally, the models were evaluated using the performance indices.
The detailed steps are shown in Fig. 1.

B. DATA SOURCES
In this study, we utilized the HCCP dataset released by the Korea Research Institute for Vocational Education and Training (KRIVET) [29]. The HCCP dataset is a longitudinal panel dataset collected to investigate the status of human resources of different companies in Korea since 2005. This dataset is composed of three types of sub-datasets. First, a dataset of survey results was collected from company employees. Second, a dataset was collected from the management team of each company. Third, financial information data of the companies participating in the survey were gathered. Except for the financial information dataset, all data are from a 7-year period. Each dataset can be combined with the company ID and worker ID columns.
In total, 74 732 employees from 3113 companies participated in the survey. The detailed characteristics of the dataset are shown in Table 1.
In addition, the data investigated in the HCCP dataset were organized into 6906 features with 25 categories. There are 24 employee-related categories and a single financial information category. In the dataset, the features include variables from the management team (Nos. 1-6) and employees (Nos. [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24]. Based on the features of the management team, we can check company-related information (e.g., the scale of the company) rather than employee-related information.  More employee-focused information (e.g., demographics, satisfaction, and stress levels) was composed of employee features. The detailed categories of the features in the dataset are shown in Table 2.

C. DATA PREPROCESSING 1) MERGE SUB-DATASET AND SELECT CATEGORIES OF FEATURES
To organize the data based on a single subject, we combined three sub-datasets. (a) In the company dataset, we selected features of the company to reflect the effect of the company scale. The row and feature dimensions of the original company datasets including the first through the seventh years were respectively (4275, 5043). After extraction of the company-scale features, only datasets with dimensions of (4275, 2) remained (company ID feature and company scale feature). (b) The financial information dataset included 14 130 rows and 150 features. We selected two features from the original financial dataset (company ID, revenue). We set the revenue features as predictors of the machine learning model. (c) Based on the employee dataset, we combined these selected features from two datasets (company and financial datasets). For the company scale and revenue features, there is a difference among the levels of the worker dataset, scale, and revenue features. In this study, we combined datasets based on the company ID to which the subject belongs and analyzed the dataset at the subject level. As a result, an employee dataset merged with company-scale and revenue features showed row and column dimensions of (74 774, 1715), as compared to the original employee datasets (74 774, 1713).
In the HCCP dataset, there are a total of 21 categories of features related to employees and companies. To select features directly related to employees, we selected eight categories (demographics, monetary compensation, employment type, stress, organizational commitment, motivation, satisfaction, organizational culture, communication and trust, and talent management) among 21 categories. After selecting 8 employee-related categories, only 312 features were selected, and the dimensions of the remaining dataset were (74 774, 312).

2) SELECT COMPLETE DATA FOR ALL FEATURES
In the HCCP dataset, a non-response or unknown data were coded as −9 and −8. We confirmed that a non-response VOLUME 9, 2021 or missing data were included in each variable. To proceed with a precise analysis, all non-responses and missing data included in the dataset were removed. After this step, the complete dataset had dimensions of (38 763, 312) without missing or non-response data.

3) DISTRIBUTION CONFIRMATION OF SELECTED FEATURES
After removing missing and non-response data, we checked the feature distributions. When training machine learning models, the distribution of datasets or features is critical. Features with skewed distributions cause disturbances in the model training and a decline in the model performance. In this step, we confirmed the distributions of the selected features and applied a log transformation to features with skewed distributions.

4) GENERATE ANNUAL AND INTEGRATED DATASET
Two types of datasets were constructed to develop and validate the proposed framework. The first dataset has seven annual datasets from first to the seventh year. From each annual dataset, a total of 312 features were selected from the completed dataset. The dimensions of the first year dataset were (8571, 33), and they were (5897, 40) for the second year, (4837, 41) for the third year, (4477, 49) for the fourth year, (5012, 49) for the fifth year, (4908, 50) for the sixth year, and (5061, 50) for the seventh year. For the seven datasets, the training and test datasets were constructed by dividing them into a ratio of 9:1.
The second is an integrated dataset that combines features belonging to the same category. Except for the revenue feature, the remains are categorical features, and most are on a 5-point scale. To integrate features at scale, binary features were excluded, and only 5-point scale features were averaged. All seven annual datasets were used to construct an integrated dataset. Because there are differences between the features collected each year, some integrated features are missing when constructing a single integrated dataset. To solve this problem, we imputed missing feature values using the average of the existing values in the same feature. Finally, the single integrated dataset consisted of 21 integrated features, and we confirmed the (38 763, 21) dimensions from the integrated dataset. In addition, the integrated dataset was split into training and test datasets at a ratio of 9:1.
To validate the classification performance of the models, we constructed additional datasets with diverse conditions using different class levels of the same revenue feature. The classification performance was compared under three conditions: binary, three, and four classes.

5) EXPERIMENT FOR CLASSIFICATION PERFORMANCE
In this step, a total of 24 datasets were used to evaluate the performance of the proposed models (seven annual datasets × three conditions for classification labels + single integrated dataset × 3 conditions = total 24 datasets). Three machine learning algorithms were used to compare the performance under diverse conditions. After the experiment, we checked the feature importance of the classification algorithm and achieved a superior performance. To validate the framework in terms of HRM, we examined the selected features for both the annual and integrated datasets.

D. MACHINE LEARNING MODELS 1) XGBoost CLASSIFIER
The XGBoost algorithm is a gradient boosting method based on a supervised learning algorithm [30]. The classification and regression tree (CART) method is used as the basis of the algorithm and is an ensemble model composed of decision tree models, following the principle of additive learning.
The objective function of the XGBoost algorithm using additive learning is as follows.
In (1), Obj indicates an objective function of the XGBoost algorithm. In this formula, function L is a loss function for the algorithm training. In addition, function is a regularization term and represents the complexity of the tree. The kth decision tree is represented by a function f k , and the functions that indicate the decision tree are elements of the function space F. The result classified through the function f k for x i is y i [30]. As a result, this algorithm considers the classification results from several decision trees rather than a single result.

2) STOCHASTIC GRADIENT DESCENT CLASSIFIER
The SGD method is an optimization algorithm used to determine the parameter values that minimize the objective function [31]. This method is extremely similar to the traditional gradient descent algorithm. However, it only calculates the derivative of the error of a random single instance in the data rather than all data. As a result, this makes the algorithm much faster than a gradient descent algorithm [32]. SGD methods can be applied to various machine-learning models. The machine learning classification algorithm using the SGD method is called an SGD classifier.

3) LOGISTIC REGRESSION
Logistic regression is a supervised learning algorithm that predicts the probability of a data belonging to a certain class as a value of between zero and 1 and classifies it into a category according to the probability [33]. This algorithm calculates the log-odds of each feature and applies the sigmoid function to represent the probability that the data belongs to the corresponding class. A logistic regression algorithm that performs multiple classifications is called multinomial logistic regression or softmax regression [34]. It operates  in the same manner as traditional logistic regression, which performs a binary classification, but differs in the number of classes, which is more than three nominal predictors.

E. EVALUATION METRIC
In our framework, we used six performance indices to evaluate the classification performance of the machine learning algorithms. To confirm the classification performance, we calculated the confusion matrix using the true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) [35]. The TP and TN indicate that the model correctly classified the company revenue level. In contrast, the FP and FN indicate that the model incorrectly classified the company revenue level. Detailed information on the confusion matrix is shown in Figure 2.
We can then obtain four indicators: precision, recall, F1-score, and accuracy.
In addition, to evaluate the performance, we checked the ROC curve and AUC. The ROC curve indicates a trade-off between the true positive rate (TPR) and the false positive rate (FPR). The formulas of TPR and FPR are as follows: A model with a good performance shows that the ROC curve is closer to the upper-left of the graph, or the AUC is sufficiently close to 1. In addition, to investigate the performance of the algorithms, we averaged the ROC curve and the AUC of each class in a multiclass classification.

F. TOOLS
All codes for data preprocessing, machine learning algorithms, and performance evaluation were written in Python

III. RESULTS
After preprocessing, the annual dataset comprises 38 763 responses, and the integrated dataset comprises 38 763 responses. The detailed numbers of subjects in each dataset are shown in Table 3. A log transformation was applied to the revenue feature to convert a skewed distribution into a Gaussian distribution. Figure 3 shows the distribution of the revenue features.
To evaluate the performance of the proposed framework, we applied preprocessed datasets with various class conditions to machine learning algorithms. We conducted a random search to find the optimal hyperparameters for comparing classifiers. The hyperparameters for the training algorithms are listed in Table 4.

A. BINARY CLASSIFICATION
To evaluate the performance of the model in the binary class condition, we applied an annual dataset and integrated dataset to the XGBoost classifier, SGD classifier, and logistic regression model. We also confirmed the classification performance using several indices. The experimental results are listed in Table 5. The XGBoost classifier showed the VOLUME 9, 2021 best performance among the classification algorithms for the annual and integrated datasets. ROC curves from the integrated dataset are shown in Figure 4.
In the binary class condition, the maximum values of the indices were found for the XGBoost classifier for both the annual and integrated datasets. All index values of the trained XGBoost classifier (average precision of 83.2%, average recall of 83.1%, average F1-score of 82.9%, average accuracy of 83.2%, and average AUC of 92.7%) were higher than those of the SGD classifier and logistic regression (Table 5).

B. MULTICLASS CLASSIFICATION
In addition to the binary class condition, we conducted a multiclass classification using the XGBoost classifier, SGD classifier, and logistic regression model. In addition, we checked the classification performance of each model under various conditions. The experimental results are listed in Tables 6 and 7. Under a multiclass condition, the XGBoost classifier showed the best classification performance among the classification algorithms. Figures 5 and  6 depict the ROC curves for the classification algorithms from the integrated dataset with three and four classes, respectively.
Similarly, under the three class conditions, the XGBoost classifier (average precision of 72.7%, average recall of 71.6%, average F1-score of 71.9%, average accuracy of 71.7%, and average AUC of 89.0%) showed an average value higher than both the SGD classifier and logistic regression (Table 6). Under the four-class condition, the maximum indices were found in the XGBoost classifier (average precision of 63.2%, average recall of 63.4%, average F1-score of 62.6%, average accuracy of 63.4%, and average AUC of 88.0%). XGBoost showed the best classification performance compared to the SGD classifier and logistic regression under all experimental conditions (Table 7).

C. FEATURE IMPORTANCE OF ALGORITHM
After the experiment on the classification performance, the top-10 most important features from the trained XGBoost classifier were compared for both the annual dataset and the integrated dataset with various class conditions. The scale, monetary compensation, and industry were selected as the top-three most important features from the annual and integrated datasets. Other features selected by feature importance were birth year of employee, contract condition, culture, education, commitment, communication & trust, and stress. Variations in the top-10 most important features were found depending on the experimental conditions. Detailed lists of the important features are provided in Tables 8 and 9.

IV. DISCUSSION
In this study, we proposed a framework for classifying company revenue using machine learning algorithms. For this purpose, we designed an experimental paradigm including various class conditions and compared the important features of the trained models. To validate the robustness of our framework, we applied three different classification algorithms with multiple datasets and checked the feature importance to confirm the adaptability in the HRM field.
The findings regarding the classification performance and top-10 important features of the trained models are used to evaluate the experimental paradigm, as shown in Tables 5-9.

A. FRAMEWORK PERFORMANCE
To classify the revenue, the experiment was conducted under three conditions. Under the first condition, the binary levels of the revenue were identified using classification algorithms. In other words, a company can be split into two groups based on its revenue. We conducted a multiclass classification using three and four classes under the second and last conditions for comparison with the first condition. All three conditions were cross-validated using both annual and integrated datasets.
The evaluation indices used in this study were the accuracy, precision, recall, F1-score, ROC curve, and AUC. In the comparison of the classification performance, we considered the recall (i.e., the ratio of misclassification) and precision (i.e., the ratio of correct classification). To compare the classification performance, the values of the ROC curve and AUC indices were averaged for every class. As a result, micro-and macro-averaged indices were used for comparison.
We selected three classification algorithms in our research, i.e., the XGBoost classifier, logistic regression, and SGD classifier. Under every condition, the XGBoost classifier showed the best classification performance among the three algorithms. In addition, we found similar performance results in related studies predicting employee turnover based on the XGBoost model (AUC of 87.0%) used in predictions of employee performance based on machine learning algorithms VOLUME 9, 2021 (J48 algorithm: precision of 72.7%, recall of 70.8%, F1-score of 70.8%, and AUC of 85.9%) [36], [37].

B. EMPHASIZED FEATURES
In traditional HRM studies, researchers have investigated a wide range of variables in an organizational environment. In this study, we examined both the company-and employee-level variables and confirmed the top-10 most important features. The result suggests several insights from an academic and practical perspective.
First, monetary reward, scale of the company, industrial categories of the company, and demographics (year of birth) were the most important features for the classification of revenue. All results showed the same trends for both the annual and integrated datasets. In relation to these results, monetary compensation has been considered a primary incentive for employees and a determinant of a company's productivity. Furthermore, empirical evidence in previous research has suggested that companies can utilize monetary compensation as a management tool to enhance the performance in terms of short-and long-term revenue growth [38]- [40]. In addition, the scale of the company and industrial categories have been regarded as influential components of performance. Researchers interested in the high performance of HR practices and various components of an organization have focused on the size of the organization as an element affecting the financial performance owing to the market power and economics based on scale [41].
Second, four additional features (organizational culture, satisfaction, stress level, and communication and trust) were selected as features that are less important than the first four features. Organizational culture among employees has been a popular topic in studies on how the culture of an organization influences its outcome [42]- [44]. In addition, the satisfaction and stress levels of employees are considered to have a strong association with the performance of an organization [45]- [48]. Communication and trust between employees produce several benefits for both employees and an organization [49]- [51]. In the last aspect of our feature importance, contract conditions and industrial relations (e.g., labor unions) were determined. Industrial relations cover various research areas, such as the industrial relation system, the type of workplace system, and the development of a business strategy within the company [52], [53].
In summary, we confirmed that the features selected from our framework correspond with the points emphasized in previous studies on HRM. This reveals the close relation of employee-and company-level features to the organizational performance.

C. PROPOSED FRAMEWORK FOR HRM PRACTICE
The classification model also suggests some practical implications to HRM practitioners. As a strategic partner, HR managers are often asked to facilitate the entire company-level performance. The purpose of this study is to provide a useful tool for such managers that will help them make better decisions in a complex business environment. Our framework is suitable for this purpose because the survey items used (features) can generally be applied to companies under various situations. HR managers can set benchmarks on our research methodology and plan the measurements of their own organizational status. In addition, the XGBoost classifier identified the company revenue with an average accuracy of 83.2% under a binary class condition. Considering the complexity of the organizational phenomena, a specific model performance is a useful navigator in estimating the performance of a company in advance.
In terms of HR policy making, the feature importance showed some key elements of consideration. First, monetary compensation was robustly found among the top-10 most important features. Several types of cultural associations between employees and an organization have been studied [54], and the proportion of salary in relation to the total compensation can be determined by the associations. Notably, studies have shown that employees in east Asian countries have formed relational models in comparison to transactional models found in Western countries [55]. However, the feature importance of monetary reward in our research suggests that such a cultural difference may have also transformed during the last 20 years in Korea.
Second, variables from a company-level strategy (e.g., size and industry) are found among the top-10 most important features. Compared to the separative approach during the early stage of HRM research, recent studies emphasized the integration of HR and a corporate strategy. HR practitioners also need to renew their focus from administrative supporters to strategic partners in the development of a business strategy. Finally, organizational culture, satisfaction, stress level, communication, and trust repeatedly appeared among the list of the top-10 most important features. A comprehensive HRM system can be achieved by focusing on specific features rather than relying on intuition.

V. CONCLUSION
The estimation and classification of revenue is critical for planning the business strategy in a company. In this study, we proposed a machine-learning framework for the classification of company revenue. We also designed an experimental paradigm to investigate the performance of our model under various conditions. The proposed framework identified revenue levels with a maximum AUC of 95.3% under the binary-class conditions, 92.7% for the three-class conditions, and 90.3% for the four-class conditions. The results suggest that our framework based on a dataset of employee surveys has the potential to reliably classify company revenue. In addition, more diverse variables from the HRM field (e.g., personality, leadership, and decision making), as well as a different approach with deep learning methods, need to be considered in a future study. We expect our framework to contribute to the practice of HRM.