Towards Improving Skin Cancer Diagnosis by Integrating Microarray and RNA-Seq Datasets

Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.


I. INTRODUCTION
S KIN cancer is a worrying complex disease taking on a wide range of skin pathological states (SPSs). The complex heterogeneity of its occurrence is determined by the abnormal and out of control proliferation of specific cells (squamous, basal, Merkel, melanocyte, keratinocyte, etc.) that lead to the development of multiple skin cancerous pathologies. Among them, non-melanoma skin cancer (NMSC) related pathologies are the most frequent in order of incidences, pathologies in this group are led by basal cell carcinoma (BCC), squamous cell carcinoma (SCC) and Merkel cell carcinoma (MCC) [1]. With regard to melanoma skin cancer (MSC), the main pathologies are primary melanoma (PRIMEL) and metastatic melanoma (METMEL), where METMEL has a higher mortality rate [2]. Recent epidemiological studies show a concerning global trend, the incidence and occurrence of both MSC and NMSC cases have already become the most common types of cancer in white populations [3]. This is supported by the statistical analysis of MSC rates on cohorts from United States whites, United Kingdom, Norway and Sweden which increased up to 3% annually during the last 3 decades [4]. With respect to NMSC cases, its incidence is around 20 times higher than MSC cases [5] despite being widely understudied. As a result of the fateful combination of both factors (incidence and occurrence), an extensive global alarm is being increased. Therefore, the possibility of suffering from any skin cancer type could be led by two main drivers. The first driver is the tumor evolution of other skin diseases previously considered precancerous states such as psoriasis (PS) [6], [7] or actinic keratosis (AK) [8]; the second driver is the tumor degeneration and mutation from healthy states such as normal skin (NSK) and nevus (NEV).
The narrow biological relationship among several SPSs may complicate the successful diagnosis of skin cancer. Certain researches have pointed out the difficulty in discerning among specific SPSs from the clinical, histological and molecular points of view: AK vs SCC [9], AK and SCC vs PRIMEL [10], SCC vs BCC and MSC [11], primary MCC (PMCC) vs metastatic MCC (MMCC) [12], etc. Different editions of the American Joint Committee on Cancer (AJCC) have gradually introduced the most outstanding clinical parameters for their diagnosis This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ (tumor mitotic rate, TNM classification, Breslow thickness, Clark levels, etc.). Consequently, the AJCC Cancer Staging Manual has been considered the gold standard by clinicians when making their diagnoses [13]. However, the way to diagnose this cancerous disease continues to be limited and each AJCC edition implies controversies and corrections on which are the best criteria to efficiently diagnose each SPS. Conversely, other studies insist on the possibility of differentiating them from the identification of gene expression patterns such as AK vs SCC [14]. Although previous studies show DEGs can discern among different pathologies, the biological complexity of the skin cancer may put its validity into question.
The opportunity to efficiently improve the discernment among multiple SPSs related to cancer from biological data involves taking into account a set of requirements. Firstly, different technological alternatives which allow gene expression to be quantified have to be inspected. Although microarray technology has been vastly used, RNA-seq technology is ultimately replacing it thanks to its various advantages described in the literature [15]. Nonetheless, the absence of open access datasets from experiments using the newest technologies suggests microarray analysis could still be considered. In addition to its low cost, it may not have been properly exploited yet by considering the combination of diverse skin cancer datasets containing samples of different SPSs. This fact gives the chance to reinforce the statistical robustness of the study as well as to obtain DEGs from a wider range of SPSs. This observation introduces the following challenge: how to adequately integrating data from both technologies in order to increase as much as possible the number of samples for each identified SPS of the study. Previous studies have underlined the good agreement among them in terms of similarity, complementarity and compatibility [16], [17]. In view of these advantages together with the proven consistency of applying multi-platform integration among both microarray platforms and technologies at gene expression level [18]- [22], this integrative approach is encouraged to continue carrying it out. However, the researchers have traditionally kept in mind the mandatory correction of eventual batch effects with the purpose of achieving an effective integration of multiple experiments over different microarray platforms [23], mainly coming from two manufacturers: Affymetrix [24] and Illumina [25]. By additionally taking into account experiments conducted on RNA-seq technology, the hypothetical influence of this factor may be modified in an unpredictable way. The treatment and the attempt of correction should never be disregarded; however, there is no certainty that a complete elimination of these effects will take place [26]. Among the multiple batch effect correction algorithms, ComBat [27] has proven to show the highest effectiveness when integrating microarrays [28] and, recently, has been validated in the integration of RNA-seq datasets from different sources: GTEx and TCGA projects [29]. In the case of favorably dealing with all these limitations, a new experimental challenge takes place: how to discern multiple SPSs by using changes in gene expression. Although hierarchical clustering highly helps in graphically showing such changes [30], methodological approaches based on multiclass classification are postulated as an innovative alternative when assessing the validity of DEGs for simultaneously diagnosing multiple SPSs [31]. Finally, the use of feature selection algorithms must be explored with the objective of selecting only informative DEGs, that in many cases can dramatically reduce the search space.
Under the fulfillment of the previous requirements, the integration of microarray and RNA-seq technologies at gene expression level [32] opens new possibilities for skin cancer analysis. In particular, this advance could improve the understanding of the hypothetical biological relationships and differences among SPSs that may be discerned in a simple simultaneous analysis. Clinicians could directly benefit from its validity in multiple ways. Firstly, the suspicions about the patient tumor evolution from healthy skin states to cancerous states, even through precancerous skin diseases, could be eventually assessed by presenting certain genetic susceptibility to change [33]. A patientoriented medical service could be derived from the above by knowing the genetic signs. Consequently, unnecessary medications or medical treatments such as radiation therapies, excision surgeries or medications supply could be prevented [34]. Certainly, clinical diagnosis could be supported by an intelligent diagnosis tool that offers another complementary point of view [35], [36].
Although our novel methodological approach is thought to be applied on any multiclass problem, this work shows its validity by addressing the improvement of skin cancer diagnosis, thus taking into account all the requirements previously discussed and offering the benefits motivated above. The integration of different skin cancer datasets from microarray and RNA-seq technologies based on gene expression analysis has not been widely explored by the scientific community. First of all, an exhaustive sample search of multiple SPSs was carried out from public data repositories. Next, 22 microarray and 5 RNA-seq series containing 1090 samples in total were collected. However, after applying a strict quality control phase, only 968 samples passed and were subjected to the preprocessing phase: 666 samples from Affymetrix and Illumina microarray platforms and 302 samples from Illumina RNA-seq platforms. Subsequently, the sample integration considered only those genes sharing a common annotation for all the series selected for this study. After merging multiple batches and applying batch effect correction on them, the challenge was to efficiently find valid genes simultaneously discerning up to 10 SPSs: from a priori healthy states (NSK and NEV) to cutaneous carcinomas (BCC, ISCC, PMCC and MMCC) or melanomas (PRIMEL and METMEL), including skin diseases with a higher risk of tumor degeneration that have already been cataloged as precancerous states (AK and PS). From the assessment of a highly heterogeneous multiclass dataset of 968 samples and almost 7700 genes, a DEGs subset was identified by applying a novel one-vs-one (OVO) multiclass gene selection algorithm. This was achieved by means of consciously tuning critical and highly selective parameters. Specifically, log2 fold change (LFC) and maximum number of selected DEGs (NMAX) among each pair of SPSs were considered. By relying on a widely used feature selection algorithm and assessing different subgroups of multiclass candidate DEGs, an ANOVA statistical test [37] assessed the influence of these critical parameters together with the use of different classification models and performance metrics. Finally, the biological relationship of these DEGs with skin cancer was determined by examining their functional properties and inspecting specific literature.

II. METHODS
A flowchart of our approach is presented here (Fig. 1). Each of the experimental steps of this proposed pipeline will be subsequently addressed in the following subsections. For reproducibility purposes, this methodological approach can be run step-by-step by using different functionalities published under the KnowSeq R/Bioc package [38] together with several R scripts which have been included in the next repository: https://github.com/jmggugr/skca-transcriptomics-integration/

A. Transcriptomics Technologies Integration
In order to obtain the integration of skin cancer datasets coming from different platforms and technologies, three steps have to be carried out (see left part in Fig. 1).
1) Raw Data Acquisition: One of the first steps involves carrying out an in-depth information search about skin cancerous pathologies and, subsequently, finding out the current availability of datasets. For example, AK and PS have been previously cataloged as precancerous skin diseases. Also, a wide range of SPSs related to cancer have been specified: from carcinomas (BCC, SCC or MCC) to melanomas (PRIMEL and METMEL), to even lymphomas or sarcomas. Next, the identification of transcriptomics webdata resources required to inspect the availability of the above SPSs together with healthy states (such as NSK or NEV) in public repositories such as NCBI GEO [39] and ArrayExpress [40] web platforms. Initially, samples on which drugs were applied, viruses were evaluated or were not directly extracted from tissue by means of punch biopsies or sliced sections were discarded. Moreover, only those SPSs for which a sufficiently representative number of samples were found and considered in order to increase the possibilities of characterizing their manifestation [41]. Under these considerations, Bowen's disease samples (also known as SCC in situ) were not finally considered (only two datasets containing data samples from this SPS were found, summing up to only 12 samples which  was considered too low for the study). Finally, no representative number of lymphoma and sarcoma samples was found, so they were not considered in this study. Since different microarray technologies and platforms were dealt with, several R packages from Bioconductor web platform [42] were used to acquire the RNA samples: GEOquery [43], affy [44] and oligo [45] for Affymetrix platforms and lumi [46] for Illumina platforms. In the case of RNA-seq series, SRA and FASTQ files containing raw information were directly downloaded in a programmatic manner before being preprocessed. Only those series whose samples were aligned to the GRCh37 reference genome, were considered for this study due to its greater current public availability. Specifically, the extensive RNA sample collection from 27 series used in this work led to the analysis of up to 10 SPSs (Table I).
Each of the series can be identified under accession ID, which shows most of them being submitted from United States and other countries where their population is predominantly white: Deutschland, Netherlands, Great Britain and Australia (Table II).
2) Preprocessing: This phase checks the quality of the samples under a detailed quality analysis process in order to remove potentially wrong samples. The quality of every microarray series was assessed using up to 6 quality tests: distance among samples, principal component analysis (PCA), Kolmogorov-Smirnov test based on the K a parameter, density distribution plots, standard deviation of the samples intensities and Hoeffding's D-statistic (normally executed with D < 0.15). These tests are available from the arrayQualityMetrics R package [47]. In order to discard all low quality samples (outliers), all of these tests were applied many times for each series. With respect to RNA-seq series, 5 samples were excluded by avoiding sample duplication. The total number of excluded samples from each series is specified in the last column of Table II.
Subsequently, each of the sequencing technologies requires a wide range of intra-array processing steps which have to be carefully performed when both are going to be finally integrated at gene expression level. Because of being processed from different platforms, a normalization procedure has to be applied on each microarray series. The Robust Multi-array Average (RMA) algorithm [48] was applied in this work by modularly performing background correction, normalization and summarization on the microarray data. The rma function from affy and oligo R packages was used for Affymetrix microarrays and the lumiExpresso from lumi R package was used for Illumina microarrays. Gene annotation of each series was provided by the annotate R package, which helps in mapping from the manufacturer chip identifiers to standardized symbols by using a wide range of annotation packages from the Bioconductor website. With respect to RNA-seq series processing, the proposed pipeline by Anders et al. [49] was partially followed, only changing certain tools. Once a large number of FASTQ and SRA files are available, several tools such as sra-toolkit [50], hisat2 [51], bowtie2 [52], samtools [53] and htseq [54] were used to get read count files containing the located genes in each sample. Before obtaining these files, gene annotation was retrieved by means of biomaRt R package [55], a data-mining tool which allows connecting to the Ensembl database [56]. After all these steps, other R packages such as cqn [57] helped in correcting and normalizing GC content bias, and NOISeq [58] was used to calculate the gene expression values.
3) Gene Expression Integration: After preprocessing each of the microarray and RNA-seq series individually, additional requirements have to be considered before inter-array normalization and integration [59]. First of all, each of the expression values of the genes transcribing the same gene identifier have to be summarized in a single value. In order to be consistent in assessing the impact of each gene selected in our analysis, all transcripts were gathered by applying the mean of them to each series separately. This parameter was selected after performing a comparative study versus median as it was done in our previous work [31], showing no statistically significant differences in classification performance. Next, several simultaneous steps were carried out on the 27 series (Fig. 2). 28 batches were established because different samples from GSE42677 series were processed by two different platforms.
Firstly, logarithmic transformation was performed on 2 series (GSE2503 and GSE3189) in order to adjust the scale of the gene expression values, establishing base 2 for all the batches ( Fig. 2A). Next, 16-bit depth homogenization was applied (Fig. 2B) after previously analyzing the maximum  (GSE98394). Thereafter, by having previously established a common gene annotation for all the considered series, only common genes from all the samples coming from the series / batches were identified and selected. At this point, batch effect correction shoud be considered because hypothetical batch effects could be appearing among all 28 batches considered (Fig. 2C). In order to deal with this issue, ComBat method [27] from sva R package [60] was considered, correcting and establishing a consistent sample distribution along all the samples from all batches (Fig. 2D). Finally, an inter-array normalization was applied by means of normalizeBetweenArrays function from limma R package [61]. This achieves consistency among all the samples put together and forces an identical empirical distribution on each of them based on quantile normalization (Fig. 2E). Before any new sample is properly assessed by this procedure, all these transformations are completely necessary and have to be applied in the same way. At the end of this procedure, the whole integrated dataset formed by p common genes and all n quality samples selected among N classes is achieved (matrix A in Fig. 1).

B. Machine Learning and Soft Computing
Bioinformatics researches and recent biological problems have been successfully benefited from the use of machine learning and soft computing techniques [62] in a wide range of topics such as expression profiling identification, feature selection and classification [63], protein sequences [64] and DNA sequences [65]. As the number of biological experiments and applications using high-throughput technologies continues to increase, this type of techniques for knowledge discovery will find new applications [66], [67].
1) OVO Multiclass DEG Selection: Traditionally, the gene selection from expression profiles analysis deals with the curse of dimensionality problem (np-hard), as it pits few n samples against thousands of p genes [68]. This issue becomes even more challenging when increasing the number of SPSs (in our work, N) (see nomenclature in Fig. 1).
For the purpose of addressing such challenge, this work presents a novel, simple and intuitive one-vs-one (OVO) multiclass DEGs selection approach based on the assessment of all possible pair comparisons of SPSs. This work defines the comparison of two SPSs as class pair comparison (CPC). The criterion for selecting DEGs for each CPC is a high LFC, which means higher discernment power at the gene expression level. The DEGs selection process ensures this criterion is satisfied by tuning the two parameters LFC and NMAX. On the one hand, LFC establishes a minimum threshold value for genes to be considered as DEGs throughout all CPCs. On the other hand, NMAX indicates the maximum number of DEGs selected for each CPC. An additional threshold can be established using p-value (PV). However, a constant value of 0.001 was set in the experimental setup of this work. By extending to a problem of N SPSs, the total number of CPCs amounts to (N 2 − N)/2. In particular, this work simultaneously analyzes 10 SPSs, which gives 45 CPCs. The expected maximum number of DEGs, globally for all the CPCs, would sum up to NMAX * (N 2 − N)/2. This novel multiclass OVO approach can be seen as a step forward with respect to the classical gene selection process, which is exclusively controlled by PV and LFC. The lack of capacity of the selected DEGs to discern among specific CPCs or different SPS subsets can be easily avoided by tuning the new proposed NMAX parameter. The union of all the DEG sets after considering each CPC needs to take into consideration the identification of repeated DEGs. These can occur as some genes may present a strong difference of gene expression for several CPCs. Nevertheless, such DEGs coincidences would help in reducing even further (up to p * ) the final candidate multiclass DEGs (where p * ≤ NMAX * (N 2 − N)/2 << p).
In order to strengthen the selection of DEGs, a number of experiments (M = 10 in this work) were performed, splitting in each of them the whole integrated dataset into two datasets: 90% for training and validation, and the remaining 10% for testing, in a cross-validation manner. Similar representativeness of each SPS was ensured within each of the dataset folds. The feature selection and parameter tuning processes were initially applied on the training dataset for each of these M experiments, thus returning different DEGs sets for each LFC and NMAX combination. With the aim of improving the reliability and the interpretability of the subsequent results, only those p * common genes matching all the M experiments for each parameter combination were selected. This choice discards spurious DEGs only emerging in specific executions and prevents subsequent classification biases. Before evaluating the different p * common gene set (finally naming p * to this set) within each of the M experiments, an additional assessment of their informative power was performed by means of the minimum-Redundancy Maximum-Relevance (mRMR) feature selection algorithm [63]. This algorithm returns a ranking according to the criterion of placing first those DEGs with the most relevant and the lowest redundant information among themselves with respect to the class variable. Then, from this ranking, proper assessment, described in the next subsection, allowed selecting a total of p * * genes from the previous set of p * genes.
In summary, twofold DEG selections were carried out: firstly, reducing the computational complexity from p thousands of genes to the p * most reliable candidate DEGs of the disease; secondly, exclusively selecting those p * * DEGs with higher informative capability for the intelligent diagnosis (see right part in Fig. 1).
2) Automated Classification Assessment: Three classification techniques assessed the informative power of different DEGs subsets from the ranking returned by mRMR: Support Vector Machines (SVM) [69], K-Nearest Neighbour (KNN) [70] and Naive Bayes (NB) [71]. K-fold cross validation technique (K-fold CV, where K = 10) [72] was used on the training set of each M experiment with the purpose of providing a realistic performance of the DEGs on new unseen data. Optimal hyperparameters were calculated for these methodologies: σ (kernel width) and γ for SVM, and k for KNN. The 10-fold CV classification assessment was repeated 10 times by randomly shuffling the dataset. The advantages of this process are twofold: first, this is statistically robust because of asymptotic convergence to a reliable estimation of the classifier performance [73]; and second, this prevents from achieving overfitting when assessing training and testing data. Finally, three metrics were used in order to measure the recognition rate by combining each classifier in association with different DEGs set sizes: accuracy (ACC), overall F1-score (OF1) and mean multiclass F1-score (MF1). These are calculated by using Equation (1), (2) and (3), respectively. Each of these metrics can be expressed as a function of certain parameters (precision (P) and recall (R)) or different rates (T p , T n , F p and F n ) which can be identified from a confusion matrix of N classes: The metrics related to F 1 -score [74] were considered particularly suited and robust for the multiclass study tackled, as they provide a better measurement of the recognition rate of each of the classes under unbalanced data. With regard to dealing with it, data balancing techniques such as SMOTE [75] were considered. However, no significant performance improvement was achieved, leading to be discarded and avoiding the introduction of additional complexity and artificial data.
Next, in order to assess the influence of the multiple factors considered for identifying multiclass DEGs, an ANOVA statistical test was performed over the entire dataset. Although factors such as assessed dataset type (TYPE), analyzed K-fold cross validation (KFOLD) or M experiment performed (EXPERIMENT) were also evaluated by this test, 4 factors were specifically highlighted because of their further relevance in the subsequent analysis. On the one hand, LFC and NMAX parameters were subjected to evaluation by tuning the proposed algorithm. On the other hand, the hypothetical differences of applying different classifiers in combination with a number of DEGs set sizes (Gen-Max) were also inspected by means of this test. By checking the validity of each factor (LFC, NMAX, classifier and GenMax), the different performance metrics (ACC, OF 1 and MF 1 ) were measured for both training and test sets.
Finally, a functional enrichment analysis was performed by means of DAVID 6.8 [76] in order to functionally annotate and interrelate the obtained DEGs using Gene Ontology (GO) terms [77].

III. RESULTS AND DISCUSSION
By taking into account the integration at gene expression level from 22 microarrays and 5 RNA-seq series containing multiple SPSs related to cancer, the opportunity to determine the skin cancer gene signature of up to N = 10 SPSs, formed by highly reliable multiclass DEGs, has been addressed in this work. The experimental analysis of this study has been conducted under the proposal of a novel OVO multiclass DEGs selection algorithm, which has been thoroughly tested by means of a complete and powerful ANOVA statistical test. The interpretation of the results obtained from this analysis has been used to select suitable parameter settings. By tuning our proposed algorithm, this study was focused on assessing the informative power of the p * identified multiclass DEGs. After selecting p * * multiclass DEGs from the previous one, their biological relationship to skin cancer was finally determined. This discussion has been guided on presenting all the results derived from the procedure above.

A. Impact of Tuning Algorithm Parameters
The statistical significance of each considered and highlighted factor (NMAX, LFC, GenMax, Classifier) was confirmed by means of the ANOVA statistical test, showing the influence of each of them on the classification performance (Table III). The most significant differences were exclusively explained by checking the scale depth when using MF1 (Fig. 3). While the lowest NMAX parameter value reflected one of the highest classification performances and discarded the consideration of a wide range of DEGs for each of the 45 CPCs, the impact of tuning LFC helped to elucidate the disadvantage of selecting high threshold values because the MF1 value dropped by more than 3%. Classification models results ranged from 80% to 82%, establishing these performances around 10 genes (see Classifier and GenMax factors in Fig. 3). Similar statistical results and distribution for each factor were achieved for ACC and OF1 as well, and these can be facilitated under petition. Next, in order to present the utility of the proposed algorithm in this work, a choice of parameters was required. The decision was motivated under the criterion of restrictively selecting DEGs while preserving the information of all considered SPSs for this study. For this purpose, NMAX = 1 was established as it presented one of the highest recognition rate for each performance metric assessed (as clearly showed and supported by the results of ANOVA statistical test), leading to drastically reducing the computational complexity to a maximum of (N 2 − N)/2 = 45 highly discerning DEGs. This choice prevents of arbitrarily tuning LFC and relying decision power on it in search of a sufficient threshold for discerning among multiple SPSs. Furthermore, this decision may avoid the removal of DEGs to discern those hardly distinguishable CPCs when applying highly restrictive LFC values. Hereafter, these setting parameters were used to identify the candidate multiclass DEGs and present a potential gene signature of skin cancer.

B. Selection of Informative DEGs
Although up to 45 genes could have potentially been returned by our proposed algorithm under the selected configuration, exclusively p * = 10 candidate multiclass DEGs appeared as common genes from the intersection of DEGs for each of the M = 10 experiments performed, as many of these genes were highly discriminating among several CPCs. However, in order to reduce the repertoire of candidate DEGs set for intelligent diagnosis, the informative capability of different subgroups of up to p * DEGs ordered by means of mRMR, was subjected to an automated classification assessment. This algorithm then established the following DEGs ranking: MLANA, LTF, MMP1, ADAMTS3, LY6D, SCGB2A2, KRT14, PI3, PMEL and S100A7. As a result, the classification results are presented when increasing the size of DEGs set following the previously established ranking, showing asymptotic convergence for the different performance assessments (Fig. 4). Differences among classifiers are not appreciated given the high discernment quality provided by the selected DEGs. By reducing the complexity of the study, the subsequent experimental analysis was limited to considering the first p * * = 8 DEGs given that the further average improvement of MF1 per gene is lower than 0.6%. The results associated with this size of DEGs set even improved those showed by GenMax parameter for ANOVA test, outperforming recognition rates of 94% OF 1 and 80% MF 1 when considering any classifier.
Afterwards, with the purpose of knowing the overall discernment capabilities of the 8 multiclass candidate DEGs, the number of SPSs and CPC cases being covered by each one of them and the information of the highest |LFC| for any CPC was summarized (Table IV).

C. Recognition of SPSs
Despite establishing parameter settings which help in exposing DEGs to discern from each CPC, difficulties in distinguishing among certain SPSs still can not be avoided. Most CPCs can be properly discerned from any of the 8 DEGs, presenting significant LFC values (Fig. 5). However, there is a small set of CPCs which are harder to distinguish when examining changes at gene expression level such as ISCC vs AK (LFC < 2) or PMCC vs MMCC (LFC < 1). This occurs when trying to offer a reliable diagnosis among a lot of SPSs which are close at the biological level.
By extensively checking how a new unseen sample could be classified, the different classification models assessed the 8 highlighted DEGs set (Fig. 6). The recognition rates confirm the real challenge of properly discerning the CPC cases previously highlighted, although presenting accuracy differences among models when classifying certain SPSs (for example, ISCC achieves 72% for NB, 76% for KNN and 77% for SVM). On the one hand, 3 SPSs achieved high recognition rates for SVM classification model: NSK (97%), BCC (∼100%) and PS  (∼98%). On the other hand, recognition rates dropped for the 7 remaining SPSs mainly due to the confusion with another SPS as previous studies had already advanced [9]- [12]: NEV (∼83% and confused with NSK above 4%), ISCC (77% and confused with AK above 20%), PMCC (∼58% and confused with MMCC above 37%), MMCC (45% and confused with PMCC above 54%), PRIMEL (91% and confused with METMEL above 2%),  METMEL (90% and confused with PRIMEL above 7%) and AK (∼65% and confused with ISCC above 31%). This fact remarks the difficulty of achieving reliable DEGs between precancerous and invasive states as they present molecular similarities. By considering the fusion of certain CPCs (for example, MCC formed by PMCC and MMCC, MSC formed by PRIMEL and METMEL or combining ISCC and AK), the recognition rates would have considerably increased the percentage up to 87-99% for these skin super-states in a more generalized study.

D. Determination of Potential Target Genes
One of the main justifications for separating specific SPSs is by using relevant biomarkers of their occurrence from gene expression analysis. In this case, our approach highlighted the informative capacity of these 8 candidate multiclass DEGs for an overall diagnosis of suffering from skin cancer (Fig. 7).
In view of these results, certain multiclass DEGs such as MLANA, MMP1, LY6D or PI3 appeared down-expressed for both SPSs and, among others, may discern better PMCC and MMCC with respect to other SPSs (Fig. 5). All these genes have previously proven to be of great importance for expression pattern characterization and skin cancer diagnosis: from inhibition in SCC (MLANA), positive dysregulation in BCC and AK (MMP1) to correlated overexpression in SCC and PS (PI3) [78]- [80]. Therefore, a preventive clinical analysis of these genes could help to avoid erroneous therapies by examining their hypothetical involvement in other SPSs addressed by this study.

E. Biological Interpretation of the Multiclass DEGs
In order to understand the functional properties of the 8 highlighted DEGs, an enrichment analysis based on GO terms  was performed from DAVID Bionformatics Database [75]. The three GO ontologies for biological processes (BP), cellular components (CC) and molecular functions (MF) were considered for our analysis. A total of 6 BPs, 6 CCs and 4 MFs were determined to be significant throughout these genes (Table V). As shown, MMP1, ADAMTS3 and LTF genes are highly related in terms of their proteolysis process and endo-and metalloendo-peptidase activity. According to the activity of proteolytic enzymes, this occurrence has been associated with angiogenesis and tumor progression of skin cancer [80], [81]. Subsequently, by exhaustively inspecting specific literature, the biological relationship of the 8 highlighted DEGs with skin cancer was consulted. On the one hand, the most remarkable inquiries underlined the dysregulation of up to 6 DEGs in MSC cases [83], [84] and development risk [85] (except ADAMTS3 and PI3) and the implication of up to 5 DEGs in PS development or inflammatory processes [80], [86] (except MLANA, ADAMTS3 and KRT14). On the other hand, the differentiating role of specific DEGs in NMSC cases was highlighted: the overexpression of ADAMTS3 in BCC [87] or the hypothetical implication of KRT14 in the malignant transformation of potential stem cells as origin of MCC [88]. Based on all these precedent evidences and the results shown (Fig. 7), the 8 multiclass DEGs highlighted by this approach should be particularly taken into account by being related to tumorigenesis and pathogenesis of skin cancer. Concretely, MLANA has been remarkably demonstrated to be upregulated in NEV [83], inhibited in SCC [78] and differentiated between MCC and PRIMEL by highlighting absence and overexpression by means of immunohistochemical analysis [89]. Further, multiple genetic dysregulations of DEGs have been reported in several studies: from the downregulation of LY6D, SCGB2A2 and KRT14 in METMEL with respect to PRIMEL [82] to the dysregulation in SCC versus NSK by showing inhibition of MLANA and SCGB2A2 or overexpression of MMP1 and PI3 [78], [90], [91]. Finally, the dysregulation of certain DEGs has been interestingly reflected in both SCC and PS in a similar way: from inhibition of SCGB2A2 together with overexpression of MMP1 and PI3 [80] to slight and strong upregulation of LTF in SCC and PS, respectively [80], [86]. Because of being a chronic inflammatory skin disease, special attention should be paid to the psoriasis evolution because the cancer development also generates inflammatory reactions around surrounding tissue [7]. From the preventive point of view, clinicians should remain attentive to the high gene expression variability of these specific DEGs by observing changes between NSK, PS and diverse SPSs related to cancer (see gene expression changes for all these DEGs in Fig. 7). In accordance with our results, this multiclass DEGs subset could represent a genetic signature offering clues about the overall state of the disease.

F. Limitations of the Approach
Although the validity of gene expression analysis has been proven by an enormous amount of scientific publications, certain limitations may be remarked when applying multiclass classification and transcriptomics resources integration. On the one hand, the identification of relevant multiclass DEGs is restricted to the establishment of proper thresholds for the discernment among CPCs by using statistical conditions such as LFC or PV. However, it is completely necessary to inspect the gene expression levels in order to avoid the selection of DEGs with similar levels and avoid mistakes under classification assessment. This concern disappears when final clinical diagnosis is doubtful among exclusively two specific SPSs (for example, PMCC vs MMCC, etc.), in which case gene expression analysis may find highly discerning DEGs making them outstanding target genes for therapeutic treatments. On the other hand, the consideration of multiple heterogeneous sources of transcriptomics data may lead to removal of relevant DEGs after their integration, due to the specific platform requirements. This eventuality could be widely assessed at the expense of discarding biological samples as long as it does not affect the representativeness of the SPS to be analyzed. Additionally, gene expression analysis could benefit from the use of another biological points of view. Copy number variation could help to shed light on explaining gene expression changes for each DEG within each SPS (see widening and outliers associated with each multiclass DEG in Fig. 7). The co-integration of both omic data from the same cohort of patients could noticeably improve the downstream analysis. Finally, the relevance of the highlighted DEGs could be more widely determined by taking clinical data such as gender, race or ethnicity together with their biological involvement to certain pathways.