Exploiting Ontology Recommendation Using Text Categorization Approach

Semantic Web is considered as the backbone of web 3.0 and ontologies are an integral part of the Semantic Web. Though an increase of ontologies in different domains is reported due to various benefits which include data heterogeneity, automated information analysis, and reusability, however, finding an appropriate ontology according to user requirement remains cumbersome task due to time and efforts required, context-awareness, and computational complexity. To overcome these issues, an ontology recommendation framework is proposed. The Proposed framework employs text categorization and unsupervised learning techniques. The benefits of the proposed framework are twofold: 1) ontology organization according to the opinion of domain experts and 2) ontology recommendation with respect to user requirement. Moreover, an evaluation model is also proposed to assess the effectiveness of the proposed framework in terms of ontologies organization and recommendation. The main consequences of the proposed framework are 1) ontologies of a corpus can be organized effectively, 2) no effort and time are required to select an appropriate ontology, 3) computational complexity is only limited to the use of unsupervised learning techniques, and 4) due to no requirement of context awareness, the proposed framework can be effective for any corpus or online libraries of ontologies.


I. INTRODUCTION
In the digital era of living, research community is making effort to the use of information and computing technologies to manage the rapid change in data volume. The amount of data doubles over a period of 20 to 24 months [1], [2] and by 2020, the world will have over 4ZB data. The extraction of meaningful facts from the huge amount of data with minimum human involvement and effort remains a challenging task for the research community. A huge amount of data is available on the Internet and Semantic Web is considered a simple way to allow machines for precise understanding and processing of the data. Semantic Web standards enable The associate editor coordinating the review of this manuscript and approving it for publication was Le Hoang Son . data interoperability by constructing a distributed data space for software agents and users to access and publish information from different data sources and locations [3]. Moreover, Semantic Web standards aid applications to carry out more of the work required to find, combine, and act upon information on the web without human intervention. Ontologies are an integral part of the semantic web and bear the potential to model different types of data [4]. Ontologies provide support to share and reuse knowledge while providing automated reasoning about data. Due to structural support and formal representation of domain schemas [5], ontologies open several opportunities for researchers to automate the processing of web data such as ontology's effect on the system quality [6], heterogeneity, automated information analysis, reusability [7]. The diverse benefits of ontologies enable the research community to explore its use in certain domains such as agriculture, healthcare and information technology [8]- [10]. A frequently used ontology design principle is to reuse the content of existing similar ontologies. A huge number of ontologies are available on the Internet with the purpose of reuse instead of building it completely from scratch, such as CMS Ontology, 1 Library Ontology 2 and SNOMED CT Ontology. 3 The reuse of classes, properties, and concepts (ontology components) of an existing ontology according to a user's needs can significantly reduce the cost in terms of time and effort [10]. Due to the availability of a number of ontologies in different domains, searching for an appropriate ontology for its reuse with respect to a user requirement is considered as an ongoing challenge [11]. The existing efforts of the research community can be summarised as follows. 1) While using search engines to find the ontologies, a huge number of links and documents are provided in response to a user's query. Assistance is required to help the user find the right ontology according to the subject domain and conceptual details. Since few search engines use ranking algorithms, consequently, visibility of some well-defined ontologies can be hindered [12]. Moreover, this is in itself a tedious and time-consuming task. 2) To use recommendation systems to take keywords as input and recommend ontologies to the users, however, these recommendation systems are context-aware or domain-specific, for example, related to biomedical science. 3) Ontology libraries are used to find an appropriate ontology for users, however, few libraries support keywords search.
Like other studies [13]- [15], we consider ontology recommendation as an information retrieval problem, and employ a text categorization approach to propose a framework for the recommendation of ontologies with respect to the user requirement. Firstly, an ontology repository of four domains is created by collecting ontologies from literature and Internet resources. Secondly, a pool of requirements is created by conducting a survey from the developers and domain experts. In this regard, 31 user requirements are collected to validate the effectiveness of the proposed framework. Finally, the ontology repository and requirements pool is considered as an input to the proposed framework. The proposed framework organizes the ontologies in related groups (clusters) and whenever a user gives a requirement as input, the system will perform the required steps (mentioned in section V) and will recommend the most appropriate ontology. The proposed work overcomes the aforementioned limitations such that only a single most appropriate ontology will be provided to the user instead of a plethora of results for him to choose. Unlike ranking algorithms, the proposed system employs text categorization and unsupervised learners overcoming the issue of visibility of some well-defined ontolo-1 https://github.com/ayesha-banu79/Owl-Ontology/blob/master/College%20Mngt%20Sys.owl 2 https://github.com/ayesha-banu79/Owl-Ontology/blob/master/Library%20Ontology.owl 3 http://bioportal.bioontology.org/ontologies/SNOMEDCT gies. Furthermore, the proposed system eliminates the issue of domain-specific ontology search and it can be enhanced to use ontologies from many fields such as computer science, medical, education. The main contributions of this research work are as follows: • Instead of the main repository with all ontologies, we have organized the ontologies in related groups according to the domain expert opinion. Consequently, achieving better retrieval and performance.
• An ontology recommendation framework is proposed. The proposed framework recommends the top most appropriate ontology to the user according to her requirement rather than providing with countless pages of results.
• A performance assessment model to evaluate the effectiveness of the proposed framework in organizing ontologies, predicting the correct ontology group against user requirements, and recommendation of ontology is also presented.

•
We have used various weighting methods in our experimentation. We have designed experiments for determining the suitability and best weighting method out of Binary, TF, TFIDF, TFC, LTC, and Entropy for unsupervised learners.

•
We have used Fuzzy c-means, K-Means (Euclidian and Manhattan), and K-Medoids for our experimentation. For each ontology group, we have identified the best-unsupervised learner in organizing and predicting the correct ontology group against user requirements.
The rest of the article is structured as follows. Section 2 presents the related work. Sections 3 and 4 describe a brief overview of ontologies and text categorization approach respectively. Section 5 describes the proposed ontology recommendation framework. Section 6 describes an evaluation model to assess the effectiveness of the proposed approach. Sections 7 and 8 present the experimental procedure and results respectively. Finally, Section 9 presents the conclusions of the proposed work.

II. RELATED WORK
This research work mainly focuses on ontology recommendation, however, we also presented the recent literature on the text categorization approach. Different researchers have made efforts to address the issue of appropriate ontology recommendation. The summary of their efforts with limitations is as follows.

A. ONTOLOGY RECOMMENDATION
Alani et al. [16] performed an Ontology search based on the content of ontology and user query. The authors used the query to represent domain names. Moreover, they used Web pages to find representative terms related to query for expanding the query. Experiments were performed in the biomedical domain. Jonquet et al. [17] introduced a web service for recommending ontologies in the biomedical domain. In this study, size, connectivity, and coverage were used for decision making for a query. The system uses a set of keywords or ontology metadata to describe the domain to recommend appropriate ontology. Martínez-Romero et al. [18] also proposed an ontology recommendation system for the biomedical domain known as ''BiOSS'', which recommends ontologies on the basis of keywords provided by the user. BiOSS uses domain coverage, popularity, and semantic richness as the evaluation parameters. On the basis of the scores of these parameters, ontologies are suggested in an order.
Groza et al. [19] proposed an ontology ranking and selection system based on the Analytical Hierarchical Process (AHP). In this study, the authors tackled the problem of selecting, evaluating, and ranking of ontologies. Moreover, the authors used AHP to analyze different ontologies from different perspectives. The system is composed of three main modules named domain coverage, ontology measurement, and AHP. The proposed system was tested on the ontologies of a the tourism domain. Butt et al. [20] proposed a framework entitled RecoOn for recommending ontologies based on structure-less queries. The aim of RecoOn is to suggest the best-matched ontologies against a query consisting of multiple keywords. Experiments were conducted on the CBRBench ontology collection. Matching cost and popularity of the ontology was used as the evaluation metrics.
Trokanas and Cecelja [21] proposed an algorithm for ontology evaluation and reuse. The proposed algorithm uses knowledge about ontologies, which is presented in the form of terminologies and structure to create the compatibility metrics. The algorithm relies on the high-level details of ontology. Chemical and business process use cases have been used for demonstrating the work. Aguilar et al. [22] proposed a hybrid recommender system for ontologies of the biomedical domain. The authors used metadata, which is stored for ontologies in the semantic repository, and considered quality and adaptability characteristics of ontologies during the recommending process.
Brown et al. [23] used the concept of ontology recommendation in recommending the ontologies for the planning of software requirements. The process is divided into two phases. In the first phase requirement model is converted into ontology. In the second phase, converted ontology is compared to the other ontologies related to the domain. A tool was developed for the second phase that consists of three components such as Matchmaker, persistence manager, and query handler. Recommended ontologies are determined on the basis of these three components.
Zulkarnain et al. [24] proposed a methodology by using reuse, coverage, language as acceptance criteria for ontology recommendation. To verify the ontology recommender system, the authors used the BioPortal's ontology recommender's API. The recommended bio ontology can be further reused and enhanced according to the need. Martínez-Romero et al. [25] extended their previous work [18] and proposed an ontology recommendation system called ''NCBO Ontology Recommender 2.0'' for recommending biomedical ontologies. The proposed system finds ontologies based on biomedical text or keywords using coverage, detail, acceptance, and specialization as evaluation parameters. ''NCBO Ontology Recommender 2.0'' recommends more than 500 ontologies available on NCBO BioPortal. Faessler et al. [26] proposed JOYCE, a tool for selecting and tailoring ontologies. JOYCE identifies and assembles ontologies or pieces of ontologies from the ontology repository. The aim of the proposed tool is to utilize the existing ontologies.
Finally, the related work of ontology recommendation and its limitations are summarized in Table 1. Besides, there are some ontology libraries and search engines available. However, mostly they are domain-specific and their scope is limited. The main limitations of existing efforts for ontology recommendation are: 1) Context-awareness, 2) limited scope, 3) efforts required to implement the conceptual models, and 4) use of single or multi-term keywords. Considering user requirement description as an input of an ontology recommendation system can improve the searching process regardless of the existence of numerous ontologies of any domain. As we are considering ontology recommendation as an information retrieval problem, a brief overview of related literature is presented in the next section.

B. TEXT CATEGORIZATION
Traditional machine learning-based approaches for text categorization primarily focuses on feature engineering and classification of text documents. Machine learning models take text features as input, which are designed based on several statistical methods and word frequency. Several domains have benefited from machine learning and text categorization approaches. Hussain et al. in [27], have employed machine learning and text categorization approach for automating the selection of design pattern. The proposed three-step methodology contains pre-processing, unsupervised learning of identifying similar objects and selection of appropriate design patterns. The authors evaluated the performance of their proposed system and reported 18% better performance as compared to supervised learners. The authors extended their research to include a large dataset of design patterns, employed several statistical methods of text features and unsupervised learners [28]. Compared to previous work, the extended approach provided four advantages. Firstly, the semi-formal definition of design patterns was not necessary as a prerequisite; secondly, the ground class labels assignment was not mandatory; thirdly, the lack of classification training for each design pattern class, and fourthly, authors claimed that appropriate sample size was not needed for accurate training. Authors in [29], proposed a framework for selection and organization design patterns. The authors tried on to minimize the semantic relationship gap between design patterns and the features. The authors presented a case study and employed a powerful deep learning algorithm named Deep Belief Network. Several other studies have also employed machine learning bases solutions for text categorization in different domains. In [30], a spam detector is developed using machine learning. The proposed solution uses a combination of a collection of features, pre-processing steps, or setup information, such as using or not using stop words list, lemmatization, keyword patterns, etcetera (etc). Vilares et al. [31] present an unsupervised approach to multiple languages sentiment analysis guided by rules based on syntax; the terms are weighted based on the syntax-graph analysis. Text categorization approaches are also been tested on many languages related text corpuses such as the Turkish Language [32], Arabic language [33], and Croatian Language [34]. Author profiling is yet another significant task relevant to the categorization of text, where a lot of progress has been observed. In this regard, Basile et al. [35] have proposed an author profiling model. The proposed models consist of a linear kernel SVM, Parts Of Speech (POS), and n-grams.
Another area of study is collaborative filtering, hash collaborative filtering, and binary codes for the recommendation systems. Collaborative filtering algorithms recommend the items to a user, based on the preferences of the customer and are able to match other users with common interests [36]. Binary codes aim to approximate user-item encounters and create hash tables to speed up retrieval time. Using binary codes can reduce the query time to constant or sublinear complexity considerably. By learning binary codes, the storage requirement can be minimized considerably, as storing each binary code needs just 4 bytes if the code length is 32 [37]. Various studies have been conducted to evaluate the usage of binary codes in e-commerce, to recommend individual items to users [36], [39], [39]- [42][38]- [42] and personalized fashion recommendations [43]. The accuracy of these models is lower than traditional models because such models are highly limited and can lack adequate versatility to list the Top-N objects correctly [39].
In recent years, state-of-the-art approaches have moved dramatically from computational such as statistical and traditional machine learning to deep learning-based text categorization [44]. Convolutional Neural Networks (CNN) are commonly employed in the field of image processing. Vieira and Moura [45] introduced the application of CNN in text categorization. For several classification datasets, Kim employed a single layer CNN achieving impressive classification results. In [46], Liu et al. employed the Recurrent Neural Network (RNN) for text categorization. Unlike previous works, the authors focused on multitask learning system to learn together across several related tasks. To enhance the efficacy of deep learning models to accurately classify the text several authors used the combination of two models. The authors in [47] proposed a text classification model comprising Long Short Term Memory Network (LSTM) and CNN. LSTM is also being used in the field of healthcare. Authors in [48] and [49] used LSTM and deep learning respectively for intelligent healthcare monitoring systems. In another attempt, authors employed a combination of CNN and RNN [50]. CNN is used to extract text features while RNN is responsible for multi-label prediction.

III. BRIEF OVERVIEW OF ONTOLOGIES
In order to imply different viewpoints, several people, software programs, and organizations communicate with each other despite their differences in needs, platforms, formats, and backgrounds [51]. An ontology consists of a set of terms that are used in a formal and hierarchical manner to constitute ontology. These terms include class, subclass, properties, and individuals. In this regard, the term ontology can be described as ''a hierarchically structured set of terms to describe a domain that can be used as a skeletal foundation for a knowledge base'' [52].
Being the backbone of the Semantic Web, ontologies are regarded as an alternative to address data heterogeneity problems. The term ontology is defined as ''a formal, explicit specification of a shared conceptualization'' [20].
Ontologies consist of concepts or objects that can be used to express knowledge and relationships [52]. A concept can be any real-world object. There are no strict rules to describe the term concept in the ontology. However, a concept should reflect the same real-world phenomena that a specific ontology is expressing. An ontology consists of a set of elements that are used in a formal and hierarchical manner to constitute ontology. The ontology has four primary elements: classes, concepts, instances, and relationships [53]. Creating an ontology also promotes the analysis of knowledge in the domain, which in effect helps to reuse existing ontologies [55]. The graphical representation of ontology to model the concept of an institute is shown in Fig 1. This ontology contains 41 classes and 42 sub-classes. For example, the person class contains two subclasses named student and employee. Moreover, Employee class contains two subclasses named administrative staff and faculty creating a hierarchy of different concepts as they appear in the real world Ontologies can be used in many research areas to support a wide range of tasks such as natural language processing, knowledge representation, information retrieval, databases, online database integration, knowledge management, visual information retrieval, geographic information systems, digital libraries, or multi-agent systems [56]. Furthermore, many researchers are using the ontology related systems in different fields such as Diagnostics [57], Recommendation and classification [58], [59], IoT security [60], content analysis [61] and opinion mining [62]. However, considering the ontology reuse as a defined design pattern, little or no attention is being paid to the reuse of existing ontologies to reduce the costs [10]. Consequently, reuse and discovery of ontology terms remain a crucial challenge. To address these issues we are proposing a framework to recommend appropriate ontology to the users on the basis of user requirement.

IV. BRIEF OVERVIEW OF TEXT CATEGORIZATION APPROACH
The rapid increase in the amount of digital information available on the internet has made it difficult to search for relevant information for a user. Consequently, the categorization of documents has become a challenging task; it enables researchers to consider it as an information retrieval problem. The research community has reported the implication of the text categorization approach to address information retrieval problems [63]. Text categorization is a process that analyses given electronic documents algorithmically and assigns them to related categories [64].  Automatic text categorization is used in machine learning, especially in the text-mining domain, which employs either unsupervised or supervised learning techniques. Supervised learning techniques require that class labels must be assigned to documents, whereas unsupervised learning techniques use data attributes, similarity and dissimilarity measure to automate their learning process. A conventional text categorization framework involves pre-processing of text documents, feature extraction, feature selection, and classification of these documents [65].

A. PRE-PROCESSING
For the text categorization of large documents, it is necessary to perform pre-processing of the input documents and store the extracted information in an appropriate data structure for further steps [66]. The pre-processing step involves tokenization, removal of stop-words, lowercase conversion, and stemming (or lemmatization). Tokenization is a process splitting a text stream into single words, phrases or any other meaningful parts. Stop-words removal process discards frequent words that carry no meaning or information, such as propositions, pronouns, and conjunctions. Subsequently, tokens are converted into lowercase to reduce duplication of words. Stemming is a process for performing word normalization, which reduces a word into its basic form [67].

B. INDEXING
Indexing is the common way to convert textual documents into numeric vectors. Vector Space Model (VSM) is employed as the most common indexing method to describe a document in a numeric vector. Regardless of its simple data structure, the VSM enables efficient analysis of large document collections. VSM was originally introduced for indexing documents and retrieval of information. However, it is now being used in different text mining and document retrieval systems [68]. We have used the term document category of VSM in this study, where each word is represented by a numeric value demonstrating the importance (weight) of the word in a document. Equation 1 is used to construct VSM ( word-by-document matrix) where entry of each word refers to its occurrence in the document.  [66]. A brief overview of each weighting method used is as follows:

1) BINARY
Binary weighting method is considered as the simplest weighting method. As the name suggests, if a word occurs in the document the weight will be 1 and if the word does not occur then the weight will be 0.

2) TFIDF
TFIDF is a numerical metric designed to represent how significant a word is to a document in a list or corpus. TFIDF value decreases proportionally to the amount of times a word occurs in the document and is determined by the amount of documents in the corpus containing the term, which helps to account for such terms appearing more often overall.

3) ENTROPY
Entropy is a measure of the unpredictability or imbalance. The entropy word weight characterizes a word's value in identifying a specific document. When a word occurs specifically in a document then entropy is high and if the word appears equally in the documents then the weight (entropy) is low.

4) TFC
The TFIDF weighting method does not take into account the length of documents. TFC is a variant of TFIDF, however, for TFC length normalisation is used. TFC uses a normalized TFIDF weight for document terms.

5) LTC
LTC is also a different format of TF-IDF like TFC. However, it considers the limit of small datasets, and normalization of weights. Furthermore, instead of the raw word frequency, VOLUME 9, 2021 LTC uses the logarithm of the word frequency, thereby minimising the impact of large frequency variations. The main idea behind using these weighting methods in this research is to increase the accuracy of text categorization and find the best-fit weighting method for ontology corpus. Consequently, while proposing an ontology recommendation system we also performed a comparative study of feature weighting methods. Five aforementioned methods were evaluated on ontology corpus of four domains with three unsupervised learners. Three random terms are selected from our corpus and their consequent feature values are presented in Table 2 for better understanding of readers. It can be seen in the Table 2 that how different weighting schemes treat various terms in the VSM (depending on the length of document and frequency of the term) and assign weights to them. For a detailed explanation about various weighting methods and their implications, readers can refer to [69]- [71].

V. PROPOSED METHODOLOGY
The selection of appropriate ontology with respect to the user requirement has become a complex process in terms of required time and effort. The research community has reported different implications of machine learning and text categorization in several domains such as author identification [72], web page classification [73], spam e-mail filtering [74], sentiment analysis [75] and design pattern classification and recommendation [28]. Although several frameworks and statistical methods have been introduced for ontology detection with respect to the given user keywords and queries, however, to the best of our knowledge, there is presently no comprehensive study on implications of text categorization approach in terms of ontology recommendation.
In this study, we propose a framework that employs unsupervised learning and text categorization approach for ontology recommendation. The objectives of the proposed framework are: 1) to organize ontologies according to the opinion of domain experts, and 2) to select appropriate ontology with respect to the user requirement. The layout of the proposed recommendation framework is shown in Fig 3, which describes its functionality in four phases. In the first phase, ontology crawling is designed and implemented to obtain ontology terms and text. Subsequently, in the second phase, pre-processing activities are performed over user requirements (in natural language) and ontology data. In the third phase, unsupervised learning is employed to group similar ontologies and determine candidate ontology group for the user requirement being processed. Finally, in the fourth phase, ontology is suggested for the given user requirement.

A. ONTOLOGY CRAWLING
An ontology describes the information in semi-structured natural language text [76]. In other words, an ontology models any described concepts (also called classes) in terms of task, action, function, reasoning process, and strategy [77]. The aim of the crawling phase is to retrieve properties, classes, annotation properties, and metadata descriptions of ontologies and to create a text corpus for further processing. These ontologies are related to four different domains: Food and Drinks, Academics, Computer Science, and People. Consequently, for each ontology repository, a new separate file is created to represent the description of ontology. Fig 4 presents text corpus of an institute ontology after the ontology crawling step. Text corpus file contains all the classes, properties, metadata descriptions of aforementioned ontology. Table 3 presents an overview of Domains, sub-domains/categories used in this study. We used ''owlready'', a python library to extract properties, classes, annotation properties, and metadata descriptions of ontologies. Consequently, these properties, classes and descriptions are then given as input to pre-processing activity.

B. PRE-PROCESSING
The aim of the second phase is to pre-process the text data retrieved from the ontologies corpus. Pre-processing prepares the data for the next phase (clustering). The set of pre-processing activities are shown in Fig 5. The first three pre-processing activities are performed to remove stopwords, numbers, and punctuation, which have no meaning. The word lemmatization activity is performed to group several inflected forms of a word into a single item. Subsequently, to avoid duplication of the words due to the upper or lower case, all words are converted into lowercase. Moreover, the aim of these activities is to reduce the data sparsity and feature set size. The next activity word indexing constructs VSM, which contains words of all the input documents, and represents them as the word-by-document matrix. Subsequently, a feature vector is generated for each ontology. Finally, the aim of the last pre-processing activity namely weighting methods is applied to rank the words in VSM. For each ontology group, we determine the best performer out of the five weighting methods (Entropy Weighting, TFIDF, LTC, TFC, and Binary). The binary-weighted form of a VSM is shown in Fig 6. Furthermore, Table 3 presents the number of non-repeated words which were obtained after performing pre-processing activity on each ontology group.

C. CLUSTERING
Clustering, also known as learning without a teacher (unsupervised learning), has been applied in a wide range of fields including engineering, informatics, computer science, life and medical sciences, economics, earth sciences, and social sciences [78]. There are several clustering algorithms such as K-Medoids, K-Means, Agglomerative, Fuzzy c-means and so on. Based on the clustering properties, these algorithms Take K distinct points randomly. These points act as initial centroids. 6 Assign each data object to the group most close to the centroid. 7 When all data objects are assigned, recalculate the positions of the k centroids. 8

Repeat
Step 6 and 7 until the convergence is reached (centroids no longer move and are fixed). 9 end can be grouped into certain schemes such as partitioning, hierarchical, model-based, grid-based, density-based and soft-computing [28]. The research community has agreed that no single unsupervised learning algorithm can be recommended as an outperformed learner. Like [14], we used K-Means, K-Medoids, and Fuzzy c-means to employ the proposed framework for ontology recommendation with respect to user requirements. The brief description of these algorithms (unsupervised learners) is as follows.

1) K-MEANS
K-Means is a renowned partitioning based iterative clustering algorithm. K-Means classifies the given data into different groups (clusters) using the idea of centroid [79]. In a cluster, the mean value of its data points is known as centroid. For each data vector, K-Means calculates the distance between the data vector and each cluster centroid. For any given data set, the algorithm classifies the dataset into a user-defined number of clusters namely k. The working procedure of the K-means algorithm is presented in Algorithm 1.

2) K-MEDOIDS
Like K-Means, K-Medoids is also a clustering algorithm based on partitioning. But, K-Medoids is more robust than Randomly select k as the Medoid for n data points. 6 By calculating the distance between Medoid k and data points, Find the closest Medoid and map data objects to that Medoid.  If there is no change in the assignments repeat steps 5 and 6 alternatively. 11 end K-Means [80]. In the K-Medoids algorithm, medoids are the data objects of clusters which are located centrally and selected randomly from the data objects D to form k clusters. Moreover, the rest of the data objects in D are placed near to Medoids (central point) in a cluster. Subsequently, all data objects of a cluster are processed to find new Medoids in repeated fashion and represent a new cluster in a better way. After each iteration, the location of Medoids is changed. We used K-Medoids with Euclidean and Manhattan distance. The working procedure of the K-Medoids algorithm is presented in Algorithm 2.

3) FUZZY C-MEANS
Fuzzy c-means is a clustering algorithm that allows one data point to belong to different clusters, whereas K-Means only assign one data point to one cluster. Fuzzy c-means was introduced by Dunn in 1973 and later Bezdek improved it in 1981 [81]. The main objective of using these unsupervised learners is to classify the data objects into clusters. Fuzzy c-means works by assigning a membership value to each data point, which corresponds to each cluster center on the basis of the distance between the center of the cluster and the data point. Fuzzy c-means classify objects into clusters based on the membership function, which represents its fuzzy behavior. The membership function of Fuzzy c-means produces membership degree values, which range between 0 and 1. The working procedure of the Fuzzy c-means algorithm is presented in Algorithm 3.

D. ONTOLOGY RECOMMENDATION
The last phase of our proposed approach recommends ontology to the user on the basis of their requirement description.

Algorithm 3 Working of Fuzzy c-Means
Input: k: number of clusters, D: dataset Result: k Number of clusters 1 initialization; 2 if k == 1 then 3 Exit; 4 else 5 Input the dataset and value of k. 6 Calculate the fuzzy membership matrix 7 Compute the fuzzy centers. 8 Update the membership value.

end
When input documents are represented in the form of term vectors, the similarity between two documents is computed through their correlation. In order to suggest the appropriate document for a given requirement, similarity measures such as Dice Coefficient, Pearson Correlation, Cosine, and Extended Jacquard can be used [28]. However, for our proposed approach, we used a well-known similarity method named Cosine Similarity (CS), which helps to measure correlation between different vectors regardless of the document length. Moreover, it performs better than any other similarity method in text clustering [82]. The CS between two vectors calculates the cosine of the angle between these vectors.

VI. EVALUATION MODEL FOR THE PROPOSED APPROACH
In this section, we propose an evaluation model to assess the performance of the proposed framework. Firstly, we suggest measures to 1) to determine the best weighting method, 2) to evaluate the performance of unsupervised learning techniques namely K-Means, Fuzzy c-means and K-Medoids (Euclidian and Manhattan) in terms of organizing ontologies, and 3) to determine candidate ontology for the particular user requirement description using unsupervised learners. Secondly, we suggest measures to evaluate the performance of the proposed framework in the recommendation of ontology for a user requirement. Thirdly, we describe the ontology corpus and related user requirement, which are considered in four case studies.

A. EVALUATION OF UNSUPERVISED LEARNERS AND BEST WEIGHTING METHOD
The formation of clusters is an important process. However, it is also important and meaningful to test the accuracy and validity of the formed clusters. There are several measures that are used to evaluate the performance of the three clustering algorithms. We used, however, the widely used evaluation measures, which are Rand Index (RI), V-measure, Accuracy, F-measure, Adjusted Rand Index (ARI), Precision, and Recall [28], [81]. The effectiveness of an unsupervised learning algorithm depends on the higher value of these metrics. In this study, we used One-vs-All (OVA) matrix method to compute the average accuracy for two purposes: 1) to select the best weighting method for each unsupervised learner on a target ontology corpus, and 2) to identify the outperformed unsupervised learner. The reason behind using the OVA measure is due to its wide application for the multi-class problem. OVA considers the performance of the algorithm with respect to one class at a time before averaging the metrics [83]. Moreover, we use OVA for evaluating: 1) the best weighting method, 2) performance of unsupervised learners in terms of organizing ontologies, and 3) candidate ontology category for a particular requirement description.

B. EVALUATION OF ONTOLOGY RECOMMENDATION
Firstly, we suggest and apply the CS measure which is applied in order to recommend an appropriate ontology to the user. For evaluating the overall performance of the framework in recommending the ontologies, we used the Ratio of Correctly predicted Ontology (RCO) for the user requirements. The value of RCO can be computed using Equation 2 where CSO is correctly suggested ontologies and SO is suggested ontologies.

RCO = (Number of CSO)/(Total Number of SO) (2)
C. ONTOLOGY CATEGORIES Numerous researchers and Semantic Web experts of different domains (such as security and privacy, e-commerce, health, bio, and so on) have developed several ontologies for the sake of their work and for motivating the reuse of ontologies, hence, a plethora of ontologies are available online. In this study, we formulated an ontology repository consisting of 95 ontologies of four domains: computer science, food and drinks, people and academics. For each domain, the collected ontologies are grouped into certain sub-categories. Details of domains, sub-domains/categories and number of ontologies are presented in Table 3. We gathered these ontologies from literature and the Internet. The brief introduction and descriptive statistics of each ontology domain are as follows.

1) COMPUTER SCIENCE
The computer science domain contains 35 ontologies, which are further divided into four categories: networking, cybersecurity, software systems and sentiments/emotions. Sentiment/Emotion group contains application specific ontologies related to emotions and sentiments analysis such as OntoSen-ticNet [84]. There are 13, 8, 11 and 3 ontologies in the categories of networking, cybersecurity, software systems, and sentiments/emotions, respectively. This case study includes 1540 non-repeated words of 35 ontologies after performing the pre-processing activities (Section V-B).

2) FOOD AND DRINKS
The Food and Drinks domain contains 21 ontologies, which are further divided into two categories: Food and Drinks. The Food category contains 12 ontologies related to eatable items such as pizza, ingredients, and recipes to make food. Sub- VOLUME 9, 2021 sequently, the Drinks category contains 9 ontologies related to different drinks such as wine, beer, coffee and so on. This case study includes 1206 non-repeated words of 21 ontologies after performing the pre-processing activities (Section V-B).

3) PEOPLE
The People domain contains 18 ontologies, which are further divided into three categories: Work-related, People's contacts, and Family hierarchy & history. The Work-related category consists of 6 ontologies; the People's contacts category contains 3 ontologies; the Family hierarchy & history category contains 9 ontologies. This case study includes 351 non-repeated words of 18 ontologies after performing the pre-processing activities (Section V-B).

4) ACADEMICS
The Academics domain contains 21 ontologies, which are further divided into three categories: Research & bibliography, Educational institute, and Books. Research & bibliography category contains 8 ontologies; the Educational institute category contains 5 ontologies; the Books category contains 8 ontologies. This case study includes 692 non-repeated words of 21 ontologies after performing the pre-processing activities (Section V-B).

D. USER REQUIREMENTS
In order to test the validity of the proposed framework in recommending ontologies, we test the accuracy of the system in recommending ontology based on user requirements. We involve a cohort of graduate students who studied the Semantic Web course. We trained the cohort and collected 31 requirements related to the above four ontology domains. We also involve three domain experts of Semantic Web and ontologies in order to identify the ontologies for the 30 requirements. Each expert identified 30 (out of 92) most appropriate ontologies for the given user requirements. In this section, we only provide a description of 13 user requirements (four for computer science group and three for the rest of ontology domains defined in Section VI-C) to evaluate the effectiveness of the proposed framework.

1) REQUIREMENTS OF THE COMPUTER SCIENCE DOMAIN
The user requirements (UR) of the computer science domain are given as follows.
UR-1: ''I need ontology so i can report the bugs of software. This ontology must have bug type, bug report, bug status, and report status report contains priority, severity, and report attributes. It should also provide a solution and fixed version and it generates a summary of the bug report, the ontology should also provide information such as bug is resolved by person or community.'' UR-2: ''An ontology is required for cyber systems. Mainly the ontology should focus on attack pattern detection. The ontology should contain the taxonomy of problems and concepts related to the cyber world, E.g. weakness and vulnerability of system, target, probing techniques, impact of attack, and types of attacks, attack steps, patterns, technique and description This ontology provides a vocabulary and representation for the Common Attack Pattern Enumeration and Classification (CAPEC) which provides a publicly available, community-developed list of common attack patterns along with a comprehensive schema and classification taxonomy. Attack patterns are descriptions of common methods for exploiting software systems. They derive from the concept of design patterns applied in a destructive rather than constructive context and are generated from in-depth analysis of specific real-world exploit examples.'' UR-3: ''An ontology that contains all the pieces of the configuration of a server, the ontology should have concepts ranging from the server implementation to the user database and the policy being maintained by the server. Moreover, basic authorization and authentication manager should be there for security reasons.'' UR-4: ''I am building an ontology based system for sentiment analysis so I need an ontology related to the sentiment analysis. Ontology must contain all behaviors or emotions such as happy, sad, angry, uncomfortable, pain etc.''

2) REQUIREMENTS OF THE ACADEMICS DOMAIN
The URs of the academics domain are given as follows.
UR-5: ''Ontology for university benchmark is required. It contains information of faculty such as dean, director, Full Professor, Clerical Staff, Professor, lecturer, teaching staff and students. This ontology contains information of Graduate Course, research articles and publications. Against each publication and research, article data is stored such as research interest, title, Publication author ad publication year.'' UR-6: ''Academic ontology that must contain the information conference, journal, and author. Organization name and author name are provided. It also must have the year of conference and journal. Conference name and journal name are provided. Publications of the author must be provided in the ontology. This publication must contain some pub id.it contain pages and titles of conference and journal.'' UR-7: ''Ontology that contains university information. It contains faculty such as associate professor, full professor, researcher, teachers, and external teacher. It includes courses and these courses are taught by some teacher. Last name and first name of each faculty and student are provided. Titles of courses are part of this university information system.''

3) REQUIREMENTS OF FOOD AND DRINKS DOMAIN
The URs of the food and drinks domain are given as follows.
UR-8: ''This ontology contains the detail of all coffees. Base, drink, and a topping of coffees must be defined properly. It contains the ingredients of the coffee like condensed milk, stream milk, level of sugar and water.'' UR-9: ''Food ontology that models the ingredients of pizza and provides the vegetable ingredients that are used to make pizza. It contains the types of pizza ad its sizes such as large pizza, small pizza, meat only pizza, medium pizza, and vegetarian pizza. This ontology also contains ingredients for meat only pizza.'' UR-10: ''Ontology that model the cocktails, drinks, and beverages. This ontology describes the ingredients of the drinks and cocktails. Hot sauces and Worcestershire sauce are used to serve the beverages and drinks. Alcoholic and nonalcoholic beverages must be included in this ontology separately such as brandies and rums, coconut milk and coffees.''

4) REQUIREMENTS OF THE PEOPLE DOMAIN
The URs of the people domain are given as follows.
UR-11: ''We are developing a system that involves keeping a tight record of bio-data of people. In this regard, we need an ontology that we can align with an ontology that we are developing. The ontology should contain complete contact details of a person involving Address, country, state, cell phone number, email address etc.'' UR-12: ''We are developing a system that involves keeping a record of a person's family. In this regard, we need an ontology that we can align with the ontology that we are developing. The ontology should contain complete family details of a person's children, spouse, parents, etc. moreover, according to a person's gender the close relations he has i.e. aunt, nephew, niece etc.'' UR-13: ''An ontology is required containing concepts related to artists and their prominent works and their early life details (e.g. born, school, died etc.).''

VII. EXPERIMENTAL PROCEDURE
In this section, we describe the tools used to perform the experiment. Moreover, for evaluating the proposed approach, we devised three pseudocodes for experimental procedures, which are: 1) organize ontologies, 2) determine appropriate ontology domain, and 3) select the most appropriate ontology.

A. TOOLS USED IN EXPERIMENTATION
We performed all the experimentation process on Intel R Core m3-7Y30 at 1.61GHz with 8 GB RAM. For the first phase of the proposed approach, we used Spyder IDE with OWLready package. However, Protégé can also be used for manual extraction of terms (classes, properties, and description). For the rest of the phases, we used the R Project (R) for statistical computing. The operating system used is Windows 10. Subsequently, We used the ''''tm'', ''worldcloud'', ''snowballC'', ''xlsx'', ''clues'', ''factoextra'' and ''cluster'' R packages to perform the experiments. We adopted the best software engineering and programming standards [85] to implement the proposed system.

B. PSEUDOCODE FOR ORGANIZE ONTOLOGIES
Pseudocode 1 aims at describing two main activities. The first activity selects the best weighting method for un-supervised learners (USLs) used in this study. The second activity selects the best-unsupervised learner out of K-Means, K-Medoids-Euclidian, K-Medoids-Manhattan and Fuzzy c-means.  generate a text file t (o) containing terms for each ontology.

7
Perform pre-processing activities and generate a VSM of t. 8 end 9 foreach (weighting method wm ) do 10 apply wm to VSM . 11 end 12 foreach (unsupervised learner ul ) do 13 apply ul technique to organize the ontologies O of a group into k clusters. 14 evaluate the performance of ul with wm using evaluation criteria (average accuracy in Section VI(A)). 15 Select the best wm and ul with the highest accuracy against each ontology group.

end 17 End
Result: Best weighting method and corresponding best un-supervised algorithm for organizing ontologies.

C. PSEUDOCODE 2: DETERMINE ONTOLOGY DOMAIN
Pseudocode 2 describes how to determine an appropriate ontology domain (for example, Computer Science, Academics) for any given UR.

D. PSEUDOCODE 3: SELECT THE MOST APPROPRIATE ONTOLOGY
The aim of Pseudocode 3 is to describe how to select the most appropriate ontology for the given UR from the ontology domain/category determined earlier by Pseudocode 2.

VIII. RESULTS AND DISCUSSION
This section discusses the results and findings of the proposed study. The efficiency of the proposed framework is evaluated in terms of organization of ontologies, predicting correct ontology group and recommendation of ontologies with respect to user requirements.

A. ORGANIZATION OF ONTOLOGIES
We assess the effectiveness of the proposed framework to organize ontologies into related groups (clusters) with respect to the expert's opinion. We took the help of domain experts to  generate a text file t (o) containing terms for each ontology.

7
Perform pre-processing activities and generate a VSM of ur and t. 8 end 9 foreach (weighting method wm and unsupervised learner ul) do 10 apply wm to VSM .

11
apply ul technique to organize the ontologies O and UR into k clusters. 12 evaluate the performance of ul with a corresponding wm using evaluation criteria.

13
Select the best wm and ul with the highest accuracy to determine the appropriate ontology group o for the selected ur. 14 end 15 End Result: The suggested ontology group.
identify the correct ontology group/category for each ontology, for example, if an ontology belongs to the networking, cybersecurity, or software systems category. We used these opinions as true labels and measured the accuracy of the proposed system for organizing the correct ontology group for each ontology. The experiments are performed according to the given procedure and results are reported with respect to the proposed evaluation model. In the context of the ontology organization, we used four algorithms and five weighting methods. The experimental results are shown in Fig 7. Furthermore, the results of organization of ontologies are also presented in Table 4

B. PREDICTING CORRECT ONTOLOGY GROUP FOR USER REQUIREMENT
We use the Pseudocode-2 and the proposed evaluation model to assess the effectiveness of the proposed framework in predicting the candidate ontology group for a given user requirement. The experimental results are shown in Fig 8. Furthermore, the results are also presented in tabular form in Table 8. In this section, we discuss the experimental results with respect to four ontology domains and 31 user requirements. In this regard, each USL is used with the best weighting method. The main findings of the experimental results terms of predicting the appropriate ontology category for the given UR are as follows.
• In case of the Academics ontology domains, we observe that Fuzzy c-means (Accuracy=0.61) outperforms the rest of USLs with their best weighting methods. • In case of the People ontology domain, we observe that K-Medoid-Manhattan (Accuracy=0.80) outperforms K-Medoid-Euclidean (Accuracy=0.61), K-Means (Accuracy=0.42), and Fuzzy c-means (Accuracy=0.42) with their best weighting methods.
• In case of the Computer Science ontology domain, we observe that Fuzzy c-means (Accuracy=0.72) outperforms the rest of USLs with their best weighting methods.
• In the case of the Food and Drinks ontology domain, we observe Fuzzy c-means (Accuracy=0.88) outperform the rest of USLs with their best weighting methods.
• Finally, we observe that the performance of USLs varies with respect to the nature and size of the data.

C. ONTOLOGY RECOMMENDATION
We use the pseudocode-3 and the proposed evaluation model to assess the effectiveness of the proposed framework in selecting the correct ontology from the candidate ontology group for a given user requirement. The candidate group selected from the outperforming USL. In this regard, the cosine value of each ontology with respect to the user requirement is shown in Appendix A. The ontology with the highest cosine value is recommended as the right ontology for each UR. For example, in the case of UR-7 (Table 10 in Appendix A), the university information system ontology with the highest cosine value is recommended as the right ontology. The effectiveness of the proposed framework is evaluated for predicting correct ontology for the given UR in terms of RCO.
The key findings of the experiments results are summarized as follows: • The proposed framework correctly recommended 8 out of 8 ontologies (RCO=100%) for the Academics domain, 5 out of 6 (RCO=83%) for the People group, 7 out of 9 (RCO=77%) ontologies for the Computer Science domain and 7 out of 8 VOLUME 9, 2021  ontologies (RCO=87%) for the Food and Drinks domain.
• It is observed that the proposed framework recommends 27 ontologies correctly for 31 URs, which describe the RCO as 87%.
• Moreover, it is also observed that the description of a UR plays a vital role in predicting an appropriate ontology for it.
Considering the promising results of the proposed system in the selection of appropriate ontology, the proposed system can be utilized to recommend ontologies to the users. The proposed system can help the novice or expert ontology designers, data providers, data, and knowledge engineers to accurately find the appropriate ontology. These data providers and engineers, are often overwhelmed by the search results or find it too time-consuming to find an already existing ontology because of time constraints. Subsequently, they end up creating one which is already available, doubling the cost. Moreover, unlike Linked Open Vocabularies [86], the only general ontology search engine [87] which provides popularity based ontologies in an unordered list, the proposed system recommends the only appropriate ontology to the user. The functionality can be enhanced to recommend the top three ontologies based on the cosine values.

IX. CONCLUSION AND FUTURE WORK
The aim of the proposed framework is to organize and recommend ontologies with respect to user requirements in order to reduce the efforts and time of developers. The proposed framework employs text categorization approach and un-supervised learning algorithms. The purpose of the proposed framework is to overcome the issue  of ontology selection in terms of their reusability. Moreover, we also proposed an evaluation model to assess the efficacy of the proposed framework. We evaluate the proposed framework in the context of four ontology domains with 31 URs.
The key implications of results of the proposed framework are as follows. Firstly, no single algorithm can be described as the best algorithm for the organization of ontologies and determination of correct ontology group for a given UR. Secondly, for determination of correct ontology group for a given UR, Fuzzy c-means performs best for the Academics domain whereas K-Medoids(Euclidian and Manhattan) performs better for the People and the Food and Drinks domains. Thirdly, it is observed that no single weighting method can be recommended as best for all USLs across all the four ontology domains. Fourthly, the proposed system recommends appropriate ontology to the user with RCO=87%. Fifthly, though the inclusion and exclusion of ontologies from the corpus might alter the presented results, it has no effect on the context of the proposed framework. This feature means that the proposed framework is not a context-aware system like existing approaches. Sixthly, like the existing approach for ontologies recommendation, the proposed framework does not need a formal specification of ontologies.
In the future, we will focus on two aspects: 1) to use n-gram for construction of feature vectors rather than the use of individual words, and 2) to assess the effectiveness of the proposed framework by considering numerous ontologies from different domains and more user requirements while focusing on the other multi-label text categorization approaches.

APPENDIX A ONTOLOGY RECOMMENDATION RESULTS
This section presents the results of the ontology recommendation. Each requirement, candidate ontology group and CS value is presented in each table. Table 9 contains the Computer Science ontology group's results, Table 10 presents the results of the Academic ontology group. Similarly, Tables 11 and 12 present the results of Food and Drinks, and People's ontology groups respectively.

ACKNOWLEDGMENT
The authors are immensely grateful to the three domain experts who helped them in this research by identifying the ontologies against each requirements. They would also like to show our gratitude to the students of Semantic Web who provided them with the requirements for four ontology groups.