A Non-Exclusive Multi-Class Convolutional Neural Network for the Classification of Functional Requirements in AUTOSAR Software Requirement Specification Text

,


I. INTRODUCTION
Complex systems, such as automotive software systems, are organized into subsystems, designed, and built separately before being unified to accomplish the intended functionality. The aggregate of requirements documents intensifies as domain experts draft them for each subsystem. Comprehending the design concepts pertinent to the diverse needs chal-The associate editor coordinating the review of this manuscript and approving it for publication was Chiu-W. Sham .
lenges the requirements engineers, as cohesive information is dispersed among the multiple sets of system requirements. Requirements Engineering (RE) has evolved as an undeniable component of the software development process as a prospective workaround. RE is a sub-discipline of software engineering that deals with creating and refining software requirements specifications (SRS). The outcome of a collaborative software project hinges on its RE phase [1]. The system's complexity often consists of various interdependent inter-disciplinary modules, rendering SRS a challenging VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ exercise. Hence, RE should double down on the systematic and repeatable approaches which verify that the system requirements are comprehensive, consistent, and relevant. Software Requirement Classification (SRC) identifies the category to which a specific Software Requirement (SR) belongs [1]. There are two types of requirements in SRC. Functional Requirements (FRs) specify the user requirements and product features. In contrast, Non-Functional requirements (NFRs) describe a product's quality attributes, design and implementation constraints, and external interfaces such as security, reliability, and stability [2]. The steering system should turn the wheels towards the left when the driver turns left could be an example of the functional specification. The related non-functional requirement specifies how fast and smooth the wheels should have been aligned for the car to turn left. However, NFRs are beyond the scope of this study.
Under the functional requirements, there are separate subsystems/classes for each functionality. For instance, in the case of the chosen data set (AUTOSAR data), respective SRS documents are explicitly maintained for the Communication class and some generous requirements under the General category. The number and types of subsystems vary according to the enterprises.
In the automotive sector, developers of single subsystems of an automotive system are unaware of more than half of their dependencies on other subsystems [3]. For example, different teams develop the component subsystems comprising an automobile. These teams specify the requirements native to their subsystem but have minimal awareness of the relationships between other subsystems, and this results in ambiguity in data; the exact requirement is specified with different terms. Requirement engineers find it arduous to comprehend the relevant knowledge information dispersed across several constituent subsystems' conditions.

A. MOTIVATION
In the automotive industry, the software system's inputs, conditions, actions, and expected outputs are all comprehensive information in software requirements specifications. Despite the requirement documentation being well-described, automatic requirement categorization is challenging due to innate ambiguity in natural languages and the recourse to multiple terminologies and sentence patterns to represent a specific requirement [4]. SRC holds great potential as it classifies requirement statements that developers can comprehend while strategizing the system components pertinent to fulfilling a particular requirement. For example, effective classification makes prioritization and filtering of relevant requirements much more accessible.
The high ambiguity in the elicitation of requirement documents across the subsystems makes automatic classification more error-prone. For example, the Safety and Powertrain FR classes of AUTOSAR SRS document [5] share similar safety measures and associate software (SW) components. The classification turns more heinous if the dataset is imbalanced.
In this paper, the selected AUTOSAR dataset also suffers from a severe imbalance among classes and ambiguity. Therefore, the requirement classification carried out in this paper is designed as a simple neural network, instead viewed as a non-exclusive multi-class problem (NEMO). NEMO is a novel deep neural network (DNN) framework to perform multi-label classification introduced in this paper. NEMO turns this N -class classification problem into N two-class binary classification sub-problems. Each binary classifier is developed using the one-versus-the-rest approach. For example, among N binary classifiers for N classes, the classifier developed for class c1 is trained with the data from class c1 annotated as positive, and the data from all the other courses is annotated as unfavorable.
The outputs of the N -DNN models are unified to infer the classes of the requirements. This paper proposes a deep learning (DL) method for automatically classifying functional requirements from SRS documents.

B. OVERVIEW
This paper focuses on multi-class mappings of various functional specifications from AUTOSAR SRS documents. Data analysis techniques and a DL framework, specifically Convolutional Neural Network (CNN), classify functional software requirements into appropriate categories and make the classification process efficient and laborious. This paper investigates three specific aspects of artificial intelligence (AI) techniques for the RE of SRS documents.
• Word embeddings to represent the SRS document • CNNs to build the classifier framework to classify functional requirements classes.
• The classifier network is intended to categorize documents in a non-exclusive way.

II. RELATED WORKS
The classification of software requirements with textual analysis is an evolving topic in software engineering research to enhance software quality. Casamayor et al. [6] proposed a semi-supervised recommender system model to aid requirement engineers in detecting and classifying NFRs from the descriptive text. This recommender system is built using the Expectation-Maximization method. Slankas and Williams [7] utilized various approaches, including K-Nearest Neighbours, Support Vector Machine, and Naive Bayes classifiers, to extract and classify NFRs from PROMISE NFR Datasets into 14 distinct categories and assessed their performance. Reference [8] classified requirements documents from the PROMISE repository into Functional Requirements (FR), NFRs, and subcategories of NFR s using the Support Vector Machines (SVM) method. Reference [9] employed a language model and most common keywords for identifying NFRs from requirements documents. In [10], Word2Vec embedding of the PROMISE dataset is sent through LSTM and GRU network as input for classification. Reference [11] utilized two text vectorization approaches and four machine learning methods to categorize the requirements into two categories (functional requirements and non-functional requirements). In [12], a multi-label requirement classifier based on CNN, classified NFRs into five categories: reliability, efficiency, portability, usability, and maintainability. Researchers have given less attention to FR work, referenced in fewer journals than NFR [13]. Reference [14] designed five integrated models for categorizing FR statements using Naive Bayes, Support Vector Machine (SVM), Decision Tree, Logistic Regression, and Support Vector Classification (SVC) algorithms to enhance their accuracy.
The novelty in this approach is that DL techniques automatically classify FRs of enterprise applications into appropriate categories. A binary classifier is designed for each class for non-exclusive classification, considering the innate dependencies and data imbalance among classes.

III. DATA DESCRIPTION
AUTOSAR SRS documents are available in Portable Document Format (PDF). These data are converted to raw text for further processing. The text is carefully pre-processed and cleansed, using appropriate regular expressions to remove noise at maximum. These documents contain various descriptions and details of some diagrammatic representations, which are not excluded in the analysis. The 21 functional requirements/specifications classes are selected from the AUTOSAR website to highlight data imbalance and interdependencies. Table 1 enlists the classes and their specifics. Figure 1 is a pie chart that shows visually how unbalanced the classes are in terms of the number of sentences.

IV. CLASSIFICATION FRAMEWORK
Classification of software requirements involves four phases, as shown in Figure 2. They are • Data-pre-processing: The raw unstructured data from the AUTOSAR document is converted to text format, cleaned, and normalized to ensure input data is eligible for further processing.
• Vectorisation of texts: The pre-processed data is given as an input to the AUTOSAR Doc2vec model to infer conceptual information from documentation as vectors (text embedding).
• Classification: The vectorized document from Phase 2 is used to train and predict the classification models used, Multi-Layer Perceptron (MLP) and CNN. The pre-processed document form each is given to separate 21 model classifiers for classification.
• Evaluation: The results of the requirement's label predictions and the true labels of these requirements are used to calculate the performance measures, presented in Section V.
A. PHASE 1: DATA PRE-PROCESSING The AUTOSAR PDF files are converted into editable text documents prior to data cleansing. Tokens are produced from the raw text input during data pre-processing. Only the information that is comprehensible is retained after eliminating any other data such as tables, captions for figures and tables, page numbers, section titles, and punctuation. Sentences that are longer than 30 tokens are broken up into smaller ones.
As most SW components utilise capital letters for abbreviation, they are excluded from case folding, which is used to unify the cases throughout the entire text. Figure 3 displays a line graph of the distribution of requirement classes in relation to the number of sentences(in thousands) on AUTOSAR SRS documentation before (in blue) and after pre-processing (red). When PDF files are converted to text format, the data is noticeably different since tables are formed as distinct lines during the conversion. As they don't offer information that is helpful for the categorization process, the table contents and figure names were all eliminated during the pre-processing.

B. PHASE 2: DOCUMENT REPRESENTATION
In order to carry out the classification that offers quantitative characteristics of the text, document representation/embedding is a crucial step, where the text is transformed into a vector representation that replicates semantic elements from documents. The smallest unit of a written or spoken language with a practical meaning is a word. The document representation is therefore constructed over the word embedding. Rich vector representations of words called word embeddings capture the syntactic and semantic links between words [15]. In this paper, the following two standard pre-trained word vector models are used: Word embedding for SRS documents is created using a) Word2Vec vector [16], and b) FastText's Common Crawl word vectors containing subword information. FastText and Word2Vec both aim to provide distinctive vector representations of words. FastText is a Word2Vec add-on [17]. Word2Vec and FastText both learn word vectors depending on their immediate surroundings. The manner of prediction varies between the two. Word2Vec predicts words using words, but FastText employs character n-grams to work at a finer level. To anticipate the words, it makes use of character n-gram and subword information. FastText vectors containing sub-word information were chosen in particular for their capacity to deduce a superior semantic vector representation for unidentified words [18]. The word vectors pre-trained on millions of broad data may produce a poor semantic representation for certain words since the AUTOSAR dataset includes domain-specific terminology. Therefore, for improved semantic representation, the pre-trained word embedding is retrained using the cleaned AUTOSAR data.
Each class's pre-processed AUTOSAR data coalesce as a single text file. To create the sentence/document vectors, the merged huge text file is trained using the Doc2Vec method along with previously trained word vectors [19].

C. PHASE 3: DEEP LEARNING TECHNIQUES
The goal was to discover the multi-class mappings of different functional requirements to the 21 classes that are already known. For this investigation, basic deep neural network designs, an MLP, and a 1-D CNN were used. Then, the sentence embeddings were divided into training and testing groups, with training receiving the largest share (80%).
The classifier model is trained for all 21 classes separately with the one-against-all (OAA) strategy [20]. In OAA strategy, the text embedding for the one concerned class is labelled as '0', and the text embeddings of other classes are randomly sampled and labelled as '1' for (binary) classification. The sampling of data ensure that the deviation between the number of texts/sentences between class '0' and class '1' is kept minimal. For instance, the classifier model to be trained to detect whether the requirement belongs to the 'BSWGeneral' class, the text embeddings of the sentences belonging to the concerned class 'BSWGeneral' are labelled as '0', and the text embeddings belonging to the other 20 classes are randomly sampled, and labelled as '1'.
The one-against-all (OAA) technique is used to train the classifier model independently for each of the twenty-one classes. For (binary) classification in the OAA technique, the text embedding for the specific class is labeled as ''0'', while the text embeddings of other classes are randomly picked and labeled as ''1''. The sampling of data makes sure that there is little variation in the number of texts or phrases falling into classes ''0'' and ''1.'' For example, the text embeddings of the sentences belonging to the class 'BSWGeneral' are labeled as '0' and the text embeddings belonging to the other 20 classes are randomly picked and labeled as '1' in the classifier model to be trained to determine whether the requirement belongs to that class. The data division ought to be roughly equal. For instance, if the class of interest only contains a little amount of data, such as 100 sentences (as in the case of the Human Machine Interface (HMI) in the AUTOSAR SRS document), then the text embedding from  the remaining 20 classes is randomly selected, but only for an aggregate of 100 sentences.
Separate binary classifiers for each class help to prevent the over-fitting issue that is brought on by imbalanced classes. The best DL model classification parameters are found via grid search.

D. MULTI-LAYER PERCEPTRON
A subclass of feed-forward DNN is a multilayer perceptron (MLP). It uses one input layer and one output layer while having several hidden non-linear layers. In the proposed architecture, as seen in Figure 5, there are two hidden levels. According to mapping principles, the initial training data sets are matched with an appropriate targeted data set in this model.
Given a set of features X = x 1 , x 2 , x 3 . . . . . . x m and a target y,, where X is the sentence vector matrix, x i ∈ R n is the n-dimensional vector corresponding to the i t h sentence in FR class and y is the p × 1 vector of class labels. MLP uses Rectified Liner Unit (ReLU), a non-linear activation function and Binary Cross-Entropy as the loss function for classification. Figure 4 shows the architecture of the model.

E. ONE DIMENSIONAL-CONVOLUTIONAL NEURAL NETWORK
The Convolutional Neural Network (CNN) is a deep neural network that excels at automatically extracting features from input that may be processed in a ''grid-like structure'' [21]. In other words, CNNs are made to benefit from the VOLUME 10, 2022 location and order of the input pieces during learning, making them more suitable for pattern recognition tasks [22], [23]. Although MLP may learn non-linear models, its susceptibility to feature scaling is addressed with 1D-CNN. CNNs were first developed for image recognition, but they have also been successful in textual applications [24], [25].
Let x i ∈ R n be the n-dimensional word vector corresponding to i t h sentence in FR class. A filter is used in a convolution process to create a new feature vector from a sentence vector (input). Here, w ∈ R k and b ∈ R are the weight and bias terms, respectively and f is a non-linear activation function to learn the patterns in the vector. Rectified Liner Unit (ReLU) is the activation function and Binary Cross-Entropy is the loss function. Figure 5 depicts the model architecture.  Table 2 gives the overall performance of the DL networks stated above over two distinct feature scenarios: a retrained vector model and an original model using in-house vocabulary. Some FR classes that had data that was functionally equivalent in size to other classes' data (text content interpreted as functional requirements) were able to achieve model accuracy of greater than 70%. It seems sensible that the classes with the lowest model accuracy (20%) are also the ones that had the least amount of data. This suggests that, if there is balanced data available across classes, the problem has a very promising potential for success. Due to the highly skewed nature of the data and the higher model accuracy of the classes with more data, which may reach up to 95%, the total model accuracy in Table 2 is greater than 70%. Since the model's overall correctness cannot be assumed, it is vital to examine how each of the classes is performing using the suggested model. Figure 6 for the FastText Retrained Model shows the class-by-class accuracies for each of the twenty-one classes.

V. RESULTS AND DISCUSSION
Small validation sets (less than 50 sentences) might cause statistical uncertainty since a single validation set could not accurately reflect the dataset as a whole. K-fold cross-validation is a method for resolving this conundrum. It involves repeating the training and testing process on k randomly chosen, non-overlapping divisions of the dataset and averaging the results over all k-folds [26]. Figure 7 shows that there are several classes with model accuracy as low as 0% or none at all. One can only assume that the explanation is a result of a lack of information. Additional infusions of data from these FR classes are required. There are still several FR classes with very little data (149 lines), such as HMI, yet with incredibly accurate models (98%) that exist. The apparent concern is how to increase trust in and confidence in the accuracy claims made by the 21 FR models. The accuracy of each of the 21 FR class models is shown in a histogram in Figure 7 along with a z-test statistic that indicates the level of confidence in each accuracy. The relative data size affects the z-test statistic (in number of lines, aka specifications). According to Figure 7, the accuracy varies depending on how general the FR classes are. The FR classes ''Communication'' and ''BSW general,'' which have large amounts of data (more than 2000 lines), have strong model accuracy and a high level of confidence (<80%). The content of poorly performing classes like ''HMI'' and ''Body and Comfort'' is particularly ambiguous or overly generalized, which confuses the classifier model. As a result, these classes have strong model accuracy (<90%) in classification but low  confidence scores. Figure 8, which displays a heat-mapped confusion matrix of 21 FR classes, makes this behavior obvious. The class id in Table 1 is aligned with the class id in the rows and columns of the confusion matrix. The columns display the forecasts, while the rows represent the actual facts. Figure 8 makes it evident that several of the low confidence classes in the confusion matrix predict false positives relative to the more confident classes. For instance, of all the courses, Class '3' ('Communication') is by far the most precise and certain of all the classes, as seen in Figure 7 in contrast, poorly performing classes like class '8' ('HMI'), '20' ('Tools'), and '1' ('Body and Comfort') have a large number of false positives due to a large amount of data that these FR classes have access to Finally, we assert that despite the multi-class classifier network's widespread use in multi-label classification, it might not be a workable option in cases when data are few. With more balanced data, it will undoubtedly become better, but training efforts will be well behind. For the model to achieve an accuracy and confidence level of above 90%, deeper and more complex architectures are required. From a computational point of view, this will be a training challenge. The suggested classifier model considers the possibility that FR requirements might simultaneously define several classes.

VI. CONCLUSION
The general viability of explainable artificial intelligence models during automotive requirements engineering is discussed in the article. According to research, Model-Based Engineering is widely employed in the automobile industry and is even used by certain practitioners for requirements engineering. Domain-specific terms, however, have caused a number of issues with requirements engineering methods at automotive businesses. In this study, we present an NLP pipeline for categorizing functional requirements from AUTOSAR SRS papers into several types. MLP and CNN were used in the classification model's development.
All 21 classes of AUTOSAR documents were used to train the twenty-one binary classifier model for multi-label categorization. Along with the word vectors of the sentences that are inferred from the pre-trained word vectors, Word2Vec and FastText, the Doc2Vec model vectorizes the sentences in AUTOSAR documents. The word vectors were produced using pre-trained online embedding and re-training the current embedding model using internal data. In both cases, retrained vector classifier models outperformed an initial vector model in terms of accuracy. The maximum accuracy, however, is provided by the retrained vector CNN classifier model (77%).
SANJANASRI JP received the degree in August 2021, and the Ph.D. degree under guidance of Dr. Soman KP. She is currently serves as an Assistant Professor in computational engineering and networking (CEN) with the Multi-Disciplinary Research Center, Amrita School od Computing, Amrita Vishwa Vidyapeetham. She is strongly Associated with CEN, since 2010. She has published several conference and Journal papers so far. She has worked and did projects on several areas including, image processing, speech processing, the IoT devices such as Raspberry Pi, Arduino, and Jetson Nano. Her specific areas of research interests include machine learning, geometric deep learning, natural language processing, and big data.
VIJAY KRISHNA MENON is currently a Data Scientist (unicorn) and computational expert specialized in machine learning, deep learning, and other data driven techniques in big data. He works predominantly with big data on financial analytics, smart grid, and other data driven areas including NLP and text mining. He was a consultant to various companies providing his experience in NLP and data analytics. He is uniquely skilled in transforming complex mathematical models and algorithms to efficient computational implementations. He has a good research record on real-time analytics and streaming data of large volume and velocity and designing efficient data pipelines too. The recent areas he has worked includes but are not limited to text mining, financial forecasting, and stock market dynamics, data driven cybersecurity, computational epidemiology (modeling Covid- 19), and spatiotemporal analytics in various domains.
SOMAN KP currently serves as the Head and a Professor with CEN and the Associate Dean of the Amrita School of Computing, Amrita Vishwa Vidyapeetham. He has more than 25 years of research and teaching experience in artificial intelligence and data science related subjects at the Amrita School of Engineering, Coimbatore. He has around 450 publications to his credit in reputed journals such as IEEE Transactions, IEEE ACCESS, Applied Energy, and conference proceedings. He published four books namely Insight into Wavelets, Insight into Data mining, Support Vector Machines and Other Kernel Methods, and Signal and Image Processing-The Sparse Way. His book, insight into data mining was translated into Chinese. He is the most cited author in Amrita Vishwa Vidyapeetham in the area of artificial intelligence and data science (more than 5500 citations). He was listed among the Top-10 computer science faculty by DST, Government of India, from 2009 to 2013, the Career 360 and MHRD from 2017 to 2018, and also in the list of the most prolific authors in the world, prepared by Springer Nature. Under his guidance, CEN is running the M.Tech. degree in computational engineering and networking (data science) and the B.Tech. degree in computer science and engineering (artificial intelligence). He guided 12 Ph.D. students so far and currently guiding ten research scholars.
ATUL K. R. OJHA is currently pursuing the Ph.D. degree with the National University of Ireland, Galway. He is currently serves as an Adjunct Professor with the National University of Ireland. He is also the Co-Founder of Panlingua Language Processing LLP, India. He has published several papers in the area of NLP and conducted shared tasks and workshop in NLP in notable NLP conferences such as COLING, EMNLP, and so on. His area of research interests include data anlytics and NLP machine translation and localization, specch to speech and sign MT, digital humanities, metrics for MT evaluation and translation quality, corpus mining, semantic/syntactic parsing, machine learning, cognitive translation analysis, hate and aggressive speech, big data and computational linguistics, and artificial intelligence.