A Survey of Automatic Text Summarization: Progress, Process and Challenges

With the evolution of the Internet and multimedia technology, the amount of text data has increased exponentially. This text volume is a precious source of information and knowledge that needs to be efficiently summarized. Text summarization is the method to reduce the source text into a compact variant, preserving its knowledge and the actual meaning. Here we thoroughly investigate the automatic text summarization (ATS) and summarize the widely recognized ATS architectures. This paper outlines extractive and abstractive text summarization technologies and provides a deep taxonomy of the ATS domain. The taxonomy presents the classical ATS algorithms to modern deep learning ATS architectures. Every modern text summarization approach’s workflow and significance are reviewed with the limitations with potential recovery methods, including the feature extraction approaches, datasets, performance measurement techniques, and challenges of the ATS domain, etc. In addition, this paper concisely presents the past, present, and future research directions in the ATS domain.


I. INTRODUCTION
T HE amount of textual material on the web and other libraries is growing tremendously daily. Information utilization has become an expensive and time-consuming activity since data expands in a large quantity at a time and includes irrelevant content or noise. Text summarization is a method used to summarize the data. A manual text summarization process is undoubtedly an effective way to preserve the meaning of the text; however, this is a timeconsuming activity. Another approach is to utilize the automatic text summarization (ATS). In ATS, different practical algorithms can be programmed into computers to produce summaries of information. Thus, text summarization creates a brief and accurate overview of a lengthy text document by concentrating on the essential parts that provide valuable details by maintaining the overall context. In natural language processing (NLP), automatic text summarization is a method of evaluating, comprehending, and extracting information from human language. Nowadays, students to researchers, business leaders to business analysts, people from every domain work with numerous documents. Sometimes, people get confused about finding the relevant part within a document or documents where ATS can be very helpful and useful. The fundamental goal of ATS is to create a compact and persuasive summary and maintain critical information from the document. ATS also aims to generate a review that condenses the significant ideas from the input content into a small amount of space. Furthermore, the ATS systems assist users in obtaining the essential points of the original content without reading the complete document. Users will profit from the automatically generated summaries, saving them a great deal of time and work.
The objectives of this study focus on providing a thorough review of various ATS research projects. To acquire deep knowledge, researchers require a sense of what has already been done and further possibilities in this broad VOLUME 4, 2016 topic. Therefore, this study aims to assist academics and professionals in developing an idea of the evolution of ATS, research progress, and future research directions in this topic. In addition, the apparent obstacles or limitations in future research in this field are also discussed in this paper.
Text summarization was invented by H.P. Luhn [1] in the 1950s, which was used in the first commercial computer IBM 701. Using the bag-of-words approach, he counted the frequencies of the most frequent words based on their occurrence. Then, the most frequent words were selected and assigned a number to each sentence depending on a regular event. Gradually, linguistics was being considered and began to use various word types and formation using natural language processing (NLP). The extraction, categorization, and classification of texts are the main targets. After that, the evolution of NLP between 1990 and 2000 introduced the conversion of sentences into vectors and words into their base forms. The introduction of advanced NLP techniques such as neural word embedding [2], Bag of Words (BoW) [3], and word2vec [4], and modern deep learning approaches such as recurrent neural networks (RNN) [5] and long short-term memory (LSTM) [6] have observed significant progress in the ATS domain. The evolution of ATS from the 1950s to the present is reviewed in this study.
Text summarization processes from the 1970s to the early 2000s are considered traditional methods. Traditional text summarization processes require a better knowledge of the document to find the essential keywords. ATS has become an appealing domain for its influential assistance in the study and expansion of automation, too [7]. The improvement behind this new ATS is achieved by following a standard structure. Text summarization becomes more accurate and fluent by getting trimmed and interpreted with the proper design of processes.
As we have investigated ATS in-depth in this comprehensive survey, the study required a collection of scholarly research between 1998 and 2021. We followed a systematic literature review (SLR) approach to complete the review. Kitchenham proposed this systematic literature review (SLR) approach [8], [9] which consists of three phases: planning, conducting, and reporting the review. The SLR approaches tried to answer all possible questions that could arise while progressing in this research field. The goal of this research was to examine the findings of several essential research disciplines. The necessary materials for this research are assembled using the (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) workflow diagram. The PRISMA workflow for this survey is shown in Figure 1.
The overall contributions of this paper are given as follows: • This article performs a systematic review of the automatic text summarization, including the fundamental theories and evolutions. proaches, text summarization algorithms, performance measurement, evaluation matrices, and challenges. • The article compiles ATS architectures based on current methods, datasets, feature extraction, and summarization approaches. Moreover, this study explains the constraints and limitations of such methods. • Subsequently, the study ends by distinguishing the current difficulties and challenges of ATS architectures, along with future research directions. The remainder of this paper is organized as follows: The literature review of existing ATS surveys shown in Section II, the motivations and applications of ATS are described in Section III, the basic structure is provided in Section IV, the most commonly used datasets in ATS are described in Section V, the widely used pre-processing techniques are addressed in Section VI, the strategies for extracting features are described in Section VII, main ATS approaches are discussed in VIII and algorithms are described in Section IX. The ATS approaches are reviewed in Section X and the ATS measuring performance methods are discussed in Section XI. The ATS challenges with potential research objectives are addressed in Section XII. Finally, Section XIII concludes the paper.

II. LITERATURE REVIEW OF EXISTING ATS SURVEY
We have investigated the existing surveys of the ATS domain, and a few of them are presented to prove the significance of this paper. Most surveys covered the former methods and research on ATS. However, recent trends, applicability, effects, limitations, and challenges of ATS techniques were not present. Table 1 summarizes and compares the existing survey on ATS.

Reference
Year Main Purpose Limitations [10] 2009 Critical ways to summarize the texts and provides a taxonomy of the methods of summarization.
This paper doesn't include cognitive aspects, including visualization techniques and evaluations of the impact. [13] 2016 Describes two definite summarization techniques, which are abstractive, extractive. Introduces techniques and methods only. [14] 2017 A study based on automated keyword extraction and text summarization.
They do not briefly review every approach they included; they missed some feature extraction model. [15] 2017 Topic Representation, frequency-driven, graph-based, and the effectiveness and Limitations. The recent approaches are not surveyed. [16] 2017 Processes of extractive methods and multilingual text summarization are discussed.
A precise classification and idea about feature scores and extraction is missing.
A detailed classification and description of feature extraction are missing.
[18] 2020 To handle multi-documents for summarization based on recent research work and comparison. Does not represent any brief discussion of any topic.
their advantages or disadvantages. This research did not include abstractive or hybrid techniques. Saranyamol et al. [11] offered a thorough survey for analysts by introducing various aspects of ATS such as structure, strategies, datasets, evaluation metrics, etc. Gambhir et al. [16] attempted to analyze a hybrid approach including two text summarization methods. This study missed many contemporary techniques for review. The research of Gholamrezazadeh et al. [10] represents a comprehensive and comparative study of extractive methods in ATS of the last decade. Several multilingual approaches have also been discussed. Andhale et al. [13] provided a taxonomy of text summarization methods and a variety of techniques. Although the author has covered some timeconsuming processes of ATS, recent, more efficient methods such as machine learning were missed. Abualigah et al. [18] conducted research on how to handle multiple documents and massive web data for text summarization. Lastly, the paper contains a comparative table with recent studies without details. Bharti et al. [14] presented a survey of research papers based on automated keyword extraction methods and techniques. It covers ideas about multiple databases that are used for document summarization.

III. MOTIVATION AND APPLICATION OF ATS
This study aims to provide an overview of current research in NLPs and, precisely, ATS to accelerate knowledge about it. In addition, it allows the creation of new tools, methods, datasets, and resources that meet the needs of the research and industrial sectors. The advancement of NLPs made automatic text summarization usable for a regular text document summary and sentiment analysis. Moreover, ATS promotes a versatile approach to research various fields such as machine learning, natural language, cognitive science, and psychology. With multiple sources of information, ATS discusses cutting-edge work and future directions in this exciting area. These collective findings are the motivation behind this research. An essential part of research on ATS is the application, which is presented in the following section.
Recently ATS has extensively employed applications based on information retrieval, information extraction, question answering, text mining, and analytics. The ATS also improves the search engine's capabilities with various applications, including news summary, email summarizations, domain-specific summarization. Now, the applications of the ATS domain are presented below: 1 Books or Novel Summarization: ATS is used mainly to summarize long documents such as books, literature, or novels, as short documents are unsuitable for summarization. It is not easy to find context from short texts, whether long documents are a better summary material [19]. 2 Social Posts or Tweet Summarization: Every day, millions of messages, posts are generated on social networking sites such as Facebook, Twitter, etc. Useful important text summarization can be achieved using ATS [20]. This valuable source of information using the ATS [20]. 3 Sentiment Analysis (SA): The analysis of people's views, feelings, and judgments regarding events and situations is known as sentiment analysis. SA classifies emotions and mostly opinions from product reviews as "Positive" or "Negative" using fuzzy logic. ATS is quite helpful for market analysts in summarizing the feelings or thoughts of hundreds of people [21]. 4 News Summarization: The ATS helps summarize news from many websites, such as CNN and other prominent news portals. ATS extracts the primary emphasis point of the story in a newspaper, which is sometimes used as the story's headline [22]. 5 Email Summarization: Email communications are unstructured and not usually syntactically well-formed domains for summarization. ATS usually extracts noun phrases and generates a summary of email messages using linguistic methods, and machine learning algorithms [23]. 6 Legal Documents Summarization: ATS discovers relevant prior instances based on legal questions and rhetorical VOLUME 4, 2016 functions to summarize a legal judgment document. A hybrid approach employs various methods, including keywords, critical phrase matching, and case-based analysis [24]. 7 Biomedical Documents Summarization: ATS combines genetic clustering and connectivity information with a graph-based summarization process. Genetic clustering identifies the various themes of a biological document, whereas connectivity data demonstrate the relative significance of the study [25]. 8 Summarization of Scientific Papers: Scientific documents are well-structured texts with numerous researchers' viewpoints on a similar topic. In addition, the critical points of a scientific document are primarily found in tables and figures rather than generic texts. A multidocument ATS framework combines two methods to produce a technical survey of scientific documents. First, follow and gather the citations and finally, use summarization techniques to determine the content of the original and related cited articles [26].

IV. STRUCTURE OF ATS
The basic architecture of ATS contains several sections, including an input layer that consists of two types of documents, and there are single-documents and multi-documents, respectively. A single-document summarization (SDS) system that selects the important sentences from the source document while considering the maximum limit of the summary [27]. Whereas, in a Multi-document summarization (MDS), multiple documents are selected as input to generate the summary [28]. Generally, an automatic text summarization goes through a set of stages including pre-processing, feature extraction, application of summary generation algorithm or methods to summarize the source document. In a single document summarization system, the monolithic structure of the document is utilized, whereas, in a multi-document summarization system, the document's structure is less relied on. The fundamental issue with MDS arose from the aggregation of many resources from which the data is taken, which may contain more redundant information than is typically present in a single document. Furthermore, putting the collected data into a coherent text from multiple documents to make a cohesive summary [29]. Summarization on news [30], scientific publications [31], emails [32], [33], product reviews [34], lecture feedback [35], [36], Wikipedia article generation [37], medical documents [38], and software project activities [39] are just a few examples of real-world applications for the multi-document summarization task [28]. Summarization can be done in two ways: abstractive or extractive. The abstractive approach is more achievable than the extractive approach for achieving concise summarization of contrary views and preferences [40]. The details of both abstractive and extractive approaches are discussed in section VIII. An illustration of the basic structure and steps of an ATS is presented in Figure 2 and discussed as follows: where the input (single or multi-document) are pre-processed and follows feature extraction. The next section follows summarization approaches and methods which concludes to a summary of the document.

A. PRE-PROCESSING
In the pre-processing phase, the linguistic techniques are used to pre-process input text documents using crucial techniques such as sentence segmentation, punctuation marks removal, filtering stop-words, stemming (reducing common root words), etc. Widely used pre-processing techniques are discussed in Section VI.

B. FEATURE EXTRACTION
The extraction of sentences is vital for the entire summarization process by selecting different features within the source document. Selected features are applied to each sentence, and the highly scored sentences are chosen for the summary in feature extraction phase. Feature extraction techniques of ATS are elaborated in details in Section VII.

C. SUMMARIZATION APPROACHES
The first and most crucial step in the summarization strategy is to determine efficient methodologies. Some methods involve selecting the essential words and lines from the texts, while others involve paraphrasing one sentence and condensing the original content. Detailed text summarization approaches are discussed in Section VIII.

D. ALGORITHMS
Algorithms or methods are a more definite way of defining text summarization. Different algorithms and methods under various approaches are applied to obtain a better version of the summarized text. Text summarization algorithms are discussed in details in Section IX. In the following section, we explore the important use of different widely used data sets in ATS application and research.

V. DATASETS USED IN AUTOMATIC TEXT SUMMARIZATION
This survey presents an overview of the essential resources used to analyze and assess the ATS fields. This section includes the datasets that are most often used for ATS evaluation. These text summarization datasets are mostly a collec-  Table 2.

VI. PRE-PROCESSING TECHNIQUES IN ATS
Several pre-processing are performed to clean the noisy and unfiltered text. Erroneous messages and chats, including slang or trash phrases, are known as "noisy" and "unfiltered text". The approaches mentioned below appear to be some of the most often utilized pre-processing procedures: 1) Parts Of Speech (POS) Tagging: The technique of grouping or organizing text words according to speech categories such as nouns, verbs, adverbs, adjectives, etc., is known as speech tagging [163]. 2) Stop Word Filtering: Based on the context, stop words are screened out either before or after textual analysis. A, an, and by are illustrations of stop words that can be analyzed and eliminated from plain text [164]. 3) Stemming: Stemming eliminates inflections and derivative forms to a set of words categorized as primary or root forms. By using linguistic strategies such as affixation, text stemming transforms words to consider different word forms [164]. 4) Named Entity Recognition (NER): Words in the input text are recognized as names of items (i.e., person name, location name, company name, etc.) [165]. 5) Tokenization: Tokenization is a text pre-processing technique that divides text flows into tokens, which can be words, phrases, symbols, or other meaningful pieces. The goal of this technique is to examine the words in a document [166], [167]. VOLUME 4, 2016 6) Capitalization: Diverse capitalization in different documents can be problematic and thus requires to convert every letter into lowercase letters in a document. All text and document words are then merged into a single feature space using this method [168]. 7) Slang and Abbreviation: Slang and abbreviation are two different types of text anomalies that are addressed in the pre-processing stage. A support vector machine is an acronym [169], a shortened form of a word or phrase made up mainly of the first letter of the terms. 8) Noise Removal: Most textual data contain many more characters, such as punctuation and special characters. While important punctuation and special characters are required for human interpretation of documents, they can cause problems with classification algorithms [170]. 9) Spelling Correction: Spelling correction is an optional step in the pre-processing process. Typos are common in texts and documents, particularly in online media text datasets (e.g., Twitter) [171]. 10) Lemmatization: The process of changing a word's suffix with a new one or eliminating a word's suffix to obtain the basic word form is known as lemmatization (lemma). Its main application area is natural language processing [172], [173].

VII. FEATURE EXTRACTION IN ATS
Feature extraction is a technique for discovering topic sentences, essential data traits or attributes from the source documents. ATS follows two phases to locate the important sentences in the text: extracting features and text representation approach. This section describes the most often used extraction features and text representation approaches for generating sentences for text summarization.

A. FEATURES
Collecting the essential features is the first phase of the feature extraction process. It is necessary to represent the sentences as vectors or score them to find a vital sentence from a document. Some features are used as attributes to define the text for this task. The most prevalent features for calculating the score of a sentence and indicating the degree to which it belongs to a summary are given below: 1) Term Frequency (TF): The TF metric is used to determine the importance of terms in a single document [174]. As one of the most fundamental properties of ATS, it is commonly employed to represent a word's weight. 2) Term Frequency-Inverse Sentence Frequency (TF-ISF): The most relevant feature extraction approach based on the text summarization survey measures the term frequency-inverse sentence frequency amongst the sentences in all documents [175]. The weights, which seem to be reasonable indications for meaningful sentences, are generated using this method. Calculating is a quick and straightforward process. 3) Position Feature: It is usually considered that the beginning and last sentences would provide more informa- whether it is summary-worthy. In summation, it may be wrong to assume that a sentence is worthy of mention based on its length. Compared to the size of other sentences in the source material, very long and comparatively short sentences are usually not included in the summary [177]. 5) Sentence-Sentence Similarity: The resemblance of the querying sentences to other sentences in the text may be helpful for summarization. This feature extraction process can be performed in various ways [178]. 6) Title Feature (Tif): Sentences containing terms [179] from the headline may suggest the document's theme and are more likely to be included in the summary. 7) Phrasal Information (PI): The proportion of phrases is always helpful in summarizing. A collection of phrases P includes adjective phrases (ADJP), noun phrases (NP), prepositions (PPM), and verbal phrases (VP) [180].

8) Title Similarity (TS): A sentence receives a decent grade
if it has the most terms in common with the title. The number of words can determine the title similarity in a sentence that appears in the title, and the total number of words [116]. 9) Sentence Position (SP): This feature determines where a sentence appears in the text. The importance of the sentences is decided by where they appear in the text, whether it is the opening of five sentences in a paragraph [177]. 10) Thematic Word (TW): This feature is associated with domain-specific phrases that frequently appear in a text most likely relevant to the document's topic. The score is calculated by comparing the number of theme words in the phrase to the maximum sum of thematic terms in the sentence [179]. 11) Numerical Data (ND): A statement incorporating numerical data is generally crucial. This is most likely found in the summary of a document. The score is calculated by dividing the numerical data in a sentence by the length of the sentence [116].

B. TEXT REPRESENTATION
The text representation models are now utilized to represent the input documents in a better shape. In NLP, text representation approaches imply translating words into numbers so that computers can comprehend and decode patterns within a language. Generally, these approaches develop a connection between the chosen phrase and the context word from the document. Some popular text presentation methods such as bag-of-words, n-gram, and word embedding are discussed below: 1) N-gram: N-gram is an ideal approach for multi-language operations because it does not require any linguistic preparation. An n-gram is a collection of words or characters with N components. This model is simple to create, and the text may be represented by a vector, which is usually of a reasonable size. Unigrams, bigrams, trigrams, quad grams, and other n-grams compromise a set of text N-grams [181], [182], [183], [184], [185], [63]. The ngram has some limitations, such as the fact that the greater the N, the better the model. However, these results go through a lot of processing, requiring heavy computing power in the RAM. N-grams are also a sparse representation of language as the model is based on the likelihood of terms co-occurring. All words that are not present are given a chance of zero in the training corpus. 2) Bag of Words (BoW): The most primitive sort of numerical text representation is the bag-of-words model [3]. A phrase, such as a term itself, can be expressed as a bagof-words vector [65]. In a text document, it is a shortened and simplified rendition of the substance of a sentence. Computer vision, NLP, Bayesian spam filters, document categorization, and information retrieval utilizing machine learning are all areas where the BoW technique is used. [186], [101], [187], and [188] are the papers in which BOW feature extraction approaches are used.
The following are some of the issues related to BoW: If the new phrases include new words, the vocabulary will expand, as will the length of the vectors. Furthermore, the vectors would have a significant number of elements. 3) Term Frequency-Inverse Document Frequency (TF-IDF: IDF measures how important the word is, whereas Phrase Frequent (TF) measures how frequently a term appears in a text. The IDF value is needed because merely computing the TF is not sufficient to comprehend the significance of words. The inverse document frequency (IDF) was developed by K. Sparck Jones [189] as a strategy to use term frequency to reduce the impact of implicitly popular terms in the corpus. Term frequencyinverse document frequency is the name given to the combination of TF and IDF (TF-IDF). However, TF-IDF has several drawbacks: it directly calculates texts' resemblance in the word-count space, which might be slow with large vocabularies. Also, it is presumed that the counts of various terms give independent evidence of similarity. [190], [191], [192], and [175] are examples where TF-IDF is proposed for the feature extraction approach. 4) Word Embedding: Word embedding is a type of feature learning. Each word or phrase in a lexicon is mapped to an N-dimensional vector of absolute values. Various word embedding algorithms have been proposed to convert ngrams into comprehensible inputs for machine learning systems. This study focuses on Word2Vec, GloVe, and FastText, three of the most widely used deep learning methods for word embedding [2], [193].
• Word2Vec: Word2Vec [4] is a technique for creating embedding. Skip-gram and common bag of words are two approaches (both utilizing neural networks) to obtain it (CBOW). The CBOW technique uses each word's context as an input and attempts to anticipate the word that corresponds to it. Skip-gram aims to optimize the categorization of a word based on another word in the exact phrase rather than expecting the current word based on context [2]. Several articles focused on Word2Vec and can be seen in [50], [101], [194], [195], [196], [197], [194]. • Global Vectors ForWord Representation (GloVe): GloVe [198] is another robust word-embedding approach that has been utilized for text categorization. This method is comparable to the Word2Vec process. Each word is represented by a high-dimensional vector and trained using the surrounding words over a large corpus. [103], [199], [200], [201] are the articles in which the GloVe word embedding approach was used. • FastText: Several alternative word-embedding representations disregard the morphology of words [202]. By proposing a new word embedding approach called FastText, the Facebook AI Research team introduced a unique solution to tackle this issue. [203], [204], [205] are the proposed papers in which the FastText word embedding was used. This section covers all feature extraction methods. The approaches implemented in ATS over the years are detailed in the following section.

VIII. AUTOMATIC TEXT SUMMARIZATION APPROACHES
Generally, ATS is a complex and time-consuming operation that often lacks better results because computers lack a proper understanding of human language. Researchers have tried to extract better performances and standard classifications for summary texts. Text summarization approaches vary based on the number of input documents, such as single or multiple, objective-wise generic, domain-specific, or querybased, and performance-wise. The following sections cover performance-wise analysis, which is divided into two classes.

A. EXTRACTIVE TEXT SUMMARIZATION
The extractive text summarization method aims to identify words and sentences in a text material and use them effectively to create a summary [22]. This requires the selection of sentences from the original document based on their importance. These important sentences are then used to replicate the essential elements of the text word for word, resulting in a subset of the original document's phrases. The foundation is consisted of three independent tasks [152]  the topic words the text content. On the other hand, indicator representation scores depend on the features of the sentence 3) Selecting the highest-scoring sentences to form the summary In extractive text summarization, two approaches of machine learning are applied -supervised and unsupervised machine learning as shown in Figure 3. The following section contains a brief overview of these subclasses.
• Supervised Learning Methods: In supervised learning methods, the first step is to learn how to label documents by training to identify summarized and non-summarized documents. Machine learning and neural network algorithms of these methods require a classified dataset for training, where summarized and non-summarized texts are available with labels [152]. • Unsupervised Learning Methods: With unsupervised learning methods, the summarization process can be performed without any help, such as selecting the introductory sentences of the document from the user. These methods only require advanced algorithms such as graph-based, concept-based, fuzzy logic, and latent semantics to take user input and work automatically [152]. These approaches are beneficial for extensive data.

B. ABSTRACTIVE TEXT SUMMARIZATION
Abstractive text summarization is the development and automation of the traditional method of text summarization [147]. The abstractive process identifies key sections and the main ideas of a text document by paraphrasing them. The abstractive summarization process follows some common steps as follows: 1) Analyzing main contents from the text documents utilizing a vocabulary set different from the source 2) Paraphrasing the relevant data that fit in the semantics for creating a summary which contains all the actual points of the source document utilizing NLP models The abstractive summarization approaches are of two types, one is a structure-based approach, and the other one is a semantic-based approach. A brief discussion of these two types based on NLPs is provided below: • Structure-based Methods: The structure-based approach continuously filters the most critical data from documents by applying abstract or cognitive algorithms. The algorithms for tree-based, template-based ontology, rule-based ontology are the most commonly used [147]. • Semantic-based Methods: The semantic-based approach attempts to refine the sentences by implementing the NLP on the entire document. This approach can easily find the noun and verb phrases using some methods. These methods are multimodal semantic method (MSM) [206], semantic graph-based method (SGM) [207], information item-based method (IIM) [208], semantic text representation model (STRM) [209] [147].

IX. AUTOMATIC TEXT SUMMARIZATION ALGORITHMS
In this section we explore the supervised, unsupervised, structured-based, semantic-based, extractive and abstractive graph-based and deep learning ATS methods and algorithms.

A. UNSUPERVISED LEARNING METHODS
Unsupervised extractive text summarization picks essential sentences from documents more like regular ATS [210]. This process shows unique nature when it selects sentences without employing labeled summaries during training of the dataset or text document. In addition, unsupervised methods are effective because they do not require user feedback or human overviews to determine the essential features of the document. Unsupervised text summarization is more efficient than supervised and better suited for lengthy text summaries. Various unsupervised techniques or methods for summarizing texts or documents are discussed below:

1) Fuzzy Logic Based Method
The design of a fuzzy logic-based approach typically includes the selection of fuzzy rules and membership functions. It consisted of four components; a fuzzifier, inference engine, defuzzifier, and knowledge base [211]. The fuzzy-logicbased approach is also used for selecting the most important sentence from the source document or context. However, the fuzzy logic-based method requires a redundancy removal technique to achieve better results. Suanmali et al. [212] proposed a fuzzy logic approach for ATS where the summary is created by ordering the ranked sentences from the original text. In addition, some papers such as [213], [214], [215] focused on this fuzzy logic-based method in their ATS research.

2) Concept-Based Method
The concept-based method extracts concepts and utilizes similarity measures to reduce redundancy from the original document [152]. These retrieved concepts are calculated, and the sentences are scored based on importance. Nevertheless, this method and fuzzy logic have the same limitations, but fuzzy logic stands out as it handles ambiguous situations better. Ramanathan et al. [216] proposed a method that employs a sentence concept bipartite graph structure to generate summaries derived from sentences from Wikipedia. More examples for concept-based summarization to retrieve textual concepts from an external information base can be found in [217], [218].

3) Latent Semantic Analysis (LSA) Method
Latent semantic analysis (LSA) is an algebraic-statistical method for extracting hidden semantic structures of sentences and phrases [219]. LSA is an unsupervised learning technique that derives information and similar words from input documents. The significance of this method is that any outside training or template is not necessary to find out similar words appear in separate sentences [220]. However, the LSA method has some limitations, such as not analyzing word order, syntactic relations, or morphologies. In addition, it relies solely on the information contained in the input document rather than on outside knowledge. Finally, limitations such as performance deterioration using inhomogeneous datasets take this method out of comparison. Moreover, this method works quite well in the semantic summary of texts. [221], [222], [42], [54] are other examples of the LSA method for text summarization tasks.

B. SUPERVISED LEARNING METHODS
Supervised learning methods are sentence-level classifications that learn to distinguish between summarized and nonsummarized sentences [152]. A collection of documents and human-generated summaries can learn the characteristics of the sentences included in the summary. In addition, it has significant disadvantages in making context summaries manually and requiring more labeled training samples for classification.

1) Machine Learning (ML) Method
The machine learning method is used to classify the sentences as summary or non-summary classes using training data. These methods are applied when multiple document copies require extractive summaries. Notably, in this case, each document's sentences are represented as vectors. Primarily, machine learning algorithms are implemented on a set of trained datasets with documents that are trainable [174]. A collection of training manuals is fed into input documents in the training phase and classified based on the weight of a sentence. In most cases, a simple regression model works better than classifiers but still requires an extensively trained dataset. This data is the only layer in the machine learning algorithm whether a neural network model has multiple layers. That is why neural network models are becoming more usable and user-likelier on ATS. In addition, the ML methods consist of some other standard pre-processing information retrieval algorithms, such as stop-word removal, case folding, and stemming. The stemming algorithm and the concept of sentences represented as vectors are proposed by Porter [223], and [224] respectively. Machine learning trainable algorithms, such as C4.5 or naive Bayes [225] are mostly used where these algorithms are learned on a training set and tested on a separate test set. Several studies have focused on machine learning-based summarization tasks, and it can be seen in [174], [226], [227], [12].

2) Neural Network (NN) based method
A neural network approach [228] uses a three-layered feedforward network that learns the features of sentences during model training. The feature matching phase is significant, and the relationship between the characteristics is identified in some steps. Removing infrequent features and combining frequent elements followed by sentence ranking are the steps to define meaningful sentences. The neural network-based method is also used to train as per the human's style or requirements as the network learn from its training data. With multiple layers and increasing the number of hidden layers, NN algorithms perform better than ML algorithms as an advanced version of ML. A framework developed in [229], the RankNet technique also requires neural nets to classify the relevant sentences in the text automatically. It incorporates a two-layer neural network with backpropagation, which is trained using the RankNet algorithm. An old TextSum system architecture including text preparation, keyword extraction, and summary creation was proposed in [230]. The system pre-processes the source document using two methods: stop word removal and stemming. [158], [59], [231], [232], and [176] are other proposed studies in which a neural network was established for summarizing source documents.

3) Conditional Random Fields (CRFs) Method
Conditional random fields are statistical modeling techniques based on machine learning that provide a standardized prediction [233]. CRF uses non-negative matrix factorization (NMF) approaches to extract the correct features. Then the proper elements are used to define the introductory sentence from the document. CRF's main benefit is classifying suitable characteristics by offering a more precise representation of sentences and sections. A vital problem of this technique is that it specializes in domain-specific, which necessitates an external domain-specific framework for the training phase. This methodology can be generically employed to any text without first creating a domain framework that is timeconsuming. Therefore, ML and NN-based algorithms are still a better option. Some studies have been conducted conditional random field-based methods, and it can be seen in [234], [235], [236], and [237].
Other methods have been explained in various studies to explain the extractive approach for text summarization. These methods are optimization-based, statistical-based methods, topic-based methods, sentence centrality, or clustering-based methods. Researchers utilized a genetic algorithm to calcu- VOLUME 4, 2016 late the optimal weights in the optimization method because of its high computational time and cost. The number of iterations required must be defined. The topic-based method concentrates on topics in the input text. The limitation of the topic-based approach is that the sentences will not appear in the summary if the score is not the highest; it affects the quality of the generated summary [238]. Sentence centrality or clustering-based methods includes repeated sentences and is suitable for multi-document summarization. It groups different sentences on the same topic [239]. However, it requires prior specification of the number of clusters, and for similar sentences, redundancy removal techniques are required [240].
To summarize the text in an abstractive approach, the summary may include a new language that is not seen in the main text, which leads to paraphrasing. Language generation and compression strategies are required to generate abstractive summaries. To generate better abstractive summaries, abstractive text summarization is also divided into two categories, structured and semantic. A brief discussion of structured-based methods, semantic-based methods, and their subclasses is given below:

C. STRUCTURED BASED METHODS
In the abstractive summary, the source document requires newly constructed sentences to summarize. In the structurebased method, phrases from source documents are interpreted in a specified structure without losing their meaning. Structure-based approaches mainly rely on preset forms and spatial reasoning schemas, such as templates, tree-based, ontology-based, and rule-based structures.

1) Tree-Based method
The tree-based method recognizes sentences that exchange shared knowledge and facts and then mixed them to provide an abstractive summary. This tree-like structure is called tree linearization [241], which comes from many dependency trees. Dependency trees are a representation of the source text of a document. The tree-based model helps process multiple documents and identify the usual information using a syntactic tree. These methods also produce less redundant summaries, but they cannot detect the relationship between sentences without considering the context. Therefore, it overlooks significant phrases in the text. Another issue with this method is the continuous focus on syntax, not semantics. Even after these issues, this model stands out in structuredbased methods because of its fluency in summarization.

2) Template-Based Method
In the template-based method, the topic or content is extracted into possible phrases and speakers by finding similarities with a template space [242]. The template-based method is used when a document requires a predefined guideline or a human-made template for the summary. This method constructs informative and coherent summaries as various phrases and speakers of the content are selected based on the choice. Oya et al. [243] proposed a system that requires a human-made template summary template using a fusion algorithm for multiple sentences.
In another study of Zhang et al. [244], a speech actbased strategy was proposed to summarize Twitter topics. The majority of existing Twitter translating algorithms are based on template-based summarization methods. It provides abstract summaries that are appropriate for the many, brief, and chaotic characters of tweets. The issue with this method is that the templates for summarization are always predefined, which does not give much variety in the summaries. Therefore it can not produce fluent summaries in comparison to the tree-based approach.

3) Rule-Based Method
The rule-based approach finds facts and reviews of essential concepts in source documents through questioning. The interrogation and questions can be "What are the topic?" "What is the time-being of the story or topic?" etc., and answering these questions tries to generate an abstractive summary. Gupta et al. [245] proposed a rule-based method to extract relevant lines from a text paragraph in the Hindi language. Some artificial rules in the Hindi language are employed as "What are person names in the table?" "What are the locations mentioned in the table?" "What are the special symbols contained in the table?" etc. In addition, Laskar et al. [246] suggested a method using the BERTSUM model [247], which uses a transformer-based architecture for abstractive summarization. Rule-based methods are used when input documents need to be represented as classes and lists of aspects, such as query-based methods. This method is required to prepare the rules, which is a time-consuming process. Manually written rules make this method less efficient than the other methods mentioned earlier in this subsection.

4) Ontology-Based Method
Ontology is a knowledge-based approach that acts as a formal naming and definition of the entity types of a specific domain [186]. A base of knowledge is applied in this method to improve the outcome of summarization. Ontology-based methods perform extensively when a document has a knowledge structure or is repeatedly constructed to the same topic. Therefore this method is focuses on specific domain-related documents and constructs coherent summaries. Similar to the rule-based method, this method is also time-consuming. Okumura et al. [248] proposed a Wordnet ontology in his research work. In other work, Mohan et al. [249] proposed some methods for evaluating ontology, such as; ontometric, ontoclean and evalexon. A suitable ontology preparation is a very time-consuming process and cannot be generalized to other domains.

D. SEMANTIC BASED METHODS
Semantic-based methods illustrate the linguistics of a document's texts into a natural language generation (NLG) system, with a significant focus on noun and verb phrase identification [147]. These methods are effective at making less redundant and grammatically correct sentences. A disadvantage of these methods is that they sometimes ignore critical information or data even when grammatically correct.

1) Multimodal Semantic Method
The multimodal-based method is used to apprehend both image and text concepts from a document [147]. Therefore the multimodal semantic model gathers notions and establishes relationships by expressing text and pictures in multimodal materials. The foundation of a semantic model is knowledge representation based on objects. Concepts are represented by nodes, whereas connections represent the relationships between concepts. The completeness, connection with others, and repetitions of an expression are checked using the information density measure. Finally, the selected ideas are translated into sentences to summarize. SimpleNLG is an example of such a system, which provides interfaces for direct control over the way phrases are created and merged and inflectional and morphological control. [250], [251] are examples of multimodal semantic methods utilized in text summarization.

2) Semantic Graph based Method
The semantic graph-based approach summarizes a document by building a graph for the original document called rich semantic graph (RSG) [252] and reducing the created semantic graph. Making brief, cohesive, and grammatically correct sentences with reduced networks is the strength of this method. The semantic-graph-based model mainly extracts semantic information by assigning weights to the nodes and edges of sentences. For this reason, this model works well in most cases but requires a semantic representation of the text. Several studies have proposed semantic graph-based methods for text summarization tasks, and some are [13], [207], [253], [252] and [254].

3) Information Item Method
The information-item-based method is used to summarize a text file based on its abstract instead of producing an abstract from the text file's words. An abstractive representation of the source material is used to construct a summary in this approach. The minor component of a source document is an information item [255] [256]. There should be a logical flow of information in a text, and then the method retrieves information from that. A method based on information items delivers more concise and fewer redundant summaries.

E. EXTRACTIVE + ABSTRACTIVE 1) Graph Based Method
The graph-based method can be applied to both extractive and abstractive text summarization. This approach is an unsupervised learning method that rates the required sentences or terms using a graph. The purpose of the graphical process is to extract the most relevant sentences from a single text [152].
Graph-based ranking algorithms determine the relevance of a vertex in a graph based on global information iteratively extracted from the entire chart. When it comes to text summarization, specific graph-based techniques are applied.

1) LexRank: LexRank is a probabilistic graph-based tech-
nique for calculating sentence significance based on the notion of eigenvector centrality in a graph representation of phrases for natural language processing. It is a connectivity matrix based on intra-sentence cosine similarity that is utilized as the adjacency matrix in a sentence graph representation [65]. [257], [258], [259], [260], [261] proposed the LexRank algorithm for graph-based text summarization task. 2) Hyperlink-Induced Topic Search (HITS): Hyperlinkinduced topic search is a link analysis algorithm that determines the authority and hub values. The results for the search query are retrieved, and then the computation is performed only on this set of results. A hub value is the total of the scaled authority values of the pages it points to, and an authority value is the sum of the scaled authority values of the pages it points to [262]. Some articles focused on the HITS ranking algorithm for graphbased text summarization tasks, as can be seen in [263], [264], [265], and [266]. 3) PageRank: The PageRank algorithm utilizes the inbound links of specified pages to measure their significance or quality to rank the search results. PageRank links more weight based on the importance of the page from which it originates [267]. Some articles proposed the PageRank algorithm for summarization [268], [269], [270], [271], [272]. 4) TextRank: TextRank is an unsupervised method for automatic text summarization to extract the most important keywords from a document. Based on the material that both phrases contain, TextRank estimates the degree of similarity between them [273]. This overlap is computed as the number of shared lexical tokens divided by each sentence's length [274]. [275], [276], [264], and [269] are examples where the TextRank algorithm is proposed. 5) Positional Power Function: The positional power function is a ranking method that calculates a vertex's score as a function that incorporates both the number and score of its descendants [264]. [277], [278], and [279] are the papers where the positional power function is used. 6) Undirected graph: Undirected graphs, in which the outdegree of a vertex is equal to the in-degree of the vertex, can also be used with a recursive graph-based ranking method. Undirected graphs exhibit slower convergence curves for weakly linked graphs with the number of edges proportionate to the number of vertices [264]. Some studies have focused on undirected graph-based algorithms [280], [281], and [282]. 7) Weighted Graphs: Multiple or partial connections between the units (vertices) retrieved from the text may be present in graphs created from natural language texts.

VOLUME 4, 2016
As weight is applied to the matching edge that links the two vertices, it may be beneficial to express and include the "strength" of the relationship between two vertices in the model. When calculating the score associated with a vertex in the graph, the edge weights are considered. It is worth noting that integrating vertex weights may be performed using a similar method. [263], [264], [65] are the examples of the articles in which the weighted graph algorithm is used. 8) Graph-based Attention Mechanism: The relationship between all other phrases determines the significance score in the graph model. Traditional attention and graph ranking algorithms are combined in this mechanism to compute the rank scores of the original sentences, resulting in varying significance ratings of actual phrases while decoding various states [283]. Some articles are proposed graph-based attention mechanism for the text summarization task [194], [197].

2) Deep learning Algorithm
Deep learning models help information-driven ATS to become more efficient, accessible, and user-friendly. These models are highly promising for ATS because they attempt to imitate human brain functions. Deep neural networks are commonly employed in NLP issues because their design fits well with the language's complicated structure; for example, each layer can handle a particular job before passing the output to the next. A few commonly known deep-learning models [284] for ATS are described below: 1) RNN Encoder-Decoder: The sequence-to-sequence paradigm is used in the RNN encoder-decoder architecture. The sequence-to-sequence model converts the input sequence of the neural network into an identical series of letters, words, or sentences. Machine translation and text summarization are two examples of NLP applications [285]. The challenge behind this RNN seq2seq is that it requires an extensive dataset. The training process of datasets is time-consuming. This is why the deep learning methods mentioned in the later part perform better. Anyway, there are some other papers that proposed RNN encoder-decoder in the text summarization task, and some of them are [50], [286], [287], and [155]. 2) Long Short-Term Memory (LSTM): The repeating unit of the LSTM architecture comprises input/read, memory/update, forget, and output gates [6], [288]. The chain structure is very similar to that of an RNN. The input gate is a randomly initialized vector. The input of the current step is the output of the previous step in future stages. The forget gate is a single-layer neural network with a sigmoid activation function. The sigmoid function's result determines whether the prior state's information should be ignored or remembered. The memory gate controls the influence of recognized information on new information. The output gate controls the quantity of new information transmitted to the next LSTM unit. The LSTM shows promise in producing a concise abstractive summary. [5], [289], [290] used the LSTM-based method to summarize risk.

3) Gated Recurrent Unit (GRU): GRU is a simplified
LSTM with two gates: a reset gate and an update gate with no explicit memory. When all the reset gate elements approach zero, the previously hidden state information is discarded. Only the input vector influences the candidate hidden state. The update gate serves as a forget gate in this situation. LSTM contains a memory unit that offers more control, but the calculation time of the GRU consistently decreases. Furthermore, LSTM makes it easier to modify the parameters of whether the GRU takes less time to train [5]. [153], [291], [155] are studies where the writers focused on the GRU-based method for summarization tasks. 4) Restricted Boltzmann Machine (RBM): A randomprobability-distributed neural network (RBM) is a neural network with random probability distributions. A visible layer of visible neurons (input nodes) and hidden layers of hidden neurons constitute the network (hidden nodes). Every hidden node is connected to every input node in a bidirectional manner. Every hidden node is connected to the bias node. In the visible layer, the input nodes are not linked. In addition, hidden nodes are not connected at the hidden levels [292]. The network is known as a restricted Boltzmann machine because of its limited connections. [293], [294], [295], [296], [297], [298] are the research that focused on RBM method for text summarization. 5) Naive Bayesian Classification: The naive Bayesian classification method is used to extract the essential keywords from the text [29]. The Bayes technique is a machine learning approach for estimating differentiating keyword characteristics in a text and retrieving the keyword from the input using this data. The use of this naive Bayesian, score, and timestamp idea together improves the accuracy of summarization. [299], [226], [174] focused on the naive Bayesian classification method for text summarization. 6) Query Based: The score of sentences in a given document is based on the frequency counts of words or phrases in query-based text summarization [116]. Sentences containing query phrases received higher ratings than sentences containing single query terms. The sentences with the highest scores and their structural contexts are then extracted for the output summary [79], [300], [301] are focused on query-based methods for text summarization. 7) Generic Summarization: Generic summaries aimed at summarizing the document's significant points [302]. A number of excellent general summary examines the papers' key points focused on generic methods for text summarization. Rather than repeating the same information we provide the references here [222], [303], [221], and [191]. 8) Q-Network Q-network is used to approximate optimal action-value function that measures the action's longterm reward for the agent. Based on a partial summary (current state) and a candidate sentence, the model may generate a Q-value (action) [304]. When the agent picks the candidate sentence as part of the summary, the output Q-value reflects the expected value. [305], [306], [307] proposed a Q-network for text summarization.
Besides these popular models, some pre-trained language models such as BERT, GPT-2, TransformerXL, XLnet have improved in many NLP tasks [308], [309], ranging from sentiment analysis to question answering, natural language inference [310], named entity recognition, textual similarity and parapharsing [311]. These language models are pretrained on vast amounts of text data and fine-tuned with various task-specific objectives. With an unsupervised goal of masked language modeling and next-sentence prediction, these models can be so helpful. In most situations, pretrained language models are encoders for natural language comprehension issues, including classification tasks at the sentence and paragraph level [247].

1) BERT: Bidirectional Encoder Representations from
Transformers (BERT) [312] is a simple and powerful pre-trained model. BERT is developed to alter both left and right context in all layers to pre-train deep bidirectional representations from the unlabeled text. Also, only one output layer of BERT provides stateof-the-art models for various tasks, such as language inference and question answering. It does not require notable task-specific architectural modifications during fine-tuning. BERT uses a Transformer mechanism that learns contextual relations between words in a text. It also includes two separate tools: an encoder that reads the text input and a decoder that predicts. Unlike directional models that read the text input sequentially (leftto-right or right-to-left), the Transformer encoder reads the entire sequence of words at once [313]. Therefore, it is regarded as bidirectional and, in some cases, nondirectional to be more accurate. This capacity to integrate both sides significantly aids BERT in achieving better results. 2) GPT2: The OpenAI GPT-2 [314] has manifested a remarkable capability to formulate coherent and robust summaries than the current language models. The GPT-2 is not a unique design but quite identical to the decoder-only transformer. The GPT2 is an extensive, transformer-based language model trained on a massive dataset called Webtext. The critical difference between GPT2 with BERT is that GPT-2 is built using transformer decoder blocks whether BERT uses transformer encoder blocks. The GPT2 and some later models like TransformerXL and XLNet are auto-regressive. The idea of autoregression is that whenever the token is produced and added to a sentence sequence, the new line becomes the input to the model in its next step [315].

X. SUMMARY OF PAPERS REGARDING ATS
Recently, researchers have been interested in the ATS domain, which has pushed this area to become an excellent exploration topic. In addition, the desire for a better text summarization method has received special attention. This section first presents the significant research on ATS and then explains the sub-domains of ATS, which are presented in Tables 3, 4, 5, and, 6 with respective state-of-the-art accuracy. Zhang et al. [63] proposed a sentence similarity computation approach using free DUC 2003 datasets and evaluated them with ROUGE-1, ROUGE-2, etc. Mitra et al. [317] explored the extraction of two humans for the same paragraph in four quantities: optimistic evaluation, pessimistic evaluation, intersection, union, which performs comparably better than a random selection of sections. Pal et al. [318] proposed a system with an online semantic dictionary such as the WordNet and Lesk algorithm. The proposed methodology achieved the best outcomes up to a 50% summary of the original text. Fattah et al. [176] explored the use of GA, MR, FFNN, PNN, and GMM for ATS was applied to Arabic and English articles. Ryang et al. [319] suggested a framework constructing a summary within the context of reinforcement learning regarding ROUGE scores performed on DUC 2004. Nenkova et al. [55] attempted to compare the classifications of baselines on the DUC dataset, while the most challenging task was to provide a focused overview in response to a question/topic. Jing et al. [301] created a unique sentencereducing algorithm that removes unnecessary phrases and made intelligent reduction judgments based on syntactic information, context, and probability derived from content analysis. Silber et al. [320] provided a linear-time technique for lexical chain computing for ATS utilizing 24 documents, including human-generated summaries. Chua et al. [321] created a framework using the decay topic model (DTM). The Gaussian decay topic model (GDTM) experimented on Wikipedia links, with GDTM having the best overall performance. Sankarasubramaniam et al. [216] presented an analysis of multi-document summarization using TAC 2010. With the assistance of a closely connected paragraph, the performance of the system is significantly improved.
However, the papers of the four famous sub-domains of ATS are now explained in the following subsections.

A. SUMMARY OF THE PAPERS REGARDING UNSUPERVISED LEARNING METHOD
Previously we mentioned that the extractive text summarization technique consists of two learning methods. In this section, we discuss the papers that cover the unsupervised learning methods of ATS.
Steinberger et al. [191] used latent semantic analysis (LSA) to locate semantically significant sentences using two different evaluation approaches. Suanmali et al. [212] proposed a fuzzy logic relatable sentence extraction summarizer using the DUC 2002 dataset where a sentence of the document was extracted and expressed as a vector of characteristics. Erkan et al. [65] provided a way to determine graph- -Did not cover multidocuments summarization into this system -Authors would like to extend generic summarization by clustering based sentence centrality scoring suggesting a framework comprises three distinct approaches for calculating centrality in similarity graphs. Yeh et al. [42] provided two new modified methods for ATS with the corpus-based approach (MCBA) and the LSA-based TRM approach (LSA + TRM) [328] and a text connection map to extract semantically significant structures from a document. Alami et al. [101] used neural network-based techniques for ATS using the Sentence2Vec feature extraction approach, which produced the best outcomes. Gong et al. [222] proposed two text summary approaches for creating a general text summary by minimizing redundancy. Shen et al. [235] solved a sequence labeling problem by employing the effective sequence-labeling algorithm CRF. Froud et al. [316] suggested enhancing the functionality of summarization using the latent semantic analysis model and Arabic document clustering measures with stemming. Mihalcea et al. [264] studied and evaluated a variety of graph-based ranking algorithms that enable automatic unsupervised sentence extraction from the perspective of ATS. Yousefi et al. [99] introduced an unsupervised deep neural network using the SKE and BC3 email datasets that employ global and local vocabularies to represent words as the AE input.

B. SUMMARY OF PAPERS REGARDING SUPERVISED LEARNING METHOD
In this section, we discuss the papers that covers the supervised learning methods of ATS.
Xu et al. [327] introduced a neural network architecture for extractive summarization, consisting of a sentence extraction model and a compression classifier. According to the results of liu et al. [37], constructing English Wikipedia articles can be addressed as a multi-document extractive summarization of original documents with a decoder-only sequence transduction architecture. Xu et al. [322] introduced DISCOBERT, which employed discourse units as the lowest selection basis to eliminate summarization redundancy and utilizes two types of discourse graphs. Alguliyev et al. [324] [323] also described a strategy for sentence clustering using a discrete differential evolution technique. In addition, the NGD-based dissimilarity measure outperformed Euclidean distance.
Ferreira et al. [325] evaluated 15 sentence scoring methods on three distinct datasets (news, blogs, and article settings) to improve the acquired sentence extraction findings. Neto et al. [174] investigated the framework using an ML method by utilizing statistics-oriented techniques where the Naive Bayes method and the C4.5 decision tree method are the best classification methods. Fang et al. [326] investigated a graphbased ranking model using redundancy removal strategies to enhance the effectiveness of the summarization process. Kaikhah et al. [228] described artificial neural networks to generate summaries of news stories of different lengths using feature fusion to summarize highly ranked sentences. Ledeneva et al [331] provided a statistical approach for single-document extractive ATS that generates a text summary by extracting selected sentences from the source.

C. SUMMARY OF PAPERS REGARDING STRUCTURED TEXT SUMMARIZATION
Structured-based methods are a vital part of abstractive text summarization approaches. The following section examines studies that discuss structured learning methods.
Li et al. [155] designed a methodology for ATS based on a seq2seq encoder-decoder architecture with a deep recurrent generative decoder (DRGN). Liu et al. [330] proposed an adversarial technique for ATS that trained both a generative model and a discriminative model simultaneously. Hennig et al. [186] described how sentences can be mapped to nodes with several linguistic features that are generated to test the efficiency of an SVM classifier. Genest et al. [256] presented a methodology for information extraction and natural language generation. Kikuchi et al. [329] developed an approach for summarizing a single text that contained both VOLUME 4, 2016 -NA sentence and word relationships in a hierarchical tree and the ROUGE score compared to EDU selection. Song et al. [5] constructed a novel ATSDL system based on an LSTM-CNN that solves numerous challenges in text summarizing. Oya et al. [243] demonstrated an ATS for meeting discussions based on modifying a word graph algorithm to build frameworks from human-generated summaries.

D. SUMMARY OF PAPERS REGARDING SEMANTIC TEXT SUMMARIZATION
Based on a comprehensive review of structured learning methods, the following section focuses only on semantic learning approaches for ATS.
Wang et al. [154] introduced a joint attention and biased probability generation approach using three datasets, DUC 2004, Gigaword, LCSTS, where ConvS2S architecture improved by topic embedding, and SCST provided the best results. Chen et al. [332] presented a unique sentencelevel policy gradient strategy between two neural networks hierarchically while preserving language proficiency using CNN/daily mail dataset. Kryściński et al. [46] developed a method for validating abstractive neural models to perform factual consistency testing on the document-sentence level. Zhu et al. [206] proposed a multimodal objective function to utilize the loss through summary generation. ROUGE and order ranking is used to produce the multimodal reference for both automatic and human performance measures. Khan et al. [254] proposed the Sem-Graph-Both-Rel method, which was compared to other summarization techniques based on three pyramid evaluation metrics. A strategy for creating an abstractive summary for a single document is described in this paper [252], which uses a rich semantic graph reducing methodology that can reduce the actual document to 50%. Genest et al. [255] offered an optimistic abstractive summarization, which aimed at achieving an accurate objective by managing the content and structure of the summary using TAC the 2010s dataset. The evaluation of text summarization is difficult. This task is complex for machines to identify key phrases or contents that are important and add value in summary. Placing key phrases has changed the meaning of the summary depending on the purpose of the context, and it is challenging to locate this relevant information. As a result, automatic evaluation measures are necessary for reliable and effective evaluation. After reviewing previously researched papers covering text summarization topics, several methods are determined for summarization measurement. Now, the evaluation measurement metrics of the ATS domain are discussed below:

A. EXTRINSIC EVALUATION
Extrinsic evaluation determines the quality of ATS generated summary depending on how it influences other activities such as text categorization, information retrieval, and question responding. In the summarising process, it is considered good if it aids these mentioned other activities. There are numerous approaches to extrinsic evaluation. Relevance assessment determines whether the text is relevant to the topic, and reading comprehension determines whether it can answer multiplechoice assessments or not.

B. INTRINSIC EVALUATION
The intrinsic evaluation determines the quality of the summary based on comparability among the machine-generated and human-generated summaries. A good summary is judged based on two significant factors: quality and information. Human experts may be required to evaluate machine-generated summaries utilizing several quality measures. Readability, non-redundancy, structure, and coherence, and some other quality metrics include referential clarity, conciseness and focus, and content coverage, etc. Some valuable measures for intrinsically evaluating summaries are precision, recall, and F-measure. Researchers must anticipate comparability between human-generated and VOLUME 4, 2016 automatically generated summaries. With the evaluation metrics mentioned earlier, it is also possible that the two summaries produce different evaluation outcomes despite being equally good. The following section focuses on the most frequently used evaluation metrics in the research, including the following: 1 Precision Metric: The precision metric evaluates whether the percentage of sentences chosen by humans and the computer is correct. The formula shows how the precision metric is calculated by dividing the total number of sentences between two summaries by the number of sentences in the system summary [152]. [333], [334].
2 Recall Metric: The recall metric determines the system recognizes how many sentences are selected by humans.
The following equation is calculated by dividing the number of sentences in both the reference and system summaries [152]. These studies used recall metrics for the evaluation measurement task [226], [335], [99], [179].
3 F-Measure Metric: The F-measure metric incorporates recall and precision metrics. The arithmetic mean of precision and recall is an F-measure metric [152]. [336], [42], [212], and [337] focused on the F-measure metric for the evaluation task.
4 ROUGE Metric: Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a series of evaluations ATS and machine translation. It compares an automatically generated summary or translation to a set of predetermined summaries such as human-generated summaries. ROUGE consists of five measures: ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU. The examples where the ROUGH metrics were used can be found in [146], [338], [339] and [340].
• ROUGE-N (R1): ROUGE-N is focused on the unigram measure of a ATS summary against a humangenerated or pre-defined reference summary. N-gram recall algorithm that compares the system and reference summaries [16], [341]. • ROUGE-L (R-L): The ROUGE-L process is based on the longest common sub-sequences (LCS) between human-generated and automatic-generated summaries. It evaluates the ratio of the size of the LCS of two summaries to the size of the reference summary [16], [341]. • ROUGE-W: ROUGE-W determines the weighted longest standard sub-sequence, which is an enhancement of the LCS [16].
• ROUGE-S: ROUGE-S (Skip-Bi-gram co-occurrence statistics) measures the percentage of skip bigrams shared between the system and reference summaries. The skip bigrams would be any word pair in the sentence sequence with random gaps [16], [341]. • ROUGE-SU*: ROUGE-SU is extended by employing skip-bi-grams and a uni-gram as a measuring unit, a weighted average of ROUGE-S and ROUGE-N. These metrics allow bi-grams to be made up of nonadjacent words with a maximum of n-words between them [16].
5 Pyramid Method: The pyramid technique is used because there is no best comparison summary among the humancreated model summaries. The fundamental aim is to generate a global standard summary by comparing humangenerated comparison summaries based on summary content units (SCUs). A good summary has more SCUs from higher pyramid levels than lower levels, whereas a poor summary has more SCUs from lower tiers than higher tiers [16]. 6 Relative Utility: This measurement assigns a score between 0 and 10 to each sentence in the input document based on relevance. The highest-scored sentence is thought to be more appropriate for summary [342]. 7 Basic Elements: Basic element is a modifier or an argument and the connection of the modifier to the head. The goal of this strategy is to match distinct comparable expressions more easily [16]. 8 Text grammars: This strategy aids in the evaluation of text summaries. This focuses on identifying the structure of acceptable text in a formalized setting [16]. 9 Factoid Score: Factoid score is the evaluation of computerized summaries in terms of factoids which are atomic units of information. Different pre-defined summaries are utilized, and shared knowledge is evaluated among these [343]. 10 Cohesion and Coherence: Cohesion attempts to account for relationships between text elements. The four significant forms of cohesion revealed are reference, ellipsis, conjunction, and lexical coherence [344]. And coherence refers to the text's overall unity or cohesiveness, which is accomplished by efficiently grouping and logically arranging ideas. It's expressed in terms of text-to-text relationships, such as elaboration, cause, and explanation. Mani et al. [345] addressed the cohesion and coherence in their text summarization task. 11 BLEU: The Bilingual Evaluation Understudy (BLEU) evaluation metrics assess the output quality of machine translation systems in terms of reference translation [346]. Counting the number of n-gram matches located independently between the system and the reference translations is the main task of this metric [347]. The BLEU metrics can be computed as: (4) 12 CHRF: Character n-gram F-score (CHRF) generates a simple F-score by combining the recall and precision of character n-grams of maximum length 6 with several parameter values β (= 1, 2, or 3) [348]. However, this is not a common evaluation metrics various variants [349], [350] of CHRF are used to measure performance of text representation techniques like word embeddings. The CHRF can be computed as: Automatic summary evaluation based on n-gram graphs (AutoSummENG), Qarla, ParaEval, GEMS, HowNet, DE-PEVAL used automatic evaluation methods. These evaluation measures do not require human annotations, while others are semi-automated such as factoid score, relative utility, pyramid method, and text grammars require some human annotations [185]. In evaluating the text summarization task, there are some issues with the performance measures, which creates challenges. For this reason, researchers could not reach a proper conclusion and generate a promising result. These issues are addressed elaborately in the Section XII.

XII. LIMITATIONS AND CHALLENGES
ATS aims at assisting users in condensing all important information that needs to be summarized. As we have discussed in previous sections, although many summarization techniques can be used to create summaries from texts and documents, these techniques are still confined to extracting specific parts of the original text and concatenating them into a shorter text, abstraction, or paraphrasing the original material in a broad sense. The ultimate goal of any ATS system should be able to summarize texts as close as possible to a human-generated summary. However, to reach this goal, existing ATS systems still have significant important challenges. Some common challenges include 'anaphora problem' and 'cataphora problem' lag etc. In this section, we explore and investigate the most common challenges of the ATS domain.
The challenges of ATS tasks for both extractive and abstractive summarization techniques are: 1) Evaluation: This is a commonly encountered difficulty in automatic text summarization. The same study with a different dataset and metrics evaluated different types of results. Datasets can be biased to some techniques, such as datasets and metrics biased toward extractive summarization. Then, using a common dataset and metric can produce a good result. However, automatic evaluation techniques have several issues that should be addressed. The analysis of the summarization task with precision and recall may be deceiving to the researchers and do not correspond to the desired con-clusion. Considering, precision and recall do not put the knowledge of the source documents at stake while comparing. The agreement, which takes two experts to achieve randomly, is determined by the number and percentages of the classes that the researchers perform [351]. In the case of an extractive summary, the score generated with automated evaluation techniques such as ROUGE, BLEU, etc., shows less significance than the human evaluation score. Due to losing sensitivity, some performance measures also fail to generate scores on higher-quality summarization. These techniques also lack diversity in vocabulary that creates indecision to find synonyms of words used in documents. In addition, automatic evaluation techniques require reference sentences, and collecting a vast amount of reference sentences is a difficult task. Lastly, semantically and syntactically incorrect sentences are ignored while scoring. It is a significant issue as some of these metrics give good scores to trivial sentences and fail to evaluate grammatically incorrect sentences [352], [353]. 2) Important sentence selection: Usually, an ATS system selects the most relevant sentences from the original text and marks them essential. While forming the summary, selective sentences or words need to be standard as per the benchmark. However, giving significance to the sentences is very subjective. Thus, while making any summary from particular sentences, it would make a difference. This problem can be solved by using userspecific data to work in a professional summarise to produce summaries. Although vector representation and similarity matrices attempt to find word correlations, there is no reliable way to identify the most important sentences. 3) Lack of different scenario-based training data: There is much information on the Internet with various scenarios, such as politics, crickets, and statistical data representation. The datasets used in the summarization tasks did not cover every different method for training. 4) Interpretability: Abstractive models provide condensed representations of the source content that express its essential concepts. Machines struggle with the complexities of human language and how humans express emotions, particularly in written materials. Therefore, ensuring the interpretability of source content through abstract models is a difficult task. 5) Interpreting long sentences and jargons: Most of the existing learning methods can only summarize short sentences and get puzzled when long sentences are encountered by the algorithms while processing the source text. Researchers should identify the problem and thus build a new architecture to reduce or eliminate this problem to solve this issue. 6) Anaphora problem: Anaphora problem is a prevalent difficulty in text summarization. During the discussion, humans, frequently substitute the subject with synonyms or pronouns. The 'anaphora problem' determines

Algorithm Limitations
Fuzzy logic It requires a redundancy removal technique in the postprocessing phase to improvise the summarization quality.

Unsupervised
Concept-based It needs to utilize similarity measures for reducing redundancy which can affect the quality of the summary.
Extractive Latent-Semantic The LSA generated summary required a large amount of time.
Machine Learning It needs a large set of data for training and improving the sentence selection for making a good summary.

Supervised
Neural Network It is quite slow in training phase and application phase. Also requires human interruption for training data.

Conditional Random Fields
In CRF linguistic features are not considered. It also requires external domain specific corpus.

Tree based
It ignores the context and significant phrases in the text, eventually failing to detect the relation between sentences. Another drawback is that it continuously focuses on syntax, not semantics.

Template based
The templates are pre-defined in this method that creates a lack of diversity in the summaries.

Structurebased Rule-based
The requirement to prepare the rules is a time-wasting process. Another challenge is that the rules needed to be written manually.
Ontology-based A suitable ontology preparation is a very time-consuming process and cannot be generalized to other domains.

Abstractive
Multi-modal semantic An automatic evaluation of the framework is required as it is manually evaluated by humans.

Semanticbased Information item
Difficulties of creating meaningful and grammatical sentences from text. Also linguistic quality of summaries is very low due to incorrect parses.
Semantic graph This method is limited to single document abstractive summarization.

Deep learning
It need human effort for building big training data manually.
Graph-based It does not consider importance of words and does not consider dangling anaphora problem.
which pronoun complements which word. 7) Retaining the quality of the text: An ATS should ensure the quality of the summarized text. From the user perspective, the most desired quality of ATS is to understand the source text while summarizing. Various machine learning techniques can be used to retain the quality of the summarized text. 8) Word sense ambiguity: Ambiguity in words makes a difference while summarizing sentences. This ambiguity may appear due to abbreviations with more than one acronym, multiple usages of the same word in different contexts, etc. Then the acronym has to match the topic or meet the sense depending on the subject for better understanding. This problem is the opposite of the anaphora problem, which is called the Cataphora problem. Therefore, this problem can be solved using a disambiguation algorithm. 9) Meaningful, intuitive, and robust: Summarized sentences must be influential or make sense to the users, and representation must be strong concerning any areas the system faces. 10) Predefined template: Recently, natural language processing has made an incredible amount of progress in ATS. But these methods cannot generate new sentences on their own. Therefore, the template-based algorithm was introduced, where a specific template needs to be predefined for a particular summarization task. 11) Attaining higher level of abstraction: In a text summarization task, an open research topic is the achievement of a higher-level abstraction. Therefore, there are plenty of possibilities for researchers and linguistics to find the answer to this problem. In addition to the above-mentioned general challenges, we also present a few limitations of the current algorithms used in the ATS domain. Table 7 presents the limitations those are needed to be solved to achieve better text summarization results.

XIII. CONCLUSION
Text summarization is an old topic, but this field continues to gain the interest of researchers. Nonetheless, the performance of text summarization is average in general, and the summaries created are not always ideal. As a result, researchers are attempting to improve existing text-summarizing methods. In addition, developing novel summarization approaches to produce higher-quality, human standards and robust summaries is a priority. Therefore, ATS should be made more intelligent by combining it with other integrated systems to perform better. Automatic text summarization is an eminent domain of research that is extensively implemented and integrated into diverse applications to summarize and reduce text volume. In this paper, we present a systematic survey of the vast ATS domain in various phases: the fundamental theories with previous research backgrounds, dataset inspections, feature extraction architectures, influential text summarization algorithms, performance measurement matrices, and challenges of current architectures. This paper also presents the current limitations and challenges of ATS methods and algorithms, which would encourage researchers to try to solve these limitations and overcome new challenges in the ATS domain.
M. F. MRIDHA (Senior Member, IEEE) is currently working as an associate professor in the Department of Computer Science and Engineering of the Bangladesh University of Business and Technology. He also worked as a CSE department faculty member at the University of Asia Pacific and as a graduate coordinator from 2012 to 2019. He received his Ph.D. in AI/ML from Jahangirnagar University in the year 2017. He joined as a lecturer at the Department of Computer Science and Engineering, Stamford University Bangladesh, in June 2007. He was promoted as a senior lecturer at the same department in October 2010 and promoted as an assistant professor at the same department in October 2011. Then he joined UAP in May 2012 as an assistant professor. His research experience, within both academia and industry, results in over 80 journal and conference publications. His research interests include artificial intelligence (AI), machine learning, deep learning, and natural language processing (NLP). For more than 10 (Ten) years, he has been with the masters and undergraduate students as a supervisor of their thesis work. His research interests include artificial intelligence (AI), machine learning, natural language processing (NLP), big data analysis, etc. He has served as a program committee member in several international conferences/workshops. He served as an associate editor of several journals.
AKLIMA AKTER LIMA is a Computer Science student of Bangladesh University of Business and Technology. She is well organized, determined to work. Currently, she is working as an assistant researcher in the Advanced Machine Learning lab. She has experience working with Tensorflow, Keras, Matplotlib, etc., and is interested in machine learning, deep learning research. She is currently researching Advanced driver assistance systems, Stock exchange, Automatic Text Summarization, etc.
KAMRUDDIN NUR (Senior Member, IEEE) is currently serving as an associate professor in the Department of Computer Science at American International University-Bangladesh (AIUB). He also served as the chairman in the Department of Computer Science and Engineering at Stamford University Bangladesh (SUB) and Bangladesh University of Business and Technology (BUBT). Dr. Nur completed his PhD from UPF, Barcelona, Spain, Masters from UIU, and Bachelor from Victoria University of Wellington (VUW), New Zealand. Dr. Nur authored many prestigious journals and conferences in IEEE and ACM, served as TPC members, and reviewed articles in IEEE, ACM, Springer journals, and conferences. His research area includes Ubiquitous Computing, Computer Vision, Machine Learning, and Robotic Automation.
SUJOY CHANDRA DAS is a Computer Science student of Bangladesh University of Business and Technology. He is determinant, communicative, and sincere to work. Currently, he is working as an assistant researcher in the Advanced Machine Learning lab. He has good communication and presentation skills. He has experience working with front-end development, Tensorflow, Keras, Matplotlib, etc., and is interested in deep learning research. He is currently researching the Advanced driver assistance system, Automatic Text Summarization.
MAHMUD HASAN is currently a PhD candidate at the Department of Computer Science in Western University, Canada. He completed his bachelor's in computer science from Chittagong University of Engineering & Technology in 2011. He started his professional career as a lecturer at Stamford University Bangladesh. He later joined Bangladesh University of Textiles as a lecturer. Mahmud received his MSc in Computer Science from Western University in 2014. He worked as a research software developer at Robarts Research Institute, as a staff software developer at R&D unit of IBM Watson Health Imaging and as a principal investigator at Acceo Tender Retail. He is also working as a research analyst at the Department of Medical Biophysics in Western University. In his versatile career moves, Mahmud closely worked with digital and medical image analysis, understanding, segmentation, registration, compression, classification, denoising and computer vision. His primary focus is on biomedical image analysis using deep learning. He is also interested about other application of areas of machine learning in general, such as NLP. He has proved track record of publications in quality journals and talks presented in many conferences.
MUHAMMAD MOHSIN KABIR was born in Dhaka, Bangladesh. He received a B.Sc. degree in computer science and engineering from the Bangladesh University of Business and Technology (BUBT). He is currently working as a Lecturer and Research Assistant in the Department of CSE, BUBT. Also, he works as a Researcher in the Advanced Machine Learning Lab. He is always willing to learn new things with full enthusiasm and passion. He has experienced working in Python, Keras, TensorFlow, Sklearn, Scipy, etc. He is particularly interested in deep learning, pattern recognition, computer vision, and natural language processing. VOLUME 4, 2016