Covid-19 and E-Learning: An Exploratory Analysis of Research Topics and Interests in E-Learning During the Pandemic

E-learning has gained further importance and the amount of e-learning research and applications has increased exponentially during the COVID-19 pandemic. Therefore, it is critical to examine trends and interests in e-learning research and applications during the pandemic period. This paper aims to identify trends and research interests in e-learning articles related to COVID-19 pandemic. Consistent with this aim, a semantic content analysis was conducted on 3562 peer-reviewed journal articles published since the beginning of the COVID-19 pandemic, using the N-gram model and Latent Dirichlet Allocation (LDA) topic modeling approach. Findings of the study revealed the high-frequency bigrams such as “online learn”, “online education”, “online teach” and “distance learn”, as well as trigrams such as “higher education institution”, “emergency remote teach”, “education online learn” and “online teach learn”. Moreover, the LDA topic modeling analysis revealed 42 topics. The topics of “Learning Needs”, “Higher Education” and “Social Impact” respectively were the most focused topics. These topics also revealed concepts, dimensions, methods, tools, technologies, applications, measurement and evaluation models, which are the focal points of e-learning field during the pandemic. The findings of the study are expected to provide insights to researchers and future studies.


I. INTRODUCTION
E-learning environments have pioneered important changes in the field of education with the opportunities and advantages they provide. E-learning specifically focuses on the online dimension of learning-teaching processes [1], [2] and is used synonymously with concepts such as online learning, virtual learning, and web-based learning [3], [4]. E-learning offers different alternatives to traditional classroom education. Namely, it allows to interact with all stakeholders synchronously and asynchronously, and allows students to securely access all course materials without time and place restrictions [5], [6]. Thanks to these important opportunities, e-learning provides educational communities with an effective learning-teaching experience, and thus it is widely accepted in all educational environments [7], [8]. For this reason, the number of articles in the field of The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Piccialli. e-learning in the research literature is increasing exponentially every day [9].
With the COVID-19 pandemic and quarantines, e-learning has played an important role in our lives, so e-learning research during the pandemic has been critical [10], [11]. The World Health Organization (WHO) declared COVID-19 a global public health emergency on 30 January 2020 and a pandemic on 11 March 2020 [12]. COVID-19 has caused significant change and transformation in the field of education as in almost every field [13], [14], [15], [16]. In the spring of 2020, most educational institutions around the world had to suspend face-to-face education until further notice [17], [18], [19]. Thus, the face-to-face courses that form the basis of the education system have rapidly transformed into distance and online courses [18], [19], [20], [21], [22]. Thanks to Information and Communication Technologies (ICT), educational institutions in most countries have rapidly developed and implemented alternative distribution channels to move traditional education classes of almost all levels to distance VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ learning environments [10], [11], [14], [23], [24]. In this period, which is also called emergency remote teaching [25], various ICTs and e-learning opportunities have been mobilized for these e-learning applications that quickly activate distance education. In this pandemic period, all applications and activities related to distance education are characterized under the umbrella of e-learning [26].
The widespread orientation to these sub-fields of e-learning, called emergency remote teaching or emergency e-learning during the COVID-19 pandemic, has greatly increased the number of e-learning articles on the pandemic during this period [25], [27], [28]. In general terms, it is seen that the applications and activities of emergency remote teaching and emergency e-learning, which were quickly implemented during the pandemic, and the analyzes and evaluations of their different dimensions are primarily discussed in the articles. In these studies, the evaluation of the level of student satisfaction and the quality of e-learning related to e-learning activities and applications [29], [30], its effect on academic achievement [18], interest and readiness regarding e-learning [17], perception of e-learning [16], [31], e-learning adoption [32], user experiences and expectations regarding e-learning and its dimensions [20], [21], [23], [25], benefits, advantages and challenges of e-learning [21], [24], strategies for emergency e-learning and conceptual modeling [11], [15] are discussed in priority during the pandemic.
Due to the distance education requirements mandated by the current pandemic and quarantine conditions, the e-learning field has been an important focus of scientific research and practice in the pandemic. As a result, the number of domain-specific articles has increased exponentially recently. Despite this increase in the number of articles during the pandemic, there is a lack of studies that address the e-learning researches related to the pandemic from a holistic perspective with an in-depth content analysis and reveal the research landscape of the field in detail. Before the COVID-19 pandemic, various systematic reviews [2], [33], bibliometric analysis [34], [35], and content analysis studies based on topic modeling [6] were conducted, which dealt with e-learning research. These systematic reviews can only be conducted with a limited number of articles, as they are generally based on manual methods and require a lot of researcher effort [9]. Bibliometric studies can only reveal the scientific metrics and intellectual structure of any field by considering only certain bibliometric indicators such as author, article, journal, subject areas, country, and affiliation [2], but unfortunately cannot provide in-depth insights based on semantic content analysis of the domain-specific literature [9].
In this regard, analyzes in which text/data mining methods based on machine learning are used effectively on data sets containing many domain-specific articles are required for the specific determination of themes, trends and dynamics in a scientific field [6]. Topic modeling, which is a probabilistic and generative model, is the leading method used in such analyzes [36]. Unlike systematic reviews and bibliometric analyses, the topic modeling approach enables the automatic analysis of a large scientific corpus containing a large number of articles with a systematic methodology [36]. Content analysis based on topic modeling reveals themes and trends in any scientific field from a comprehensive perspective. Although a number of topic modeling studies were carried out in the field of e-learning before the pandemic, to the best of our knowledge, a study based on topic modeling that reveals all the themes, trends and perspectives in this field with a holistic approach has not yet been carried out during the pandemic period. This current situation creates a gap in terms of in-depth analysis of the pandemic dimension in the rich literature on e-learning, which has an interdisciplinary background.
Therefore, this study aimed to bridge this gap by comprehensively revealing research interests, themes, and trends in e-learning during the pandemic. In this study, firstly, the bibliometric characteristics of the articles about e-learning in the COVID-19 period were investigated. Secondly, N-gram based content analysis was implemented, and this way, prominent keywords and terminologies in the corpus were identified. Then, an exploratory analysis was conducted using the LDA-based topic modeling approach, thus 42 trending topics were discovered that mapped the research landscape of the field. Within this context, the methodology of the study was designed to investigate the following research questions (RQ): RQ1: What are the bibliometric characteristics of the articles?
RQ2: What are the distribution of prominent keywords and terminologies in the articles? RQ3: What are the most focused research topics and interests in the articles?

II. METHOD
A semi-automated methodology was proposed in this study to analyze the empirical corpus of e-learning articles during the COVID-19 pandemic. Our methodology was based on the implementation of Latent Dirichlet Allocation (LDA), a probabilistic approach for topic modeling used to discover hidden semantic structures in a domain-specific corpus. From this perspective, the research methodology designed in accordance with the study's purpose included the following stages. The empirical corpus for this study was initially prepared. Following that, data preprocessing was applied to the corpus. The LDA-based topic modeling approach was then implemented to the corpus. Finally, the topics discovered were interpreted and named. Figure 1 depicts the methodology in detail.

A. SEARCH STRATEGY AND DATA COLLECTION
The first stage of the methodology of this study included the creation of the search strategy and the acquisition of the data constituting the experimental corpus. In this context, a search strategy was designed to include e-learning articles throughout the pandemic in the literature in order to obtain the most compatible data with the background and purpose of the study. Initially, e-learning and its substitute synonyms (such as distance learning, online learning, online teaching) were determined by considering the domain-specific literature [6], [9], and these keywords were added to the searchquery. These selected keywords were examined by five field experts and researchers, and the final decision was made. Then, keywords related to the pandemic (such as covid * , pandemic * , coronavirus * ) were added to the query. At the stage of choosing the types of publications to be included in the experimental data set of the study, only peer-reviewed journal articles were included to the data set taking into account that the journal articles are studies that have passed a certain peer-review and have reached a certain scientific maturity [37]. In addition, other criteria such as publication dates (January 2020 -June 2021) and publication language (English) were added to the query. Then, the data bibliometric database from which the articles will be obtained was chosen. The articles in this context were searched in Web of Science (WoS) and Scopus, which are considered two main data sources for bibliometric databases. As a result of search on these two sources, more articles were obtained from Scopus compared to WoS. Scopus has a broader scope of journals and includes all journals indexed by WoS [37], [38]. For this reason, the Scopus bibliometric database was chosen as the data source and the search-query was built accordingly as follows: Using this search-query, a total of 3562 articles were obtained as a result of the search in the Scopus database on July 6, 2021 (3371 research articles, 191 reviews). Then, an experimental corpus was created, consisting of the title, abstract, author keywords and publication year of these articles. Because the title, abstract, and author keywords of an article are the most important sections that definitely reveal the background, scope, method and purpose of an article [39].

B. DATA PREPROCESSING
Data preprocessing is a critical task that directly affects the achievement of the analysis, especially for text mining-based studies [40], [41]. For this reason, some preprocessing operations required for this corpus were implemented sequentially. Initially, all texts were converted to lowercase. Then, web links, publisher information, numeric expressions, punctuation, and symbols were deleted. Word tokenization was applied. In the next step, the English stop words (the, are, and, is, or, a, an, for, etc.) were deleted. In order to clear the words in the corpus from their derived forms and reduce them to the root, stemming process was implemented using the Snowball rooting algorithm [42]. Finally, each article in the corpus was modeled as a word vector based on the ''bag of words'' model to provide a numerical representation of the corpus. The article texts represented by these vectors were transformed into a document term matrix (DTM). Thus, the corpus was converted into a suitable numerical matrix form for the topic modeling analysis [36], [41].
In addition, content analysis based on N-gram model for the word-level was performed in order to reveal the domain-specific semantic structures and terminologies that are frequently mentioned in the corpus [9]. In the N-gram model, unigrams represent single words, bigrams represent sequences of two words, trigrams represent sequences of three words, and so on. Using this approach, the frequencies of unigrams, bigrams, and trigrams in the corpus were calculated, accordingly prominent semantic structures and terminologies in the corpus were revealed.

C. FITTING AND IMPLEMENTING THE TOPIC MODELING
Topic modeling is a machine learning approach often used in text mining and natural language processing to discover hidden semantic structures known as ''topics'' in a body of text [36]. The background of topic modeling is based on probabilistic approaches and statistical algorithms. LDA (Latent Dirichlet Allocation) algorithm, which uses probabilistic and generative processes methodologically, is an effective topic modeling technique for semantic content analysis [43]. LDA is a machine learning algorithm that discovers hidden topics from a text with an approach based on unsupervised learning and does not require any training process, thus enabling systematic computational analysis of large numbers of text documents [36], [38], [43]. In this study, the MALLET package [44], which provides a basic infrastructure based on Gibbs sampling iteration, was used to fit and implement LDA topic modeling to the experimental corpus. The LDA model was implement with various K topic number values, and consistent topics with optimal topic-word distributions were obtained when the number of topics K was equal to 42. Each of these topics was represented by the top 15 descriptive keywords that reflect the scope of the topics.

III. FINDINGS
The results of the study are firstly introduced descriptively in order to show bibliometric characteristics. Further, N-gram and topic modeling analyses are presented to provide an overall portrait of e-learning studies in the period of COVID-19 pandemic.

A. DESCRIPTIVE ANALYSIS
In order to answer the Research Question 1 (RQ1) a total of 3562 articles were analyzed (191 of them are literature reviews and 3371 of them are research articles). 1381 of the articles were published in 2020 while 2181 of them were published in 2021; and their distribution by the subject areas are given in Table 1. Since some of the articles are indexed in more than one subject area, the total number of articles by the subject areas and their total rate should not be a misguiding issue. When Table 1 is examined, it is observed that ''Social Sciences'', ''Medicine'' and ''Computer Science'' are the most published subject areas, while ''Earth and Planetary Sciences'' and ''Veterinary'' are the least published. The distribution of the articles by the top 20 journals is given in Table 2.
Examined in detailed, Table 2 reveals that the journals with the highest number of articles published are ''Journal of Chemical Education'', ''Sustainability Switzerland'' and ''Education Sciences'' respectively. The top 20 countries with the highest number of articles originated from are enlisted in Table 3. Table 3 informs that the highest number of articles originated from ''United States'', ''India'' and ''United Kingdom'', respectively.

B. N-GRAM BASED CONTENT ANALYSIS
This section includes the result of N-gram (unigram, bigram and trigram) analysis employed to identify the high frequency terms in the corpus of e-learning studies during COVID-19.
With regard to the Research Question 2 (RQ2), the results of unigram, bigram, and trigram analyses are given in Table 4,  Table 5, and Table 6. As seen in Table 4, the highest ranked unigram in the corpus is the term ''learn'' with the rate of 81.67%. In other words, a total of 2909 documents have included the term ''learn''. It is followed by the term ''online'' (n=2539; f=71.28%) and ''education'' (n=2520, f=70.75%). As shown in Table 5, the highest ranked bigram in the corpus is ''online learn'' two-word sequence with the rate of 44.27%. Specifically, the two-word sequence of ''online learn'' have occurred in a total of 1577 documents. It is followed by ''online education'' (n=722; f=20.27%) and ''online teach'' (n=652, f=18.30%) two-word sequences. As seen in the Table 6, the highest ranked trigram in the corpus is the three-word sequence of ''higher education institution'' with the rate of 4.24%. Specifically, this three-word sequence has occurred in 151 documents. It is followed by three-word sequences of ''emergency remote teach'' (n=137; f=3.85%) and ''education online learn'' (n=119, f=3.34%).

C. TOPIC MODELING ANALYSIS
Finally, regarding the Research Question 3 (RQ3) the results of topic modeling analysis with LDA are given in Table 7. As a result, 42 main topics are discovered. The rate of each topic is calculated considering the number of the articles in that topic. Besides, the top 15 keywords for each topic are given in the table.
When Table 7 is analyzed in detail, it is observed that during the period of COVID-19 the most studied topic is ''Learning Needs'' with the rate of 6.7%. It is followed by ''Higher education'' and ''Social impact'' with the rates of 5.94% and 4.61%, respectively. It is noticed that the least studied topic is ''Disability training'' with the rate of 0.55%.

IV. DISCUSSION
This study aimed to identify the current trends and research interests in e-learning studies related to COVID-19 pandemic. In this respect, bibliometric characteristics of 3562 articles were firstly determined; then high frequency terms and topics in the articles were revealed using N-gram and LDA-based topic modeling approaches. Considering the number of published articles in 2020 and 2021 (almost midyear) as 1381 and 2181, respectively. As highlighted in previous studies, distance education and e-learning have quickly become popular during the COVID-19 pandemic [15], [21], [25], [32] and the number of articles in this field have accelerated in recent years [34], [35]. Based on the subject areas of the articles, it can be concluded that there have been publications almost in every field. Namely, ''Social Sciences'', ''Medicine'' and ''Computer Science'' are the top ranked subject areas with highest number of articles. Having in mind educational sciences is classified under social sciences [9], [35], the highest ranks of ''Social Sciences'' and ''Medicine'' support the previous findings [35], [45]. Regarding the distribution of the articles by the journals, among a wide range of journals the top ranked ones with the highest number of articles published are revealed as ''Journal of Chemical Education'', ''Sustainability Switzerland'' and ''Education Sciences''. Although many previous studies [35] have concluded that articles in the field of e-learning are mostly published in the ''Computers and Education'' journal, this study reveals the different journals. Considering the origins of the articles, ''United States'', ''India'' and ''United Kingdom'' are ranked as top three, respectively. In accordance with the previous studies [9], [34] this study also finds out ''United States'' as the top ranked country with the highest number of articles originated. Besides, countries such as ''United States'', ''United Kingdom'' and ''China'' are generally top ranked countries in the literature [9] and in this study as well.
N-gram analyses (unigram, bigram, and trigram) were employed in order to determine the frequent terms in the articles. With these analyses, it was aimed to identify prominent keywords and high frequency terms in the corpus and to determine domain specific contexts. The results of unigrams reveal the terms such as ''learn'', ''online'', ''education'' which are directly related to e-learning. Further examination of unigrams shows us that there are also keywords related to e-learning and distance learning like ''online learn'', ''online education'', ''online teach'', ''distance learn''. The trigrams such as ''higher education institution'', ''emergency remote teach'', ''education online learn'' and ''online teach learn'' are also frequently highlighted in the corpus. The first trigram, ''higher education institution'', is an important indicator of the acceleration of e-learning studies in higher education during the pandemic [15], [23], [24], [32]. The remaining trigrams can be evaluated as directly indicators of the rapid transition to e-learning and distance education [20], [25], and so the results of the current study support this.

V. CONCLUSION AND FUTURE STUDIES
In this study, considering the e-learning studies related to the COVID-19 pandemic, it was aimed to reveal the research landscape descried by these studies. In the study, which is built on three pillars, initially, descriptive analyzes were performed to identify the bibliometric features of the field. After that, by using N-gram analyses, the most frequent terms in the context were identified to determine domain specific context. Finally, by implementing LDA-based topic modeling analysis, topic distributions of the articles were analyzed. The results of the study are significant as they present the existing situation and research trends in e-learning field during COVID-19. The findings on bibliometric characteristics of the field demonstrate the followings: e-learning studies accelerated during COVID-19; the journals where the articles are published are considerably diverse; the studies are conducted in different fields and disciplines, particularly social sciences; ''United States'' is the first ranked country in terms of origin. The results of N-gram analyses reflect the main frame of distance learning and e-learning. Trigram results particularly unfold that emergency distance education and studies in higher education are prominent. Finally, the results of topic modeling analysis reveal that topics such as ''Learning Needs'', ''Higher Education'', ''Social Impact'' and ''Blended Learning'' are prominent ones.
Our findings show that the educational precautions in the early stages of COVID-19 focus on using distance education strategies and procedures as an emergency response. As the pandemic progressed, so did distance education activities. A complete understanding of the pandemic's short, medium, and long-term effects is still developing. Our study explored how the pandemic has impacted distance learning and traditional learning environments. Our findings revealed the e-learning concepts, methods, tools and technologies used in e-learning environments during the pandemic. The findings also demonstrated common e-learning activities, application areas, measurement and evaluation models used in e-learning, and social and health dimensions of e-learning, all of which are focal points of the e-learning field during the pandemic.
In this study, descriptive analysis, N-gram-based content analysis, and LDA-based topic modeling analysis were performed on a total of 3562 articles published during COVID-19. Considering that COVID-19 pandemic continues, the results of the current study are expected to provide a basis and insights for future studies. An automated text mining methodology based on generative topic modeling proposed in this study can be applied to different sub-contexts of educational sciences to obtain up-to-date implications for practitioners and educators in various backgrounds. Educational regulations and policies can be updated by taking into account the trends and themes in e-learning field revealed in this study. In today's e-learning discipline, where the number of publications is rapidly increasing, such content analysis based on unsupervised machine learning may be a more useful methodology and guide for the ever-expanding education communities in the near future. Taking these trends in emergency e-learning strategies into account, training programs for in-service teachers and educators can be updated, thus providing candidates with emergency response skills for pandemics. Furthermore, the results of the study can lead to detailed investigations through performing more comprehensive analysis on particular topics.