A Predictive Paradigm for Event Popularity in Event-Based Social Networks

Recently, event-based social networks (EBSNs) have been used as flexible online platforms that create online groups and make offline events for people. The success of popular offline events depends much on a participant number factor, which contributes to the growth of online groups and social networks. In this paper, we study a research problem of event popularity, where the popularity of an event is relevant to the number of participants of the event. In this work, we propose a predictive paradigm which consists of the procedure of generating features and training regression methods to estimate the popularity of events. We first crawled datasets and then generated features from the datasets. Finally, three famous regression methods, i.e., support vector machine, random forest, and decision tree, were used to predict the popularity of events. Extensive experiments were conducted on three city datasets with two different contexts of using these three datasets. In the city context, each city dataset was converted into a data table. Three regression methods used the data table to build predictive models and estimate the popularity of events. In the other context, each group in one city dataset was transformed into one group data table, and regression models were built on the group data table. Overall, the proposed paradigm with random forest is the best in terms of MAE and RMSE metrics. Moreover, this study has shown that for the city context, the event content is the best contributing factor that pushes people to engage in events. Furthermore, with the group context, the event time factor is very crucial to assist users in planning to join events.


I. INTRODUCTION
Online social networks shape the way people work and communicate with each other. Moreover, with the rapid growth of online social networks, people have many choices to attend online events and offline activities. To combine online and offline events in one framework, event-based social networks (EBSNs) [1] are emerging, for example, Meetup, Douban, and Facebook. Hence, people are able to create and distribute events in these networks. Users are able to take part in any event that they are interested in it. Many groups are created with similar themes, and events of the groups are published with similar topics. For instance, groups have a start-up theme and events that are issued by those groups often have business topics.
The associate editor coordinating the review of this manuscript and approving it for publication was Barbara Guidi .
Since one event is announced, this event's invitation is sent to users. There are a lot of works [2], [3], [4] about finding a list of users who are willing to attend this event.
Recommending this event to users has been investigated by many researchers [5], [6], [7]. However, when creating a new event, this event's organizers always want to estimate the number of participants in order to prepare the event for participants as good as possible and save costs for this event.
Obviously, the success of events depends on the participant number. In other words, the more people come, the more successful events are. Thus, the participant number of this event is the key factor that evaluates the sustainability of a group's event and even the growth of a social network. Hence, predicting participants in an event is also a challenging problem in social networks. Moreover, in EBSNs, there are no online tools to assist event organizers in estimating the number of participants when organizers create a new event.
The initial concept of event popularity is measured as the participant number. In this paper, we study many diversified events with very different participant numbers, for example, from social events to sportive activities. Therefore, we define a new metric based on the participant number to represent the popularity of events.
Predicting event popularity provides valuable information for administrators of social networks to deploy more services for users. Thus, it is highly demanded to develop an advanced technique for event popularity prediction over online social network platforms. In addition, the problem of event popularity is not studied thoroughly. These realities lead to open a new research problem: event popularity in these social networks.
In this paper, we study the problem of event popularity over event-based social networks. Furthermore, we provide a further understanding of online social networks through the problem of event popularity. This problem is formulated as follows: Given a new event e * published by a group g within an EBSN dataset, the objective is to predict the popularity of this event based on the historical events in the EBSN dataset.
In this work, we propose a predictive paradigm which consists of four parts to estimate the popularity of events. Part 1 stores an EBSN dataset crawled from Meetup. Part 2 represents the three main groups of features based on three main factors of events, i.e., venue, time and content factors. Part 3 is implemented with three regression methods to estimate the popularity of events, i.e., random forest, support vector machine and decision tree. The event popularity is sent to event organizers in Part 4. In experiments, we carry out the proposed paradigm in two different contexts of using three crawled city datasets. For the first context, we first consider each city as one EBSN dataset. The three groups of features of all events are generated based on this EBSN dataset. Then, each regression method uses the generated features to build a predictive model. Next, a new coming event e * is created by a group g and published in this EBSN. The proposed paradigm generates features of e * with respect to all past events in the EBSN dataset, and the paradigm provides the generated features for predictive models in order to estimate the popularity of event e * . In the other context, each group in one city is treated as one group dataset. Similar to the first context, features of only events in the group dataset are first created, and then predictive models are built on these features. Next, features of e * are generated and used in the predictive models to forecast the popularity of e * . To summarize, the contributions of our work are: • The problem of event popularity in event-based social networks is defined.
• We propose a predictive paradigm to address the problem.
• In the proposed paradigm, we generate features from a dataset and train regression methods based on the features to predict the event popularity.
• We conduct extensive experiments on Meetup datasets consisting of three famous cities in the world to illustrate the accuracy and efficiency of the proposed paradigm.
• This work can be implemented as an online tool for event organizers. The remainder of this paper is organized as follows. Section II briefly reviews related works. EBSNs terminologies and the problem are explained in Section III. Event popularity paradigm is offered in Section IV. Section V performs the empirical study. Conclusions are given in Section VI.

II. RELATED WORK
The concept of popularity and social trend predictions have been studied in many works [8], [9], [10], [11], [12]. Zhao et el. [8] recently studied event popularity over microblogs [8]. They addressed the problem of social trends popularity, which was measured as existing time of social trends in this work. Yin et al. [13] studied the problem of topic reading dynamics that was expressed by a set of keywords in Weibos. They proposed a model that can predict those who were interested in specific topics in Weibos. In another work [14], they investigated the behaviors of users within the context of Covid-19. In addition to this, they proposed SRFI model to predict the opinions of users about the pandemic through Chinese Sina blogs. Prediction of social trends about the vaccine was investigated in work [15], and they used rough set theory to evaluate the network of public opinions.
Another study on popularity in work [16] defined a research problem of online news popularity, which could be expressed by the number of shares, likes and comments. They first generated a list o features from articles and then used the boosting method to predict whether users shared a new article or not. Gao et al. [12] investigated the problem of future message popularity over the Weibo social network. The process of resending new messages was studied and they predicted the popularity by an extension of Poison model involving the time mapping process. The lifetime of online stories was presented in work [17], in which they provided an extensive analysis of the quality and the quantity of online articles in order to model social media interactions among readers. Lee et al. [11] illustrated a study on the popularity of online content. They aimed to predict the likelihood of a lifetime of online content by using a hazard regression model. They used two datasets with rich contents, i.e., forum.dpreview.com and forums.myspace.com, in their study. Almed et al. [18] defined the problem of popularity in user-generated content throughout YouTube, Digg and Vimeo. They proposed a two-stage method to predict content popularity. In the first stage, they analyzed content behaviors and generated features. In the second stage, they used a regression model to predict the values of content popularity. Moreover, to study the problem of online content popularity, another work has been investigated over Digg and Youtube [19].
Shang et al. [20] integrated social influence with homophily into a model to predict online content popularity. Dou et al. [21] also predicted online content popularity with rich information. In their work, they first selected contexts, then represented these contexts as a unified form, and finally utilized the form to predict the popularity. They proposed a knowledge-based method to enhance the accuracy of the popularity of online items. Lymperopoulos [22] clarified the online contents into two patterns: linear and non-linear growth periods. They modelled the popularity of those contents as a sequence of linear and non-linear phases and used these phases to predict popularity.
Liu et al. [1] had investigated Meetup and defined it as event-based social networks (EBSNs). Research problems of EBSNs have been defined by researchers, such as event recommendation [2], group recommendation [4] and activefriend recommendation [23], [24]. The problem of event attendees recommendation is expressed by selecting top N users who are likely to attend events. However, the problem of event popularity needs to be explored in EBSNs.
To predict a list of attendees at events, Wang et al. [25] proposed a model which was formed by a combination of a weak tie theory and a linear regression method. This study was conducted on data crawled from Facebook. In another work [10], Mehmood et al. analyzed the contents of events that were gathered from Twitter. They proposed a model which was based on LSTM in order to predict the participant number of events. Bhowmick et al. [26] defined a new concept of topical micro-categories in the context of EBSNs. This work designed a new methodology to explore microcategories, which was clarified by the popularity profile of Meetup events. Chen et al. [27] studied the event popularity problem through Twitter. In their work, they first considered an event as a set of messages which involved hashtags. Then, they designed a new model based on hashtag-based and influence-based to predict popularity. Madisetty et al. [28] designed a study to investigate the problem of social media popularity of events. To do that, they proposed a model based on a deep learning method to estimate event popularity. Li et al. [9] studied the problem of group popularity. They proposed a deep neural network model that was constructed based on group-based, time-based features to predict group popularity.
In our work, we focus on the popularity of events within the context of event-based social networks. In the following section, we illustrate the structure of EBSNs and define the event popularity problem within these networks.

III. DEFINITIONS AND PROBLEM A. EBSN TERMINOLOGIES
Event-based social networks (EBSNs) are one of the most active social networks currently. Meetup 1 is a famous example of EBSNs, and the social network is widely used in 190 countries. This network has 300000 groups, which create 10000 online and offline events per week and has more than 52 million users. Meetup only provides information of events about time, location, contents, and the list of participants of each event. And it does not provide information about 1 meetup.com reasons why events are canceled or delayed, such as weather conditions. Moreover, there are no direct links between users in Meetup network. EBSNs are constructed by four main entities, which are illustrated in Figure 1 and described as follows:

1) GROUPS
A group is initially created by only one user and organized by several users. The group founder can offer a short description of the group's theme in order to gain more users. The group stores happened events; moreover, upcoming events of this group are informed to this group's users and the whole users of an EBSN.

2) EVENTS
Any user in a specific group is allowed to creating an event, and the user is defined as the event organizer. Moreover, the event is published by the group. This created event is described by a detailed content. In addition, time and location factors are also involved in helping users to make a plan to engage in this event. Users will send a RSVP with YES to confirm attending this event; otherwise, they will reply with a RSVP with NO. Hence, each event has a list of participants

3) VENUES
Venue is a special entity in event-based social networks. A particular venue is demonstrated by a physical address with a specific location containing latitude and longitude. In EBSNs, people first join online groups and then create offline events, which are hosted in several venues where they meet each other in. Thus, a venue stores a list of hosted events. Choosing a suitable venue to host events is crucial to attracting more users to join.

4) USERS
When a user joins in EBSNs, he/she can be a member of one or several groups relevant to his/her interests. Even the user can create his/her own group. Since one event is sent to the user, this user will decide to engage in this event or refuse it. In EBSNs, there are no connections that indicate whether users are friends or not. Figure 1 also describes the procedure of creating events and hosting events for users. For example, event Meeting is first created by user u 3 , and issued by group g 2 . Then, this event is described by a content, and hosted in venue v 2 . In addition, users u 3 and u 5 engage in this event. In this figure, it is aware that there is no an online tool or a model to help events organizers forecast participants numbers.

B. PROBLEM STATEMENT
To hold a new event, forecasting how many users who want to take part in the event is a contributing factor to the success of it. The participant number is measured by RSVPs with YES in the event. In EBSNs, there are many different groups with diverse topics so the number of users who want to take part in different events is different. For example, group ID ''15817402'' about Web3 in Sydney changed its topics of events many times from blockchain topics to start-up topics in its events. And this group published 25 events in the period of 2017-2018. The participants in this group's events were very fluctuated from 5 participants to more than 200 participants. Hence, in this study, we propose a new metric to study the event popularity as follows: where p i is defined as the popularity of event e i . N is the number of events issued by group g, |e i | is the number of participants in e i . Event Popularity Problem: Given a new event e * issued by group g in an EBSN dataset, we aim to predict the popularity p * of event e * based on past events in this dataset. To address this problem, we propose a predictive paradigm in the following section.

IV. EVENT POPULARITY PARADIGM
This section presents our paradigm. We first discuss the architecture of the paradigm, and we then present a feature generation. Finally, we build regression models based on the generated features.
A. ARCHITECTURE OF THE PROPOSED PARADIGM Figure 2 presents the architecture of the paradigm, which consists of four parts. The process of the proposed paradigm works as follows: Since a given EBSN dataset is stored in Part 1, we model them as relationships between entities in the EBSN model. Part 2 describes methods to yield features. Specifically, three major factors are selected to generate features of all events in this dataset. Three regression methods in Part 3 are chosen to build predictive models based on the generated features. Since a new event e * is given, features of e * are generated with respect to the dataset. The features of e * are provided for predictive models to achieve the popularity p * of this event and sent it to an event organizer in Part 4.

B. FEATURE GENERATION
Given an EBSN dataset, we make features based on the four main entities and the structure of this dataset. Specifically, given event e * in group g; we leverage the information of three factors: venue, time and content of event e * to make features of e * . The features are grouped into three main categories, i.e., venue-based, time-based, and content-based features. To make a further clarification of the presentation, Table 1 describes notations and Table 2 illustrates generated features.

1) VENUE-BASED FEATURES
People prefer to engage in a new event due to several reasons. The new event is hosted in a popular venue that is convenient to go there. Moreover, the location of this event is close to previously attended events. Thus, choosing a suitable location or a convenient venue is very important to gain more users to attend this event. To generate a list of features from a new event e * in group g with a physical location, we first collect events relevant to e * as follows: where E is the list of events extracted from a given EBSN dataset and dis(e * , e i ) is the Euclid distance in kilometer. A given threshold r of a radius is set to collect a list of events, denoted by E * , each of which is in the radius of event e * . We generate features of e * with respect to E * as follows: where V av represents the average of events participants in list E * . |E * | is the number of events in E * , and it is considered one feature of event e * . The three different features are also derived, i.e., V min = argmin{|e i |, e i in E * }, V max = argmax{|e j | e j in E * }, and V sd . In other words, V min and V max represent the smallest number of participants and the largest number of participants in the list of events E * , respectively. And, V sd is the standard deviation of events participants in E * .
To understand more the relationship between events venues in the EBSN dataset, we first compute the distance similarity between each event e i in E * and e * as the following equation: where S i V is the distance similarity between the venue of e * and the venue of event e i . Then, we achieve list ES in the following equation: Finally, the features of event e * relevant to ES are created as: where feature V ES av is the number average of es in list ES. Similar to list E * , we also have features V ES min , V ES max , and V ES sd of event e * based on list ES. Event e * is published by group g, with this, we make other features of e * that are only relevant to group g. We first select a list of events in E that are only issued by g, denoted by E g .

E g
= {e j e j published in group g and e j ∈ E} Equation 4 is also taken to compute the distance similarity between the venue of e * and the venue of e j in E g . As a result, we have a list ES g = { es j es j = |e j | × S j V and e j ∈ E g } with |E g | elements.
Similar to E * and ES lists, we create five features of e * referred to E g and four other features of e * relevant ES g . Those nine features are described in Table 2.

2) TIME-BASED FEATURES
People often make a plan to take part in events in a specific day of the week and at a particular time, for instance, at 5 pm on Saturday. Moreover, if they suddenly have free time during one day, they will look for a suitable event and join in it. Hence, we separate the time-based factor into Day of Week and Hour of Day factors and generate features based on these two factors.

a: DAY OF WEEK
To make features based on this factor, we first only select events in E * that those events, denoted by E D , are hosted on the same day of the week with event e * , such as Saturday.  Table 2.

3) CONTENT-BASED FEATURES
Event e * that is announced in an EBSN often offers an explicit content, which includes a title and a description of this event. This content has an impact on users' decisions about whether to go or not. Therefore, we create features based on the content similarity.
The content of each event can be represented as a vector of terms. Hence, given two events e * and e i with two vectors of terms T * and T i , respectively, the content similarity between two events is computed as Equation 12: where t(., .) is the cosine similarity score between two events, the value of t is from [0, 1]. The higher value of t indicates that the two events are more relevant in content. In addition, we obtain two new lists of events that are relevant to e * as follows: ES C = {es i es i = |e i | × t i (e * , e i ) and e i ∈ E * } (13) ES g C = {es j es j = |e j | × t j (e * , e j ); e j ∈ E; and e j published by g } These two lists, ES C and ES g C , yield eights features for e * as list ES does. Those eight features are also described in Table 2.
Example of Obtaining Lists of Events: Figure 3 shows an example of an EBSN dataset including two groups g1 and g2, and a set E containing six events. Since an upcoming event e * is published by g1, we gain lists of events relevant to e * as follows: Given a threshold r, we obtain a list of events E * = {e 1 , e 2 ,e 4 , e 5 } as shown in the circle in Figure 3. The distance similarity between e * and each event in E * is computed by Equation 4; therefore, we have a list of elements ES. E g1 contains events e 1 ,e 2 , and e 3 . Moreover, ES g1 is also obtained. Events in list E D are e 2 and e 5    all lists of events, which are used to generate all features of e * with respect to the given EBSN dataset. All features are listed in Table 2.

C. REGRESSION METHODS
Based on the feature generation stage, we achieve a list of generated features of all events in E. In other words, we transform the given EBSN dataset into a data table D = {F, P}, each D i represents a list of generated features F i , which is shown in Table 2, and the popularity p i of event e i . We use D to train regression models. For a new event e * , we obtain generated features of e * , denoted by F * , which is used in the trained models to predict the popularity p * of e * . In this work, we select decision tree (DT) [29], support vector machine (SVM) [30], and random forest (RF) [31] methods to predict the popularity of events.

V. EMPIRICAL STUDY A. EBSN DATASETS
To gain an overview of event popularity, we select three famous regions, i.e., Sydney, London, and San Francisco, in the world to collect datasets from Meetup. The selected cities provide huge data with various events topics and many users. Each city is treated as an EBSN dataset. The datasets are gathered in the period of two years, 2017-2018. For each city, we selected all groups, and each group published at least 15 events in these two years. Furthermore, each event was hosted in a real physical venue with a specific location and this event had at least 5 participants. Table 3 gives statistics of the three gathered datasets. Based on this table, each user of each EBSN dataset had engaged in an average of five events for the two-year period. The distributions of users in attended events in the three EBSN datasets are depicted in Figure 4. It is observed that the majority of events had less than 50 participants.

B. EXPERIMENTAL SETUP
We use Lucene 2 to make terms, which are used to represent events contents [32]. Specifically, we remove all stop words and only keep terms in each event's content. Moreover, we also keep events with specific locations, which include 2 https://lucene.apache.org/ longitude and latitude. Threshold r is set to 0.5 km to obtain events relevant to event e * .
To gain further understanding of how factors affect the decision of users and the popularity of events, experiments are conducted on two contexts of using datasets.

1) THE CITY CONTEXT
Each city (or EBSN) dataset is considered one city dataset, which is used in the proposed paradigm. Specifically, we first sort all events in each city on event time, then we divide the events into two parts: 80% for training and 20% for testing. Training part is defined as a list of events E. We first transform E into a data table D = {F, P}. Then, D is used to train the three selected regression methods. Features of each event e * in testing part are generated with respect to E, denoted by a vector of features {F * }. And, this vector is run into trained models to predict the popularity p * of event e * .

2) THE GROUP CONTEXT
We treat each group in each city (or EBSN) as a group dataset. We first sort all events in a group dataset on event time, then we split the group dataset into two parts: 80% for training and 20% for testing. The procedure of making a data table D for training part and features of each event in testing part is similar to it for the city context. To make further clarification of making features of events within the two contexts, we give the following example. Figure 5 describes examples of splitting a given EBSN with two groups into training and testing parts. Events are sorted on event time, as shown in Figure 5. For the city context, we split the events datasets into two parts: testing part consists of events e 5 in group g 1 and e 10 in group g 2 ; and the rest of the events datasets, denoted by E (8 events), is designed as training part. Each event e i in E will generate features of it based on E\e i . Therefore, we have a data table D which consists of generated features of all events in E. Then, for each event in testing part, we make features of this event with respect to all events in E.

3) EXAMPLE OF GENERATING FEATURES OF TRAINING AND TESTING PARTS FOR THE TWO CONTEXTS
For the group context, each group is defined as one group dataset. For example, dataset g1 has five events. A given specific time is to split events of g1 into two parts: 80% for training and 20% for testing. Testing part of g1 only has event e 5 , and train part consists of e 2 , e 3 , e 4 and e 5 . To make features of all events in training part, we first collect all events in this EBSN dataset that they are held before the splitting time. Hence, we have list E = {e 1 , e 2 ,e 3 ,e 4 ,e 6 , e 7 ,e 8 ,e 9 }. Features of each event e in training part (e 2 , e 3 , e 4 , e 5 ) are made with respect to E\e. Thus, we achieve a training data table D only containing four events and use D to train regression models. Features of e 5 in testing part of g1 are yielded with respect to all events in E. Table 4 describes the time of generating features for each group in each city within both contexts. It can be seen clearly that groups with few events take less time to create features compared to groups with many events.

C. EVALUATION METRICS
These two metrics, MAE and RMSE, are widely used to measure the performance of regression models. Therefore, MAE and RMSE are selected to evaluate the differences between actual values and predicted ones. These two metrics are defined in Equation 15 and Equation 16 respectively.
where p i and p i predicted are the actual values and the predicted values of event popularity. M is the number of events in each testing part. The two metrics, MAE and RMSE, are used to assess the performance of the three regression models in the city context. For the group context, we use two new metrics that are defined in the following equations: where n is the number of groups in each city. Platform: All algorithms are implemented in Python and executed in a machine with a dual-core CPU 3.4GHz and 16GB Ram. The number of trees in random forest model is set to 100 trees. CART is used to build the tree model. And, RBF kernel is involved in support vector machine method. Figure 6 illustrates the results of MAE and RMSE metrics from the selected three regression methods for the three cities. These three methods use all features (listed in Table 2) to build models based on training parts, then predict the popularity of each event in testing parts. In general, decision tree (DT) yields the worst results of two metrics for three cities. Support vector machine (SVM) gives the best scores of the   three datasets in terms of MAE metric; meanwhile, random forest (RF) is the best model in terms of RMSE metric.

1) PERFORMANCE OF PROPOSED PARADIGM IN THE CITY CONTEXT
We also compare the performance of these regression models with the four different groups of features, i.e., all, venue-based, time-based, and content-based features. Figures 7, 8, and 9 describe the results of each model corresponding to each group of features for three cities. Overall, models that are built on all features (All) yield the best results. It is observed that the models that are built based on the group of content-based features provide better results than those built on groups of venue-based and time-based features. In addition to this, SVM with the group of content-based     features yields the best results of MAE for three cities, and RF with this group is the best in terms of RMSE. DT with different groups of features is still the weakest method.
The first context (or an EBSN dataset) has many groups with diversified themes. Each group published many events with various topics. In addition, the participant numbers in different events are much dissimilar. Hence, the role of events contents is very critical to attract more people to take part in those events. Based on the results yielded from different groups of features, we can conclude that the contents of offline activities are the most valuable factor in the city context. Obviously, people often come to discuss a certain topic, or they have specific purposes of attending, for example, learning start-up skills. Thus, social network administrators need to improve the contents of events and follow up on social trends in order to keep users stay in their networks.

2) PERFORMANCE OF PROPOSED PARADIGM IN THE GROUP CONTEXT
We design each group in one city as a dataset, and split this dataset into two parts. The two metrics, nRMSE and nMAE, in Equation 17 and 18 are used to compare the performance of the three regression models.
The results of nMAE and nRMSE yielded by the three regression methods with all features for the three cities are demonstrated in Figure 10. In general, RF outperforms the two compared methods in terms of the two metrics. Otherwise, DT is still the worst method in all three cities. In this context, each group is treated as one dataset to build predictive models. Many groups in each city do not have many events; therefore, the training data table transformed from one group dataset copes with the problem of high dimensional data. Moreover, RF model is constructed from 100 trees, and each node of a tree is built based on the best feature. That are reasons why RF is better than SVM in the group context.
Similar to the first context, we also compare the performance of three predictive models with different groups of all, venue-based, time-based, and content-based features, respectively. The results of the comparisons are shown in Figure 11, Figure 12, and Figure 13. Overall, RF is still the best model for the four different groups of features; meanwhile, DT results in the worst metrics for the four groups of features.
Furthermore, RF built with time-based features yields better results of the two metrics than RF built with all features. In addition, RF trained with content-based features provides better results than it trained with venue-based features. These realities of the group context are different from the results of the city context. They are explained as follows: (1) Each group has only a few topics of events, even some group only has one topic for all events; (2) In EBSNs, event organizers often select the same venue to host offline activities; (3) Since attending previous events, users already know the topics of events and locations of events. Hence, the time factor is the most important character to push users to engage in new events; moreover, they will select events that are suitable for their free time.
We can conclude that in the small context of social networks, such as the group context, the time and content factors are the most contributing factors to the success of events. Hence, organizers need to select a suitable time to hold events and offer attractive contents in order to gain more people coming.

VI. CONCLUSION
In this paper, we present a study on event popularity over event-based social networks. For this objective, we propose a new paradigm to predict the popularity of events by transforming a dataset into a data table that can be used in regression methods. The proposed paradigm first stores an EBSN dataset, and then it makes features from this dataset. Three well-known regression methods are involved in the proposed paradigm to build predictive models based on generated features. Finally, the popularity of events is sent to event organizers. This study is conducted on three cities with two contexts of using datasets. Overall, RF is the best method to yield event popularity in the two contexts. We find that in the context of the whole city, the event content is the best contributing factor to affect people to join events. However, for the group context, event time is very crucial to make users engage in events. This study not only shows the impact of attracting content and suitable hosting time of events when event organizers create offline activities but also helps administrators of social networks to be aware of the importance of events contents. This work opens a new promising direction for future work: time-optimized planning for events and users, in other words, how organizers can catch users.
THANH TRINH received the Ph.D. degree in computer science from Shenzhen University, China, and the M.Sc. degree in information systems design from the University of Central Lancashire, U.K. He is currently a Lecturer with the Faculty of Computer Science, Phenikaa University. He has published many papers on his research topic. His research includes efficient query, database, social networks, classification, forecasting disasters, and climate change.
NHUNG VUONGTHI received the M.Sc. degree in information systems design from the University of Central Lancashire, U.K. She is currently a Lecturer with the Faculty of Digital Technologies and Cybersecurity, School of Business and Management, Vietnam National University. She has conducted several project consultation in her research topic. Her research interests include cybersecurity, data mining, and network optimization.