Predicting the Entrepreneurial Success of Crowdfunding Campaigns Using Model-Based Machine Learning Methods

A common phenomenon that increasingly stimulates the interest of investors, companies, and entrepreneurs involved in crowd funding activities particularly on the Kickstarter website is identifying metrics that make such campaigns markedly successful. This study seeks to gauge the importance of key predictive variables or features based on statistical analysis, identify model-based machine learning methods based on performance assessment that predict success of a campaigns, and compare the selected different machine learning algorithms. To achieve our research objectives and maximize insight into the dataset used, feature engineering was performed. Then, machine learning models, inclusive of Logistic Regression (LR), Support Vector Machines (SVMs) in the form of Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and random forest analysis (bagging and boosting), were performed and compared via cross validation approaches in terms of their resulting test error rates, F1 score, Accuracy, Precision, and Recall rates. Of the machine learning models employed for predictive analysis, the test error rates and the other classification metric scores obtained across the three cross-validation approaches identified bagging and gradient boosting (the SVMs) as more robust methods for predicting success of Kickstarter projects. The major research objectives in this paper have been achieved by accessing the performance of key statistical learning methods that guides the choice of learning methods or models and giving us a measure of the quality of the ultimately chosen model. However, Bayesian semi-parametric approaches are of future research consideration. These methods facilitate the usage of an infinite number of parameters to capture information regarding the underlying distributions of even more complex data.

to be funded, backers who pledge money to back the initiator's idea, and a mediator.The Kickstarter platform mobilizes both parties.It is open to creators and backers from many countries in the world.In fact, since its inception in the year 2009, Kickstarter has hosted over 170 000 successfully funded projects raking taking in over 4.5 billion dollars from over 16 million backers.Kickstarter operates on an "all-or-nothing" funding system; this means that no one is charged for a pledge towards a project unless it reaches its funding goal and by so doing poses less risk for everyone involved.Every project consists of a target funding limit/goal over a fixed period of time; a project is considered to be a success only if this goal is met.If projects do not reach their funding goal, creators do not receive any of the pledged amount and are not obligated to complete projects without the funds required to do so, and backers will not be charged.Once a project is successfully funded, Kickstarter deducts a 5% fee from the funds solicited from the campaign.This marker of the success or failure of a campaign enables researchers to apply classification algorithms.Prospective participants (creators and backers) are usually interested in knowing the probability of success of Kickstarter campaigns to be able to achieve their goal.This potentially insulates them from investing time and money on projects that have little to no chance of being funded and, most importantly, direct them to projects with more successful prospects.A successful crowdfunding campaign can be attributed to a few factors [5] , such as developer credibility and prior experiences [3] .Variables, such as the content of the campaign, financial incentives, developer and sponsors' characteristics, feedback perspective, duration of campaign, deadline, goal expectation, and precision of information provided were investigated for the crowdfunding success [6−8] .However, determining which variables are critical is difficult.This study presents a case study in which feature selections and compared respective statistical models are used to assess successful crowdfunding predictions, shedding light on prediction model selection and optimization in crowdfunding success.More specifically, the aim of this study is to analyze and compare working models that can successfully predict Kickstarter campaigns, gauge the importance of key variables or features, such as Backers Count and amount in USD pledged, and to compare different machine learning algorithms, including Logistic Regression (LR), Support Vector Machines (SVMs) which are inclusive of linear and quadratic discriminant analysis, and ultimately random forests (bagging and boosting).

Data Description and Feature Engineering
The data used in this study result from crowdfunding campaigns conducted on the Kickstarter website between 2009 and 2017.The data was scraped in its original form by web robots.Projects with missing observations were removed from the original data so the data was inclusive of only those projects which had reached their specified time so as to have a distinct marker of outcome: success or failure.The resulting data without missing observations consisted of 82 228 projects with information recorded on 21 features.Notable among the features considered were: country from which campaign was launched; goal/amount targeted; amount pledged over time; number of backers or backers count; project category (including art, design, food, games, movie, music, photography, publishing, and technology); amount pledged in USD; amount pledged per person; percent of goal achieved; length of Kickstarter; state from which campaign is launched; backers as a percentage of population; days spent making the campaign; days from inception to deadline; response denoting success or failure; time and population factors categorized as short, medium, and long; and other features.

Feature engineering
To maximize insight into the dataset, feature engineering was performed.Summary statistics obtained from the data showed that 36 959 projects were considered successful, representing 45.95% of the total, and the remaining 45 269 were considered failures, representing 55.05%.The projects originated from 19 countries, with the majority of projects launched in the United States (about 96%) (see Table 1).Further descriptive statistics revealed that the state of California had the most projects (12 906)  and Delaware had the least (49).It was also observed (see Table 2) that music projects seem to have been the most successful followed closely by art and technology projects.Photography projects however were the least successful.A population factor was created by identifying cities with population size less than 93 794, between 93 794 and 1 211 704 , and greater than 1 211 704 .These were classified as low, medium, and highly populated cities, respectively.It is observed from the side-by-side bar chart in Fig. 1 that the projects from highly populated cities are more likely to be successful than those from less populated cities.
Kickstarter advises stakeholders that projects lasting 30 days or less tend to have higher success rates.Hence, having projects successfully funded in time is very crucial to project creators, not only raising the initial funds to get the project ideas off the ground, but also gaining exposure and helping them to get attention to other potential investors.As observed in Fig. 2 , if the number of days from the launch of project to deadline is less than or equal to 30 days, the project tends to be successful.Since the number of Kickstarter campaigns launched was relatively higher for the United States than all other countries, our analysis is restricted to the projects in this country.In fact, for the US data, it was realized that 35 337 projects were marked successful in contrast to 34 466 being unsuccessful after " data cleaning" was performed.Stacked plots for the US dataset in Fig. 3 seem to tell a similar story to the full dataset.Some interesting trends and patterns were further observed in the data.A variable of interest, percentage of goal (Prct_goal),  follows a bimodal distribution (Fig. 4) with excess zeros and excess successes (100 percent funded).Two variables that may be important predictors for determining Kickstarter success are the amount pledged and the backer count (Backers_count).The histograms that follow show the truncated distributions of both pledged USD (Fig. 5) and the backer's count (Fig. 6).The histograms are truncated at the 3rd quantile due to the extremely long right-tail.Both histograms display a similar distribution with most of the values lying at or near zero with long skew right-tails.
There are 15 project categories including art, comics, crafts, dance, design, fashion, film/video, food, games, journalism, music, photography, publishing, technology, and theater that are considered in the study.Certain categories tend to have higher rates of success, such as design, comics, and dance, while categories like journalism, food, and crafts tend to fail more often.All categories' average percents of goal are shown in Fig. 7 , The number of days the Kickstarter accepted donations has an irregular distribution, with most campaigns lasting 30 days.This is most likely due to the recommendation that a Kickstarter campaign be 30 days or less.The amount of money that a Kickstarter project needs to earn to be deemed successful is reflected by goal.These values are chosen by the founders when they are setting up their campaigns.These amounts range from 1 dollar up to 100 million dollars.The median value is 5 000 dollars. Figure 8 shows how the goal is distributed, although it is truncated at the 95th percentile (60 000 dollars) due to the extremely long right-tail.A variable that takes on values 0 through 4, called "Twords" , was created.These are based on the most common words used in the titles and blurbs of successful campaigns.A value of 0 means that none of the words appeared in the Kickstarter's title or blurb while a 4 indicates the most appearances of successful keywords.This variable was created by looking at the name and blurbs associated with the top 10 percent of successful campaigns.The 50 most common words not including "the", "and", "it" , etc., were viewed for both name and blurb.The name is the name of the Kickstarter campaign and the blurb is a short description that details further information about the Kickstarter.If any of the top 50 words was present in the title then it would be given a value of 1 or 2, depending on whether the word appeared in the title once, or more than once (1 corresponding with one appearance and 2 corresponding with more than one appearance).The same values were assigned for blurb, following the same rules as the assignments for title."Twords" was created by summing these two values together, thus, it takes on the values from 0 through 4. Figure 9 represents a stacked plot display of the frequency failed and successful projects.
The plot seems to indicate that design, dance, comics, theater, and game projects are markedly successful on the Kickstarter platform as evidenced by their higher success rates.In contrast, the plot suggests a very poor performance for journalism and craft related projects.
The population of the city where the Kickstarter was launched was a variable explored.These values range between 1 231 and 8 107 916.The median city population is 422 908, with a mean of 1 233 572.States were also examined, with many states show differing levels of success, as shown in Fig. 10.

Feature and variable selection
An attempt is made to establish possible relationships between continuous variables in the dataset.To achieve this, a correlation plot (see Fig. 11 ) was obtained for several selected variables.A closer look at the plot reveal highly positive correlations between some continuous variables.For example, "pledgedUSD" and "pledged" are highly correlated.This makes sense as these variables contain very similar information.The same could be said of "days_spent_making_campaign" and "days_inception_to_ deadline", and several other continuous variables.It is important to note that the presence of high correlation between these variables is an indicator of multicollinearity and may result in unreliable statistical inferences.To identify multicollinearity issues and address them, a so-called Variance Inflation Factor (VIF), condition indices, and variance decomposition proportions are used as detection measures.The VIF for each term in the model measures the combined effect of the dependences among the regressors on the variance of that term [9] .One or more large VIFs indicate multicollinearity.Practical experience indicates that if any of the VIFs exceeds 5 or 10, it is an indication of multicollinearity.Furthermore, condition indices greater than 30 and variance decomposition proportions greater than 0.5 are recommended guidelines for detecting multicollinearity.First, the VIF, condition indices, and variance decomposition proportions of the variables are obtained "cursorily" by means of a linear model.Results regarding the VIF and variance decomposition proportion measures on the continuous variables "goal", "backers count", "pledge per person", and " length of Kickstarter" facilitated the removal of the other continuous variables.In the presence of very large amounts of data with numerous potential technical predictors, such as that used in this Kickstarter project, it is infeasible for investigators or researchers to put all the potential predictors into a model, as many of these variables may not be associated with the outcome being predicted.In these scenarios, one may be interested in the prediction of an outcome and finding a "parsimonious" subset of variables that are associated with the outcome.This means that we can find a dimension reduction technique or method to determine the most important variables for analysis.In our case, we consider the use of the Least Absolute Shrinkage and Selection Operator (LASSO), which can assist investigators interested in predicting an outcome by selecting the subset of the variables that minimizes prediction error [10] .Here, the coefficients of some less contributive variables are forced to be exactly zero.Only the most significant or contributive variables are kept.The random forest approach or the criterion called Gini Importance or Mean Decrease in Impurity (MDI) that calculates the importance of each feature also presents us with a variable importance measure [11] .When both methods were applied, the variables goal, backers count or number of backers, pledge per person, length of Kickstarter project, project categories, time factor, and population factor were ranked as more contributive variables or the most significant variables in minimizing prediction error.

Classification algorithms
In this section, the machine learning algorithms explored in identifying the best predictive model for the Kickstarter data are explained.The classification algorithms employed are LR, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), classification trees, bagging, and boosting.Validation methods and the results of these methods are also reported.

LR
The LR model is a binary classification model for supervised learning in machine learning.In the LR model, the binary response follows a binomial distribution with probability of success and probability of failure under the assumption that there are independent and identically distributed Bernoulli trials; that the number of trials are fixed and that there are two and only two outcomes, labelled success and failure.This classification model models the probability of success as the conditional expected value of the response variable given the features , that is with the logit link function to the predictor, where are coefficient parameters of the features.The LR model is given by That is which can take the range of values from 0 to 1.The likelihood function of the LR model is where or 1.For maximum likelihood estimation, this function can be maximized by taking the natural logarithm of the likelihood function, differentiating with respect to the parameters, equating to zero, solving the equations using the iterative least squares method and obtaining [12]   .

LDA c, P(X
Although the LR model is a relatively powerful yet simple linear classification algorithm, it has limitations that necessitates the need for alternate linear classification algorithms.For example, when the two response classes are well-separated, the parameter estimates of this model become very unstable.Furthermore, for relatively small sample sizes, when the distribution of the features in the model is Gaussian distributed, the LDA becomes more stable than the LR model.LDA essentially models the distribution of features separately in each response class and then adopts Bayes theorem to estimate probabilities.LDA makes predictions by estimating the probability that a new set of features belong to each class.The class that gets the highest probability is the output class and a prediction is made.More intuitively, LDA can be derived from probabilistic models that model the conditional distribution of the data for each class LDA assumes that each data class follows or is modeled by a multivariate Gaussian distribution, where represents the number of features in the model.The covariance matrix is the same across all the classes, that is .LDA is assumed as a classifier, and its use is evidenced by the usage of the class priors estimated from the training data.This is done by finding the prior probabilities, which is as proportions of data in each class .The class means, as well as the covariance matrix are estimated by prior probabilities, Class means is Covariance matrix is In general, the classification function prescribed for new data points is as below: In the case of a binary classification as with our Kickstarter problem, , the classification function is then represented as The general LDA classification function is where The LDA models the binary response with a linear combination of the features.The QDA is similar to LDA in terms of the derivation of parameters.However, the underlying difference is that QDA models/classifies the response with a non-linear combination of features.Furthermore, unlike the LDA classifier, QDA assumes that each class of the training data possesses its own covariance matrix.This means that an observation pertaining to the c -th class will be of the form , with its own class covariance matrix .The decision boundary between the two classes is quadratic rather than a hyperplane.The QDA discriminant function is QDA estimates a covariance matrix for each class, and hence the number of effective parameters are greater than LDA.In terms of flexibility, LDA is a relatively better classifier, but if the training observations are very large as in our case, then the use of a QDA for classification is plausible.

Tree-based methods
Tree-based methods in machine learning are popular algorithms for classification and regression.These methods are notable in terms of their high prediction accuracy, stability, and their ease of interpretation.Furthermore, they are robust for investigating nonlinear relationships.Tree-based methods involve segmenting the feature space into regions.In terms of prediction, the summaries International Journal of Crowd Science of the training observations are used: that is, the mean and the node.There are so-called splitting rules used to segment the feature space.One merit of tree-based methods is their nonparametric nature; they have no underlying distributional assumptions about their feature space and the classifier structure.The tree-based methods employed in this project are classification trees, Bagging, and Boosting.

Classification trees s B s N s c s
Classification trees are a type of decision tree algorithm.They are used for the prediction of the membership of observations into classes of a categorical response from measurements taken on features.The idea behind the prediction is that each observation belongs to the most commonly occurring class of the training observations in the region to which it belongs.A classification tree is comprised of branches that represent attributes and leaves that represent decisions.In practice, the decision process commences at the trunk and follows the branches until a leaf is reached.For a classification tree algorithm, the interest is in class prediction of class proportions among training observations in their respective regions as well as class predictions corresponding to specific terminal node regions.The algorithm is an embodiment of the concept of recursive binary partitioning or splitting.This involves dividing up the dimensional space of the features into nonoverlapping rectangles.This division is accomplished recursively.The criterion used in making those binary splits is the so-called classification error rate, which is the proportion of incorrectly classified training observations in a region that do not belong to the most common class.To define this classification error rate, also known as the misclassification error rate, we need to define the proportion.For a node , which represents a region with corresponding observations, the proportion of class observations in node observations is represented as The majority class for node is represented as and hence the misclassification error can be written out as Alternatively, two other measures that are used in place of the misclassification rate are the so-called Gini Index and the crossentropy rate.The Gini Index is the measure of the total variance across the classes and sometimes described as the measure of node purity.
The Gini Index is represented as The cross entropy is defined as

Bagging
Bootstrapping is an increasingly popular and powerful concept that is used in machine learning.It simply refers to a resampling algorithm used to estimate statistics such as standard errors, means, and variances from a population by randomly resampling a dataset with replacement.The bootstrap facilitates understanding of the biases, variances, and features that exist in the resample and its application spreads to a variety of statistical learning methods, including those whose measure of variability is difficult to estimate.In essence, this method can be useful for testing the stability of a model, as multiple datasets are resampled, used, and tested on multiple models.

G
The aggregated bootstrap, or bagging, is an ensemble method which is an extension of the bootstrap method in matching learning that is applied to decision trees that suffer from very high variance.Decision trees generally suffer from high variance as splitting training observations/datasets randomly and fitting classification/regression trees to these random datasets may yield completely different inferences.Bagging comes to the rescue, as it can reduce the uncertainty associated with fitting decision trees with the randomly split datasets.Essentially, bagging reduces the variance associated with decision trees.From a training dataset, what bagging does is by using the bootstrap method, it repeatedly samples without replacement and generates different bootstrapped training datasets.Different prediction models are fitted using the independent bootstrapped datasets.Each prediction model suffers from a very high variance but low bias, especially for decision trees but subsequently all prediction models are averaged together to obtain a low variance prediction model.This "bagged" model is represented as .

Boosting (gradient boosting) L(y, f(x))
Boosting is another machine learning algorithm that reduces the variance resulting from the decision tree algorithm.It works in a similar way as bagging, except that with boosting, decision trees are grown in a sequential manner: that is, each decision/classification tree is grown from using information from previously grown classification trees.Each new tree results from the fit of a modified version of the original dataset.Unlike the bagging algorithm, boosting does not involve bootstrapping.The gradient boosting algorithm is a type of boosting algorithm for classification trees that we employ in this project.It trains predictive models in a gradual, additive and sequential manner.It discriminates the shortcomings of decision trees by using gradients in the loss function of the predictive models.The kind of desired loss function, , needs to be specified before hand.A modified general algorithm for the gradient tree boosting algorithm [13] is as follows.
(1) Initialize the optimal constant model, which is a single terminal node tree, (2) For to (iterations): These are referred to as pseudo/generalized residuals.

K T kg h kg
For gradient boosting classification algorithms, a loss function that can be assumed is a multinomial deviance.In this case, least squares trees will be constructed at each iteration.Each tree, will be fitted to its negative gradient , Furthermore, a boosting classification algorithm will have lines 2(a) -2(d) in the algorithm repeated K times at each iteration and will have a variant of the final output result in , as , .

Evaluating machine learning models
After exploring the machine learning models presented in Section 2, it is important to find metrics that quantify the performance of the predictive models.There are several metrics that are available for evaluating varying machine learning tasks.In this article, we focus on cross-validation approaches and classification metrics for evaluating our models.These metrics are inclusive of test error rates, Accuracy, Precision, Recall, and an F-measure (also sometimes known as the F1 score).

Cross-validation: Validate set approach and k-fold crossvalidation
Cross-validation involves estimating the test errors associated with the algorithms considered to be able to evaluate their performance.A good cross-validation method will give a robust measure of the various predictive models' performance throughout the whole dataset.The two cross-validation approaches considered in this article are the validation set approach and the -fold validation approach.The validation set approach, also known as the hold-out validation set approach, involves splitting the available set of observations into two nonoverlapping parts, called a training set and a test set (or hold-out set).For this project, the data split was of the data for training and of the data for testing.The predictive models of the various algorithms are fitted to the set and the fitted models are used to predict observations for the test set.We can then obtain classification test error rates for model evaluation.The merit of the validation set approach is its simplicity in terms of implementation and its low computational complexity.However, the downside of this method is that it may suffer from issues of high variance.This is a result of the uncertainty resulting from which observations will end up in either the holdout set or training set.Hence the result may be different for different sets.The -fold cross-validation is the next measure employed for model assessment.It involves the observations being first randomly split into k groups or folds.The first group will be used as the test set, and the algorithm is fitted to the remaining groups.The test error rate is then computed for the observations in the test set.There is then an iteration of the procedure times.For each of the times, a different group will be treated as the test k k set.As a result, there will be test error estimates of the test sets and thus a reasonable approach will be to average the classification test errors to get one estimate of the test error.In this article, the 5and 10-fold validation approaches are considered.The merit with this method is its accurate estimation performance.The higher the value of chosen, the less biased model the method results.

Classification metrics
Classification metrics for evaluating predictive models are usually premised on a confusion matrix [14] .This matrix, when constructed, specifies the number of test cases that are correctly and incorrectly classified, and entails the needed information for constructing the metric.In this article, we adopt the Accuracy, Precision, Recall rate, and an F1 score as comparison metrics to the test error rates realized from the cross-validation approaches.The Accuracy metric can be defined as the number of the number of test cases correctly classified, that is, the sum of True Positive (TP) and True Negative (TN) cases divided by the total number of all test cases which also includes False Negatives (FN) and False Positives (FP).This is represented as More precisely, the Accuracy metric involves an overall measure of how correctly the classification model predicts the entire dataset.Owing to its relatively easy computation and understanding, the Accuracy measure is widely used.However, a drawback with this measure is that, for highly unbalanced datasets, it masks classification errors for classes with few cases and thus, may perform poorly [15] .Another metric that is useful is the Precision.The Precision is a ratio of true positive cases predicted to the sum of TP and FP.Intuitively, this metric measures the ability to correctly detect or classify cases belonging to the positive class.The higher the ratio, the better the precision of the classification model.This is represented as The Recall rate is another classification metric that is employed in this article.It is defined by the ratio of TP cases to the sum of TP and FN cases (that is, the total number of positively classified cases).Thus, this metric is informative in part, because it specifies the number of positive cases correctly predicted from the total number of positive cases.It is worthy of mention that the Precision and Recall metrics form the building blocks of the F1 score metric which is the last section considered.The Recall metric is given as Finally, the last metric we consider for model comparison is the F1 score.This is a combination of the Precision and Recall metrics via a harmonic mean equation which is given as The larger the F1 score, the higher the Precision and the Recall.It serves as a good compromise between Precision and Recall and tends to work well with highly imbalanced datasets, unlike the Accuracy metric.

Results
The results of the six machine learning algorithms used for prediction and their corresponding test error rates, Accuracy, F1 score, Precision, and Recall rates resulting from the crossvalidation approaches are tabulated and shown in Table 3.Of the 6 methods used for prediction, the test error rates obtained across the three cross-validation methods suggest bagging and gradient boosting are the most robust methods for predicting the success of Kickstarter projects.The test error rates for linear and quadratic discriminant analysis seem to be close in comparison. in fact, the misclassification rates are around 30% for both methods.The LR model seems to come close as the next better predictive model after bagging and the gradient boosting algorithms as evidenced by its low test error rates of about just 5% -6%.These results, interestingly, are also in line with the Accuracy metric considered.Overall, the LR, bagging and gradient boosting models have relatively higher Accuracy rates and thus have better predictive performance.This is further evidenced by their F1 score, Precision, and Recall scores, which also are the higher amongst all models considered.The LDA, QDA, and tree models perform similarly but possess less Accuracy and F1 scores than their counterpart models.

Discussion and Future Work
This study sought to mainly investigate statistical learning methods and associated machine learning algorithms based on feature engineering that present us with the best predictive models for predicting the success of Kickstarter campaigns.The data used was web-scraped from KickStarter, one of the biggest rewardbased crowd funding platforms in the world.Over 80 000 observations and 61 features were used.Because a lot of the Kickstarter projects (about 96%) originated in the United States, the emphasis of the study was placed on these projects.First, feature engineering was performed to target the most relevant variables.After a dimensionality reduction was performed with LASSO, the random forest procedure, and multicollinearity diagnostics, the variables of goal, backers count, time, and population were ranked as the most contributive and significant variables in minimizing the prediction error of any machine learning methods we planned to use.Six machine learning algorithms were then explored.The performances of these methods were employed for validity with three cross-validation approaches and classification metrics, such as test error rates, Accuracy, F1 scores, Precision, and Recall were tracked.The results showed the bagging and gradient boosting methods for classification as having the least test error rates and overall very high Accuracy, Precision, and Recall rates, indicative of better classification methods for predicting success rates of Kickstarter campaigns.The major research question hence has been answered.However, it is important to note that for the very complex data, the assumptions for some classification methods, such as LR analysis that is a parametric approach, have been shown to be unrealistic and not flexible.This is because it first assumes that the sample data comes from a population that follows an identical probability distribution with a fixed number of parameters.The second assumption of independence of observations is not always plausible for complex datasets.Hence the Bayesian nonparametric approach will be a more plausible approach and worthy of future consideration to promote generalization.The Bayesian nonparametric models are more robust and valid across problems as they allow the usage of an infinite number of parameters to capture the features of the distribution underlying the complex data.Moreover, if the interest is the identification and understanding of the effect of particular variables considered on the rate of success, then causal inference models rather than curve fitting should be further explored.

Fig. 7 Fig. 8 Fig. 9
Fig. 7 Histogram display of the distribution of average percent of goal for each category.

Fig. 10 Fig. 11
Fig. 10 Histogram display of the average percent of goal for different states.
Fit a regression tree to the targets giving terminal regions, .j= 1, 2, . . ., J g (c) For , compute Predicting the Entrepreneurial Success of Crowdfunding Campaigns Using Model-Based Machine Learning Methods International Journal of Crowd Science | VOL.6 NO.1 | 2022 | 7-16

Predicting the Entrepreneurial Success of Crowdfunding Campaigns Using Model-Based Machine Learning Methods
International Journal of Crowd Science | VOL.6 NO.1 | 2022 | 7-16