In Vitro Fertilization (IVF) Cumulative Pregnancy Rate Prediction From Basic Patient Characteristics

Tens of millions of women suffer from infertility worldwide each year. In vitro fertilization (IVF) is the best choice for many such patients. However, IVF is expensive, time-consuming, and both physically and emotionally demanding. The first question that a patient usually asks before the IVF is how likely she will conceive, given her basic medical examination information. This paper proposes three approaches to predict the cumulative pregnancy rate after multiple oocyte pickup cycles. Experiments on 11,190 patients showed that first clustering the patients into different groups and then building a support vector machine model for each group can achieve the best overall performance. Our model could be a quick and economic approach for reliably estimating the cumulative pregnancy rate for a patient, given only her basic medical examination information, well before starting the actual IVF procedure. The predictions can help the patient make optimal decisions on whether to use her own oocyte or donor oocyte, how many oocyte pickup cycles she may need, whether to use embryo frozen, etc. They will also reduce the patient’s cost and time to pregnancy, and improve her quality of life.


I. INTRODUCTION
According to the World Health Organization (WHO) [33], infertility is "a disease of the reproductive system defined by the failure to achieve a clinical pregnancy after 12 months or more of regular unprotected sexual intercourse." For women under 60, infertility was ranked the 5th highest serious global disability [1]. Estimates from 25 international population surveys sampling 172,413 women indicated that 9% of them suffered from infertility [5]. Another study [14] on household survey data from 277 demographic and reproductive health surveys for women aged 20-44 estimated that 48.5 million couples worldwide suffered from infertility in 2010. The 2006-2010 United States National Survey of Family Growth (NSFG) [7] sampling 22,682 men and women aged 15-44 also found that 6.0% (1.5 million) women suffered from infertility in 2006-2010. Assisted reproductive technology (ART) [23] could help these couples to conceive pregnancy. The most common ART is in vitro fertilization (IVF) [8], which retrieves eggs from a woman's ovaries, fertilizes them in the laboratory, and then transfers the resulting embryos into the woman's uterus through the cervix. According to the 2015 ART National Summary Report [2], more than 99% ART cycles performed in the United States in 2015 used IVF.
The timeline of a typical IVF procedure is shown in Fig. 1. During the patient's first visit, initial consultation is conducted, her medical history is recorded, and basic medical examination is performed. This process may take 1-2 days. At Day 3, the patient's basic characteristics such as age, BMI, infertility duration, AFC, AMH, FSH, pathogenesis, etc., are available. If the patient determines to perform IVF, then usually it will take three menstrual cycles. In the first menstrual cycle, additional examination and controlled ovarian hyper-stimulation (COH) are performed. Oocyte pickup and egg fertilization are done in the second menstrual cycle. Embryo or balstocyst transfer are performed in the third menstrual cycle. The entire process takes about 2-3 months. During this process, embryo morphology features can be extracted to determine the embryo quality, number of embryo to transfer, and the transfer plan, etc. If the patient fails to conceive after embryo transfer, she has to spend the same amount of time again to repeat this procedure, which represents a heavy burden to many patients, economically, physically, and emotionally.
Cumulative pregnancy rate, which tells the probability that a patient conceives pregnancy after multiple IVF cycles, is an important measure for evaluating different IVF approaches, and is usually also the first question that a patient asks before starting the IVF. Given the long duration (2-3 months) and high cost of an IVF cycle (the average cost of an IVF cycle is approximately $10,000-15,000 in the United States [12], and $4,500 in Tongji Hospital in China), it is important to be able to accurately estimate the individualized cumulative pregnancy rate, so that the patient can make the most appropriate decisions on whether to use her own oocyte or donor oocyte, how many oocyte pickup cycles she may need, whether to use embryo frozen, etc. Artificial intelligent, particularly machine learning [4], could be used for this purpose.
Machine learning has rapidly progressed the medical field during the past few years. It has been used to predict the development of hepatocellular carcinoma [21], adult autism spectrum disorder [30], non-small cell lung cancer prognosis [32], human oocyte developmental potential [31], the risk of acute myeloid leukaemia [3], etc., and also to identify a human neonatal immune-metabolic network associated with bacterial infection [22], to classify skin cancer [9], to isolate individual cell for scalable molecular genetic analysis of single cells [6], Machine learning has also been used to predict the pregnancy result with features obtained before and during the IVF, including basic patient characteristics, embryo morphology, and so on. For example, decision trees [18], [19] have been used to investigate the relationship between the outcome of transfer and 53 embryo, oocyte and follicular features [20], to predict the IVF outcome from 100 variables related to the basic patient characteristics (e.g., age, body mass index, etc.) and derived from the different stages of the IVF cycle (e.g., the amount of hormone treatment, the measurement of ovary volume, etc.) [17], and to predict the IVF outcome from 69 features on patient's basic information, diagnosis, clinical tests, treatment methods, etc [11]. Bayesian classifiers have been used to select the most promising embryos to transfer to the woman's uterus using features related to clinical data and embryo morphology [16], and to predict implantation outcome of individual embryos in an IVF cycle from 18 features including age, infertility factor, treatment protocol, sperm, embryo morphology, etc [25]. Support vector machines (SVMs) [27] and Bayesian Classifiers [26] have been used to predict implantation outcomes of new embryos from 17 features related to patient characteristics, clinical diagnosis treatment method, and embryo morphological parameters. However, to our knowledge, no one has used only patient characteristics from basic medical examinations to predict the cumulative IVF pregnancy rate, as we are doing in this study.
In this paper, we propose supervised and unsupervised machine learning approaches for cumulative pregnancy rate prediction from basic patient characteristics. We show that the approach that integrates unsupervised learning and supervised learning achieves the best performance. Our approach can significantly save the time and cost in predicting the cumulative IVF pregnancy rate, and thus can help the patients make more appropriate decisions before the IVF starts.
The remainder of this paper is organized as follows: Section II introduces our three machine learning approaches for cumulative pregnancy rate prediction. Section III presents the experimental results. Section IV discusses the benefits of our proposed approaches. Finally, Section V draws conclusion.

II. OUR PROPOSED MACHINE LEARNING APPROACHES
This section introduces the dataset used in our study, and the feature selection and machine learning approaches for cumulative pregnancy rate prediction from basic patient characteristics.

A. The Dataset
This study consisted of 11,190 Chinese couples who suffered from infertility and received IVF treatments at Tongji Hospital (ranked 3rd in Gynaecology and Obstetrics in China), Huazhong University of Science and Technology, Wuhan, China, between January 2016 and March 2018. Their IVF cycles varied from one to 11, as summarized in Table I. Only basic patient characteristics obtained from the initial medical examination were used in our prediction, which included female age, female body mass index (BMI), infertility duration, antral follicle count (AFC), anti-mullerian hormone (AMH), follicle-stimulating hormone (FSH), and 30 pathogeny factors.

B. Feature Selection
In order to select the most informative features, we performed logistic regression [13] using all basic patient characteristics, where each categorical feature was converted to a binary value using one-hot encoding. We used only Cycle 1 pregnancy results as the labels for logistic regression, and excluded patients who did not receive a transfer in Cycle 1. Multiple logistic regression analyses showed that 14 features had significant correlation with pregnancy results (P < 0.01). Among them, three etiological factors (endometrial tuberculosis, chromosome abnormality, and others) had fewer than 2% of the total patients. They were removed to make the features more representative. As a result, 11 features were finally selected for further analysis, and they are marked by asterisks in Table I.

C. Cumulative Pregnancy Rate Prediction
The prediction of IVF outcome is extremely difficult using only basic patient characteristics without controlled ovar-ian hyper-stimulation details, and embryo and endometrial features. According to previous research, embryo features are very important for the final outcome prediction using machine learning [11], [15]. When using only basic patient characteristics, we assume that patients having similar basic characteristics also have similar pregnancy rates. This is the best assumption we could make before starting the actual IVF. When the patients start the IVF, more features could be extracted, and more individualized prediction could be made. However, these features are not available before the IVF, and hence will not be used in our model.
We constructed three different machine learning modelsclustering, SVM, and clustering-SVM (C-SVM), and compared their performances using three measures. The pipeline of our three machine learning approaches is shown in Fig. 2. Only the 11 asterisk features in Table I were used. We first used one-hot encoding to convert each categorical feature into numerical features, and then performed z-normalization to transform each feature to have mean 0 and standard deviation 1.

D. Model 1: Clustering
In the training phase of the clustering approach, we first applied k-means clustering with k = 30 to all patients. We then identified all possible 30 × 29/2 = 435 unique pairs of clusters. For each pair, we performed the log-rank test [10], [24], [29] between the two clusters to check if the difference between them was significant. If the p value of at least one of the 435 tests was larger than a predefined threshold α (α = 0.01 was used in our study), then we identified the two clusters with the largest p-value (which meant the two clusters were the most similar) and merged them. We repeated the log-rank tests with the remaining clusters, until all p-values were smaller than α. We then recorded the center of each cluster, and its corresponding cumulative pregnancy rate.
In the testing phase, when the basic characteristics of a new patient came in, we assigned the patient to the cluster with the closest centroid, and then used the corresponding cumulative pregnancy rate as the prediction.

E. Model 2: SVM
For the SVM classifier [28], we first performed 5-fold cross validation on the training set to search for the best kernel function (polynomial, RBF, or linear) and to determine whether a larger weight should be used to accommodate the minority class. Eventually we used the RBF kernel and set the per-class weights inversely proportional to class frequencies in the training data. We then used penalty parameter C = 1 to train a probabilistic SVM classifier.

F. Model 3: C-SVM
The C-SVM approach was a sequential combination of the clustering approach and the SVM approach. In the training phase, it first used the clustering approach to group the patients into several clusters, and then trained an RBF SVM for each cluster to individualize the patients within each cluster.  Table I, convert the categorical features to numerical features using one-hot encoding, and z-normalize each feature. Clustering is an unsupervised approach. SVM is a supervised approach. C-SVM integrates both unsupervised and supervised approaches.
In the testing phase, when the basic characteristics of a new patient came in, we first assigned the patient to the cluster with the closest centroid, and then used the corresponding SVM to predict a more individualized cumulative pregnancy rate.

III. PREDICTION RESULTS
This section compares the prediction performances of the three proposed approaches.

A. Area under the Curve (AUC)
First, we evaluated the performances of the three approaches by randomly sampling two thirds of the patients as training data, and the remaining one third as test data. We used the training data to train the three models and then validated them on the test data. Their receiver operating characteristic (ROC) curves are shown in Fig. 3, and the corresponding areas under the curve (AUCs) were also computed and indicated in the legend. Fig. 3 shows that SVM and C-SVM had similar AUC performances (0.69 and 0.70, respectively), both of which were higher than clustering (AUC=0.67).

B. Cumulative Pregnancy Rate Prediction
Once we get the predicted probability and the corresponding cluster of each patient in test data, we can predict the cumulative pregnancy rate using the mean probability of the corresponding cluster. Fig. 4 shows the cumulative pregnancy rate curve using the three approaches. Although SVM had promising AUC in Fig. 3, its cumulative pregnancy rate prediction had large biases. On the other hand, clustering and C-SVM, particularly C-SVM, had much smaller prediction errors.

C. Stability of the Prediction Models
In order to test the stability of the three prediction models, we repeated them 30 times, each time with different training and test data. As Table I shows that less than 1% patients had more than three cycles, we did not consider cycle numbers larger than three. The mean and standard deviation of the AUCs from the 30 runs are shown in the first part of Table II. On average C-SVM achieved the best AUCs in the three cycles.
We also studied the stability of the three approaches using another the root mean squared error (RMSE). For each model in each run, we concatenated the predicted cumulative pregnancy rates in three cycles and n clusters into a 3nelement vectorŷ = [ŷ 1 , ...,ŷ 3n ], and computed the RMSE  between the predictions and the corresponding groundtruth y = [y 1 , ..., y 3n ], A smaller RMSE means a better performance. The mean and std of the RMSEs in the 30 runs are shown in the second part of Table II. Again, C-SVM achieved the best performance. We also performed an analysis of variance (ANOVA) test to check if there was statistically significant difference between each pair of algorithms. The p-values are shown in Table III, where the statistically significant ones are marked in bold. SVM and C-SVM were statistically significantly better than clustering on AUC, and SVM and C-SVM were statistically significantly better than clustering on RMSE. In summary, C-SVM achieved the best overall performance.

IV. DISCUSSIONS
This section discusses the advantages of our proposed approaches, particulary C-SVM, our best-performing model.

A. C-SVM Reduce the Time and Cost to Predict the IVF Cumulative Pregnancy Rate
Our C-SVM model uses only the basic medical examination information during the first visit (which takes two days and costs about $50 in Tongji Hospital in China) to predict the cumulative pregnancy rate, and the result can be known immediately after the visit.
Compared with the conventional approaches in the literature, which use information during the IVF (which takes 2-3 months and costs about $4,500 in Tongji Hospital in China), our approach is much faster and more economic. It significantly saves the patient's time and cost, and represents a step towards precision medicine and individualized treatment.

B. Cumulative Pregnancy Rate Prediction Is More Informative than Single-Cycle Pregnancy Rate Prediction
The total duration and cost of IVF is significantly impacted by the number of oocyte pickup cycles. Since oocyte pickup is time-consuming and expensive, a patient may choose to freeze the extra embryos from the first oocyte pickup cycle to reduce the time and cost: the frozen embryos can be transferred in case previous transfers fail, without the need to pickup fresh oocyte and fertilize them again. However, frozen embryo transfer may have a lower pregnancy rate than fresh embryo transfer. So, it is important to know the cumulative pregnancy rate of fresh embryo transfers so that the patient can make a smarter decision on whether it is worthwhile to save the time and cost of another oocyte pickup cycle. Our C-SVM model can predict the cumulative pregnancy rates after one, two or three oocyte pickup cycles, which gives the patients exact information they need in decision making.
A patient with poor ovarian reserve is very difficult to conceive using her own oocyte. Knowing the cumulative pregnancy rate using her own oocyte could greatly help her make a wiser decision: if the cumulative pregnancy rate using her own oocyte is much lower than her expectation, then the patient may choose to receive donor oocyte, which may have much higher pregnancy rate. In this way, the patient can avoid potentially multiple controlled ovarian hyperstimulations, shorten the time to pregnancy, reduce the overall cost, and hence improve the quality of life.
V. CONCLUSION In this paper, we have developed three different approaches (clustering, SVM, and C-SVM) to predict the IVF cumulative pregnancy rate in multiple cycles of oocyte pickup using basic patient characteristic. The selected parameters included female age, female BMI, infertility duration, AFC, AMH, FSH, and five pathogeny factors (diminished ovarian reserve, perimenopause, paternal factor, PCOS, and intrauterine adhesion). Experimental results showed that the AUCs of SVM and C-SVM were better than that of clustering, and the prediction RMSEs of clustering and C-SVM were smaller than that of SVM. In summary, C-SVM seems to be the best model.
To our best knowledge, this is the first study on using machine learning to predict the cumulative pregnancy rate of multiple IVF cycles from only basic patient characteristics before the actual IVF. The predictions can help the patient make optimal decisions on whether to use her own oocyte or donor oocyte, how many oocyte pickup cycles she may need, whether to use embryo frozen, etc. They will also reduce the patient's cost and time to pregnancy, and improve her quality of life.