Deep Learning Applied for Histological Diagnosis of Breast Cancer

Deep learning, as one of the currently most popular computer science research trends, improves neural networks, which has more and deeper layers allowing higher abstraction levels and more accurate data analysis. Although deep convolutional neural networks, as a deep learning algorithm, has recently achieved promising results in data analysis, the requirement for a large amount of data prevents its use in medical data analysis since it is challenging to obtain data from the medical field. Breast cancer is a common cancer in women. To diagnose this kind of cancer, breast cell shapes in histopathology images should be examined by senior pathologists. The number of pathologists per population in the world is not enough, especially in Africa, and human mistake may occur in diagnosis procedure. After the evaluation of deep learning methods and algorithms in breast histological data processing, we tried to improve the current systems’ accuracy. As a result, this study proposes two effective deep transfer learning-based models, which rely on pre-trained DCNN using a large collection of ImageNet dataset images that improve current state-of-the-art systems in both binary and multiclass classification. We transfer pre-trained weights of the ResNet50 and DesneNet121 on the Imagenet as initial weights and fine-tune these models with a deep classifier with data augmentation to detect various malignant and benign samples tissues in the two categories of binary classification and multiclass classification. The proposed models have been examined with optimized hyperparameters in magnification-dependent and magnification-independent classification modes. In the multiclass classification, the proposed system achieved up to 98% accuracy. As for binary classification, the proposed system provides up to 100% accuracy. The results outperform previous studies accuracies in all defined performance metrics in breast cancer CAD systems from histological images.


I. INTRODUCTION
Breast cancer (breast carcinoma) is the most common type of cancer in women, and it is the most dangerous cancer, together with lung cancer [1], [2]. Early detection of this type of cancer is crucial to reduce the mortality rate since breast cancer is often treatable when it is diagnosed early. Cancer starts from a benign state and, without appropriate treatment at the early stages, it becomes malignant. A common way to detect breast cancer is histological biopsy evaluation [3]. An Experienced pathologist evaluates breast histopathology images in various levels of magnification. Some times there is a need for complementary imageries like mammography to determine whether the sample tissue is malignant or not.
The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Tang .

APPEARANCES OF CANCER
When the breast cells start to grow abnormally, breast cancer occurs. The infected cells divide more frequently than normal healthy cells and form a mass or lump. The cells may spread to the lymph nodes first, and then other parts of the human body.

HISTOPATHOLOGICAL EXAMINATION
A histological biopsy is a thorough examination of a sampled tissue under a microscope. The minimum magnification requirement for a proper diagnosis is 40×. However, the suspected areas have to be magnified to 100×, 200×, and 400× in order to enable the pathologist to evaluate cell shape [2].

A. PROBLEM STATEMENT
The diagnosis procedure of breast cancer is operator dependent and requires an experienced pathologist. However, some  human factors like exhaustion and insufficient concentration could cause the misdetection of samples type within long and continuous procedures. In the case of misdetection, cancer may grow, and the survival rate in that condition is low. Some countries have very few pathologists per population. There is only one pathologist for every 100,000 and 130,000 in Africa and China, respectively [4]. In order to counteract the lack of experienced pathologists, the possibility of human error, the time-consuming process of screening samples, and the high cost, several Computer-Aided Diagnosis (CAD) techniques for early and automatic detection of breast cancer have been proposed and evaluated by researchers in the past [5]. These techniques can significantly help the early diagnosis of cancer. However, they are challenging to implement. Machine learning approaches nowadays are frontier in the CAD trend. With the rise of deep learning (as a part of the machine learning family), many studies have used this method in order to precisely detect samples type in histology images. However, The variety of cells size, shape, color, and scale in the histological images from one side and the complex structure of human body cells, low image quality, and similarity between benign and malignant samples from the other side can make the task challenging and prevent achieving high accuracy. Additionally, the lack of extensive and labeled datasets has also created another big challenge for the mentioned problem.

B. RELATED WORKS HISTOPATHOLOGY DATASETS
The main source for any CAD system is both data collection and labeling within real decision-making situations by experts. There are only three datasets for breast cancer histopathological diagnosis; Mitosatypia [6], Bioimaging [7], and SSAE [8]. They are not only not fully available but also have some clinical value issues [9]. Apart from availability and clinical issues, They contain 120, 1401, and 37 images from one magnification level, respectively, which is a meager amount of data. The newest public histopathological breast cancer dataset, known as BreakHis, was released in 2016 and had the highest clinical value [5]. It has 7909 images in four magnification levels (40×, 100×, 200×, and 400×).
Following the fact that our primary research dataset is BreakHis, we reviewed the previous studies mostly on the BreakHis dataset in both binary and multiclass classification.
In the binary classification, the aim is to make an algorithm that can predict whether a sample tissue is benign or malignant. In the multiclass classification, the aim is to make an algorithm that can predict the exact subtype of tissues. These subtypes are; Adenosis (A), Fibroadenoma (F), Phyllodes Tumor(PT), and Tubular Adenoma (TA) as benign's subtypes, and Papillary Carcinoma (PC), Mucinous Carcinoma (MC), Ductal Carcinoma (DC), and Lobular Carcinoma (LC) as malignant's subtypes.

BINARY CLASSIFICATION
In the binary classification, the aim is to make an algorithm that is able to predict whether a sample tissue is benign or malignant.

MODELS
BreakHis authors were the first group that evaluated their dataset using a deep learning-based CAD system. They entrusted features extraction and classification to a deep CNN [5]. They were able to improve their results by using the AlexNet network for transfer learning. The study [10] evaluated CNN with multiple handcrafted features and compared the results with those provided with raw images. They achieved their best results by using residual blocks inspired by ResNet. Study [11] tried to find the best CNN model for this classification task and compared models such as AlexNet, ResNet, and GoogleNet. Their results are showing ResNet as the best model. The latter study also insists on the necessity of data augmentation, fine-tuning all layers, and providing large Whole Slide Images (WSI) instead of small patches. The work [9] chose the Inception-v3 model as a more efficient CNN in comparison with shallower models.
Study [12] has evaluated AlexNet and DeCAF feature extractor by transfer learning strategy. This approach enabled them to extract features from the last layer of the pre-trained AlexNet and use it for training their classifier. Authors in [13] evaluated the effect of different dimensional reduction methods on extracting features: Correlation-Based Feature Selection(CBFS), Gaussian Random Projection(GPR), and Principal Component Analysis(PCA). This study used pre-trained VGG for its purpose. Study [14] found the last convolutional layer in a model provides more important features in comparison to the final fully connected layers. Study [15] introduced a dual-stage fine-tuning that retrains a fully connected layer first and then the network thoroughly. The research [16] showed that fine-tuning on the last three layers of pre-trained AlexNet network works better than the Support Vector Machine (SVM) classification of concatenated features extracted from two pre-trained networks.
Study [17] introduced a model called deep domain knowledge-based features that mitigates the gap between the extracted features and the required specific domain that comes from using a pre-trained network on other datasets. The latter study retrained the pre-trained CNN on the BreakHis dataset for efficient feature extraction. Study [18] evaluated the post encoded CNN features with Fisher Vector(FV). This study extracted a set of local features from the last convolutional layer of the model to encode local features to the FV descriptor. This approach raised the issue of high dimensionality in the data, meaning that the number of dimensions is hugely high; therefore, calculations increases in the time complexity. This issue is solved by embedding each block of the FV into a lower level dimensional feature space through a dimensionality reduction algorithm based on a multi-layer neural network model [19].
The study [20] proposed a model to capture multi-scale features by using different convolution models to ease the transition between the last convolutional and fully connected layers. Study [21] proposed a modified CNN that considers subclass information and label of each image as prior knowledge. Authors of this study claimed that their modified CNN could learn features distance better in binary classification. Research [22] introduced a CNN that is divided into a convolutional layer, pooling, and fully connected layer, and multiple hyper-parameters were tested to find the most suitable architecture for the BreakHis classification task. This new-designed convolutional model(NDCNN) was able to achieve up to 90% accuracy in binary classification. Following the fact that there may be some benign part in malignant sample images, adopting a patch-based approach becomes challenging.
Study [23] tried to overcome this challenge and introduced a Multiple Instance Learning(MIL) approach for random patches(64 × 64) extraction. Research [24] continued the MIL approach and proposed a Multiple Instance Pooling layers (MIP) to select the most distinctions feature and extract it instead of the extraction of every feature. They did it by modifying the loss function to higher activation instances.
Study [25] proposed a framework to avoid mislabeling occurs when using a patch-based approach. This framework ignores less representative samples(high confidence samples with lower entropy)and reduces the annotation cost. The remaining samples are given to the model for automatic annotation.
Authors in [26] introduced an interesting method that used the GoogleNet pre-trained model and trained the last layer in a magnification-specific way. Then for testing, all four magnification-specific-trained models aggregated by using the majority voting rule. Every four magnification-specific models prospect the test image, and if at least two of the models have the same decision, that decision is being used for the output of the network.
Research [27] fed handcrafted Tamura features instead of raw image to a Deep Belief Network(DBN) that consists of four Restricted Boltzmann Machines(RBM). Based on the fact that autoencoders have shown interesting results in image classification [28], the study [29] proposed a framework that has a Landmark ISOMAP(L-ISOMAP) for extracting features of histology images. This method was able to obtain high accuracy.

PREPROCESSING METHODS
Some studies decided to utilize a prepossessing technique on raw data to improve their results. Study [30] evaluated the importance of data augmentation on 40× magnification images on the result. Study [31] evaluated the effect of cluster-transformed images with the use of different clustering algorithms and compared them with the same CNN with raw images input.
Study [32] proposed a k-mean clustering on images to highlight nuclei segmentation as preprocessing. This study used a Discrete Wavelet Transform(DWT) to extract features from cluster-transformed images. Support Vector Machine(SVM) is chosen as a classifier with these features in the latter study.
Research [37] worked on color texture variation of histopathology images. This study evaluated the performance of multiple color-texture descriptors with different classifiers. The study did this step for each of the four magnification levels and made an integrated model as a magnification-independent model. Research [39] decided to remove normalizing to help model learn color-texture variabilities. This study found that performing the grayscale transformation, as a stain normalization method, decreases the accuracy of the results. Study [52] claims that conventional normalization techniques increase the noise in the image and introduce a new normalization technique that controls the noise.

MULTICLASS CLASSIFICATION
In the multiclass classification, the aim is to make an algorithm that is able to predict the exact type of sample tissues from other types. The study [56] used K-Means and autoencoder approached for image clustering and successfully classified histopathology images using InceptionRes-Net2. Moreover, this study evaluated the effect of the data augmentation method and was able to obtain 95.3% accuracy for its best results.
Study [57] worked on obtaining two patches (patch sampling) to prevent information loss. A CNN and K-means algorithm were combined for this purpose. With the use of ResNet50, first, essential features were extracted, and then the model was tested, and 95% of accuracy for four breast cancer types was achieved. Further study [58] elevated the performance of classification with the use of a DCNN with the gradient boosting classifier. It used inception-300 × 300+GBT and achieved 93.5, 95.3, 96.1 and 91.1 accuracy on the 40X, 100X, 200X, and 400X images, respectively.
In [27], an accuracy of 91% was achieved by the use of a CNN with a mean-shift algorithm. Similarly, paper [59] generated higher dimension features for improving image classification accuracy. It achieved 87% accuracy.
Research [60] has used the nucleus-guided training of CNN to diminish the noise, which is coming from stroma to increase the CNN classification, together with reducing computational time. In [35], classification accuracy was increased with using ReLU to reduce vanishing gradient possibility. It has used filters with various sizes of 3 × 3, 5 × 5, and 7 × 7. VOLUME 8, 2020 The proposed model obtained 93.2% accuracy in multiclass classification.
In [61], a Nottingham Grading System (NGS) was proposed to differentiate images into three subclasses. The study [62] focused on histopathological image classification limitations and successfully proposed an Enhanced Loss Function (ELF) method to increase the classifier's performance. It also shows that the ELF increases the accuracy of classification for 3% and duration of processing time up to 30-40 seconds.
In [63], two methods for the classification of histology images have been proposed. The first method is reliant on handcrafted features in which Hu moment, color histogram, and Haralick texture are used for the extraction of features from images of the BreakHis dataset. The extracted-features were then utilized for training the classifier. The second method is using transfer learning with the use of pre-trained ResNet50, VGG16, and VGG19. This research achieved its best result by transfer learning of the VGG16 network with a linear Support Vector Machine (SVM In [64], a class Kernel Principle Component Analysis (KPCA) is introduced for extracting features, and then for each extracted feature, a KPCA model is trained. This process is repeated for all the images in the dataset, and finally, a pre-trained KPCA model makes the decision. This approach achieved 92% accuracy in binary classification.
Research [5] has used a pre-trained AlexNet for binary classification on the BreakHis dataset. The pre-trained AlexNet uses sliding windows and random extraction techniques.
In [12], linear regression and pre-trained CaffeNet was used to extract features and classification. Study [35] used a deep convolutional network that is able to learn discerning features. It achieved an accuracy of 92-95% on various magnification-level images in the BreakHis dataset.

C. MOTIVATION
Recent researches have shown that deep learning methods, particularly Convolutional Neural Networks (CNN), are excessively effective for image analysis [12]. In fact, CNN has become the frontier machine learning tool for computer vision and image analysis. The recent breakthrough in deep learning shows great potential to increase the performance of applications. This advancement gravitated our attention and curiosity to investigate and develop an efficient method based on deep learning to help to solve a real-world problem in medical data analysis. This work focuses on the automatic detection of breast cancer in histopathology images.

D. CONTRIBUTIONS
Give the above, this work aims to: • Develop a high accuracy method by improving previous works' accuracies.
• Experiment image magnification-dependent and independent approaches.
• Define the exact sub-type of the sample tissues. Achieving a high accuracy result helps the feasibility of CAD systems for breast cancer recognition in medical practice.

E. APPROACH
In order to achieve the goals, a bottom-up development approach is considered, and during each subsystem development and integration, the Agile SCRUM methodology has been chosen. The reason for choosing this methodology is the type of project which is within software development. SCRUM enables us to obtain the highest efficiency through weekly sprints [65]. In each sprint, the work that has been done in the previous week will be reviewed, and then a new sprint will be defined for the following week. The subtasks are: • Survey the literature related to the topic. • Evaluate image classification algorithms and state-ofthe-art deep learning methods.
• Choose the most suitable algorithm for image recognition and design a new CAD system based on the structure of the efficient algorithms for our specific problem.
• Develop a preprocessing technique for the preparation of the dataset and experiments.
• Define a set of hyperparameters and optimize them for our specific problem.
• Implement the chosen models with the optimized preprocessing technique and hyperparameters for breast cancer detection.
• Perform test and analysis of the chosen models to achieve the highest accuracy.
• Conclude the study and suggest future work.

II. SYSTEM MODEL A. TRANSFER LEARNING
The BreakHis dataset is relatively small to train a network from scratch and achieve high accuracy results. A way to alleviate this problem is using transfer learning and fine-tuning a pre-trained CNN [66]. A pre-trained CNN is trained on an extensive dataset of various domains. The use of pre-trained networks is widespread nowadays in computer vision tasks [67]. Both ResNet and DenseNet are being used in this research.
When training a plain network with a standard optimization algorithm, as the number of layers increases, the training error decreases in the beginning, but then it increases. In other words, the training error is getting worse when picking too deep networks. If we want a highly accurate model, we have to have a deep neural network. A deep model can extract features better than shallow models by using the intermediate hidden layers [68].
The ResNet50 network is made of residual blocks. In traditional (plain) neural networks, every layer is connected to the next layer. In a network with the residual building blocks, every block is connected to the next layer, but it is also directly connected into the layer, which is 2-3 layers deeper, as a shortcut connection. Figure 4 represents a residual block. Input x is passed through a few convolutional layers (function f ), and the result will become f (x). Although a traditional CNN has the same logic, the ResNet then adds the original input x to the result (f (x)), and becomes f (x) + x. This addition is element-wise addition (⊕). The information in x can follow a shortcut to go much deeper into the neural network. So using residual block allows us to train a much deeper network. When CNN goes deeper, the path for information from the input layer to the output becomes larger. This also happens for gradient descent in the opposite direction, and it vanishes before reaching the other side of the network. A ResNet network can be made by taking many residual blocks and stacking them together to form a network. By taking the intermediate activations and letting them go deeper into the neural network, it helps with vanishing gradient problems and allows us to train a much deeper neural network without loss of performance [69], [70].
Resnet50 has five stages, and each of these stages has a residual block plus a convolutional block. Each residual block also has three convolutional layers, and each convolutional block also has three convolutional layers. ResNet50 has approximately 23 million parameters for training.
Having described ResNet, in DenseNet, each layer has additional inputs from all preceding layers (each layer gets information directly from all previous layers). This makes the neural network thin and compact(fewer channels). We have a copy of all the previous layers in the current layer (previous layers are concatenated into the current layer). DenseNet simplifies the connectivity pattern among layers in ResNet to ensure maximum information flow and needs fewer parameters than other CNNs, and there is no place for redundant feature maps. However, its layers are narrow, and they just add a few feature maps. DenseNet has a feature layer(convolutional layer), multiple dense blocks (the concatenated layers), and a few transition layers among dense blocks [71].
DenseNet achieves similar accuracy as ResNet with less than half the amount of parameters [72]. DenseNet121 version has been chosen among other DenseNet versions, because it has around 1 million parameters which are less than the number of parameters in DenseNet196, DenseNet201, DenseNet264.
In order to transfer learn and fine-tune, we make a new layer to replace it with the fully connected layer (FC-1000) in the resNet50 and DensNet121 architectures. The new top layer has a fully connected layer (FC-8). Then we transfer the weights and fine-tune the model with a new top layer. It is done by training and backpropagation on the resnet50 and DenseNet121 with the patch-balanced dataset. Figure 5 presents our proposed model's architecture for DenseNet121. The ResNet50 is also utilized in the same structure.
Apart from DensNet and ResNet, we utilized some other pre-trained networks to compare the results. Those networks are ResNet101, VGG19, AlexNet, and SqueezeNet.

III. DATASET PREPARATION A. DATASET PARTITIONING
Following the fact that the BreakHis dataset is not substantially large, the partitioning process is vital to fetch the most out of the proposed model. We have divided the dataset into train, validation, and test set with 6011, 1492, and 406 images, respectively. This ratio has been chosen to increase training efficiency as much as possible. Figure 2 shows BreakHis dataset partitioning in detail.

B. DATA AUGMENTATION
In order to increase the diversity in the BreakHis dataset and boost the CAD systems' performance, a data augmentation method has been implemented. Every image in the training set is first resized to 224 × 224 pixels. Then some of the images horizontally flipped randomly. We also use the color jitter for images. It changes the tone of the original color based on Hue, Saturation, and Value (HSV). Some of the training set images are also randomly rotated and cropped. After these steps, the image is transformed into tensors (matrices of numbers) and get normalized. Figure 7 shows a batch of augmented data. As for the validation set, all images are just getting normalized without any flipping, cropping, or rotation. The test set is given to the trained model just after resizing the image to the models required input size without any change or edition (raw image).

IV. IMPLEMENTATION A. HARDWARE AND SOFTWARE
The proposed model has been implemented, and the test results were produced with a desktop with AMD Ryzen Threadripper 1950X 16-Core Processor 3.40 GHz, 128 GB Ram, and NVIDIA 1080ti GPU has used.
As for software, we have used PyTorch in the Jupyter Notebook of Anaconda environment. The implementation code of the proposed model is available in the GitHub Repository.

B. HYPERPARAMETERS OPTIMIZATION AND SETTINGS
We define and tuned a set of hyperparameters and settings for our specific task. Table 3 shows these optimized hyperparameters and settings. Setting-3 and setting-4 are the best ones.

C. EVALUATION METRICS
In order to measure the accuracy of breast cancer CAD systems based on the chosen taxonomies, there are some metrics for fair comparison among different CAD systems.

1) IMAGE-LEVEL ACCURACY (ILA)
The total number of correctly classified images divided by the total number of images provides image-level accuracy, which is considered as the main metric for our study. Equation 1 shows the way to calculate ILA.
The fraction of relevant samples among the retrieved samples. Equation 2 represents the formula for calculation of precision.

3) RECALL (Sensitivity)
The fraction of the total amount of relevant samples which were actually retrieved. Equation 3 shows the formula for the calculation of recall.

4) F1 SCORE
As a measurement of test accuracy, the F1 score is the harmonic mean of precision and recall. Equation 4 shows the formula for the calculation of the F1 score.

V. RESULTS
We have done a set of various experiments with the most promising set of hyperparameters shown in table 3 in both binary and multiclass classification. The models have been tested with different settings to achieve the highest accuracy.

A. BINARY CLASSIFICATION
We have conducted different models with optimized hyperparameters to predict whether the sample tissues are benign or malignant. We first tested our different models in a magnification-dependent way on 40×, 100×, 200×, and 400× set of images separately.

MAGNIFICATION DEPENDENT-100X
The experiments for 40× images have been repeated for 100× magnification level images with optimized CNNs, hyperparameters, and data augmentations. Table 5 represents the results and settings. Model-4 has obtained 100% accuracy with setting 3. The result of the model-3 and model-4 also improving state-of-the-art results in beast cancer CAD systems. The code for these experiments is available in GitHub CAD-100X-Binary Repository.

MAGNIFICATION DEPENDENT-200X
The proposed models have experimented with the same approach as 40× and 100× experiments. The results are slightly lower than 40× and 100×. Model-4 achieved an VOLUME 8, 2020    The proposed CNNs are trained and tested with all of the BreakHis dataset images regardless of their magnifications. Table 8 represents the models, settings, and accuracy of the  results. The model-3 achieved an accuracy of 99.26%, which improves state-of-the-art results. We also have used more pre-trained models in this section to compare more results. The code for these experiments is available in GitHub CAD-Magnification-Independent-Binary Repository.

B. MULTICLASS CLASSIFICATION
The second set of experiments have been done in multiclass classification type. First, the experiment for each magnification group has been performed and, then the models have been tested in magnification independent way.
MAGNIFICATION DEPENDENT-40X Table 9 represents the settings and models that have been experimented. The highest accuracies come from model-3 and model-4 (same as binary classification). However, in multiclass classification, model-3 has achieved higher accuracy than model-4. These results are also improving state-of-the-art results in CAD for breast cancer. The code for these experiments is available in GitHub CAD-40X-Multiclass Repository.   The proposed models have been experimented in a magnification independent way, and the results are shown in table 13. The code for these experiments is available in GitHub CAD-Magnification-Independent-Multiclass Repository.

C. EVALUATION OF RESULTS
The results of the experiments are promising. After a closer look at the misclassified images, it is revealed that they are more and less the same in most of the models, meaning the model is working quite well, and the dataset is not broad enough to increase the variety of learning. The confusion matrix is provided for the top two models in the magnification independent multiclass classification category. It shows the Lobular Carcinoma (LC) (malignant) is the most difficult tissue for classification. This difficulty comes from its very complicated cell structure in comparison to other malignant tissues [73]. Although the cell structure is more tricky in LC, there are only 626 LC images available in the dataset, which is extremely low for making a robust classifier for this category. Interestingly, most of the mislabeled images are predicted as Ductal Carcinoma (DC). This comes from a VOLUME 8, 2020    large number of images for ductal carcinoma in the BreakHis dataset and the similarity of malignant sample tissues. Our proposed models are solving the difficulty of classification in Fibro Adenoma (FA) and Mucous Carcinoma (MC) classes, which study [63] has mentioned. The misclassified images are mostly benign samples that are predicted as malignant.
Although this is an anomaly, there are meager cases of false-negative prediction (malignant samples that are predicted as benign by the proposed models), which is the worst condition for a CAD system.
The evaluation of all the experiments is showing the Model-4 with setting-3 is the best model for binary classification. In the multiclass classification, the Model-3 with setting-3 has outperformed other models. The models are designed to be flexible, meaning it is possible to combine both model-3 and model-4 as a bigger model to boost our CAD system's performance. Figure 9 visualizes the average performance of our pre-trained models.
As can be seen in the figure 9, the Densnet121 based models (model-4 and model-6) work slightly better than    Considering state-of-the-art results, which were achieved on BreakHis dataset and provided in table 1 and our best-achieved results (model-3 and model-4), our models are improving state-of-the-art results in both binary and multiclass scheme and magnification dependent and independent categories. We considered the same dataset as previous stateof-the-art studies, presented their methods, and improved the classification accuracies.
During the development, we implemented the bottom-up integration and testing approach. We split a CAD system into subsystems; Preprocessing, path/slides, feature extractor, transfer learning CNN, and postprocessing. Each part was individually tested with a range of configurations while other subsystems had a fixed configuration. By monitoring the changes in the outcome, we optimized each subsystem individually first and then integrated them to make a specialized CAD system for our specific problem. Finally, the subsystems were integrated to make a system, and the final system was evaluated. This systems engineering approach helped us to achieve our goal, together with Agile SCRUM methodology for software development.
Our main contributions in this research are the proposed models with optimized hyperparameters, which is a unique design for this specific problem. Previous studies have used ResNet networks but were not able to achieve high accuracy, mainly because of improper tuning and patch extraction approaches. Our tuning is unique to our model and has not being used in the past. Our study has utilized DesneNet121 CNN for the classification for the first time. However, most of the previous works have implemented ResNet, VGG, AlexNet, and CaffeNet for their model. The difficulty of working with the DenseNet and the large volume of the network may be the reason for not utilizing the DenseNet for the specific problem. Our study shows that not only is it feasible to utilize DenseNet for breast cancer histological diagnosis, but it is also possible to create high accuracy models. Our results are improving state-of-the-art results in all classifications (both binary and multiclass), which is very important when applying artificial intelligence in the medical domain.

VI. CONCLUSION
In this work, different methods and solutions for automatic detection of breast cancer in histopathology images have been investigated. The aim of this work was to develop a high accuracy method that can detect cancer at early stages, define the exact type of the samples, and improve previous works results.
We first surveyed the literature on the topic to find out previous approaches for solving the problem, together with state-of-the-art CAD systems for breast cancer recognition. Drew on the literature review, we proposed different models for automatic breast cancer diagnosis based on deep learning framework and transfer learning framework. Then we presented and analyzed our image preprocessing methods (Data augmentation, dimension reduction, etc.). Moreover, the design methodologies of deep neural networks were presented. Next, the architecture of ResNet50 and DenseNet121, which were our main deep learning models, were utilized for the transfer learning framework.
Based upon an extensive study on various deep convolutional neural network techniques, we developed a very effective transfer learning architecture that consists new fully-connected classifier and an input layer that is combined with pre-trained DenseNet121 and ResNet50 models. We introduced the dataset for our study (BreakHis), the shortcomings of the dataset, and specified our training set, validation set, and test set in detail. We then implemented our proposed framework in Python with PyTorch in the Jupyter notebook of anaconda. Following the fact that hyperparameters in CNNs are very important to the efficiency of the model, we provide a set of hyperparameters (learning rate, pooling size, learning rate scheduling, etc.). We optimized a set of hyperparameters and tested them on a fraction of the BreakHis dataset. After finding the best settings for our hyperparameters, we define nine models and experiment with those models with our best hyperparameters settings.
We achieved the accuracy of 100%, 100%, 99.02%, and 99.48% for 40×, 100×, 200×, and 400× images in magnification-dependent binary classification, respectively. As for multiclass classification, 98.43%, 98.54%, 97.53%, and 97.40% accuracies for 40×, 100×, 200×, and 400× images in the magnification-dependent category has obtained. In the magnification-independent category, we achieved an accuracy of 99.50% and 97.72% for binary and multiclass classification, respectively. All of our results in all categories and magnification dependent and independent are well improving state-of-the-art results. This promising result is another leap toward digitalization and convincing medical experts to trust CAD systems for breast cancer detection.
Although this study evaluated the proposed models on the best available dataset for breast histological images, it is still may not be broad enough research. The BreakHis dataset was made from only 82 patients, which makes the data diversity extremely limited. The dataset does not provide any information about the number of images from each specific patient to calculate Patient-Level Accuracy (PLA). The proposed model is showing promising results, but before using it in real-world examples, it has to be tested with a few more datasets to compare the results and increase the variety of data.
Moreover, the proposed models are working less accurately in 400× images. This contradicts the fact that ''if the training data contains great detail (such as 400× breast histopathology image), the CNN network performs better in predictions''. The 400× images providing higher magnification and more details, but the system is actually working worse in this group of images. The investigation of lower accuracy in the 400× image can be another future work. However, the reason can be the total number of parameters of ResNet50 and DenseNet121, which is not much enough to cover the great detail 400× images thoroughly.