Cellular Actin Cytoskeleton Morphology Identification for Mechanical Characterization Using Deep Learning

Actin cytoskeleton morphology is able to affect and reflect the cellular mechanical properties. However, due to the lack of efficient approaches to quantifying actin cytoskeleton and mechanical properties, the study of mechanotransduction dynamics is still laborious. In this paper, a model to characterize the cellular actin cytoskeleton morphology was built using the graph to vector embedding technique together with neural network in machine learning (ML). The proposed ML model consists of a skip-gram model followed by a fully connected classifier. The images of NIH/3T3 cells treated with Latrunculin B at different concentrations were taken as the inputs, and the outputs were the actin cytoskeleton morphology labels defined by treatment concentrations (i.e., the actin depolymerization level). The proposed model was also compared to a general convolutional neural network (CNN) and three commonly used transfer learning models (GoogleNet, Xception, and VGG16), the results demonstrated the capabilities of the proposed model in extracting actin cytoskeleton features, avoiding overfitting, and keeping the model generalization.

challenging, let alone investigating the correlation between 30 the actin cytoskeleton morphology and cellular biomechani-31 cal properties. 32 In earlier research, Karlon et al. proposed an automated 33 method using intensity gradient in images of endothelial cells 34 to quantify the cells' orientation distribution and cytoskeletal 35 filament organization [6]. Even this method provides rapid 36 and accurate quantification results compared to pure man-37 ual measurements, it still needs great human effort. More 38 effort was then devoted to the related studies. For plant cells, 39 Higaki et al. developed an image analysis framework to quan-40 tify the cytoskeleton orientation, bundling, and density using 41 the measurement of fluorescence microscopic images [7]. 42 Kimori et al. applied a mathematic method to quantify the 43 morphology of biological structures of the actin cytoskele-44 ton in the plant cells [8]. For animal cells, Alizadeh et al. 45 proposed a way to distinguish the high metastatic and low 46 metastatic osteosarcoma cancer cell with high accuracy using 47 Zernike moments and geometric parameters as a measure 48 ever, these methods either focus on external cell morphology 84 and cell cycling phase [12], [13], [14], [15] or require a high-85 throughput cell imaging platform (htCIP) that provides access 86 to extracting high-content cellular and nuclear morphology 87 of individual cells [17], thus cannot be applied to quantify depolymerization level). In the ML model, the embedding 105 tool outputs the embedded vectors of the cytoskeleton graphs, 106 and then the embedded vectors are used by the fully con-107 nected layer to perform cytoskeleton classification. The test 108 loss of 0.6313 and accuracy of 80.49% were obtained at 109 the end showing that both structure and quantity features 110 were extracted efficiently from the actin cytoskeleton images. 111 Compared to the result presented in the previous work [18], 112 more sample images were used to reduce the effect of over-113 fitting and to improve the model reliability. To validate 114 the model, the proposed model was compared to a general 115 convolutional neural network (CNN) and three commonly 116 used transfer learning models (GoogleNet, Xception, and 117 VGG16). In a number of studies, the CNN and transfer 118 learning models showed their ability to develop an internal 119 representation of a two-dimensional image. The comparison 120 results demonstrated the advanced capabilities of the pro-121 posed model in extracting actin cytoskeleton features, avoid-122 ing overfitting, and keeping the model generalization. Actin cytoskeleton of single cells was cropped from the flu-127 orescent cell images. The cells that overlapped with others 128 were discarded during the data preparation. To quantify the 129 brightness of the actin cytoskeleton, the cropped RGB images 130 were converted to grayscale with the range of brightness for 131 each pixel from 0 ∼ 255 [19]. To remove the effect of the 132 background color, the brightness of pixels outside the cropped 133 area was mandatorily set as 0. To minimize the noise effect, 134 the pixels with brightness lower than the average brightness 135 were set as 0 [20]. Since the graph of the actin cytoskeleton 136 needs to be extracted, a Canny edge detector was applied to 137 skeletonize the actin fibers in images. For the skeletonized 138 images, each non-zero pixel (i.e., the pixel with brightness 139 larger than the average brightness) was treated as a node 140 with its brightness as the node feature. Graphs were explored 141 from one randomly selected root node through four directions 142 (i.e., up, down, left, and right) using Breadth-first search 143 and edges created between two adjacent nodes. The graph 144 G consists of the nodes and the edges that represent the 145 architecture and intensity features of actin cytoskeleton 146 (see Fig. 1).

149
A subgraph is a set of nodes that appears around the selected 150 root node n within a designed distance (or walk depth) from 151 the root node, see Fig. 2. Weisfeiler-Lehman (WL) relabeling 152 process was applied to sample and label the subgraphs [21], 153 which lays the basis for the WL kernel [21], [22]. The 154 subgraph extraction process is shown in Algorithm 1. The 155 algorithm takes a given graph G from which the subgraph 156 needs to be extracted, a randomly selected root node n, and 157 graph embedding is to train the weight matrix of the hidden 179 layer to find efficient representation for given graphs [24] 180 (see Fig. 3). Note that the input of the embedding could 181 be replaced by other encoding approaches, such as a one-182 hot hashing trick [25], without significantly affecting the 183 embedding model structure. In addition, deep learning APIs, 184 such as Keras, also provide embedding models with non-fixed 185 input lengths.

186
In this work, to embed the actin cytoskeleton graphs, the 187 skip-gram (SG) model was applied as the embedding layer, 188 which is commonly used in Word2Vec embedding and yields 189 the highest overall accuracy, and consistently produces the 190 highest accuracy on semantic relationships [26], [27], [28]. 191 for n ∈ N i do 6: The skip-gram model was trained to maximize the probability 192 of predicting subgraphs that exist in the graph that needs to 193 be embedded on the input (see Algorithm 2). In detail, the 194 SG model is to predict the subgraphs for each given graph Pr(sg is the embedding for G i , and V and 203 V refer to input and output embeddings, respectively. The 204 training processing of the SG model is thus to minimize 205 J ( ) over a graph set iteratively. In the beginning, the weight 206 matrix (i.e., embedding vector) was initialized with a 207 random value. The given graph G i ∈ G was fed as a one-208 hot vector [29] in e epochs to fit the subgraphs contained in 209 graph G i . During the embedding process, actin cytoskeleton 210 images with similar structure and quantity were embedded 211 closer than those without the similarities. specifically, cell 212 images taken under the same conditions were represented 213 using vectors having closer distance in vector space, which 214 improves the later classification accuracy.

215
Since the subgraphs are extracted from each node of each 216 graph with the initial depth d, the size of SG vocab is very large. 217 This means for each input, weight updates only make very 218 small changes to a huge amount of weights even though there 219 exists just one true example. This makes the training pro-220 cess very inefficient. Thus we used negative sampling which 221 approximates the loss from the softmax layer by updating a 222 small subset of all the weights at once -update the weights 223 of the correct label but only a small number of the incorrect 224 label [30]. This makes the network training more efficient.

226
After we obtained the embedding vector of the actin 227 cytoskeleton graph, a neural network (NN) classifier was built 228 to identify the cytoskeleton class. Linear neurons together 229 with rectified linear unit (RELU) activation function were 230 taken as the learning units. Softmax was selected as the last 231 VOLUME 10, 2022 activation function to get the probability distribution. The 232 dropout technique was applied to prevent overfitting [31]. 233 Cross entropy was taken as the loss function to update the 234 weights and hyper-parameters in the network [32]. The train-235 ing loss, which needs to be minimized, is quantified by cross-236 entropy as 238 where q(x) is the predicted probability distribution, p(x) is the 239 true probability distribution (i.e., true label).

240
To find the optimal hyper-parameters of the model which  to the previous study [34]. Therefore, the actin cytoskeleton 296 classes were defined in three classes. Class 0 represents the 297 untreated status (depolymerization = 0%), Class 1 represents 298 the mid-treated status (depolymerization = 62% relative to 299 the untreated case), and Class 2 represents completely treated 300 status (depolymerization = 100% relative to the untreated 301 case). For each image, at leaset 1 cell was cropped from it. 302 In the end, total of 7371 images were preprocessed as afore-303 mentioned. 2850, 2197, and 2324 cells for class 0, class 1, 304 and class 3, respectively, were converted to actin cytoskeleton 305 graphs.

307
For the embedding model, the target is to embed the actin 308 cytoskeleton graph into a vector with a length of 128. There-309 fore, the hidden layer of the embedding model was repre-310 sented by a weight matrix with 7371 rows (one for every 311 graph in our dataset) and 128 columns (one for every hidden 312 neuron). So the end goal of all of this is just to learn this hid-313 den layer weight matrix. The 1 x 128 graph vector then gets 314 fed to the output layer to fit its corresponding subgraphs. The 315 output layer is a softmax regression classifier. Specifically, 316 each output neuron has a weight vector that multiplies against 317 the graph vector from the hidden layer, then the output neuron 318 applies the softmax function to the results. For the classifier, the input layer had 128 neurons (equals the 321 vector dimension). Two fully connected hidden layers with 322 neuron numbers of 32 and 128, respectively, were developed, 323 ReLU was chosen as the activation for each neuron. One 324 Dropout layer was applied between these two hidden layers 325 with a dropout rate of 0.5. Since the graphs needed to be 326  Table. 2.  thus it will most likely lose the least information. The embed-368 ded vectors are shown in Fig. 4, in which the 3D distri-369 bution of embedded vectors indicates that the vectors that 370 representing the graphs extracted from actin cytoskeleton 371 indeed have their own characteristic features under different 372 treatment concentrations. The explained variance ratio of 373 reducing vector dimension from 128 to 3 is 97.96%, which 374 shows 3D-PCA caught most of the variance in the original 375 data, so it is reasonable to assume that a strong strength of 376 association lies between the original data and the 3 principal 377 components of the 3D hyperplane. Therefore, the embedding 378 layer efficiently reduced the graphs without missing their 379 structure and node features. A more significant distinction 380 could be generated in higher dimension space.

382
Grid search was applied to find the optimal hyperparameters. 383 The estimator with Adam optimizer, Glorot uniform initial-384 izer, epoch number of 70, batch size 20, and learning rate 385 at 0.05 gave us the best fitting result. Then the embedded 386 vectors were fed to the classifier to fit their corresponding 387 labels. The loss in the training process is shown in Fig. 5 (A). 388 As it can be seen, the training loss is stably decreasing from 389 1.8939 to 0.5922, and the validation loss is decreasing from 390 1.416 to 0.5033. The accuracy in the training process is shown 391 in Fig. 5 (B), the training accuracy is increasing from 46.16% 392 and 80.29%, and the validation accuracy is increasing from 393 62.64% to 85.44%. Thus, the performance of the valida-394 tion is better than that of the training process. This happens 395 because we used Dropout to perform the learning regulariza-396 tion, of which the behavior of training and validation were 397 different. During the training process, a percentage (50%) of 398 the features were set to zero in the case since the dropout 399 rate was 0.5. However, during the validation, all features were 400 VOLUME 10, 2022  accuracy evaluation is reliable. As shown in Fig. 5 (B preliminary work (85.5%) [18]. This is because that a much

424
To further analyze the model performance, the confusion 425 matrix is shown in Fig. 6, in which each entry denotes the 426 number of predictions made by the model for actin cytoskele-427 ton classification. Specifically, the entries with the same row 428 and column indices are correct predictions of each class, 429 and those with difference row and column indices denotes 430 incorrect predictions. Moreover, we calculated the precision, 431 recall, and F1-score for all classes (see Table 3).  For each class (i.e., actin cytoskeleton depolymerization 441 degree), the corresponding cell mechanical properties, such 442 as Young's modulus, could be pre-measured using Atomic 443 Force Microscope, and then saved in the database. For exam-444 ple, the Young's modulus of NIH/3T3 cells obtain subject 445 to similar actin cytoskeleton depolymerization as in this 446 work have been reported in previous study [34]. There-447 fore, the cellular elasticity of NIH/3T3 cells can be directly 448 obtained/predicted based on the actin cytoskeleton classifi-449 cation result generated by the proposed model. This implies 450 that the labor intensive cellular mechanical quantification 451 experiments can be saved by using the proposed deep learn-452 ing classification model, which could significantly improve 453 research efficiency and productivity.  Since deep neural network models may take days or even 504 weeks to train on a very large dataset, a way to short-cut 505 this process is to reuse the pre-trained models that were 506 developed for standard benchmark datasets [41], such as the 507 ILSVRC. The best-performing models could be downloaded 508 and reused directly, or integrated into a new model to solve 509 other image recognition problems. Moreover, reusing the pre-510 trained models helps us capture the inherent data distribution 511 more effectively.

512
In this study, GoogleNet, Xception, and VGG16 were used 513 to predict the image labels. We excluded the top of the net-514 work (i.e., global average pooling layer and the dense output 515 layer), and added our own global average pooling layer based 516 on the output of the base model, followed by a classifier 517 with one unit per class (i.e., three units in total). Since this 518 is a multi-classification problem, softmax was taken as the 519 dense layer's activation function. The weights of the pre-520 trained models were frozen in the training process at the 521 beginning of the training. The training and test results are 522 shown in Fig. 7 (B ∼ D). For GoogleNet, the model loss 523 decreases from 1.300 and 1.092 to 0.843 and 0.953 for train-524 ing and validation, respectively, when the epoch increases. 525 The accuracy increases from 53.4% and 59.8% to 67.8% 526 and 66.1% for training and validation, respectively. The test 527 loss is 1.233, and the test accuracy is 59.5%. For Xception, 528 the model loss decreases from 0.949 and 0.841 to 0.655 and 529 0.750 for training and validation, respectively. Meanwhile, 530 its accuracy increases from 58.1% and 63.6% to 74.0% and 531 69.3% for training and validation, respectively. The test loss 532 is 0.900, and the test accuracy is 66.8%. For VGG16, its 533 loss decreases from 0.945 and 0.680 to 0.624 and 0.632 for 534 training and validation, respectively. Its accuracy increases 535 from 62.1% and 73.4% to 76.1% and 74.9% for training and 536 validation, respectively. The test loss is 0.710, and the test 537 accuracy is 73.9%. This results clearly show that these three 538 models start overfitting quickly. As we know, the aforemen-539 tioned models were built to resolve complex image recog-540 nition tasks, therefore, dozens of convolutional layers were 541 included in the network architectures that can extract over 542 hundreds of features. Even though transfer learning shows 543 benefits in avoiding overfitting when the target data set is 544 relatively small, in many cases, the model may not always 545 solve the overall problem. Particularly, for each sample image 546 containing the actin cytoskeleton of one single cell, those 547 models were much too powerful. The over presented features 548 lead to quick overfitting during the training process. Even we 549 tried to unfreeze the weights of base models, the overfitting 550 still showed up in 4 epochs and resulted in a bad model gen-551 eralization. This observation indicates that the applied three 552 pre-trained models are not optimal in making predictions on 553 the morphology state of the cellular actin cytoskeleton when 554 compared with the proposed approach.

555
Although the superiority of the proposed deep learning 556 model has been demonstrated, it can be further improved by 557 increasing the data size during training so that more elabo-558 rate cytoskeleton morphological structures can be detected. 559 Moreover, in future work, more advanced techniques, such 560 as the elite opposition-based learning and chaotic k-best 561 gravitational search strategy (EOCS) [42] and genetic algo-562 rithm (GA)-assisted AdaBoost random forest (GA-ADA-RF) 563 model [43], could be applied to improve the model's performance in real-time response, optimization, and adaptability.