Deep Convolutional and Recurrent Neural Networks for Cell Motility Discrimination and Prediction

Cells in culture display diverse motility behaviors that may reflect differences in cell state and function, providing motivation to discriminate between different motility behaviors. Current methods to do so rely upon manual feature engineering. However, the types of features necessary to distinguish between motility behaviors can vary greatly depending on the biological context, and it is not always clear which features may be most predictive in each setting for distinguishing particular cell types or disease states. Convolutional neural networks (CNNs) are machine learning models allowing for relevant features to be learned directly from spatial data. Similarly, recurrent neural networks (RNNs) are a class of models capable of learning long term temporal dependencies. Given that cell motility is inherently spacio-temporal data, we present an approach utilizing both convolutional and long- short-term memory (LSTM) recurrent neural network units to analyze cell motility data. These RNN models provide accurate classification of simulated motility and experimentally measured motility from multiple cell types, comparable to results achieved with hand-engineered features. The variety of cell motility differences we can detect suggests that the algorithm is generally applicable to additional cell types not analyzed here. RNN autoencoders based on the same architecture are capable of learning motility features in an unsupervised manner and capturing variation between myogenic cells in the latent space. Adapting these RNN models to motility prediction, RNNs are capable of predicting muscle stem cell motility from past tracking data with performance superior to standard motion prediction models. This advance in cell motility prediction may be of practical utility in cell tracking applications.


INTRODUCTION
C ELL motility is an emergent property of living matter that spans the nanomolecular and macroscopic length scales, involving a complex regulatory network and dynamic reorganization of the cell's geometry [1], [2]. Cells can display a diverse set of motility behaviors, and these behaviors can provide a usef ul window for inference of a cell's functional state. Neoplastic transformation has long been appreciated to alter cell motility behaviors, increasing the migration rate of various models in culture and serving as a mechanism for metastasis [3], [4], [5], [6], [7]. The motility behaviors of cancer cells in culture can even be predictive of broader tumor progression [8].
Likewise, the migration of progenitor cells is critical in early development and tissue regeneration [9]. Skeletal muscle stem cells (MuSCs) provide an accessible cell culture system to study stem cell motility behaviors in vitro by timelapse imaging. During embryonic development, MuSC precursors must migrate from early stage developmental structures (somites) to their adult location along the edge of muscle fibers in the trunk and limbs [10], [11]. In the adult, motility continues to play a critical role, as MuSCs migrate along muscle fibers in vivo to sites of injury to initiate tissue repair [12], [13]. Motility behaviors are heterogeneous between MuSCs and change during stem cell activation [14], [15]. Heterogeneous fitness for regeneration within the MuSC pool is well appreciated [16], and analysis of heterogeneous motility behaviors may provide an additional lens through which to decompose different MuSC phenotypes.
Given the biological importance of motility behaviors, classification of cells based on motility behaviors has useful applications in research and diagnostics. Similarly, exploration of heterogeneity within the motility behaviors of a cell population may provide biological insights. However, it is often difficult to determine which features of motility behavior will be predictive of a phenotype of interest, or allow for discrimination of heterogeneous behavior. Different phenotype classification tasks and cell populations may require distinct feature sets to extract valuable biological information. A method to algorithmically determine relevant features of cell motility for a given classification or discrimination task is therefore advantageous.

Related Work
Quantitative investigation of cell morphology has allowed for inference of drug mechanisms [17], [18] and the recovery of gene interactions [19], demonstrating the versatility of cell phenotype analysis. Existing results combining cell morphology and motility analysis have shown that the two feature sets are complementary [20], suggesting that new cell motility analysis methods may likewise complement existing morphology analysis approaches. Inspired by these results, we focus our attention here exclusively on analysis of cell motility, which has received relatively less attention.
To date, a number of tools have been proposed that rely upon a set of handcrafted features to quantify cell motility behaviors, providing some remarkable results [21], [22], [23], [24], [25]. Neural progenitor cells were discriminated by morphology and motility behavior alone [25], and genes that affect motility have been identified solely from timelapse imaging data [21]. Similarly, a heuristic cell motility feature was identified as one of the most important sources of information to predict hematopoietic cell lineage decisions [20]. We have recently demonstrated that rates of cell state transitions and the ordered or random nature of these transitions may also be inferred from motility alone [15].
These results demonstrate the potential insights that may be gathered from more extensive analysis of cell motility. However, these methods rely upon engineering of a handcrafted feature set, and have thus far focused largely on features of speed and directional persistence. It is possible that more complex features may allow for improved discrimination of cell motility behaviors, but it is difficult to predict what these features may be in each context.
Convolutional neural networks provide an approach to learn relevant features from data, rather than handcrafting features based on a "best guess" of which features are relevant. In the field of computer vision, convolutional neural networks (CNNs) have recently made rapid advancements, demonstrating state-of-the-art performance on a variety of image classification tasks [26], [27], [28], [29]. CNNs utilize a set of parameterized kernels to extract spatial features, allowing distinct feature kernels to be learned for a given classification task [30]. In this way, CNNs are able to learn a "representation" of the problem's feature space. Feature space representations may also be learned in an unsupervised manner by training CNN autoencoder architectures to encode and decode [31], [32]. This approach may be useful for learning relevant motility features where an explicit classification task is not present.
While CNNs are most commonly applied to tasks involving analysis in images with two spatial dimensions and one channel dimension at a single time-point, convolution is a natural analytical tool for any input information with spatial dimensions. CNNs have been successfully applied to a diverse set of non-imaging domains, including natural language processing [33], bird song segmentation [34], and EEG recordings [35]. Perhaps most clearly mirroring our challenge of motion classification, CNNs have performed well in the classification of video recordings [36], [37], [38], [39].
Cell motility may be represented as a time series with one temporal dimension and two spatial dimensions, where the two spatial dimensions represent the Cartesian axes of motion (i.e., x, y). For the purposes of convolutional analysis, we treat each of the two spatial dimensions as channels. In this formulation, the value of the an x; y coordinate at time t is entered as an element in a ðTime Â CoordinateÞ matrix. Multiple problem domains have shown success in applying CNNs to multi-channel time series data in this manner [40], [41], [42], [43]. Similarly, convolutional layers with one spatial dimension and one channel dimension may allow for motility behavior classification and unsupervised feature learning without a priori definition of handcrafted features.
Deep neural networks have also been extensively applied to the analysis of sequential inputs, such as natural language sentences and biological polymer sequences [33], [44], [45], [46]. While simple 1D CNNs that consider raw sequence inputs can be effective, the introduction of recurrent units such as long short-term memory (LSTM) units to learn temporal relationships within the input sequence can improve performance and effectively learn long-term dependencies [47].
In the multi-channel time series representation described above, CNN layers may function as feature extractors that require no hand-engineering. Pairing these extracted features with recurrent neural network (RNN) units may also allow for motility behavior analysis capable of learning long-term dependencies across the time series. Similar approaches combining convolutional layers and recurrent units have proven effective in the analysis of biological polymer sequences [48] and in image classification [49].
Here, we investigate whether RNNs paired with CNN feature extractors could be effectively applied to the problem of cell motility behavior classification. We develop a tool we call Lanternfish to represent motility paths as multichannel time series, classify different motility behaviors, learn motility features in an unsupervised fashion, and predict future cell motility from past behavior using RNNs. Lanternfish represents cell motility as a simple multi-channel time series, where time series values are Cartesian coordinates. We demonstrate that RNNs with convolutional layers as feature extractors are sufficient to accurately distinguish experimentally observed cell motility behaviors. Autoencoder architectures based on these models capture variation between cell types and reveal heterogeneity within cell states in the latent space. Additionally, we show that our RNN model can be adapted to predict cell motility in subsequent frames more accurately than standard methods, with potential applications in the field of cell tracking.

RNN Classification Architectures
Our baseline RNN classification architecture utilizes an LSTM layer with n ¼ 256 hidden units, followed by two fully-connected layers with 256 and 128 units respectively. Each of these fully-connected layers is paired with a rectified linear unit activation [26]. The final layer is a fully-connected layer with n ¼ K units where K is the number of classes paired with a softmax activation. This baseline architecture is presented in Fig. 1A.
We modify this architecture slightly to yield an RNN with convolutional feature extractors. This architecture prepends a set of four convolutional layers with one spatial dimension and one channel dimension ("1D" convolutions for brevity). These 1D convolutional layers utilize size 3 kernels with unit strides. Following these convolutional layers is a max pooling layer with filter size 2 and stride s ¼ 2. Following the max pooling layer, the second architecture is identical to the first (Fig. 1B).

RNN Autoencoder Architecture
Our RNN autoencoder architecture strongly resembles the classification network. Following the fully-connected layers in the classification architecture, the RNN autoencoder appends a 1D upsampling layer and mirror 1D convolutional layers to return the input back to the original size (Fig. 1C). Mean-squared error (MSE) against the input sequence was utilized as a loss function for training.

RNN Motility Prediction Architecture
We adapt our RNN autoencoder architecture to a prediction architecture by removing the max pooling, fully-connected, and dropout layers. Sequences are convolved by four 1D convolutional layers, as in the autoencoder, before being passed to an LSTM and convolved by four more 1D convolutional layers. The final convolutional layer uses a linear activation function rather than a ReLU. Input sequences length t in are provided in the same multi-channel time series format as our other RNN architectures, and output sequences are multi-channel time series of length t out . The number of LSTM units is adjusted to n ¼ 2t out depending on the length of desired output sequences.

Baseline Kinematic Motion Prediction
A linear kinematic model is used for baseline motility predictions. The kinematic model calculates the mean velocitỹ across the last t time steps in the preceding track and projects the object byṽ for each predicted time step. The temporal window t is optimized by parameter search. This model assumes the moving particle exhibits ballistic motion.

Saliency Analysis
The "saliency" of regions within an input track, estimating their importance for the assignment of a specific class by a classification model, was performed in the manner described previously [53]. Briefly, for a given input track x, class k, and trained classification model f that outputs the class score for k, f k , each element in x is evaluated for influence on the score f k ðxÞ. This influence is computed as the gradient of f k with respect to x, @f k @x . We present the rectified gradients as saliency maps using a ReLU.

Computational Infrastructure
Nvidia GTX 1080 (Pascal) and Titan Xp GPUs were used for all experiments.

Cell Culture
Mouse embryonic fibroblasts, muscle stem cells, and myoblasts were cultured as previously described [15]. Neoplastic MEFs were generated as described and generously donated by the authors of [54].

Timelapse Cell Imaging
Timelapse cell imaging, cell segmentation, and cell tracking was performed as described [15]. Briefly, cells were imaged for 10 hours in DIC at 6.5 minute intervals using a stagetop incubator at 37 o C and 5 percent CO 2 . Images were segmented using common heuristic techniques and tracking was performed using a modified version of uTrack [55]. Cell tracking data is available on the "Heteromotility" Github repository https://github.com/cellgeometry/heteromotility.

Motility Simulations
To determine if RNN models could discriminate between different types of motion under ideal conditions, we trained RNN classification models on simulated data from 3 distinct, biologically relevant models of motion, namely random walks, Levy flights, and fractional Brownian motion. Random walks are a type of motion with normally distributed random step sizes and directionality. Random walks are observed in freely diffusing biomolecules [56]. Levy flights similarly display random directionality, but step sizes are instead chosen from a long-tailed Levy distribution. Levy flights are observed in multiple biological systems and optimize path finding [57], [58], [59], [60]. Fractional Brownian motion models a random walk with long term dependence, similarly relevant as a representation of regulated motion in biology [61], [62]. By starting with simulated data we can optimize parameters using large sample sizes that would be difficult to obtain with living cells.
Random walks, Levy flights, and fractional Brownian motion were simulated for classification, each with a mean displacement of m ¼ 5 ðx; yÞ units per time step. Simulations were carried out for T ¼ 100 time steps and restricted to a ð2048; 2048Þ pixel plane, representing the field-of-view that might be expected using a standard 4 megapixel microscopy camera. Example simulations are presented alongside experimentally measured cell motility tracks (Fig. 2). We note that simulated motion tracks do not reflect the full complexity of real cell motility tracks, motivating us to simulate motility more akin to real cell motion in Section 3.6.

RNNs Accurately Classify Simulated Motility Behaviors
Experiments to classify multi-channel time series representations of cell motility were performed using n ¼ 15;000 samples from each simulation class. RNN classifiers were trained and evaluated using 5-fold cross validation. In each training fold, 10 percent of the data was used for validation and model selection by early stopping. The held-out test set was unseen by the model at the time of either training or model selection and was used only for final model evaluation. Early stopping was performed in all models after the testing loss failed to improve for 3 consecutive epochs [63]. Models were evaluated based on the prediction accuracy on the testing set. Models were trained using crossentropy as a loss function. Optimization was performed with Adam using a learning rate of ¼ 0:001 [64]. Weights for each of the convolutional and fully-connected layers were regularized by the l 2 norm with strength ¼ 10 À6 . The optimizer, learning rate, and regularization strength were chosen empirically, and a rigorous search for optimization hyperparameters was not performed. Using this scheme, models fit rapidly on simulated data (Fig. 3A).
Both baseline RNN models and RNN models with convolutional feature extractors achieved near perfect classification accuracy (Fig. 3B). Likewise, a Random Forest model trained on hand-engineered features successfully discriminated these simulations. This result indicates that RNNs are capable of discriminating different types of motion with no hand-engineering of features.
To investigate how the RNN models may be learning to discriminate different motion models, we visualized filters from the first convolution layer of RNNs with convolutional feature extractors (Fig. 3C). Inspection of these filters reveals that several combinations of gradients are learned across the first and second dimension of motility. Interpreting filters in later layers becomes more challenging, but this result is in line with intuitions and suggests that at least some features we may engineer by hand (metrics of displacement, etc.) are being learned by convolutional feature extractors.

RNNs Accurately Discriminate Cell Types by Motility Behavior
After validating that RNNs were sufficient to distinguish simulated classes of motion, we applied the same classification networks to distinguish different types of experimentally measured cell motility. Cell motility was tracked in three different cell types by timelapse imaging for 10 hours, followed by segmentation and tracking by standard methods. Mouse embryonic fibroblasts (MEFs) are commonly used for in vitro cell culture assays, and neoplastic transformation of these cells has been demonstrated to alter their motility behaviors [15]. We tracked both wild-type and neoplastic (c-Myc overexpression, HRas-V12) MEFs to compare their motility behaviors. These two groups represent two "cell states" of the same cell type.
Muscle stem cells (MuSCs) are the obligate stem cell of the skeletal muscle, and their motility is known to be effected by their activation state [14]. Activated MuSCs commit to become myoblasts, a proliferating myogenic progenitor cell. We tracked both MuSCs and myoblasts to compare motility between these two myogenic cell types (see Methods for culture details).
We treat the cell population of origin for each track as a class label, such that we have four distinct classes: wild-type MEFs, neoplastic MEFs, MuSCs, and myoblasts. In the following experiments, we train classification models to discriminate these class labels on a pairwise basis, as well as together.
To determine if RNNs could distinguish cell types based on experimentally measured motility, we trained RNN classifiers to discriminate between MEFs and MuSCs. Models were trained for 1000 epochs using early stopping with a 15 epoch patience period. Training was performed on a total of n ¼ 562 MuSC and n ¼ 562 MEF tracks (both wild-type and neoplastic) using 5-fold cross validation. Equal numbers of MuSCs and MEFs were used to ensure class balancing. In each training fold, 10 percent of the data was reserved for model selecting during the training process. We report evaluation accuracies from the held-out test sets, which are used neither in model training nor selection.
In other domains, transfer learning between models trained on different data sets has proven advantageous to model performance [64]. It is possible that pre-training an RNN classification model on the simulated motility tracks described above, where training data is plentiful, will yield performance improvements when classifying experimentally measured cell motility, where data is expensive. To assess this possibility, we performed the classification experiments above both with random initializations (de novo training) and with weights transferred from a corresponding model trained to classify simulations (Simulation pretraining). Simulated pretraining did not appear to improve the performance on any tasks we evaluate, possibly due to discrepancies between simulated motion and real cell motility.
Mean accuracy on the held-out test set for this cell type classification task was 85:31% AE 0:62% for baseline RNN models and 94:84% AE 0:80% for RNN models with convolutional feature extractors when trained de novo, compared to a heuristic RF baseline model at 94:95% AE 0:39% (Fig. 4A). These results indicate that even with a small data set such as this, RNN models can be effectively trained to discriminate different types of cell motility with performance similar to hand-engineered features (Fig. 4A). Moreover, RNN models with convolutional feature extractors improve significantly upon the RNN baseline (p < 0:01, t-test), suggesting that convolutional layers allow the models to capture features that are missed by LSTMs alone.
To determine if RNN models could discriminate between the three distinct cell types we observe, we trained classifiers to discriminate MuSCs, myoblasts, and MEFs (both wild-type and neoplastic) in a three-way classification task. Our baseline RNN models achieved 58:99% AE 1:79% accuracy on this task, while RNN models with convolutional layers achieved 75:25% AE 1:45% accuracy. Both RNN approaches failed to match the heuristic baseline performance at 79:94% AE 1:10% accuracy (Fig. 4A).

RNNs Provide Discriminative Power between Stem Cell Activation States
To determine if RNNs can distinguish between more nuanced differences in cell state, RNN classifiers were trained to discriminate between myogenic activation states (MuSCs and myoblasts). Training was performed on n ¼ 334 MuSCs and myoblasts per class using 5-fold cross-validation. As with the cell type classification experiments, 10 percent of each training fold was used model selection by early stopping. We report evaluation metrics from the held-out test set. Mean test accuracy reached 85:47% AE 1:96% for RNN baseline models, 93:11% AE 0:65% for RNN models with convolutional layers, and 95:31% AE 0:32 for our heuristic baseline (Fig. 4A). As with cell type classification, the increase in performance provided by the addition of convolutional layers to the RNN baseline model is significant (p < 0:05, t-test). These results demonstrate that RNN models can discriminate between stem cell activation states based on motility alone, even with small data sets. The RNN models perform comparably to hand-engineered features paired with a Random Forest classifier on this task.
RNN classifiers were also trained in the same manner to discriminate between wild-type and neoplastic MEFs with transfer learning from either the simulated motion classifier or the cell type (MEF versus MuSC) classifier trained above. Training was performed on n ¼ 250 samples per class using 5-fold cross-validation. Validation data for model selection was reserved from each training fold as before. RNN models failed to achieve testing accuracy greater than 66.2 percent on this task (Fig. 4A). Pretraining using the cell type classification task proved to be of minimal benefit, with accuracy at 65:20% AE 2:55% for models with no pretraining and 66:20% AE 1:62% when using the cell type pretraining. The baseline heuristic model performed at 72:04% AE 1:85% accuracy, significantly outperforming the RNN models.
The more nuanced phenotypic difference between wildtype and neoplastic MEFs may be an inherently more challenging classification problem. The small available sample size likely compounds this difficulty and exacerbates the RNN classifiers' poor performance. This experiment highlights the fact that hand-engineered features may provide superior performance on some tasks.
To develop an understanding of how the RNN models were discriminating MuSC and myoblast motility, we performed saliency analysis to visualize the relevant aspects of our input tracks for classification [53]. The basic principle of saliency analysis is to identify the regions of a given input that have the most influence on whether or not the input is considered part of a particular class. This influence of input regions on a particular classification decision is estimated by the magnitude of the gradient on the input with respect to the class score (see Methods). Here, we find the temporal regions of cell motility tracks that provide the highest signal for the network to classify that input as either a MuSC or a myoblast (Figs. 4C and 4D). Qualitatively, it appears that large displacements in myoblast tracks provide strong gradients for the myoblast class, while tight turns and reversals of direction in MuSC tracks provide strong gradients for that class.
Saliency analysis of this form offers a unique window for interpreting motility classification models. While interpretability techniques are available for hand-engineered features, it is often difficult to dissect nuanced interactions between features that may be causal for classification decisions for a given sample. Saliency analysis offers the ability to highlight input regions, rather than pre-computed features, which lead to a classification result. These highlighted regions of input may be easier to interpret than a set of most influential features in some instances.

Cell Mimetic Pretraining
Transfer learning from RNN classifiers trained on random walk, Levy flight, and fractional Brownian motion simulations failed to improve classification performance on experimentally measured cell motility tasks. We reasoned that this may be due to the drastic differences between the behavior of these simulations and real cells. To remedy this, we attempted to generate simulated data that more accurately reflected real cell motility to enhance pre-training efficacy. For a set of real cell motility data, we measure the displacements and turning behavior of each cell. Displacements are measured simply as the Euclidean distance between each set of sequential timepoints. The turning direction at a point t i is determined as the angle between the vectors that connect points t iÀ1 to t i and t i to t iþ1 .
Cells are decomposed into a set of k clusters by k-means clustering on a set of parameters measured from these displacement and turn angle distributions. The number of clusters k ¼ 5 was chosen empirically to capture the diversity of the cell phenotypes while still leaving non-trivial numbers of cells in each cluster. For each cluster, a bounded Johnson distribution is fit to the aggregate distribution of displacements and a Gaussian mixture model is fit to the aggregate distribution of turn angles (Appendix, Fig. 8). Simulated samples are generated by randomly sampling displacement magnitudes and turn angles from the fitted Johnson distributions for T time steps. To represent a population of cells, the proportion of simulations generated from each cluster is equivalent to the cluster's prevalence in the original cell data. This approach may be conceptually likened to the bag-of-words model [66], in which k-means clustering is used to decompose features into a representative "vocabulary." By sampling from each of k clusters proportionally, we aim to capture and simulate heterogeneous phenotypes within a cell population, rather than simply reproducing a single averaged phenotype that may not be representative of any true cell phenotype.
We generated "cell mimetic" simulations for MuSCs, myoblasts, wild-type MEFs, and neoplastic MEFs by the above method, with n ¼ 15;000 simulated samples for each. RNN classifiers for myogenic state (MuSC versus myoblast) were pretrained by classifying between the two simulated data sets. Likewise, RNN classifiers were trained to distinguish the two MEF states. Both models achieved > 99% testing accuracy. The weights from this pretrained network were used to initialize RNN classifiers for the myogenic activation and neoplastic state classification tasks outlined above.
Mean testing accuracies were effectively unchanged at 93:41% AE 0:78% for the myogenic state classification task. Accuracy slightly decreased for the MEF neoplastic state classification task to 58:8% AE 2:83%. These results suggest that pre-training on "mimetic" simulations has little effect on final classification accuracy, and that transfer learning from simulations to experimentally measured data is not advantageous for the tasks explored here. However, we leave open the possibility that similar simulation and pretraining schemes may provide benefit in other contexts not evaluated in these experiments.

Autoencoders Allow Unsupervised Learning of Representations in Motion Feature Space
Results up to this point indicate that supervised classification of different cell motility phenotypes using RNN models is effective and comparable in performance to classification with hand-engineered features. We now shift our attention to the use of RNN models for unsupervised learning tasks.
In the analysis of motility data, supervised classification data is not always available. For instance, to explore the heterogeneity of types in a given population, there is no obvious method to generate supervised classification data that may be used to learn relevant feature kernels by optimization of a standard classification loss function. This would also be an issue in the identification of heterogeneous motility behaviors in patient biopsy samples, in which the distinguishing features are not known a priori.
Identifying different types of cell motility without an explicit discrimination task represents an unsupervised learning problem. A traditional method to approach this task is to perform unsupervised clustering on a set of heuristically defined features [67]. Recent work has also developed methods to learn features from data unsupervised learning tasks. Training neural networks as autoencoders is one such method which has been used in other contexts to learn relevant feature kernels where no obvious supervised discrimination problem is present [31], [32].
Autoencoders are generally formulated using an encoder model that maps the input to a compressed number of dimensions, followed by a decoder which maps this "latent representation" back to the input dimensionality. These models are often trained by optimizing the encoder and decoder to accurately reconstruct the input after this set of transformations. This use of autoencoders has proved successful when applied to RNA-sequencing data [68], [69] for cell type and cell state identification. Similarly, the latent dimensions of an autoencoder may serve to segregate cell states from cell motility data in an unsupervised fashion. To determine if autoencoders can learn unsupervised representations of cell motility, we trained RNN autoencoders on our multi-channel time series representations of cell motility.
An RNN autoencoder was formulated by using the same initial convolutional layers and LSTM unit as in our classification architecture. Following the LSTM unit, we again use a set of three fully-connected layers paired with ReLU activations. We set the central fully-connected layer to use n ¼ 16 hidden units, corresponding to 16 latent dimensions. The number of latent dimensions was chosen empirically and does not represent the result of a rigorous hyperparameter search. Following the fully connected layers, we utilize upsampling and convolutional layers to decode the latent representation back to the input dimensions (Fig. 1B).
RNN autoencoders were trained similarly to classification models, but the regularization strength was increased to ¼ 10 À5 . Regularization strength was increased after overfitting was observed empirically. The Adadelta optimizer was used with learning rate ¼ 0:1 [70]. The alternative optimizer was chosen empirically based on the time to convergence we observed in early training experiments.
To determine if the autoencoder could learn features from simulations in an unsupervised manner, the model was trained on n ¼ 15;000 samples of each class for three types of simulated motion (random walk, Levy flight, fractional Brownian motion) using 5-fold cross validation. Mean squared error was used as a loss function. Models consistently converged for each training fold (MSE ¼ 410 AE 11, pixel units). RNN autoencoder outputs consistently failed to capture the full extent of a the input motility track, seeming instead to capture a vague notion of the motility location and extent (Fig. 5A).
Do the latent dimensions capture information about the inputs, even if the reconstructions are poor? To answer this question, we utilized the output of the autoencoders' central layer (the encoded representation) as features to classify the input classes for each simulation. A Random Forest classifier was trained to distinguish the simulation classes from these features. Random Forests trained on autoencoder features achieved 56:36% AE 0:59% accuracy on this 3 class problem. This indicates that the features learned by these autoencoders contain some information about the three simulated classes, but are less useful for classification that the heuristic features we define as a baseline (Fig. 5C).
The encoded representation of an autoencoder may also be interpreted as a latent space where clusters within the data may be identified. Visualizing the latent space of our autoencoder trained on simulated tracks using t-SNE [71] reveals that despite the poor quality of reconstructions, the latent dimensions largely segregate fractional Brownian motion (fBm) simulations from random walks (RW) and Levy flights (PF), though the latter two are intermixed (Fig. 5B). It is possible that a similar examination of the latent space in an autoencoder trained on experimentally observed cell tracks will reveal structure within the data. To determine if our RNN autoencoder models were capable of learning latent spaces that segregate cell types and cell states, we trained an autoencoder on MuSC n ¼ 1407 and myoblast n ¼ 334 tracks (length T ¼ 60) using 5-fold cross-validation. The models consistently converged on each training fold (MSE ¼ 8574 AE 830, pixel units). Training a Random Forest classifier to discriminate MuSCs and myoblasts based on the 16 latent dimensions of this model, as for the simulated motility autoencoder above, yields a prediction accuracy 88:69% AE 0:57%. Again, this performance is inferior to that achieved using hand-engineered features, but does demonstrate that the latent dimensions capture variation between myogenic cell states (Fig. 3C).
Examining a t-SNE visualization of the latent space, it is evident that the latent dimensions capture variation between MuSCs and myoblasts, segregating myoblasts into one region of the latent space (Fig. 5D). Interestingly, the latent space does not separate the two cell types discretely, but rather presents a continuum between the MuSC and myoblast phenotypes, such that some MuSCs are similar to myoblasts, while others are distinct. This reflects the underlying biology, in which MuSCs transition over time from a quiescent state into the activated myoblast state [72], [73].
We apply Louvain community detection [74], as commonly employed in the field of single cell RNA-sequencing [75], to identify subpopulations within the latent space. Community detection identifies five subpopulations, capturing the majority of myoblasts in two of the five clusters (Fig. 5E). It is interesting that myoblasts are segregated into two non-overlapping clusters, suggesting two discrete states of myoblast motility. MuSC captured within the myoblast dominated clusters (2 and 3) may be more activated than counterparts in cluster 1, while cluster 4 may represent an intermediary state. We compare this cluster partition derived from autoencoder latent features to a baseline k-means clustering performed on heuristic motility features (See Appendix, Fig. 7). This heuristic feature baseline likewise separates the two myogenic activation states (MuSC, myoblast), but does not reveal substructure within the myoblast state like the learned features do.
Cluster identification within these autoencoder latent spaces is a promising approach to reveal substructure within cell populations. To determine if our autoencoder model was learning some heuristic features, we visualized the mean speed of each track in the latent space (Fig. 5F). It is readily apparent that the model learns to discriminate cells with different mean speeds, indicating that the model is capable of capturing some of the features we may have hand-engineered in a completely unsupervised manner. Combining motility and morphology analysis to further characterize these heterogeneous clusters is an interesting direction for future research.
Collectively, these results demonstrate that RNN autoencoders are capable of distinguishing cell types and revealing heterogeneous substructure from cell motility observations in an entirely unsupervised manner. The continuum of cell states between MuSCs and myoblasts suggested by the autoencoder latent space is consistent with known biology and suggestive of trajectories cells traverse through behavioral space. Application of these methods to other cell types may similarly suggest trajectories of cell state transition.

RNNs Predict Muscle Stem Cell Motility
Tracking individual cells in timelapse microscopy experiments is a difficult multi-object tracking problem [76]. Popular tracking methods utilize a motion model to predict cell motility in advance of the next frame to improve tracking performance [77]. This motion prediction is especially useful in the event of "missed detections," where a cell is not detected or segmented for a given set of frames but is detected later on. The most common motion models employed are based on linear kinematics, with Kalman filters serving as a popular choice [55]. Linear kinematics assume that particles possess an inertia and tendency to continue moving in the same direction as previously observed. These assumptions are equivalent to assuming objects exhibit ballistic motion.
However, cell motion does not adhere to ballistic assumptions in all cell types, with myogenic cells being an excellent example of such a system. In myogenic cells, a motile cell often makes a few rapid displacements punctuated by stopping periods or direction changes which violate intuitions about the inertia of moving objects. A motion model specifically tailored to the cell type of interest may therefore be useful to improve tracking performance, but such specific tailoring would require a prior knowledge of the very motion features that the live cell experiment is designed to analyze. Some way to tailor prediction models on the fly could help solve this problem.
Recurrent neural networks have been effectively utilized for sequence prediction in multiple fields [78], [79], [80], [81]. We adapted the convolutional RNN autoencoder model to a sequence prediction model by removing the pooling layers and fully-connected layers and altering the number of nodes in the central LSTM layer (Fig. 6C, see Methods). As a prediction task, we trained the RNN prediction model on t in ¼ 20 time steps of motion and predicted t out ¼ 10 time steps into the future. As a data set, we split MuSC motion paths into subpaths of length t total ¼ 30 for a total of n ¼ 8;676 paths.
A testing set of 10 percent of all tracks was held out for final model evaluation, and the remaining tracks were split with 80 percent used for training and 20 percent used for validation/model selection. Mean squared error between the predicted path and the ground truth path was used as the loss function and the Adam optimizer was used with learning rate ¼ 0:001. No regularization was used when training motility prediction models.
As a baseline for comparison, we performed a simple kinematic prediction of MuSC paths that assumes persistence of the velocity from preceding time points. The velocity for prediction was obtained by averaging instantaneous velocity for t ¼ 15 time points prior to the track terminus, where t was optimized by parameter search. This baseline model leads to an average mean squared error (MSE) in pixel units of 220 indicating that the RNN model is a superior motion predictor in the MuSC context (Fig. 6B). Representative track endings (length t out ¼ 10) produced by the RNN prediction model are displayed alongside the preceding track beginnings (length t in ¼ 20) and the ground truth track endings (Fig. 6A).
In most cases the motion prediction reasonably approximates the cell's ground truth direction, but does not closely mirror the exact path (Fig. 6A, inset i and ii). In some events, the RNN model fails to predict even the correct direction of motion in cases where the linear model performs well (Fig. 6A, inset iii). We performed the same experiment with mimetic myoblast simulations using n ¼ 10 5 total samples, holding out n ¼ 5000 samples for testing. Similar to the MuSC results, RNN motion predictors achieved a markedly lower MSE of 1194:99 AE 15:34, relative to the baseline kinematic model MSE of 9797:78 AE 38:76 (t-test p < 0:001).
These results indicate that convolutional RNN models effectively model some cases where linear models fail, and likewise fail in some cases where traditional linear models perform well. Comparing performance on average, RNN motion models have lower overall error compared to linear models, but the existence of clear failure cases highlights the need for further research prior to widespread application.
With future improvements, RNN motility prediction models may offer a way to fit a uniquely tailored motion model to specific cell biology contexts. Cell-context specific RNN motility predictors may be useful to improve multicell tracking performance, which remains a difficult problem in the field. Addressing challenges with cell tracking is a problem well suited for the type of learned modeling approach we present, as different cell types can exhibit highly different motility behaviors, making the a priori design of a universal motion prediction model difficult. Of note, recent work has suggested that cell motility prediction from morphology features is highly accurate in some contexts [82]. This suggests that the use of morphology features, in addition to motility information, may allow for more robust cell motility predictions.

CONCLUSION
Deep neural networks enable representation learning, or learning of features relevant for the description of a feature space. By representing cell motility as a multi-channel time series, we show that RNNs with convolutional feature extractors may be applied as an effective analytical tool. Our results demonstrate that these models are capable of discriminating between simulated models of motion and multiple types of experimentally measured cell motility behaviors, though these models are still inferior to handengineered features paired with random forest classifiers. In our experimentally measured cell motility data, we find that RNN models effectively discriminate between different cell types, and different states of myogenic progenitor activation. We also find that RNN autoencoders can learn latent spaces that distinguish between cell states and suggest cell state transition trajectories in an unsupervised fashion.
Adapting the convolutional RNN autoencoder for motility prediction, we find that the RNN model reduces prediction error relative to a standard linear model when predicting MuSC motility. Such prediction models may be useful for cell tracking. While we apply the methods described here to cell biology, there is no conceptual limitation that prevents application to other fields where discrimination based on motion recordings is desired. In the field of cell biology, analysis of motility with deep neural networks may allow for useful insights to be gathered in contexts where relevant features are non-obvious or laborious to construct.  . Displacement and turn angle distribution modeling for mimetic motility simulations. We fit cell motility displacement distributions using Johnson distributions due to their flexibility and tolerance for asymmetry. For turn angle distributions, we fit Gaussian Mixture Models with k ¼ 5 modes to tri-modal turn angle distributions. Here, we present distribution fits for one cluster of myoblasts (clusters derived by k-means as part of the mimetic simulation procedure, see Methods) and one cluster of wildtype MEFs. Fig. 7. Unsupervised clustering of myogenic cells using heuristic features. Applying k-means clustering (k ¼ 2) to heuristically derived features from all myogenic cells in our dataset identifies the different myogenic activation states (MuSC, myoblast), but finds minimal substructure within the myoblast population. Here, we project the heuristic feature space to 2-dimensions using t-SNE for visualization and indicate the cell state identity (left) and cluster identity (right) of each cell using color overlays. Jacob C. Kimmel received the PhD degree in developmental & stem cell biology from the University of California, San Francisco. He is a data scientist at Calico Life Sciences, where he uses quantitative biology methods to study the biology of aging. His research focuses on the application of machine learning methods to infer cellular phenotypes from cell behaviors and quantification of cell state dynamics.
Andrew S. Brack has been trained as a molecular and developmental biologist. In his PhD work, working under Prof. Malcolm Irving (FRS), he used a combination of biophysical and molecular biology approaches to understand the structural mechanisms that control skeletal muscle contraction. Following his PhD, he performed postdoctoral research at Kings College London with Dr. Simon Hughes and then at Stanford University with Dr. Tom Rando, where he studied skeletal muscle stem cell regulation. He is currently an associate professor of orthopedic surgery at UCSF, where his research focuses on understanding the cellular and molecular mechanism that control skeletal muscle regeneration.
Wallace F. Marshall received the PhD degree in biochemistry from the University of California, San Francisco, followed by postdoctoral research in cell biology at Yale University. Before he did undergraduate training in electrical engineering at SUNY Stony Brook. He is currently a professor of biochemistry and biophysics at UCSF, where his research focuses on how cells solve engineering problems related to establishment of cellular geometry, including analysis of organelle size control systems, mechanisms for regeneration in single cells, and cellular decision making.