Clustering and Classification for Time Series Data in Visual Analytics: A Survey

Visual analytics for time series data has received a considerable amount of attention. Different approaches have been developed to understand the characteristics of the data and obtain meaningful statistics in order to explore the underlying processes, identify and estimate trends, make decisions and predict the future. The machine learning and visualization areas share a focus on extracting information from data. In this paper, we consider not only automatic methods but also interactive exploration. The ability to embed efficient machine learning techniques (clustering and classification) in interactive visualization systems is highly desirable in order to gain the most from both humans and computers. We present a literature review of some of the most important publications in the field and classify over 60 published papers from six different perspectives. This review intends to clarify the major concepts with which clustering or classification algorithms are used in visual analytics for time series data and provide a valuable guide for both new researchers and experts in the emerging field of integrating machine learning techniques into visual analytics.


I. INTRODUCTION AND MOTIVATION
Recent years have seen an increasing use of time-oriented data in many fields such as networks and systems, meteorology, social media, behavior analysis, trajectory data, biological science, finance, and the like, where data is measured at a regular interval of (real) time.In this research work, we focus on time series data; it is therefore important to agree on a formal definition.Time series data is defined as an ordered collection of observations or sequence of data points made through time at often uniform time intervals [1].Also, because of its diversity of sources, its complexity, and its various underlying structures, we categorize time series data, used in our surveyed papers, into four categories based on their structure: univariate, multivariate, tensor fields and multifield.
Machine learning gives computers the ability to learn without explicit programming [2].Alpaydin [3] gives a concise description of machine learning, which is ''optimizing a performance criterion using example data and past experience''.
The associate editor coordinating the review of this manuscript and approving it for publication was Alberto Cano .
Data plays a major role in machine learning where the learning algorithm is utilized to discover and learn knowledge or properties from the data (learn from experience) without depending on a predetermined equation as a model [4].In supervised learning, the dataset (the training set) is composed of pairs of input and desired output and learning aims to generate a function that maps inputs to outputs.Each example is associated with a label or target.In unsupervised learning, the dataset (the training set) is composed of unlabeled inputs without any assigned desired output and the aim is to find hidden patterns or substantial structures in data [5].There are different types of supervised and unsupervised machine learning techniques and under each approach has different algorithms taking various approaches to learning.Our focus in this work will be on classification as a supervised learning technique and clustering as an unsupervised learning technique with time series data.
Sacha et al. [6] highlight two main functions for machine learning.The first is to transform unstructured data into a form which facilitates human exploration, analysis and understanding.The second is to utilize unsupervised or semi-supervised algorithms to direct the analysis itself by recommending the best visualizations, verification, successions of steps in the exploration, etc., where the algorithm can automatically discover complex patterns from the raw data directly.This user-centric approach of interactive visualization utilizes human vision scalability for analyzing, exploring and understanding such data.It also assists data analysts in solving complex problems interactively by integrating automated data analysis and mining, such as machine learning-based methods, with interactive visualizations [7].
Machine learning algorithms provide a collection of automated analyses which can be much more efficient, accurate and objective in solving time series tasks.Machine learning also focuses on prediction [8] which has useful and widespread real-world applications.
The machine learning and visualization communities have been addressing time series issues from different perspectives.Machine learning has a strong algorithmic focus while interactive visualization has a strong human/visualization focus [9].Therefore, the essential difference between the fields, is the role of the user in data exploration and modeling.In machine learning, the goal is to dispose of the user, so everything is automated.In this case, the user can play a limited role such as selecting the type of algorithm, where their influence should be restricted to a minimum.In an interactive visualization, a completely opposite point of view is offered, where visual representations are leveraged by the user to extract knowledge from the data, discover patterns, adjust models of the data under user steering.This main difference in philosophy may explain why both communities have remained relatively disconnected [10].
Based on the above, there is a strong incentive for both communities to be synergized in order to make progress and benefit from one another [7].Combining automated analysis methods and interactive visualization has been shown to be an efficient approach for visual analytics.The visual analytics process aims to tightly couple automatic analysis methods and interactive visualization in order to gain knowledge from raw data and present a chance for analysts, through interaction tasks, to analyze, explore, reason, discover, and understand the data.

A. SURVEY SCOPE AND INTENDED AUDIENCE
Our focus will be on two important machine learning tasks, namely clustering and classification, and how they are integrated into visual analytics systems for time series data.From a broader point of view, existing works come from two different fields which can be classified into two categories: data mining approaches [1], [11]- [14] and visualization approaches [15]- [17].

1) FROM A DATA MINING PERSPECTIVE
Several surveys are available on clustering and classification for time series data.Liao [11] and Aghabozorgi et al. [12] provide an overview on clustering time series data.Xing et al. [13] present a review for time series data classification.Moreover, Yahyaoui and Al-Mutairi [14] also discuss some classification algorithms that are used with sequence data.Fu [1] provides an overall picture of the current time series data mining techniques including clustering and classification tasks.These previous works discuss in detail a wide range of clustering and classification algorithms that have been proposed and employed on time series data with a strong algorithmic focus.However, user influence is not considered in most of these works.
2) FROM THE VISUALIZATION PERSPECTIVE Aigner et al. [15] provide a complete classification scheme for time-oriented data.A large part of their work involves a structured survey of existing techniques for visualizing time-oriented data, illustrated with numerous examples.Bach et al. [16] review a range of temporal data visualization techniques and classify them from a new perspective by depicting each technique as series of operations performed on a conceptual space-time cube.However, their work does not provide much guidance for interaction design.Additionally, Ko et al. [17] present a survey that categorizes financial systems from the visual analysis perspective.Their focus is on financial data, which is one of several different kinds of time series data.In contrast, our work looks at time series data in general, primarily emphasizing clustering and classification tasks with a variety of visual analytics systems, which focus on combining machine learning algorithms and visualization techniques.

3) TOWARDS INTEGRATION AND CONVERGENCE
The idea of integration between machine learning algorithms and interactive visualization has been encouraged and promoted from both the visualization and machine learning communities.For example, several recent initiatives have been put into place to bring the two domains closer, such as the annual CD-MAKE conference and the MAKE-Journal [18], [19].The recently organized Dagstuhl Seminars titled ''Information Visualization, Visual Data Mining and Machine Learning'' (12081) [10] and ''Bridging Information Visualization with Machine Learning'' (15101) [7] are other examples of efforts to bring researchers from both domains together to discuss important challenges and corresponding solutions for integrating the two fields.
To understand this interplay between both domains, the working group in the Dagstuhl Seminar ''Bridging Information Visualization with Machine Learning'' (15101) [7] developed a framework which conceptualizes how the incorporation of interactive visualizations and machine learning algorithms can be performed.This framework was inspired by Keim et al. [20] visual analytics framework.The group attempts to identify aspects of machine learning by the user such as adjusting the parameters of models or switching between different model kinds.Montes et al. [21] present a work which is considered as one of the groundbreaking works in this trend.They combine visualization with machine learning techniques (clustering and classification) over time series data to understand the behavior of complex distributed systems.Recently, Sacha et al. [22] developed an ontology which maps out all major processes in machine learning and aims to provide visual analytics practitioners with a means to ''navigate'' the intricate landscape of machine learning, in order to uncover aspects which might be improved by introducing more machine or human capabilities.
To the best of our knowledge, there are no previous survey papers that offer a systematic review of the literature for time series clustering and classification that combine visualization techniques and machine learning algorithms for visual analytics.In this work, we specifically look at the convergence between automatic methods and interactive exploration, and how such automatic methods have been used in visual analytics systems (as shown in Fig. 1).
We provide a comprehensive and detailed survey on clustering and classification in visual analytics systems that have been applied to time series data.Although a large enough body of literature has covered the clustering and classification of time series data, their focus is either on algorithms or interactive visualization.However, the idea of integration and convergence between both domains is beneficial; for instance, clustering is one of the most popular algorithms to have been incorporated into visual analytics systems.Since visual representations are quite significant for interpreting and understanding the characteristics of clusters output by algorithms, direct adjustment of clustering algorithms is often facilitated through interactive interfaces that present new results ''on-demand'' [23].

4) CLUSTERING AND CLASSIFICATION OF TEMPORAL AND NON-TEMPORAL DATA
For time series data the presence of noise, its high dimensionality and high feature correlation pose challenges for designing effective and efficient clustering and classification algorithms compared to data without a temporal component [15], [24].
Analyzing time series data is nontrivial and can even vary over time due to complex interrelations between time series variables.Xing et al. [13] mention three major challenges for time series analysis especially in classification.First, many methods can only take input data as a vector of features.Unfortunately, there are no explicit features in sequence data.Second, feature selection is not easy because the dimensionality of the feature space can be high and computation can be costly.Third, since there are no explicit features, building an interpretable sequence classifier is burdensome in some applications.
Computing the similarity between two data objects is considered one of the main differences between clustering and classification of temporal and non-temporal data [11], [25].The unique characteristics of time series data such as noise, including outliers and shifts and the varying length of time series has made similarity measures one of the main challenges for clustering and classification of time series data [12].When dealing with time series data, the biggest challenge lies in replacing the distance/similarity measure for static data with a suitable one for time series data because it may be scaled and translated differently both on the temporal and behavioral dimensions [24], [26].In the context of visualization, classification and clustering tasks share a common goal which is data abstraction.This is for subsequent visualization, to decrease the workload when computing visual representations and to minimize the perceptual effort required to interpret them.
Keim et al. [27] present the visual analytics mantra: ''Analyse First -Show the Important -Zoom, Filter and Analyse Further -Details on Demand''.Accordingly, it is not enough to only recover and display the data using visualization techniques; rather, it is essential to analyze the data according to its value of interest, displaying the most relevant aspects of the data, and at the same time providing interaction techniques, which assist the user to gain details of the data on demand.Automatic analysis techniques are critical to the visual analytics process and are essential to incorporate in parallel with the interactive visual representation.Also, analysis techniques such as feature selection, dimensionality reduction and clustering, support gaining insight into data and support human cognition to process large volumes of data, enabling visualization to scale.Visual analytics also allows users to interact with these algorithms, in some cases, through interactive interfaces such as directing the modification of algorithms, accepting user input or switching between algorithms and display new results ''on-demand'' [20].

5) CLASSIFICATION
We classify the surveyed papers from six different perspectives, these being Time series Data Structures, Similarity Measures and Feature Extraction for Time series Data, Time series Analysis Techniques (Clustering and Classification), Visualization Analysis, and Evaluation Approaches.

6) SURVEY SCOPE
A variety of concepts and methods are involved in achieving the goal of extracting useful structures from large volumes of data including statistics, machine learning, neural networks, data visualization, pattern recognition, and high-performance computing [15].Time series analysis is dominated by traditional statistical methods such as autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) as well as machine learning techniques such as k-means and support vector machine (SVM).Machine learning methods have also shown ability for time series analysis.They also enable analysis tasks such as clustering, classification and prediction [28], [29].Recently, Neural networks have been increasingly used with sequential data such as text data analysis where the recurrent neural network (RNN) has received popularity.
Aigner et al. [15] gave a brief overview of analytical methods for time-oriented data including clustering, classification, search and retrieval, pattern discovery and prediction where visualization of temporal data can highly benefit from the analytical support.In this survey, we focus on clustering and classification.Other analytical tasks such as search and retrieval and pattern discovery are indirectly addressed by our inclusion of similarity measures, clustering and classification since these operations are the bases of pattern discovery or search.Other analysis tasks that are not in the focus in this survey, but are widely used in the context of time series analysis, include prediction which targets to deduce from data collected in the past and display how the data will develop in the future.Linear regression, recurrent neural network (RNN), and Long short-term memory (LSTM) are the most recently used techniques for this task besides the common statistical techniques such as the autoregressive moving average model (ARIMA) and box-Jenkins method.
To fulfill the scope of our survey, we have selected papers which focus on machine learning algorithms for time series clustering and classification tasks in visual analytics systems.The sixty-five publications which have been selected span a period of thirteen years.For all papers, we pay attention to time series similarity measures and feature extraction, clustering and classification algorithms, and visual analytics.We categorize the nature of time series data and evaluation techniques.Our findings on these are summarized in Table 1.Papers that focus on time series text visualization are out of our survey's scope.

B. INTENDED AUDIENCE
The intended audience of this survey are those who already have a background in visualization and possibly want to know more about machine learning tasks, in particular clustering and classification.These tasks could help them analyze, understand and visualize time series data.As a result, we do not go into detail about visualization (visualization techniques or visualization tasks and interaction methods) but focus more on machine learning tasks (clustering and classification) and how these algorithms have been adapted into visual analytics systems.

C. SEARCHED VENUES
For paper collection, we mainly used IEEE Xplore (e.g.TVCG, VAST and PacificVis), Springer (e.g.Visual Computer), ACM, Wiley (which includes Eurovis papers), Sci-enceDirect and SAGE.Using IEEE Xplore, forty-three papers were obtained mostly from IEEE Transactions on Visualization and Computer Graphics, IEEE on Visual Analytics Science and Technology (VAST), and IEEE Pacific Visualization Symposium (PacificVis).We include six papers from Springer, and six papers from ACM.The other eleven examined papers have been obtained from other digital libraries.

D. SURVEY STRUCTURE
Figure 2 shows the structure of this survey, which is derived from the main steps of the selected papers.We start with time series data structures where we provide a general classification for time series data.All data structures, as described in Section II, refer to the main definition of time series data, and this section answers questions such as how time series data structures are different, along with providing some examples of this kind of data.
In Section III, we discuss similarity measures and feature extraction which are important for time series data as, usually, the quality of analysis techniques (clustering and classification) are significantly influenced by its selection.Moreover, in this section we show how these techniques, along with clustering and classification techniques, have been adapted to gain and visualize relevant knowledge from the data.
Section IV reviews the time series analysis tasks.We provide a comprehensive explanation for popular clustering and classification algorithms that have been used in the surveyed visual analytics papers, how they are used with time series data, and how they have been adapted to interactive visualization.
Section V summarizes visualization techniques, visualization tasks and interaction methods that are used in surveyed visual analytics systems.Some of these techniques and tasks are beneficial for time series data, while others are shared when working with other kinds of data.We focus more on illustrating how these techniques and tasks are performed and adapted to assist in analyzing time series data.The evaluation approaches for the surveyed visual analytics systems are discussed in Section VI.
Our survey presents a structured review of the concept of integrating interactive visualizations and analysis techniques (clustering and classification) into the visual analytics systems for time series data.Through this, we have determined different research trends as well as some of the limitations and challenges involved in the integration and convergence of machine learning algorithms and interactive visualization.These are summarized in Section VII.

II. TIME SERIES DATA STRUCTURES
We classify time series data that has been used in our surveyed papers into four categories.This classification can be subsumed under the concepts of univariate, multivariate, tensor fields, and multifields.Hotz and Peikert [30] discuss the complex structure of scientific data and provide a clear definition of a multifield.Our four types or categories are generalized to include many related subtypes of time series data structures in order to achieve a comprehensive classification for time series data structures that can be embodied in visual analytics systems.The prevalent representatives in our surveyed papers are multivariate time series and tensor fields.

A. UNIVARIATE
The univariate time series is a sequence that contains only one data value per temporal primitive [15], [31].It is a field of a single variable captured or observed through time.Temperature in a city spanning a period of time is a clear example of this type of data structure.

B. MULTIVARIATE
Multivariate time series is a set of time series which have the same timestamps [15], [31].This kind of time series data structure is an array of variables or numbers at each point in time and can be a collection of multiple univariate captured through time, such as temperature and pressure readings, or associative multivariate, such as 3-D acceleration measured from a tri-axial accelerometer, where each component of the multivariate has the same units and sensor source.As time series data structures are an ordered collection of observations or sequence of data points made through time, most of the surveyed papers adopt this type of structure.This special type of multivariate time series data is relevant in many application fields including biology, medicine, finance and animation.Multivariate time series data have been also used in manufacturing systems and predictive maintenance [32], [33].In the surveyed visual analytics papers, time series data, e.g., obtained from gene expression measurement [34]- [37] can be used by biologists to understand the correlation between different types of genes, analyze gene interactions, and compare regulatory behaviors for interesting genes.Moreover, medical experts utilize time series data e.g., blood pressure measurements [38], to understand and deal with different cases such as monitoring illness progression, and understanding ecological and behavioral processes related to a disease which may lead to improved disease diagnoses.Furthermore, time series data, e.g., obtained from sampled transactions over a period of time [39]- [41], stock markets [42], [43], and international financial markets [44], [45] can be used in the financial field and is usually analyzed to understand and forecast the market situation.It is useful to find correlations between the data and test hypotheses about the market, which helps to make the best decisions at the appropriate time under different business and economic circumstances.A multivariate can also present time series data obtained from various data sets including metadata e.g.patient records [46], [47], employment records [48], [49], and others [50], [51].

C. TENSOR FIELDS
These are an array of data arranged on a regular grid with a variable number of axes [4].They can be described as a quantity which is associated to each point in space-time as it has been extended to functions or distributions linked to points in space-time [30].Dealing with spatio-temporal data, this type of time series data structure is generalized to include many related subtypes: time series of graphs and networks, time series of spatial positions of moving objects, and time series of spatial configurations/distributions.

1) TIME SERIES OF GRAPH AND NETWORK
Time series data in the form of networks consist of associated attributes such as nodes and edges that reflect different kinds of behavior over time.Node or edge attributes of dynamic graphs can be introduced as time series.This kind of time series data helps understand different temporal patterns and evaluate the network dynamics in general [52]- [56].The network view helps to visualize the connectivity of the sensors, which can enhance analysis, detection and exploration.As each machine (e.g.engines or computers) typically consists of a large number of sensors that produce massive data, time series data can be obtained from the nodes of such machines over a period of time, such as CPU load, memory usage, network load, and data center chiller sensor, helping to improve the understanding of how machines are used in practice and analyze the performance and behaviors of such systems [57]- [63].Indeed, analyzing this data helps users and experts understand and evaluate the network dynamics.

2) TIME SERIES OF SPATIAL POSITIONS OF MOVING OBJECTS
Spatial positions of moving objects data with an associated time component classifies as trajectory data.It presents different places over time, providing a clear idea of spatio-temporal changes.A combination of interactive visualizations and automated analysis has together been shown to be an efficient approach in analyzing, tracking, and representing this type of data in order to understand and recognize the mobility of a diversity of moving objects, such as vehicles [64]- [71], and aircraft [66], [67], which can lead to path discovery, movement analysis, and location prediction.

3) TIME SERIES OF SPATIAL CONFIGURATIONS AND DISTRIBUTIONS
Being able to extract useful insight from time series of spatial distributions and configurations is becoming more important due of the massive growth in data science and the rapid advancement of many technologies.In our surveyed papers, we consider discovering behavioral patterns and finding interesting events that might take place in certain municipalities [72] and public or business sectors as spatial configurations and distributions.This identification of regular configurations and distributions over time is represented by a total number of events and behaviors extracted from a chosen spatial scale.Personal mobility behaviors and movement patterns [73]- [81], behaviors of animals [82], [83], pattern changes in climate (weather) and the ozone layer [81], [84]- [90], and behavior capture data made through time at often uniform time intervals [91]- [96] can be regarded as instances for this type of data structure that take a place in specific spatial identification.

D. MULTIFIELD
This kind of data, defined as a set of fields, provides enough flexibility to capture most types of compound datasets that occur in practice [30].Combining multiple modality sensors such as gyroscopes, magnetometers and accelerometers with other environmental sensors is a good example of such data structure type.

III. SIMILARITY MEASURES AND FEATURE EXTRACTION FOR TIME SERIES DATA
Large time series data requires adequate preprocessing to gain an appropriate approximation of the underlying data representation.The aim of feature extraction is to generate a higher-level abstraction which represents the data while preserving the shape characteristics of the original data during dimensionality reduction.There are several dimensionality reduction techniques specifically designed for time series which exploit the frequential content of the signal and its usual sparseness in the frequency space [97].In general terms, choosing the distance measure is important and assists in dealing with outliers, amplitude differences and time axis distortion.Furthermore, choosing important features in the data requires sufficient communication of knowledge from domain experts.Thus, the quality of mining approaches is significantly affected by the choice of similarity measures and feature extraction techniques to obtain relevant knowledge from the data.Similarity measures and feature extraction techniques used in the surveyed visual analytics papers are summarized in Table 1.

A. RAW DATA SIMILARITY
Most mining approaches often utilize the concept of similarity between a pair of time series.While dealing with time series data, efficiency and effectiveness are the main targets of representation methods and similarity measures [98].Tornai et al. [99] argue that the distance between two sequences as a measurement plays an important role in the quality of clustering and classification algorithms.The accuracy of such algorithms can be significantly impacted by the choice of similarity measures.Yahyaoui and Al-Mutairi [14] and Wang et al. [98]  Euclidean distance (ED) is a commonly used metric for time series.It is defined between two-time series X and Y having length L; therefore, the Euclidean distance, between each pair of corresponding points X and Y, is the square root of the sum of the squared differences [100].Thus, the two time series that are being compared must have the same length, and the computational cost is linear in terms of temporal sequence length [101].Along the horizontal axis, the distance between two-time series is calculated by matching the corresponding points [102].The Euclidean distance metric is very sensitive to distortion and noise [13], and it is not able to handle one of the elements being compressed or stretched [83]; therefore, this approach is not reliable, especially when computing similarity between time series with different time durations [103].
Dynamic Time Warping (DTW) is another distance measure that is proposed to overcome some Euclidean distance limitations such as non-linear distortions.In DTW, the twotime series do not have to be the same length, and the idea is to align (warp) the series before computing the distance [13].However, two temporal points with completely different local structures might be mistakenly matched by DTW.This issue can be addressed by improving the alignment algorithm, e.g.shape dynamic time warping.It considers point-wise local structural information [104].
Due to its quadratic time complexity, DTW does not scale very well when dealing with large datasets.In spite of this, it is widely used in different applications, such as in bioinformatics, finance and medicine [105].DTW has several local constraints, namely boundary, monotonicity and continuity constraints [103].Moreover, some common misunderstandings about DTW are that it is too slow to be useful and the warping window size does not matter much; Wang et al. [98] and Mueen and Keogh [106] have attempted to correct these notions.Kotas et al. [107] have reformulated the matrix of the alignment costs, which led to a major increase in the noise reduction capability.Other surveys review distance measures such as Euclidean Distance (ED) [108], Dynamic Time Warping (DTW) [109], [110], and distance based on Longest Common Subsequence (LCSS) [98], [111].
Correlation is a mathematical operation which is widely used to describe how two or more variables fluctuate together.Different types of correlation can be found by considering the level of measurement for every variable.Distance correlation can be used as a distance measure between two variables that are not necessarily of equal dimension.In time series data, it is used to detect a known waveform in random noise.Unlike DTW and LCS, correlation also offers a linear complexity frequency space implementation in signal processing [83], [112].
Cross-correlation is the correlation between two signals which shape a new signal, and its peaks can indicate the similarity between the original signals; it is used as a distance metric [12].However, cross-correlation can be carried out more efficiently in frequency domain [112].Autocorrelation occurs when the signal is correlated with itself, which is useful for finding repeating patterns [83].Walker et al. [83] demonstrate that cross-correlation is a slow operation in time series space, but it corresponds to point-wise multiplication in frequency space.It is also considered as the best distance measure to detect a known waveform in random noise.When processing the signal, the correlation has a linear complexity frequency space implementation which cannot be achieved by DTW.

B. FEATURE EXTRACTION
Feature extraction is a form of dimension reduction which helps to lower the computational cost of dealing with high-dimensional data and achieve higher accuracy of clustering and classification [116].Matching features from time series data should be extracted before applying learning algorithms to the vector of extracted features.Several feature-based techniques have been proposed to represent features with low dimensionality for time series data.Wang et al. [98]  Principal Component Analysis (PCA), as an eigenvalue method, is a technique which transforms the original time series data into low-dimensional features.As a feature extraction method, PCA is effectively applied to time series data [117]- [120].PCA [4] transforms data to a new set of variables whose elements are mutually uncorrelated, thus learning a representation of data that has lower dimensionality than the original input.PCA has been used as an effective dimensionality reduction method that eliminates the least significant information in the data and preserves the most significant.In the surveyed visual analytics papers, [41], [50], [54], [70], [84], [87], [91], [96] use PCA to reduce high-dimensional data and analyze the similarity of the time series data.PCA is a linear dimensionality reduction technique.
Multidimensional Scaling (MDS) is a very popular non-linear dimensionality reduction technique that is useful for effectively representing high-dimensional data in lower dimensional space.This technique has been used in the surveyed papers [36], [48], [54], [56], [57], [63], [78], [81], [84].MDS is a useful technique which effectively represents high-dimensional data in lower dimensional space; however, it struggles to separate k-Means clusters [84].Jeong et al. [36] use MDS to gain a better understanding of gene interactions and regulatory behaviors.Thus, two different MDS representations are considered with respect to the time series data.One representation shows local differences among genes in the same cluster group while the other shows global differences among all genes in all the clusters.It is also used to reveal the distributions of the time series data, helping to visualize the relations among time series [48].
Transforming time series data into a set of features cannot capture the sequential nature of series.k-gram is an example of a feature-based technique that aims to maintain the order of elements in series using short sequence segments of k consecutive symbols [14].k-grams [121] represent a feature vector of symbolic sequences of k-grams in time series data.Given a set of k-grams, this feature vector can represent the frequency of the k-grams (i.e.how often a k-gram appears in a sequence).It has only been mentioned in [47], [92].
Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (DWT) are rarely used in the surveyed visual analytics papers [38], [72], [82].However, these techniques are used in the data mining field and achieve good results, encouraging visual analytics researchers to adopt these techniques in future research.Discrete Fourier Transform (DFT) is one of the most common transformation methods [1].It has been used to transform original time series data into low dimensional time-frequency characteristics and index them to obtain an effective similarity search [122].DFT is used to perform dimensionality reduction and extract features into an index used for similarity searching.This technique has been continually improved and some of its limitations have been overcome [108], [123], [124].
Discrete Wavelet Transform (DWT) has also been used as a technique to transform original time series and obtain low-dimensional features that efficiently represent the original time series data [99], [125].Chan and Fu [126] use Haar Wavelet Transform for time series indexing, which shows the technique's effectiveness with regards to the decomposition and reconstruction of time series.With a large set of time series data, analysis tasks would face certain challenges in defining matching features; therefore, taking advantage of wavelet decomposition to reduce the dimensionality of data is beneficial [127].The classification task can be accurately performed utilizing the discrete wavelet transforms technique [128].
Discretization is usually needed when applying featureextraction techniques in time series data; however, its use can cause information loss [13].To address this issue, Ye et al. [129] introduce time series shapelets which can be directly applied to time series.This technique is based on comparing the subsection of shapes (shapelets) instead of comparing the whole time series sequences to measure the similarity.A binary decision maker decides whether each new sequence belongs to a class or not.The shapelet classifier has some limitations with a multi-class problem, and to overcome this issue, Ye and Keogh [129] use the shapelet classifier as a decision tree.Xing et al. [130] have shown that early classification can be efficiently achieved by extracting the local shapelets features.

IV. TIME SERIES ANALYSIS TECHNIQUES A. CLUSTERING
Clustering is widely used as an unsupervised learning method.The aim of time series clustering is to define a grouped structure of similar objects in unlabeled data based on their similar features.Consequently, data in one cluster is homogeneous, while the data in other clusters are dissimilar.Features do not provide any information about an appropriate group for its objects, they only describe each object in the dataset, assisting clustering algorithms to learn and extract useful information for their structure.Due to the unique structure of time series data (e.g.high dimensionality, noise, and high feature correlation), clustering time series differs from traditional clustering, consequently, several algorithms have been improved to deal with time series.
Most works involving the clustering of time series can be classified into three categories [12].The first category is whole time series clustering, where a set of individual time series is given, and the aim is to group similar time series into clusters with respect to their similarity.The second is subsequence clustering, which involves dividing the time series data at certain intervals using a sliding window technique to perform the clustering on the extracted subsequences of a time series [131].The third category is a clustering of time points based on a consolidation of their temporal proximity and the similarity of the corresponding values.Some points might not assign to any clusters and are deemed as noise.
Clustering algorithms embedded in visual analytics systems have received much attention from both the visual analytics and data mining communities for time series data.Unlike the classification task, this task does not require labeled data; therefore, the data is partitioned into groups of similar objects.Most of the existing works that perform time series clustering usually fall in one of the previously mentioned categories.Projection-based methods have received a lot of attention because a scatterplot is intuitive and easy to read.Scatterplots can also provide a unified embedding space for visualizing data and their similarities and show the embedded semantic content [132].Elzen et al. [54] propose a projection-based method to explore and analyze the change of dynamic networks by transforming each time-step network into a high-dimensional vector which is then projected onto a two-dimensional space using dimensionality reduction techniques.Dimensionality reduction is performed for each data window separately, which can then be sequentially visualized, obtaining the similarity across multiple time points evolving over time.Therefore, using the projection-based method can assist with clustering similar time series data so that conventional clustering algorithms can be applied to the projected data [54], [75], [84].
We provide a review of the existing time series clustering methods in the surveyed visual analytics papers, along with the research that has been conducted in the data mining community.These algorithms can be divided into five methods: partitioning methods, hierarchical methods, model-based methods, density-based methods, and grid-based methods.Table 1 summarizes the clustering algorithms used in the surveyed papers.Some papers adapted their clustering algorithms, therefore, an additional section has been introduced in Table 1 to include these clustering algorithms.

1) PARTITIONING METHODS
Partitioning methods are described as a process of partitioning unlabeled data into k groups.The k-Means (KM), k-Medoids (PAM), Fuzzy c-Means (FCM), and Fuzzy c-Medoids are the most popular algorithms for partitioning clustering.Kaufman and Rousseeuw [133] categorize these algorithms into two categories: crisp (hard) clustering methods (including: k-Means and k-Medoids) and fuzzy (soft) clustering methods (including: Fuzzy c-Means and Fuzzy c-Medoids).While in hard clustering methods, each object is assigned to only one cluster, in fuzzy clustering methods, each object is assigned to more than one cluster with a probability.In such methods, the number of clusters must be pre-assigned and most partitioning algorithms cannot tackle the problem of finding the number of clusters [133].Another issue is that they are not straightforward when dealing with time series of unequal length because of the ambiguity of measuring cluster centers [11].[134] is a simple and widely used algorithm which divides a set of data into K groups represented by their mean values.After K cluster centers (centroids) are randomly initialized, each example is assigned to the nearest cluster.It iterates until it converges to a locally optimal partition of the data.For each iteration, each example is assigned to the closest cluster center, which will be recalculated based on the mean value of all examples of that particular cluster [135].

k-Means
k-Means has been used to cluster time series data, achieving efficient clustering results due to its speed, simplicity, ease of implementation, and the possibility to assign the desired amount of clusters [43], [136].Most of the surveyed papers use commonly applied partitioning methods of clustering, especially the k-Means algorithm [34], [36], [38], [43], [52], [58], [74], [75], [77], [78], [84], [86], [87], [89], [90], [95].k-Means clustering can be performed on multivariate time series, where each time point is considered as a vector and the cluster labels are used as symbols to encode the time series [43].Zhao et al. [77], for instance, utilize the k-Means clustering algorithm to cluster visitors based on the time they spend at attractions, thus, it assists to group people in the same cluster if they have similar attraction preferences.k-Means could also be used with visualization techniques, as shown by Wu et al. [90], where it is used to determine the most appropriate and reasonable number of clusters for visualization.k-Means has also been adopted in a global radial map to divide all the stations into a number of groups, each having similar change rates [87].Li et al. [86] adopt the k-Means to generate clusters of slopes and map each cluster onto a ring in the global distribution view.In projection-based methods, k-Means is applied to the projected data [75], [84].
k-Medoids or PAM (partition around medoids) [133] is another partitioning algorithm.In this algorithm, a set of k representative samples are initially selected, then each example in the dataset is assigned to the nearest representative sample constructing partitioned clusters.Although this algorithm is like the k-Means algorithm, it is more robust and only differs in its representation.Instead of implying a mean, k-Medoids clusters are represented by the representative data sample in each cluster.This algorithm is often used alongside the DTW distance measure to cluster time series data [137].Andrienko et al. [69] use k-Medoids as a clustering algorithm, which could be better suited than k-Means as it uses medoids instead of means.However, it still has the same issues as the k-Means, where the number of subclusters must be known in advance.

b: FUZZY (SOFT) CLUSTERING METHODS
These algorithms aim to minimize an objective function that usually has numerous undesired local minima [138], allowing fuzzy partitioning instead of hard partitioning.Thus, each sample in the dataset could be assigned to more than one cluster with a membership that measures degrees of association to clusters.Even though fuzzy clustering algorithms are usually more time consuming, they provide more detailed information concerning the data structure [133].
Fuzzy c-Means [139], [140] is the most common fuzzy clustering algorithm and an extended version of k-Means.It provides both effective and significantly meaningful (fuzzy) data partition [141].This algorithm was later improved by many works [141]- [144].A dataset is divided into fuzzy groups that differentiate in representatives by minimizing the objective function (within groups) of weighted coefficients (e.g.distances between objects and cluster center), influencing the fuzziness of membership values.
Fuzzy k-Medoids [145] is another fuzzy partition algorithm which is an extended version of k-Medoids.The candidate medoids are picked (as objective functions located in the cluster centre) from the dataset to minimize all fuzzy dissimilar objects in the cluster.

2) HIERARCHICAL METHODS
Hierarchical clustering defines a tree structure for unlabeled data by aggregating data samples into a tree of clusters.It can be used for time series of equal and unequal length [11], [12].This method does not assume a value of k, unlike k-Means clustering.There are two main kinds of hierarchical clustering methods -agglomerative (bottom-up) and divisive (top-down) [12], [152].
An agglomerative algorithm (bottom-up) considers each object as a cluster, and then progressively integrates clusters.It is the more commonly used algorithm [11], [12] and is involved in many visual analytics works for time series data [85], [92].The merging process is repeated until eventually, all items are in one cluster or termination conditions are satisfied, such as the number of clusters being sufficient.The divisive algorithm (top-down) starts by grouping all objects into one cluster then divides the cluster until each object is in a separate cluster [12], [152].In their visual analytics system, Bernard et al. [91] mentioned two advantages of divisive clustering for time series data.Firstly, the hierarchical structure allows for multiple levels of detail with the same data elements in respective sub-trees.Secondly, the level of detail concept can be achieved with a single calculation.However, both algorithms predominantly suffer from an inability to perform adjustments once a combining or dividing decision has been implemented.Also, they do not have the ability to undo what has been previously done [133], [135], [153], [154].
The basic hierarchical clustering algorithm starts with assigning each vector to its own cluster.Then, it computes the distances between all clusters and saves these distances into a distance matrix.Next, it finds, through the distance matrix, the two closest clusters or objects which will produce a cluster.It updates the distance matrix and returns to the previous step until only one cluster remains [153].Hierarchical algorithms usually use a similarity or distance matrix to merge or split one cluster, and this can be visualized as a dendrogram [135].Lin et al. [155] present Symbolic Aggregate Approximation (SAX) representation and use hierarchical clustering to evaluate their work.Hierarchical clustering methods can also be divided based on the way that the similarity measure is calculated; examples include single-link clustering, average-link clustering, and complete-link clustering [135].CURE [156], BIRCH [157], and Chameleon [158] are some examples for improving the performance of hierarchical clustering algorithms.Hierarchical methods can produce multi-nested partitions that let different users select diverse partitions based on the similarity level that is required.However, it suffers from computational complexity in time and space, and using it to cluster many objects incurs a massive I/O cost.
The hierarchical method is applied to determine the order of time series data before visualizing and launching interactive exploration [39], [50].Wijk and Van Selow [159] conducted one of the first pioneer work in visual analytics systems.They use a bottom-up hierarchical clustering approach to identify common and uncommon subsequences that occur in large time series.Then, users can easily interact with the visualization which allows them to select days, find similarities, etc. Battke et al. [34] overcame the issue of hierarchical clustering speed for large time series datasets by implementing the rapid neighbor-joining algorithm [160], and then attaching the produced trees to heat-map plots, allowing interactive specialized data exploration.
The hierarchical method creates aggregations which have been visualized as dendrograms, providing multiple levels of detail and an initial overview of similar groups.Visual analytics enhances interactivity, enabling users to change the level of detail by dragging the aggregation level slider [91] or by applying multiple-height branch-cuts to manually select clusters [37].

3) MODEL BASED METHODS
A self-organizing map (SOM), a model-based method developed by Kohonen [161], is a specific type of neural network (NN) that is used for model-based clustering.As an unsupervised learning method, self-organizing neural networks rely on neurons which are coordinated in a low-dimensional (often two-dimensional) structure.Those neurons are iteratively trained by the self-organizing procedure.SOM is one of the most common neural network models and is often used for data analysis.It is also described by Kohonen as an analysis and visualization tool for high-dimensional data [162].However, SOM can also be used for other applications, such as clustering, sampling, dimensionality reduction, vector quantization, and data mining [163], [164].The most important feature of SOM is produced in the output layer by the neighborhood relationship [165].
Various extensions have been developed to enhance the SOM's scope and performance, such as adaptive subspace SOM (ASSOM) [166], [167], the parameterized SOM (PSOM) [168], visualization induced SOM (ViSOM) [169], [170], and the Self-Organizing Mixture Network (SOMN) [171].The SOM uses a collection of neurons usually arranged in a 2-D hexagonal or rectangular grid to shape a discrete topological mapping of input space.At the beginning of the training process, weights are initialized by assigning small random numbers.In this algorithm, each training iteration has three stages.First, an input is presented every time, and then the best matching cell, or winning neuron, is selected.After that, the weight of the winner and its neighbors are updated.The process is repeated until the map converges and the weights have stabilized.In the feature space, the neighboring locations are always represented in the neighboring neurons in the network because they are updated at every step.During the mapping, the topology of the data is maintained as it was in the input space [11], [172], [173].
The self-organizing map (SOM) has been used to analyze temporal data, and is utilized for pattern discovery in temporal data with visual analytics e.g.[34], [44], [45], [70], [79], [91], [114].Recurrent SOM [174] and Recursive SOM [175] have enhanced SOM for mapping time series data [172].Fu et al. [176] use self-organizing maps to gather similar temporal patterns into clusters.A continuous sliding window is used to segment data sequences from numerical time series before applying the SOM algorithm.SOM also is used in [173] to cluster time series features.In many clustering works, SOM is chosen due to its advantages with regards to certain properties such as parameters selection, data analysis, and better visualization.However, one of its main disadvantages is that it does not work perfectly with time series of unequal length, as it is difficult to define the dimension of weight vectors [11].
Due to SOM being a robust algorithm, Schreck et al. [45] use it to render trajectory prototypes and represent data samples on the SOM grid using trajectory bundle visualization.Thus, the trajectory bundles can be visualized at the location of their underlying prototype pattern on the SOM grid.It also organizes the space of movement patterns by arranging prototype trajectories on the SOM grid; this means that neighboring patterns can be compared to each other, and the different patterns smoothly transit over the map.Bernard et al. [91] also use the SOM method as a projection technique to make a similarity-preserving color legend for human poses.The grid of the color legend is the result of a SOM that is trained using all feature vectors in the manner of a vector quantization scheme.Thus, the grid structure helps to arrange the most prominent human poses.Moreover, the SOM algorithm can support visualization by representing data on the SOM grid or using the grid of color as a result of the SOM model.The algorithm has also been used in [44] to visually analyze sets of trajectory data which are trained in unsupervised mode.Start and end points of trajectories are indicated over the SOM grid by different colors.The goal of their visualization is to produce maps of user-preferred trajectory clustering.The surveyed papers have shown that link-nodes and glyphs are the most widely adopted visualization techniques with model-based clustering, e.g.[44], [45], [91].

4) GRID-BASED METHODS
One type of clustering method is the grid-based cluster [35].This method identifies a set of cells in a grid structure, providing grouped structures in unlabeled data.It is described as a process of quantizing the space into a set of cells made-up a grid.These cells are then used to perform clustering.The fast processing time distinguishes this approach from others.Instead of depending on the number of data objects, they depend on the number of cells in each grid [177].The two grid-based approaches in [177], [178] are typical examples of efficient clustering algorithms, particularly for very large datasets.
In EpiViz [35], a visual analytics tool for epigenetic features, the grid algorithm is implemented to find similar genes based on the values of their measurements and splits the scatter plot into 5 * 5 cells.Based on their measurements, one cluster of genes per measurement is created for each cell.The scatter plot shows a cluster of genes with their sizes proportional to the number of genes.Thus, it can be said that the grid algorithm, as a machine learning algorithm, assists and interacts with the scatter plot as a visualization technique which provides a classic visual analytics system.Therefore, the EpiViz paper could provide an idealistic model with advantageous features resulting from integrating both machine learning algorithms and visualization techniques to obtain a very effective visual analytics system.

5) DENSITY-BASED METHODS
In density-based clustering, the cluster continues to expand if the density of a set of points with its neighbors is closely packed together, and that cluster is separated by subspaces where the objects have low density.This kind of algorithm is more complex than other clustering algorithms such as partitioning clustering [12].As it is based on data density, density-based clustering can distinguish noise data and does not require a prior number of clustering, which can be more helpful for non-linear clustering.Landesberger et al. [78] highlight some of the advantages of using a density-based clustering technique in their visual analytics methodology for time series data.They state that a density-based clustering is a fast algorithm which does not require pre-setting the number of clusters, is able to detect arbitrary shaped clusters as well as outliers, and uses easily comprehensible parameters such as spatial closeness.DBSCAN [179], OPTICS [180] and LOF [181] are some of the common algorithms that work with the density-based concept.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) [179] is one of the most highly cited density-based methods.It depends on a density-based concept of clusters which is designed to detect clusters and noise in a set of data.For each point of a cluster, the epsneighborhood must have a minimum number (minPts) of points.Therefore, the two parameters, eps and minPts, must be known for each cluster or, at the very least, for one point from the particular cluster.In every cluster resides two points, the core and border points, which are on the cluster's border.DBSCAN has good efficiency on large datasets and aims to discover clusters of arbitrary shapes.For example, Chae et al. [74] and Zhao et al. [77], in both visual analytics systems, use DBSCAN to group visitors into corresponding clusters.Zhao et al. [77] utilize the longest common subsequence (LCS) to measure the similarity of two visitors' sequences before applying DBSCAN.
However, DBSCAN cannot transact with clusters of various densities, which is one of the main problems for this algorithm.In contrast, OPTICS (Ordering Points To Identify the Clustering Structure) [180] can deal with the issue of an unknown number of clusters with different densities [182].Local Outlier Factor (LOF) [181] also shares certain notions with DBSCAN and OPTICS with regards to local density estimation, and depends on distances in its local neighborhood.Most clustering algorithms are developed to find and optimize clustering, and they usually ignore noise when the clustering result is produced, but the LOF tries to assign for each object a degree of being an outlier.

B. CLASSIFICATION
Classification is described as mapping data into predefined classes.The classification task is referred to as a supervised learning method because the classifier is constructed using training data, and classes are known in advance.In this task, the algorithm is trained on dataset examples, and tries to assign each set of data into its appropriate class; in other words, assigning time series patterns to a specific category [13].
In classification, the aim is often to learn what the unique features that distinguish classes from each other are.Thus, when an unlabeled dataset is entered into the system, the classification task can automatically determine to which class each series belongs [183].The k-nearest neighbors, decision tree, support vector machines and neural network are the most widely used algorithms for the time series classification task.Even though these algorithms have received much attention in VOLUME 7, 2019 the data mining and machine learning communities, embedding their use in visual analytics systems for time series data is still a relatively young and emerging field.The next sections intend to provide a review of the few existing time series classification in the surveyed visual analytics papers along with the works that have been adopted in the data mining community.

1) k-NEAREST NEIGHBORS (k-NN)
The k-nearest neighbors algorithm performs a straightforward function on data.There is no learning process; in order to produce an output for a new test input x, the k-nearest neighbors to the new sample at test time can be found in the training data, which then returns the major class label, producing an output at test stage [4].Despite there being numerous classification algorithms that have been used to classify time series, evidence shows that the simple nearest neighbor classification is extremely difficult to beat [184], [185].Xi et al. [186] and Rakthanmanonet et al. [187] have shown that the simple combination of one-nearest-neighbor with dynamic time warping (DTW) distance produces good results, but it suffers from computational complexity with the DTW algorithm [186].

2) DECISION TREE (DT) AND RANDOM FORESTS
The decision tree (DT) is one of the most popular classifiers.It is generated by algorithms that identify various ways of dividing a dataset into branches [188].The tree has three kinds of nodes.In the root node, the outgoing branches can be divided into one or more branches.In the internal node, one incoming branch can be divided into two or more outgoing branches.In the end node, leaf nodes represent classes and branches represent decisions.Starting at the root, the classifier makes decisions to reach the class label [189].The decision tree can also be utilized under uncertainty as a sample for sequential decision problems.It assists with describing the decisions that will be made, the cases that might happen, and the results that are related to each of the events and decisions.
C4.5 [190], ID3 [191], Classification And Regression Tree (CART) [192], [193], and CHi-squared Automatic Interaction Detector (CHAID) [194] are examples of decision tree algorithms.The complexity of a tree impacts its performance and accuracy.The criteria and pruning method that are used can control this complexity, and certain metrics can be used to measure it.These metrics include the depth of the tree, the overall number of nodes, the number of used attributes and the overall number of leaves.The rule induction always links to the decision tree induction, and every path from its root to its end can be converted to a set of rules [195].The decision tree's performance is better when it deals with discrete features.
Random Forests are an ensemble of bagged decision tree learners with randomized feature selection.Breiman [196] defines it as a collection of randomized decision trees, thus, it takes the decision tree concept a step further by producing many decision trees.In a random forest, each decision tree is learned from a random subset of features and a random subset of training examples [197], [198].It can be used for classification as well as regression.
For random forests, the training algorithm applies general techniques of bootstrap aggregating or bagging.In bagging, it trains an often large number of classifiers on random subsets of the training set, classifying by using the majority vote of all classifiers.In boosting, it operates as per bagging, but introduces weights for each classifier based on performance over the training set.
Decision trees (DT), have been adopted in visual analytics with high levels of accuracy.Xie et al. [40] use a decision tree in the VAET system which highlights interesting events in e-transaction data.The system uses a probabilistic decision tree learner to estimate the salience of each transaction in a large time series.Then, the saliency values are visualized in a time-of-saliency map.This visualization allows analysts to explore, select and conduct a detailed examination of interesting transactions, displaying them in a new visual metaphor called KnotLines.

3) SUPPORT VECTOR MACHINES (SVM)
Support Vector Machines (SVM) is an effective classification method.It is widely used and has shown substantial achievement in solving sequential time series classification tasks [199]- [203].
The SVM discriminates between positive and negative examples, and through the use of said examples, it learns to classify and produce positive and negative classes [4].For linear cases, SVM [204]- [206] aims to find a class identity by mapping series into a high-dimensional feature space.Once the similarities between series have been measured, SVM separates two classes and enforces a larger margin hyperplane, which is the gap between classes.Thus, SVM acts as a large margin classifier for accurate classification and efficient generalization.
For non-linear cases, SVM often uses kernel functions, which represent a non-linear decision boundary that separates the positive and negative samples.The kernel function is appropriate with high-dimensional feature spaces and has been applied to measure the similarities between two given time series [201].Many kernel-based methods corresponding to different measures of similarities and which efficiently overcome time series classification problems have been proposed [201], [207]- [209].Multiple kernel learning is an optimization problem [201] whose solution has been proposed by [210].They present an efficient algorithm that solves the multiple kernel learning problem and works with many samples or multiple kernels which need to be combined.
Support Vector Machine (SVM), as a time series classification model, has been integrated with visual analytics systems [82], [94], [113].This procedure allows scientists and domain experts in such fields (e.g., biology) with a little background in machine learning to build classification models with high levels of accuracy [82].Lu et al. [113] supported the creation of the SVM model along with two other different types of models, Linear Regression and Multilayer Perceptron, combining feature selection and model cross-validation through numerous interactive visualizations, which help analysts in their building of such a model.Kim et al. [94] developed a visual analytics tool that incorporates machine learning algorithms (supported vector machine) to predict coded undesired behaviors.

4) NEURAL NETWORKS (NN)
Neural networks are learning algorithms that mainly rely on statistics.This kind of algorithm learns from data using its own learned features [4].Neural network algorithms have been efficiently used to solve several tasks.The task of classification, especially time series classification, has received particular attention with regards to using different kinds of neural networks, such as multi-layer perceptron (MLP) [211], convolutional neural networks (CNN) [31], [212], [213], and recurrent neural networks (RNN) [214].
Multi-layer perceptrons (MLPs) represent a type of neural networks that have been used as classifier.Its architecture comprises fully connected layers, and each layer contains neurons with weighted interconnections between them [211] called parameters.Neurons act as switching units associated with weights that are interconnected among them.The aim of this model is to ideally approximate a function (e.g.classifier function) by mapping the input values into a category (a class) learning the parameters (weights) [4].For time series classification, class labels should be given so that a learning function maps the series into an appropriate class.Thus, the weights are learned by finding the best relationship between time series and their appropriate classes [211].From the visual analytics perspective, multi-layer perceptrons (MLPs) have been used by Lu et al. [113] in their visual analytics system.They use backpropagation and allow users to select which algorithm to use, set the number of folds for the stability test, train models to predict and compare between available models.
Convolutional neural networks (CNNs) are a recently introduced kind of neural networks that have been developed for processing data that has grid-structured topology, such as time series (1-D grid) data and image data (2-D grid of pixels).CNN architecture comprises convolutional layers for spatially related feature extraction and fully connected layers used for classification.Convolutional layers are utilized, as feature extractors, to learn features through mapping the raw data into a feature space, and the trainable fully connected layers perform classification based on the learned features from the convolutional part.The convolutional part generally consists of multiple layers; each layer has three stages: the convolution stage (filter), the detector stage (activation) and the pooling stage [4].The input and output of each stage are called feature maps [31].In the training stage, the forward and backward propagation algorithms are used to train the CNN and estimate parameters.A gradient-based optimization method is utilized to minimize the loss function and update each parameter [213].
Unlike 2-D grid (e.g.image data) input, convolutional neural networks for time series uses a 1-D grid, so instead of holding raw 2-D pixel values, the input of time series classification is multiple 1-D subsequences.In this case, multivariate time series [31] are separated into univariate ones so that feature learning can be performed for each univariate series.At the end of feature learning, trainable fully connected layers are adopted to perform classification.
The univariate time series are considered as input that is fed into the convolutional layers, learning features through convolution, activation and pooling layers.The 1-D convolutional layer extracts features by applying dot products between transformed waves and a 1-D learnable kernel (filter) [215], computing the output of neurons that are connected to local temporal regions in the input.This stage is followed by the activation layer, which is used to perform non-linearity within the networks, allowing learning of more complex models [216].In the pooling layer, a down-sampling operation is performed to reduce the resolution of input time series [31], which in turn reduces complexity and generalizes features in the spatial domain.After extracting feature maps from multiple channels, they are fed into other convolutional layers and then pass them as inputs of the fully connected layer.In the fully connected layer, the class score will be computed, where each of the result numbers corresponds to a specific class.
Time series classification faces some obstacles and difficulties, such as feature representations at different time scales, and can be distorted by high-frequency perturbations and random noise in time series data [215].Several multi-channel CNN architectures have been used for the task of time series classification [31], [212], [213], [215], [216].The results of all adapted CNN classifiers are competitive for both classification accuracy and performance with regards to overcoming the challenges.
The classification algorithms applied in our surveyed papers are usually embedded in visual analytics systems [40], [82], [83], [94], [113].The k-nearest neighbors, decision tree, support vector machines, and neural network are used in some recent works, but are not as common as clustering techniques.

V. VISUAL ANALYSIS A. VISUALIZATION TECHNIQUES
Visualization transforms symbolic data into geometric data [217].The result of this process can help people to understand the data by presenting it in a graphical format, helping users or analysts to observe, analyze, make decisions, and identify patterns and correlations based on visualization.The visualization can also help to detect and see information and relations between data which might not be recognized when looking at numerical data [218].In this way, it can aid scientific discovery and enhance the likelihood of gaining deep and unexpected insights, which sometimes leads to new hypotheses.At a basic level, time series data (e.g., from sensors) is presented in 1-D charts, with multiple sensors displayed on the same chart or linked charts.Different visualization techniques (ripple, stacked, river, stream) and interaction techniques (zoom, pan, select) allow the user to select the time duration and obtain visual feedback.Interaction with the linked view will highlight regions in the time series and any pattern recognition techniques will highlight data in the time series, helping to understand and analyze data over time [38], [48], [59], [86], [90], [93].With stacked, river, and stream graphs, each item is displayed as a colored current whose height changes continuously as it flows through time.The overall shape comprises all the items considered, and it can provide an overview of the topics that are important at points in time.Various possibilities for interaction are used, which allow users to browse and zoom into details of the time duration, as well as to select from the shape.
For time series data, achieving a good visualization helps users not only to create interesting images or diagrams, but also to amplify cognitive performance.Thus, visualization should communicate with the mind to simplify the data complexity.Aigner et al. [15] present three main criteria, these being expression, relevance, and effectiveness, that need to be satisfied in order to achieve a good visualization, exploiting both human visual perception and huge computer processing.
In this survey, visualization techniques are divided into nine categories.These classifications draw from the comprehensive vocabulary of the visualization taxonomy presented by Borkin et al. [219].This taxonomy is used and modified to include all visualization techniques that are used in our surveyed papers, which are summarized in Table 1 From the surveyed papers, it can be noticed that while some techniques dominate others, they share the same goal, which is to present as much information as possible in the display to the user.Thus, there is a wide pool from which to select visualization techniques that can smoothly deal with big data in order to reduce data size and produce a visualization structure which allows the user to explore, analyse, and understand the data.
In the same context, Table 1, shows an increasing trend of using a variety of visualization techniques with time series data.Also, line plots, geographic maps, heat maps, histograms, and bar graphs are the most commonly used techniques in the surveyed papers.Most of them are used to give an overview of the dataset by displaying the time-dependent relations of actions.In contrast, some visualization techniques are rarely used, such as tessellation and streamgraphs, while some are presented as new visualization techniques such as time-of-saliency and knotlines.

B. INTERACTION TASKS
Visual analytics merges machine and human capabilities to facilitate exploration, analysis, understanding and provide insights of exploratory analysis for data and methods.Visual analytics present the chance for analysts, through interaction tasks, to analyze, explore, reason, discover, and understand important structures in complex data and architecture of methods [20].Thus, users can be involved in the process through interaction tasks providing directed feedback to the system.
Early steps in visual analytics were investigated by Tukey [220] on exploratory data analysis, encouraging to support direct interaction with data.Following this work, numerous interaction methods have been developed to support various types of analysis data and methods, assisting users and analysts to better understand, explore, analyze, and gain insights.Researchers in the field of visualization have made efforts to benefit from user interactions in order to achieve analytical reasoning and integrate users into a comprehensive visual analytics system [7].Several works for different visualization tasks and interaction methods have been presented.Those existing works can be classified into three categories, namely low-level tasks, or interactions (e.g.[218], [221], [222]), high-level tasks (e.g.[223]- [225]), and multi-level tasks (e.g.[226], [227]).
In this work, we utilize a typology of abstract visualization tasks by Brehmer and Munzner [226].Their typology provides potential for rigorous analysis as it does not only focus on low-level tasks and high-level tasks, but also addresses the gap between them; these tasks are termed as multi-level tasks.This typology allows us to better interpret our survey from an interactive visual analytics perspective, given that it provides multi-level visualization tasks and a straightforward way of describing complex tasks as linked sequences of simpler tasks.
They identified six main multi-level tasks which are related to visualization tasks in the surveyed papers.We briefly summarize each task with all its subtasks and comment on how they are used in the surveyed papers.In the high-level task (analysis), users or analysts can analyze data using visualization tools so that they can consume information in many domain contexts or produce new information using available resources such as existing data elements.In the mid-level task (search), users or analysts can search elements of interest using visualization tools.The search task is classified into four types: lookup, browse, locate, and explore.In the low-level task (query), the users or analysts already found targets, thus, they can identify, compare, or summarize the pre-found targets.The visualization tasks in our surveyed papers are summarized in Table 1 under the headings: Analysis, Search, Query, Encode, Manipulate, and Introduce.
From the surveyed papers, it can be noticed that low-level tasks are more commonly used than high-level abstract tasks.
As shown in Table 1, using visualization tools as high-level tasks to analyze data is rarely done in the surveyed papers.In contrast, low-level tasks are often used; for example, query tasks are often used to find targets.Selection and navigation interaction methods are also widely used to provide a range of different options which can be applied to any element in visualization systems.Moreover, the filtering method is frequently used when individual view of sequence data needs to be filtered.

C. VISUALIZATION AND ANALYSIS TECHNIQUES
We have conducted the review from the perspective of the data mining and visualisation communities and how the two integrate to produce visual analytics systems.The data mining community utilises visualisation to a lesser extent and with the specific goal of demonstrating the efficacy of methods under research.Images are intended to be static figures, there are many examples of using t-SNE (clustering) overlaid with colour to represent classification to convey how well a new technique performs or how well a data set can be processed.Another example is that of utilising heat maps to indicate which features from training sets contribute to the model classifier.
Visual analytics provides different perspectives and goals to satisfy the user demands.Interaction becomes a key goal where the system should impart more knowledge through the capability to interact with data or model parameters.This can lead to a different emphasis on the methods chosen to process the data.An effective clustering algorithm such as t-SNE led to ineffective user interaction because of spatial inconsistency after reduction to 2D used in the creation of the interactive user interface [131].Alternative clustering techniques PCA and UMAP projected similar data to similar spatial locations in 2D (Fig. 3).Feedback in user studies and from domain experts indicated that the latter dimension reduction approaches are more suited for deriving user interfaces [131].
Parallel coordinates is a familiar interaction tool in the visualisation community to enable the exploration of high dimensional parameter spaces, but we saw no use of parallel coordinate visualisations as static images in the data mining literature.Primarily this is due to it being a useful tool to interrogate data when interaction is employed.Each axis can represent a parameter in the model or clustering approach, etc. allowing the user to experiment a gain feedback through alternate views [228](Fig.4).Indeed, the utility of these approaches is through multiply coordinated views where direct interaction in any of the views highlights the same selection in each view space (Fig. 3).
For temporal data with a spatial component, a common processing approach is for locations to be quantized, and paths through the quantized locations creating a motif which can be matched using similarity measures.1-D curve similarity measures are employed directly on the data.Multiple sensors, weighted similarity, or higher-dimensional data is reduced in dimensionality (PCA, MDS, non-linear DR) before clustering.Similarity measures include Euclidean  User interaction is principally through the parallel coordinate plot to isolate the overlapping manifolds in the data [228].
If each curve/path has similarity computed against all other paths, the result is a symmetric square matrix where each entry represents the degree of similarity/dissimilarity.We can employ clustering techniques such as agglomerative clustering and DBSCAN to create a hierarchy which can be displayed as a tree structure (dendrogram [91]).Cuts can be taken through the tree to simplify the data.The tree provides a useful interaction interface to update and query results in the other linked windows.Dendrograms as a static image infrequently appear in machine learning literature, but again they create a useful interactive tool since a cut through the dendrogram can produce a specific instance of a visualisation representing different levels of clustering (or data aggregation).
Time-based (e.g., one hour, one day), or pattern-based (e.g., recognising a pattern using a variety of similarity measures or change detection) can result in data segmentation.The segmentation results are visualised or used as input for further processing steps [53].Users can influence segmentations indirectly through choices concerning the segmentation algorithm, changes in its parameters, or by direct selection and labelling of the data.
Visualising segmented data offers significant visual cues for determining outliers or clusters of data.For 1-D data, multiple segmented data can be plotted on charts (and multi-dimensional on linked charts).Trends, clusters and outliers can be detected visually [59].Interaction can allow brushing in the chart to remove, select, label or highlight groups of associated data.Queries can be generated using slope tools or ranges, and curves will either match or not match such queries.These queries can be stored for future use to act as triggers or stored procedures on the data.
Apart from 1-D charts, another main approach is to use radial depiction of data.The data can be visualised as line or bar charts in a circle with the x-axis around the circle, and y-axis away from the circle.Typically, the x-axis represents time, with multiple axes radiating from the centre indicating durations (e.g., hours in a day, days in a month, or months in a year).Transformed data may place spatial coordinates on the x-axis (with a map central to the visualisation), and the y-dimension could be time, with distance from the y-dimension then indicating further attributes such as intensity (of the sensor -e.g., pollution levels [86], [89], and shells of data appear around the circle (stacked/river charts).Multiple small versions would create glyphs, or a single view linked to other views, which offers more detail.
Calendar views [53], [78] also offer successful interaction, allowing visualizations to aggregate according to the days selected.Selections can involve months, a certain day of the week and workdays versus weekends.A secondary view based on the above chart or network views can offer focus-and-context associating the detailed view within the overall context of the annual view.The calendar view utilizes colored patterns to indicate different clusters; therefore, the selected elements become active and bigger, which cause unselected elements to become smaller.Differing from radial plots, the calendar view allocates the same amount of screen space to individual patterns, giving them equal visual importance [53], allowing to visualize during which time stamps the temporal clusters occur.
The similarity matrix also serves a useful visualization and interaction tool, and is displayed using color mapping e.g., resulting in a heatmap [52], [94].Rows and columns can be sorted to reveal patterns.Individual selections in the heatmap highlight data pairs in the source data.Larger selections highlight groups of data with the degree of similarity chosen.Selections are linked to other views of the data.Sorting can also be applied to any of the other linked views, e.g., multiple bar charts can be sorted by decreasing similarity from a user-selected pattern [83], [85].It can also be used for network graph.Different colors and pixels are used to represent the data, emphasizing the relationships between elements.The similarity matrices explain to which degree the clustering would change for the next parameter setting.In this kind of visualization, the user can select a similarity threshold and algorithm which helps to perceive the dynamic network from different perspectives.
With regards to graph/network data, networks are directly visualized as node-link diagrams resulting in clutter [52], [54], [78].Standard techniques are used to simplify the graphs, such as using edge bundling, weighting edges according to the linkage, or higher order curves to emphasize path connectedness.Node-links can be converted to matrix view with each matrix element storing the edge weight between the two nodes.The matrix can be visualized directly (with edge weight mapped to color).The network view provides an overview of the clustered nodes which have a similar behavior over time and edges reflect connections between these clusters.
Visual analytics systems offer direct views of the data (e.g., visualisation of the raw accelerometry data (Fig. 3 top) or abstract views (Fig. 3 bottom) where data has undergone processing such as dimension reduction to create the interactive interface.Throughout our study the essence of visual analytics is to provide multiply coordinated concrete and abstract views of data.This allows interaction with parameter spaces to enable human cognition to play a vital role in information and knowledge discovery.
Analysts usually change their exploring strategies and switch between analytical techniques and visualizations to collect different findings.However, these analytical techniques (black-box methods) might confuse the end-users or provide results that do not lead to a solution to the problem, and some of them require user action such as k-means requiring the assignment of the number of clusters.To be beneficial in visual analytics, the analysis techniques should be fast enough in terms of response for efficient interaction, parameters of the analytical technique have to be representable and understandable utilizing the visualizations and parameters have to be adjustable by visual controls [20].
There are numerous challenges associated with visual analytics system usability and process understanding.To obtain more confidence, the user should be aware of the source of data and the transformations that have been applied on its way through the processing stages (e.g., preprocessing, analysis tasks and visualization techniques).Rapid feedback is significant in visual analytics interfaces, and that represents challenges to various of the domains related to visual analytics.Due to the complexity of human interaction, evaluating visual analytics systems is especially complicated, and integrating machine learning algorithms in to these systems adds additional complexities and opens questions such as how the model succeeds or determines what a good solution is, why a model predicts a value, or why a model provides a classification label which are sometimes beyond of the scope of interactive visualization issues.Some works such as [229]- [231] shed light on the black boxes of classification and clustering algorithms and explain the determined decisions which assist to understand these algorithms and enable the comparison of different prediction methods.These questions are very important in order to understand the model outputs and provide appropriate visual representations and interaction techniques.

VI. EVALUATION APPROACHES
A systematic evaluation, controlled by a set of standards, identifies and validates the degree of achievement or value of proposed systems, techniques, methods and algorithms.Since the space of visualization systems design is massive, Munzner [227], [232] subdivided this complex problem into four sequential layers that separately solve various concerns, presenting a nested model for visualization design and validation.At the top level, details of a specific application domain are considered.Next is the design of data and tasks abstraction.The following level concerns the design of visual encoding and interaction, while the last level involves the design of algorithms.
This research utilizes Munzner's work [227], [232], which presents different appropriate evaluation approaches at each design level, including field / case studies, controlled lab / user studies, usability studies, heuristic, and algorithms performance.These approaches were applied to our surveyed papers (summarized in Table 1).At the top level, field studies or case studies form the most common evaluation approach, where investigators gather qualitative data through semi-structured interviews and observing people's actions in real-world settings.At the abstraction level, studies or case studies are also used as qualitative validation to evaluate a member of the target users by observing and documenting their use of the deployed system.At the visual encoding and interaction idiom level, a controlled lab study or user study is used as an evaluation approach.Through this, quantitative measurements (e.g.time, errors, quality, and preferences) are collected as well as qualitative measurements (e.g.questionnaires and qualitative discussion).Also at this level are usability studies, another qualitative evaluation approach which aims to prove that the deployed system is usable.Heuristic evaluation is another, quantitative and qualitative measurements, validation approach that involves experts in the field to ensure that the visualization design does not violate any guidelines used to justify the usability of a visualization system.At the algorithms design level, the quantitative evaluation approach is used to validate the performance of algorithms such as their speed and computational complexity.
The evaluation approaches which have been applied to our surveyed papers are classified into five categories adopted from Munzner's work [227], [232]: case studies, usability studies, controlled user studies, algorithm performance and others.Table 1 summarizes each evaluation approach used on a surveyed paper, classifying based on years.It should be noted that the case study approach is most commonly used in the surveyed papers compared to other evaluation approaches.We have also noticed from our survey that other evaluation approaches are also used, such as ground truth [52].

VII. INTEGRATION OF VISUALIZATION AND ANALYSIS TECHNIQUES A. MODEL BUILDING VISUALIZATION
The above processes, visualizations and interactions can result in a large corpus of labelled data suitable for visual and statistical interrogation.Additionally, labelled data is useful for model building, using data mining approaches as discussed earlier.Such models can be used to aid the user with further segmentation and labelling of the data [40], [57], [83], building predictive models for the future [113], and identifying patterns and behaviour of systems or individuals in the data [94].By exposing algorithm choice through the interface, along with parameters, the user can play an interactive role in deciding the best approach for their data [44], as effective algorithms for time series analysis always require precise choices of approaches and parameters in order to be able to solve clustering and classification tasks.We notice from our survey that several interaction methods are not specific to data only, interestingly, a variety of interactive tools are combined to support analysts in algorithm selection (e.g., [89], [113]), training (e.g., [44], [113]) and testing (e.g.[113]).Moreover, several systems interactively provide analysts with a variety of controlling options for time series analysis tasks such as control algorithms parameters (e.g., [38], [44], [52], [78], [94]) or control threshold (e.g., [35], [37], [40], [44], [52], [83], [88], [95]).For example, overplotting matched time-series data leads to new interfaces where direct data selection can accept or reject data from the view without the need for further model training.Exposing model parameters to the user allows the understanding of their inter-relationships and how they impact the algorithm (e.g., feature detection) performance (Fig. 5).
Visual feedback of the model using the visualizations and interaction results in the effective capture of domain knowledge, fulfilling the definition of visual analytics and including humans in the loop.The models range from pure clustering, such as clustering patients on medical records, which can lead to predictions about how an individual patient's condition will evolve [46], to utilizing classification techniques such as SVM [82], [94].Such models can be used to aid the user with further interactive clustering while representing data samples as discussed earlier.The user can interact with the view in (c) to select and ''delete'' undesirable matches [83].
Choosing the analysis algorithms such as k-means, hierarchical clustering, or the self-organizing map or giving feedback during the analysis process such as k-means which requires the user to specify the number of clusters as input are some examples of the interaction between the end-user and visual analytics system.Therefore, the implication of visual analytics and the goals of the end-users on the choice of analysis algorithms are fundamental and require further investigation in terms of what kind of visual controls are required to manage the algorithm and assess the quality of the proposed solutions side by side with the interactive visual representations.

B. EMERGING TRENDS
The merger of visual analytics and machine learning offers many potential opportunities for time series data analysis.However, a large effort is still needed at the algorithmic and software levels to help embed fast machine learning techniques in visual analytics systems.From a performance perspective, dealing with massive datasets in terms of quantity and speed of data to be visualized and interacted with in real-time is crucial in visual analytics systems.Therefore, response times are very important and such factors can play a major role in an interactive visualization.Thus, developing fast machine learning for interactive visualization is one of the open research topics associated with integrating the two domains.
Also, one of the major technical barriers is that the existent software tools are highly divided between these two domains; for example, visualization tools are often written using programming languages like C++ or using libraries such as d3js (a JavaScript library), which are powerful with regards to maintaining close control over the visualization technique and user interaction.On the other hand, most of the advanced machine learning algorithms are usually written using different libraries in statistical or programming languages such as Matlab, R, or Python (Machine learning libraries like scikitlearn, TensorFlow, etc.), where they aim to learn complex models from (often large amounts of) data but provide limited interactive information visualization.Therefore, there is an urgency to find a standard software environment which can be used to assist visual analytics developers with integrating machine learning techniques effectively and efficiently in interactive visualization systems [10].
Recent visualization research has seen and increased use of sophisticated algorithms, especially in projection-based methods which have a stochastic nature [54],Ali2019.Thus, the outputs of these algorithms may rely on different settings, e.g.random initialization, which sometimes have major effects on results and evaluations.These algorithms should be measured in terms of their robustness, generalizability, stability analysis, sensitivity analysis, etc.The robustness of an algorithm concerns its ability to handle any kind of input.An algorithm's generalizability sigifies that it can be generalized into a greater dataset (unknown data) than the dataset (small known data) used in the training process.Stability analysis refers to the analysis of errors in numerical computation (if the errors are increased, the algorithm is numerically unstable, and if the errors are abated, the algorithm is stable).The sensitivity analysis of algorithms involves analyzing the alteration of outputs with respect to the inputs.Therefore, visual analytics developers must take into account these factors alongside others which may have major effects on visualization results [7].
Moreover, some machine learning algorithms embedded within visual analytics systems for time series data are still part of a relatively young and emerging field, even though they have received wide attention in the data mining and machine learning community.To mention but a few, Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (DWT) have rarely been used as dimensionality reduction techniques by visual analytics researchers, while these techniques achieved good results for time series data in the data mining field.Moreover, some clustering algorithms, such as the fuzzy clustering methods, and classification algorithms are currently under-represented in visual analytics works but are successful in the data mining community, and are therefore something that visual analytics researchers should include in their future works.
There are several challenges which we perceive as interesting research directions for combining machine learning and visualization techniques.Firstly, there is no existing unified or systematic solution to support the user, which explains the scarcity of classification algorithms used in the surveyed papers.Secondly, there is a visualization challenge in terms of clarifying the reasons behind why such algorithms demonstrate impressive classification performance.
One interesting potential research direction of combining the two fields of machine learning and visualization techniques is building user-driven algorithms specifically geared for a visual analytics approach to overcome difficult challenges for time series data.Involving users in the process, through interactive methods, allows them to provide directed feedback to the system.Formulating a user-centric approach through combining automated analysis methods and interactive visualization is an efficient approach to visual analytics.This puts emphasis on the visualization community to apply visual interfaces to existing algorithms provided by the data mining community.
Deep learning algorithms (e.g.CNN, RNN and LSTM) are often perceived as black-box models due to their ambiguity and unclear working mechanisms [233].Although these algorithms have been used for time series data by the data mining community, there is little work on CNN, RNN or LSTM with visual analytics.This leads to the other interesting potential research direction of combining the two fields, as there is no clear understanding of why deep classification algorithms achieve highly performant results when solving such a task.Thus, visualization techniques are needed to explore such complex models as well as illustrate and explain their internal operation and work mechanisms.This would allow to gain general insights and obtain an overview of how to control and improve such models.Efforts have been made in the field of computer vision to clarify the learned features of deep learning algorithms on image data.The existing methods of previous works can be categorized into two different groups: code inversion (e.g.[234]- [236]) and activation maximization (e.g.[237]- [239]).In the field of visualization, a set of visualizations have been developed to help machine learning experts clearly understand such deep complex models (e.g.[240], [241]).Liu et al. [240] have recently presented an interactive visual analytics approach which allows for the better understanding, diagnosis, and improvement of deep CNNs.

VIII. CONCLUSION
This research is considered a comprehensive survey for time series data, focusing equally on both machine learning and visualization from the visual analytics perspective.Time series data can be obtained from different sources which have been categorized into four types based on the surveyed papers.During research, we focused on two mining tasks; clustering and classification.At the beginning, we review both tasks from the data mining perspective.They achieve great performance and accuracy when dealing with time series data.This success led us to review a promising field where both automated analysis techniques and interactive visualizations can be combined to easily understand, explore and analyze large and complex datasets.We cover over 60 papers in detail, which were selected with the criteria that every paper must involve time series data and visual analytics, using either clustering or classification tasks.It can be noticed from the surveyed papers that many visual analytics works use clustering more than classification.Because of a lack of label data, keeping humans in the analysis loop is paramount in order to help users adjust and explore the influence of different clustering choices during the analysis process.Visualization and interaction techniques are also surveyed in the reviewed papers and classified based on previous literature.Such classifications have been modified and changed to be compatible with the surveyed papers.The evaluation approaches of every paper were also studied and categorized.As a result, researchers can use this review as a guide for new investigations.In the end, we believe that this paper is a starting point towards clarifying the major concepts that have been presented, and provides a valuable guide to the emerging field of integrating data mining techniques with visual analytics.

FIGURE 1 .
FIGURE 1.This survey focuses on the intersection between time series data, machine learning techniques (clustering and classification), and interactive visualizations.

FIGURE 2 .
FIGURE 2.The time series data pipeline used to structure our surveyed papers.From the surveyed papers the items in the cloud are usually integrated into one visual analytics system which is evaluated using the various evaluation approaches surveyed in Section VI.
present a comprehensive review for time series measures, classifying them into four major categories: lock-step measures (e.g.Euclidean distance and Manhattan distance), elastic measures (e.g.longest common subsequence [LCSS] and dynamic time warping [DTW]), pattern-based measures (e.g.spatial assembling distance [SpADe]) and threshold-based measures (e.g.threshold query based similarity search [TQuEST]).Pattern-based measures and threshold-based measures are out of this work's scope as they are not used in the surveyed visual analytics papers.
list several methods for reducing time series dimensionality as feature extraction, including Discrete Fourier Transformation (DFT), Discrete Wavelet Transformation (DWT), Discrete Cosine Transformation (DCT), Single Value Decomposition (SVD), Adaptive Piecewise Constant Approximation (APCA), Piecewise Aggregate Approximation (PAA), Chebyshev polynomials (CHEB), and Symbolic Aggregate approXimation (SAX).The types of methods we discuss below are intended to provide examples of popular feature-based techniques, not to define a rigid taxonomy of methods.

FIGURE 3 .
FIGURE 3. UMAP clustering of time-series animal behavioural data leads to consistent neighbourhoods in the 2D interface (compared to t-SNE which does not).Also shown, a k-nn cluster and how pattern matching in multivariate data is achieved through the interface[131].

FIGURE 4 .
FIGURE 4. Parallel coordinates plot with annular and linear axes, colour coded splines representing the data, and density plots on the annular axes.The view is coordinated with the (PCA dimension reduced) point data (top right).A density rendering based on the data is given (top left).User interaction is principally through the parallel coordinate plot to isolate the overlapping manifolds in the data[228].

FIGURE 5 .
FIGURE 5. Cross-correlation in frequency space is used to find matching time-series patterns with low computational complexity.(a) The user can interact with the cross correlation threshold, (b) and in a linked view see where the matches occur in the overall time-series.(c) Overplotting allows the user to inspect matching patterns.The cluster centre is plotted.The user can interact with the view in (c) to select and ''delete'' undesirable matches[83].

TABLE 1 .
The selected visual analytics papers of time series data.The table provides an overview of the surveyed papers regarding similarity measures and feature extraction, time series analysis techniques (clustering or classification), visualization techniques, visualization tasks and interaction methods, evaluation approaches, and distribution of papers by year of publication.