An Embedded System for Collection and Real-Time Classification of a Tactile Dataset

Tactile perception of the material properties in real-time using tiny embedded systems is a challenging task and of grave importance for dexterous object manipulation such as robotics, prosthetics and augmented reality. As the psychophysical dimensions of the material properties cover a wide range of percepts, embedded tactile perception systems require efficient signal feature extraction and classification techniques to process signals collected by tactile sensors in real-time. For this purpose, we developed two embedded systems, one that served as a vibrotactile stimulator system and one that recorded and classified the vibrotactile signals collected by its sensors. The quality of the collected data was first verified offline using Fourier transform for feature extraction and then applying powerful machine learning classifiers such as support vector machines and neural networks. We implemented the proposed memory-less signal feature extraction method in order to achieve real-time processing as the data is being collected. The experimental results have shown that the proposed method significantly reduces the computational complexity of feature extraction and still has led to high classification accuracy even when fed to the less complex classifiers such as random forests that can be easily implemented on embedded systems. Finally, we have also shown that low-cost, highly accurate, and real-time tactile texture classification can be achieved using the proposed approach with an ensemble of sensors.


I. INTRODUCTION
With the recent advances in hardware design and VLSI technology, mobile embedded systems such as IoT and Edge devices have started to offer artificial intelligence (AI) services [1], [2]. Considerable scientific and technological efforts have been devoted to developing tactile sensing embedded systems with prospective applications in many fields, such as telehealth systems (e.g. remote examination, palpation, and surgery), smart prosthetics, and robotics with the sense of touch [3]- [9]. Pinker [10] described the complexity of human tactile capabilities as ''Think of lifting a milk carton. Too loose a grasp, and you drop it; too tight, and you crush it; and with some gentle rocking, you can even use the The associate editor coordinating the review of this manuscript and approving it for publication was F. K. Wang . tugging on your fingertips as a gauge of how much milk is inside!'' Unlike humans that can effortlessly perform object perception and manipulation tasks; tactile enabled embedded systems are still primitive and need more research and development for tasks such as discriminating material properties (such as texture, hardness, roughness, and friction [11], [12], dexterous object manipulation/grasping [6], [13]- [15], slip detection [16], and so on). With high performance platforms, such as GPUs and AI-accelerators, raw signals coming from tactile sensors can be processed using high complexity timefrequency transforms and/or deep learning systems for feature extraction and AI processing [17]- [19]. However, for embedded and Edge/IoT systems the real-time processing of such streaming input signals of various modalities with different frequency spectrums is a big challenge [20]- [22]. As forwarding the streaming raw data to servers for cloud processing might not be optimal due to one or more of the cost, energy consumption, and privacy issues, tactile enabled embedded systems must be equipped with efficient feature extraction and AI processing methods to be able to detect patterns of interest in real-time. The goal of feature extraction on embedded platforms is to transform the raw signals read from sensors into a more descriptive domain such that simpler AI processing algorithms can perform at accuracy levels that are close to those obtainable in high performance offline platforms. As tactile intelligence ultimately requires real-time processing of information collected by an array of various types of sensors with dense spatial arrangement for operating multiple points of contact [23], [24], in this paper, to process our experimental tactile dataset obtained via single contact point with the textured surface, the proposed feature extraction algorithm was designed to fit on a lowcost tiny embedded device. Ultimately, such methods codesigned under the resource constraints of the embedded and Edge/IoT devices are expected to serve better for more advanced tactile information processing tasks, such as classifying the aforementioned wider variety of texture classes and tactile experiences. Ideally, such a real-time feature extraction method should be memory-less (without buffering the signal values) and thus apply fewer memory accesses to fetch data; otherwise, dynamic power consumption will be higher due to the digital signal activities over the internal and external interconnects of the embedded device.
The deep learning approach has received increased attention due to the fact that the features it learns to extract in its early layers have similar properties to those extracted by biological neurons in the primary visual cortex (V1) [17], [18] (V1-like features include Gabor-like edge filters, gratings, and color blobs [25]). In our earlier work [24], [26], [27], we have developed a neurocomputational model; selforganized on natural images, the model learnt to extract features that closely matches structural and functional properties of Layer 4 of the cat primary visual cortex [26]. Kursun and Favorov [24] have shown that these features can be used to perform efficient texture image classification and as in deep convolutional neural networks (CNN) [17], [18], they can be used by subsequent cortical areas to develop gradually more complex and perceptually advanced features. Texture classification using tactile information can also benefit from extraction of such robust and perceptually salient general purpose features. As some of the prominent features that neurons in our CNN-like model learn correspond to average power in various frequency bands in their local receptive fields, in this study, we investigate the effectiveness of such bandpower features for real-time tactile information processing on embedded and Edge/IoT systems. Developing such intelligent embedded devices for tactile processing is a relatively new and emerging field with many general applications of tactile technology [7], [8], [13], [14], [28] and more specifically in neuroscience for real-time animal neurophysiological experiments that can utilize wearable tactile devices and developing diagnostic tests of neuropathies [27], [29]- [33].
To test the effectiveness of the proposed feature extraction method for tactile information processing, we developed an embedded system to record a pilot dataset [34] of tactile signals collected by a set of sensors. The dataset is first used offline to evaluate the discriminative power of these features in classifying various materials with different textures. Secondly, another embedded system is developed to evaluate the realtimeness of the tactile information classification based on the proposed CMB features. The embedded system continuously applies CMB in the time domain and computes the bandpowers of the input tactile signals in frequency bands of gradually increasing ranges. Even with a small number of bands, the embedded system achieves high classification accuracy by applying the proposed CMB feature extraction method and a Random Forest classifier (an ensemble of rulebased decision trees) in real-time.
The Fourier Transform (FT) of a signal can be used to decompose a signal to its frequency components and provides a very high resolution and lossless description (features) of the input signal. However, its computational and space complexity might overwhelm low-cost embedded systems. Comparable levels of signal classification accuracy is achievable using features extracted in the time domain as in the frequency domain [17]- [19]. In order to achieve real-time processing on tiny embedded systems, we can exploit the trade-offs between the descriptiveness of the representation and its computational complexity by performing the computations of the feature extraction in the time domain, instead of the frequency domain. These simpler features can be characterized as lossy and lower resolution approximations to the frequency/power spectrum of the signal [19], [35]. Computing the total power of a signal can be considered as one of the simplest features of such a low resolution approximation to spectral analysis. The total-power feature can be computed by summing the squares of all frequency harmonics in the wideband decomposed by the Fourier transform. Moreover, this sum can also be computed in the time domain per Parseval's theorem [36] (as described in more detail in Section II). That is, performing the Fourier transform is not required to compute the total power of a signal; instead, it can be computed by summing up the squares of the signal amplitudes across the given time window. In addition to the wideband computation suggested by Parseval's theorem, extending this idea further by computing the bandpowers in cumulative frequency bands will complement/enrich the set of features extracted in the time domain and help obtain finer approximations to the power spectrum. We called this method cumulative multi-bandpower (CMB) feature extraction method.
The contributions of this paper are the following: • Design and development of a data collection and texture classification embedded system with tactile sensors (the collected dataset is available at [34]), • Development of a novel feature extraction method for signal processing in embedded and IoT systems and comparison with the Fourier transform. VOLUME 8, 2020 The rest of this paper is organized as follow. Section II discusses the required background including Parseval's theorem and the basics of exponential smoothing and low-pass filtering. Section III describes our proposed embedded system and the collected tactile dataset. The proposed CMB feature extraction method is introduced and discussed in Section IV. The real-time embedded classifier and the classification results based on the CMB features is presented in Section V. Finally, Section VI concludes the paper.

II. BACKGROUND
We review Parseval's theorem and its relation to the exponential smoothing as background for the proposed CMB (cumulative multi-bandpower) feature extraction method. Combining Parseval's theorem and the exponential smoothing technique, CMB leads to a simple yet efficient implementation on the proposed embedded system for tactile signal processing/classification. CMB is designed to avoid the computational and space complexity of the Fourier transform at runtime due to the computational/memory limitations of target embedded systems. We review Parseval's theorem and how it can be used to extract a set of simple yet powerful features (bandpowers in various frequency ranges) for use in embedded platforms.
Based on Parseval's relation, the average energy of a signal recording, x[n], can be determined either by adding up the energy of the signal per each sample (i.e., | x[n] | 2 ) at the time domain, or by taking the energy of signal in the frequency domain as summation of | X (jω) | 2 /2π as shown in Eq. 1.
As Parseval's theorem relates the signal's total power in the time and frequency domains, it allows keeping track of the power in real-time without the need of keeping a sliding window of past signal samples for transforming into the frequency domain. Therefore, the theorem offers a method for staying in the time domain yet being able to do useful feature extraction in the frequency domain.
The exponential smoothing offers an efficient, memoryless approach to apply low-pass filter on the stream of samples of a given signal [36]. The exponential moving average filter on the signal x[n] is defined as in Eq. 2: As the smoothing factor α of the exponential smoothing decreases, high frequencies are attenuated. The angular cutoff frequency, ω c , can be taken as the half-power point (or −3dBpoint) and computed as in Eq. 3, which can be converted to ordinary frequency as ω c f s /2π. Figure 1 plots the cut-off FIGURE 1. The frequency response (magnitude response) of exponential filtering. Lower smoothing factors, α, yield low-pass filters with lower cut-off frequencies. The cut-off is the angular frequency (in π rad/sample) at which the DTFT magnitude goes below the plotted −3dB attenuation level.
frequencies corresponding to various α values.

III. DATA COLLECTION SYSTEM
In this section, we describe the proposed data collection embedded systems. Figure 2a shows the overall architecture of the data collection system. We have developed two embedded systems, one that serves as a vibrotactile stimulator system and one that records and classifies these tactile signals collected from tactile sensors. The vibrotactile stimulator system serves for data collection in a controlled environment; it controls a stepper motor that rotates the drum. Considering the fact that the psychophysical dimensions of the material properties cover a wide range of percepts (such as roughness, softness, warmness, and friction) [11] and they require complex spatiotemporal analysis, we have limited our study to the machine perception/discrimination of various textures that can be sensed by sensors attached to a probe/stick touching the material surfaces via a single touch point. As the probe rubs against the surface of the textured material on the stimulator, the sensors attached to the probe capture the vibrotactile signals for real-time classification. The probe is 3D printed with high printing density so that it transmits the vibrations at its tip without distortion.
The stimulator system consists of a control unit, a motor driver module, and the rotating drum module. The control unit, reads the experiment specifications (including the speed and direction of rotation) to control the drum accordingly. Figure 2b shows the physical implementation of the system. The diameter of the drum is 7 cm and it rotates at a linear speed of 5 cm s −1 which was chosen as a typical touch velocity. For each texture, 20 seconds of recordings are collected (corresponding to nearly five rotations of the drum). 97464 VOLUME 8, 2020 For this study, we have explored a number of commercialoff-the-shelf sensors and embedded boards. To acquire multimodal tactile information, we studied various sensors, including accelerometers, piezo sensors (e.g. Piezoelectric Polymer sensor), motion sensors (e.g. Vibration module based on the vibration sensor SW-420) and microphones. For the recording embedded system, we have used the Arduino UNO SoC board to read and collect data in a flexible sampling rate. Based on the limited memory and performance budget of the embedded system used in this study, we have chosen the 3-dimensional accelerometer sensor (MMA-7660 from NXP Company [37]) and an electret condenser microphone (CMA-4544PF-W from CUI Company [38]) as the sources of recordings in our tactile dataset. Although using a richer combination of sensors on a more expensive embedded board would achieve even higher classification accuracy, developing efficient methods for real-time signal processing on embedded systems will help improve the throughput of both low-cost and advanced embedded systems.
We have used the on-chip analog to digital converter (ADC) of the AVR processor available on the Arduino board. The AVR's ADC is set to work with the maximum available clock speed which is the Arduino's Clock/128 = 16 MHz/128 = 125 kHz. Based on the technical details of the AVR's ADC, each analog to digital conversion operation takes 13 ADC clocks that makes a total of 104 µs for each conversion. This yields the highest available sampling rate of 9615 Hz on our Arduino data collecting system. Knowing this limitation, we have set up the sampling rate of data collecting system to 200 Hz for the accelerometer to collect the motion data and to 8 kHz for the microphone to collect the sound data.
For every sampling instance, the proposed data collection system reads a new data sample, the data is sent through serial USART communication to the computer for populating the tactile dataset. The 3-dimensional accelerometer sensor that we used for data collection measures movements in X, Y, and Z directions and gives a total of 3 values per sample, from which the acceleration can also be computed. Movement recordings are integer values that are sent to the base computer for storage. The statistics of recorded values from the sensors are given in Table 1. As the readings to be transferred to the computer (for populating the dataset) have different orders of magnitude, the transfer time is not fixed as the Arduino's USART communication converts data to string before the transfer. This negatively alters the data sampling period of the sampling loop. To avoid sampling rate variation, we have set one of the ARV on-chip timers to interrupt the processor every specific amount of time (determined based on the desired sampling frequency) that calls the sensor reading and data transfer routines.
We have used commercial off-the-shelf embedded boards and electrical components (AVR-based embedded boards, stepper motors, etc.) as well as our own designed and 3D printed mechanical components (including the rotating drum glued with different texture strips). The collected tactile dataset has 12 texture classes and Figure 3 shows an exemplary subset of texture strips that used for the experiments. Textures include sandpapers of various grits, Velcro strips with various thicknesses, aluminum foil, and rubber bands of various stickiness. The dataset collected is available upon request and can be found at [34].
To validate the data collection system, we analyzed the recordings of the 12 texture classes using discrete time Fast VOLUME 8, 2020 Fourier Transform (FFT). As outlined in Section V, using one to three seconds of contact with these texture materials were sufficient to distinguish them. The recordings are first tested offline and shown to include discriminatory information for texture classes. We cropped a number of training examples from each texture; we used 256-sample windows for each example (that corresponds to about one second recording of the accelerometer). Then, we applied the Fourier transform for feature extraction and tested how discriminable the classes are using various machine learning algorithms, including K-nearest-neighbor (KNN), support vector machines (SVM), and random forests (RF) [39], [40]. Window sizes smaller than 256-samples (e.g. 128) correspond to less than half a second of the accelerometer recordings (and much shorter for the sound signals sampled at 8 kHz) and result in significantly lower classification accuracy. However, even with a window length of 128-samples, the FFT-based implementation on the embedded system failed due to the data memory limitation. A method that avoids the window-based buffering of sample readings has been proposed and discussed in the next section.

IV. PROPOSED CUMULATIVE MULTI-BANDPOWER FEATURE EXTRACTION METHOD
Embedding an efficient feature extraction method into the hardware platform shown in Figure 2 enables it to collect and real-time classify the texture data. Instead of buffering the samples of sensor readings in time windows for spectral analysis in the frequency domain, we propose a simple (memoryless) yet efficient (discriminatory) signal feature extraction method easily implemented on our intelligent embedded system for tactile classification. Such feature extraction methods are of significant importance for real-time, energy-efficient, mobile embedded systems. The memory and performance constraints of an embedded implementation may prohibit the use of resource demanding feature extraction methods such as the Fourier transform and deep learning. Our proposed feature extraction method computes a small yet descriptive statistics of power spectral density of the streaming signals coming from the vibrotactile sensors. Parseval's theorem described in Section II allows the computation of the total power of a signal in the time domain. Combining this idea with exponential smoothing of the input signals, the bandpowers in various cumulative frequency bands can form a rich set of features extracted in the time domain in real-time. The proposed method, called cumulative multi-bandpower (CMB) feature extraction method, is described below.
Let x[n] denote the discrete readings obtained from a given sensor at time step n (e.g., the data coming from one of the three channels of the accelerometer sensor and the feature extraction can be performed in parallel in each dimension separately).
Parseval's theorem (Eq. 1) states that the total energy (thus, average power) of a signal can be calculated either using the amplitudes in the time domain or spectral power in the frequency domain. More specifically, summing power-per-sample across time (i.e. sum of the squares of the amplitudes of the data samples) is another way of computing the total spectral power across frequency. On one hand, FFT returns the power spectrum that precisely describes the distribution of power into individual frequency components of the given signal; on the other hand, working in time domain with summations (accumulation of powers of samples, x[n] 2 , provides a memory-less mechanism that can help the embedded system avoid complex and computationally demanding FFT approach. To take advantage of both approaches, instead of using average power as the single feature over the whole frequency spectrum, we propose to extract an array of such features receiving their incoming samples from smoothened versions of the signal (i.e. low-pass filtered data samples with various pass-band/cut-off characteristics). We approximate the low-pass filters using exponential smoothing (see Eqs. 2 3) as defined in Eq. 4: for a set of K smoothing factor values, 0 < α k < 1, for k = 1, . . . , K . Using lower smoothing factors, α k , computes low-pass filters with lower cut-off frequencies (i.e. lower values of α k actually increase the level of smoothing; see Figure 1 for the relationship between exponential smoothing and low-pass filters). Let us assume smoothing factors are sorted in decreasing order and let us include an additional α 0 = 1, which does not perform any smoothing, S α 0 [n] = x[n], and it will be used for computing the average power of the signal as suggested by Parseval's theorem: 1 = α 0 > α 1 > α 2 > . . . > α K > 0. Note that Eq. 4 can be computed memory-less without the need for storing the past S α k [n] values. In fact the computational code performs the following assignment.
Having such an array of S α k data values, the Parseval's theorem can now be applied to sum up the squares of these values in order to calculate average powers in gradually narrower bands of frequencies as shown in Figure 1 (due to gradually lower cut-off frequencies these consecutive lowpass filters have). Let F α k denote the (average power) feature extracted for a given alpha value as in Eq. 6: Note that we can avoid buffering F α k [n] values and again use exponential smoothing to estimate the sum of power-persample, S α k 2 , in our calculations: Also note that using exponential smoothing applies exponentially decreasing weights over time that fits well with the transient nature of the incoming data as the data changes from one texture class to another. These set of F α k , k = 0, 1, . . . , K , features can discriminatory signals based on not only their frequency components but also their amplitudes and DC-offsets (c 0 of FFT or the average amplitude). However, the following simple normalizations can be incorporated into the feature extraction for obtaining features sensitive only to frequency variations. To achieve DC-offset invariance, first we modify Eq. 6 as Eq. 8: For k = 0, we do not change the formulation of Eq. 6 As before, these F values can be obtained using memoryless computation by exponential smoothing: Finally, the normalized features, R, are computed as: It is straightforward to show that R α k features are both amplitude-scaling and DC-offset invariant. We can show that scaling the amplitude of the signal, x Since all F α k [n] features for all α values are scaled up by a factor of m 2 , the scaling can be cancelled out by using the ratios of the F α k values to each other. A good strategy to control the magnitude of the features would be to normalize each F α k by the previous feature, F α k−1 . For simplicity of formulating the theory and the subsequent discussion, we choose to normalize by F α 1 and compute the normalized features, R α k m,c , of x m,c as: Clearly, R α 1 = 1 and it can be omitted. Moreover, augmenting the set of K − 1 normalized features, R α k , k = 2, . . . , K , with F α 0 and F α 1 has the same descriptive power as the set of K +1 unnormalized features, F α k , k = 0, 1, . . . , K . Having F α 0 and F α 1 with the scaling and DC-offset invariant normalized features can help machine learning classifiers detect average power and amplitude variations as they also might be valuable sources of information.
As a first demonstration of the proposed feature extraction method, we use simple sine waves with various frequency, amplitude, phase, and DC-offsets. For this aim, we define the following six functions with w 1 = 2π × 600 and w 2 = 2π × 250 (frequencies of 600 Hz and 250 Hz): • x 5 (t) = sin(w 1 t + ω 0 ), for ω 0 = 83, and • x 6 (t) = 10 + sin(w 1 t) These signals, x 1 (t) through x 6 (t), are sampled for 1 s at a rate of F s = 4 kHz and their Fourier transforms show frequency harmonics at either/both 600 Hz and 250 Hz, as expected. Figure 4 shows that the proposed features, R α k , have frequency sensitivity as the plots for x 1 , x 2 , and x 3 have different feature values. Moreover, the figure shows that these features are invariant to amplitude-scaling and DC-offset as the plots for x 1 , x 4 , and x 6 are perfectly identical. The features are also very robust to phase changes as the differences between the features of x 1 and x 5 are negligibly small (≈0.0001dB) in Figure 4, which demonstrates that the changes in the feature values are due to the difference in the power spectrum of signals.
The algorithm of the feature extraction algorithm is given in Algorithm 1. The time and space complexity of the CMB feature extraction algorithm are both linear in K, O(K ), where K is the number of cumulative bands used by CMB (corresponding to the number of alpha values). Fast Fourier Transform (FFT), on the other hand, has O(nlogn) time complexity and O(n) space complexity, where n is the length of the FFTwindow with n K , especially when the sampling rate is high. This difference makes CMB more applicable than FFT in real-time with a small compromise in accuracy.

V. EXPERIMENTAL RESULTS
To evaluate the proposed embedded system for tactile classification, we have performed a wide range of experiments. In the first experiment, we have implemented and tested various classifiers on the texture classification task using both the proposed CMB features and the FFT features for comparison. We have used the following six classifiers [39], [40]: • Random forest classifier, which is referred to as RF in the figures. The RF classifier is a majority voting ensemble of a number of decision trees. We have varied the number of trees for optimization purposes. With improved generalization capabilities, RF is one of the simplest yet accurate classifiers in machine learning [40], [41].
• Support vector machine classifier with the radial basis function (RBF) kernel, which is referred to as RBF − SVM in the figures. As a nonlinear maximum margin classifier, RBF −SVM is one of the most successful classifiers in machine learning (especially with small/mid size datasets).
• Linear support vector machine classifier, which is referred to as Linear − SVM in the figures. Linear − SVM is less powerful than RBF − SVM but it has good generalization due to margin maximization. Moreover,

Algorithm 1 Proposed Memory-Less Cumulative
Multi-Bandpower (CMB) Feature Extraction Algorithm it is easy to train, scales to large number of samples, and the discriminant can be computed explicitly.
• K-nearest neighbors classifier using the Euclidean distance metric, which is referred to as KNN −Euclidean in the figures. Other distance metrics have also performed comparable in our experiments. This algorithm does not perform any training, it only stores the training dataset and measures the distance of a test example to these training examples to make its inference). KNN can serve as a good baseline for accuracy but even for K = 1, it is inefficient due to high time and space complexity.
• Multi-layer perceptron classifier, which is referred to as MLP in the figures. MLP has a number of neurons in each one of its hidden layers. The hidden units in each hidden layer extract nonlinear combinations of the inputs from the previous layer in order to define sufficiently nonlinear discriminants. With more layers and neurons, the number of parameters increase and the generalization reduces [39].
• Logistic regression (referred to as Logistic−R in the figures) is a statistical method that models the probability of classes (dependent variables) using a linear combination of features (predictors or independent variables).
We picked classifiers that are suitable for our embedded implementation and that are also straightforward to optimize 97468 VOLUME 8, 2020 without requiring many hyperparameters so that we can keep the focus on feature extraction (not the selection/optimization of an advanced classifier). The default settings were generally preferred to avoid over-fitting. For KNN , we used K = 1, for linear SVM we used the default value for the regularization parameter (C = 1). For RBF − SVM , we used automatic scaling for the gamma value (gamma is inversely proportional to the RBF radius). For MLP, we used a single hidden layer with 100 neurons with ReLU activation function with a learning rate of 0.001 and momentum = 0.9. For random forests, we used 100 trees and the percentage of features to consider for the best split was set to 50%.
Snipping recordings of various lengths and applying feature extraction, we created the training and test sets for the machine learning classifiers. We created two datasets, one with shorter snips of 1 to 1.5 seconds in length (chosen randomly in that range to mimic variations in a typical finger swipe/touch) and one with longer snips of 2 to 3 seconds. We have repeated these experiments 30 times. Each training set contains 120 example snips (10 per class) and the test sets contain 480 examples (40 examples per class). To extract features from the snips, we apply and compare the following three feature sets and use them as input to the aforementioned classifiers. For the implementation of the classifiers on PC for offline analysis, including optimization/validation of the classifier hyperparameters, we used scikit-learn Python module for machine learning [42]. VOLUME 8, 2020  • The Fourier transform (FFT) features. The frequency spectrum of the snips are computed by FFT and fed as the feature vector to the classifiers. In the figures, for the short and long recordings, FFT features are referred to as FFT 1 (using one-second windows) and FFT 2 (using two-second windows), respectively.
• The third set of features is a tiny subset of our proposed CMB features, hence we named it Tiny-CMB (TCMB).
Here, we use CMB only with the two most extreme α values, α = 1 that computes the total power of the signal i.e., F 1 [n] = 1 n i x[i] 2 , (as also mentioned in Eg. 9) and α = 0 that computes the variance of the signal From the power and the variance, the square of the average of the signal window can also be computed, x 2 = F 1 − F 0 , which makes TCMB features interesting from the machine learning perspective as using the mean and the standard deviation of classes play an important role in inference in machine learning [39]. In the figures, we refer to the features as the short and long recordings as TCMB1 and TCMB2, respectively. Figure 5 compares the classification accuracies obtained with most classifier-feature-sensor combinations on the short and long recording datasets. The sensors include X, Y, and Z channels of the accelerometer and the sound (denoted by S in the figures). The figure shows that increasing the length of data recordings for feature extraction improves the classification accuracy in most cases for the CMB features. However, for the FFT features, longer windows have led to no improvement or even accuracy loss in some cases e.g., in Figure 6d when all channels of data are used. That small degradation might be due to the curse of dimensionality phenomenon [39]: Using longer recordings leads to very high dimensional feature vectors (frequency components) for FFT. However, for the proposed CMB features, using longer or shorter recordings does not change the number of features (which depends only on the number of α values). The results summarized in the figure also suggests that the proposed CMB feature extraction method offers accuracy levels comparable to those of FFT and achieves that without the memory/computation demands that FFT has. Tables 2 and 3 shows the details of classification results with the proposed CMB and FFT features in terms of the true positive and true negative rates for each class. Here, the true positive rate (TP in the tables) of a class refers to the proportion of the actual textures of that class that are correctly identified as such by the classifier; while the true negative rate (TN) measures the proportion of texture examples of other classes that are correctly predicted to be nonmembers of that class by the classifier. The tables are color-coded in a way that values closer to 100% are dark green; and as the values get lower, the color gets closer to dark red. Comparing the corresponding cells of these two tables, we see that the FFT features offer better TP/TN. This comes from the fact the FFT provides more powerful (full resolution view of the spectrum) features that lead to more accurate classifications. However, the FFT implementation was not feasible on our tiny embedded platform. On the other hand, the CMB features can provide comparable levels of TP/TN in almost all cases with its simple implementation on the embedded board.
In the next experiment, we studied the accuracy of the CMB-based classification with respect to the number of features (i.e. the number of α values that controls the number of cumulative bands). Figure 6 presents the results obtained in this set of experiments; the results on the accelerometer sensors are given in Figure 6a and the results on the sound signals collected by the microphone sensor are given in Figure 6b. We see that increasing the number of bands (features) help the sound data a lot more for achieving high classification accuracy. However, as the accelerometer carries information in a limited band of frequencies, its accuracy reaches its peak at a faster rate (using only a few α values). This observation means that the optimum number of bands in the CMB  feature extraction method indeed depends on the nature of the distribution of the useful information on the frequency bands. Embedded system designers may use this opportunity in their designs to minimize the number of bands needed by the feature extraction for the given application.
As the random forest classifier [41] can be expressed as an ensemble of nested if − then − else statements, it is a good candidate to be implemented on embedded systems with minimum data memory requirements and faster execution for inference after its training. In the next experiment, we have worked on optimizing the random forest classifier to better meet the the resource limitations of our Arduino-based implementation. We have used Arduino-UNO, one of the tiniest Arduino boards, in the implementation of the proposed CMB feature extraction based random forest classifier to show that the hardware requirements for extracting the features and the subsequent classification are minimal. We tried 10 random forest classifiers starting with a single decision tree and by doubling the number of trees up to 1024. The execution time (on both the embedded and PC implementation) and the classification accuracy of these random forest classifiers are reported in Table 4. Our experiment revealed that the Arduino system is not able to host more than 32 decision trees due to the lack of sufficient code/data memory. Nevertheless, even with a low number of trees in the random forest (such as 16 and 32, which are implementable on the host embedded system), a good classification accuracy is achieved. Compared to the implementation on our high-end PC with GPU, the embedded implementation needs about 20 times longer runtime for classification due to its limited computational power and lower clock frequency.

VI. CONCLUSIONS
In this study, we developed an intelligent embedded system equipped with vibrotactile sensors to populate a tactile dataset and to classify tactile signals as they are collected in real-time. For feature extraction, we also proposed a novel power spectral feature extraction method that we called CMB (cumulative multi-bandpower). The proposed feature extraction method CMB is based on the memory-less total power computation suggested by Parseval's theorem and the memory-less low-pass filtering achieved by the exponential smoothing approach to compute the band-powers of the input signals in cumulative frequency bands. Therefore, CMB works in the time domain and achieves real-time computations on the streaming tactile input signals.
For the classification of the textures, CMB features are fed to a random forest classifier implemented as an ensemble of (majority voting) rule-based decision trees. Although the combination of more descriptive Fourier transform and more powerful support vector machines achieved a higher accuracy for offline classification, its implementation for real-time online processing on the embedded board was not feasible due to data and code memory limitations. Nonetheless, our embedded implementation of the combination of the CMB features and the random forest classifier achieves comparable classification accuracy especially when multiple sensors are fused for classification.