Introduction
The issue of food authentication has recently attracted the attention of consumers because of religious or lifestyle reasons [1]–[4]. Especially for Muslims, food authentication regarding halal food is essential [5]. Pork is food that Muslims cannot eat (The Holy Quran, 1:173; 5:3; 6:145; 16:115). However, pork adulteration in beef has been discovered in the market [3], [6]. The practice of mixing beef with pork is sometimes done for economic reasons [7], [8]; the seller adulterates pork in beef because pork is cheaper than beef [9].
Recent research has discussed meat authentication using visual detection. The procedure includes DNA isolation from fresh meat samples, amplification of specific DNA sequences, and detection using lateral flow assays. This research can authenticate horse meat and pork meat with high selectivity and reproducibility values. However, this process still takes quite a long time, namely, 25-30 minutes [10]. Another recent study used lateral flow sensing (LFS) and polymerase chain reaction (PCR) for the rapid visual detection of adulterated meat [11]. The samples used in this study were the adulterated beef samples prepared by mixing with duck meat in a series of proportions of 0%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 5%, 10%, 50%, and 100%. This research took less than 2 hours to process. Various scientific methods have been developed to identify mixed meats, including gas chromatography (GC) and mass spectrometry (MS) [12], high-performance liquid chromatography (HPLC), nuclear magnetic resonance (NMR) spectroscopy [13], and Fourier transform infrared (FTIR) spectroscopy [14]. However, several things have to be considered when using these tools, such as cost, time, and experience [15], [16]. The price of GC-MS instrument is around USD 120,000 in 2017 [17], while the cost of testing a sample is about USD 50. In addition, the testing process of one sample can take about 1 to 2 days, depending on the complexity of the gases. Another consideration is the assistance of a person who has experience with operating the GC-MS instruments.
A solution is needed to meet these considerations using more practical and economical methods and instruments, and a faster analysis process with reliable results. This paper proposes the Optimized Electronic Nose System (OENS). An electronic nose (e-nose) is the main instrument in OENS. E-noses are devices with several advantages over other techniques for analyzing food smell, for example, the small amount of sample required, fast performance, simple usage, high sensitivity, and good correlation between the data from sensor analysis. The e-nose features offer five main categories of food analysis that can be used: monitoring, expiry checking, freshness evaluation, purity testing, and other food quality control investigations. Hence, the motivation for this study can be formulated as follows:
Several types of research have used an e-nose to identify pork adulteration in beef for food quality control. However, most of them were focused on the differentiation and classification of species of meat. Only a few researchers have tried to determine different gas contents, which can be used for halal authentication in food.
In the existing studies, e-nose systems have been developed for halal authentication. They show the potential of e-nose for halal authentication, even though their experiments were quietly limited without performing classification or regression tasks. For example, e-nose with PCA was used to differentiate pure lard, pure chicken fats, beef fats, mutton fats, and adulterated samples [18]. Moreover, e-nose with PCA was employed to discriminate four meat samples and three types of sausage [12]. Furthermore, another study attempted to perform binary classification to differentiate beef and pork using Naïve Bayes classifier [19].
According to these motivations, the main contribution of this study is to propose OENS for performing multiclass classification to differentiate seven mixtures of beef and pork. Therefore, this study makes the e-nose implementation for the practical application of halal authentication closer. In addition, e-nose produces signals that are sent to a computer for processing and analyzing. The proposed OENS can prevent the distortion of e-nose signal analysis by: (i) proper noise filtering, (ii) optimizing the sensor array, and (iii) optimizing the support vector machine (SVM) parameters.
The rest of this paper is organized as follows. Section 2 discusses previous works related to the topic of this study. Section 3 explains the details of OENS, including a specification of the materials and methods used in the experiment, such as the classification method and the discrete wavelet transform for signal processing. Section 4 describes the results of the experiment. Section 5 is the conclusion.
Related Works
E-nose can be used for food authentication and adulteration assessment, as summarized in TABLE 1. Research to detect meat adulteration using an electronic nose has developed and is being studied. The latest research can detect a mixture of minced mutton in pork [20]. The study made six mixed combinations, namely mixing minced pork at 0%, 20%, 40%, 60%, 80%, and 100% by weight with minced mutton. To build the predictive model, these studies using multiple linear regression (MLR), partial least square analysis (PLS), and backpropagation neural network (BPNN). The predictive R2 result for the six classes is 0.97.
An electronic nose to detect adulteration levels in tomato juices is discussed [21]. This research compared six previous methods with the most recent popular one, spectral clustering using three methods of evaluating the clustering performance, i.e., mutual information criteria (MI), precision, and rand index (RI) which give statistical significance result (alpha = 0.05), thus outperformed the other methods. Rodriguez [22] studied two food adulteration cases (a pure variety of green coffee beans and pure cayenne adulteration with bell pepper powder). This work aimed to report improvements achieved in the differentiation of aroma samples with minimal differences in odor pattern.
Moreover, wine traceability and authenticity can be used to prevent outlawed adulteration practices, such as (i) addition ethanol, coloring and flavoring compounds; (ii) diluting wine with water; and (iii) replacing with cheaper wine. Therefore, the combination between e-nose and multivariate statistical methods improved the traceability and the classification of grapes and wine (especially the varieties and the geographical origin of grape) [23]. E-nose was also succeeded in detecting adulteration of mutton, which led into developing a model capable of detecting and estimating the adulteration of minced mutton with pork [24]. The volatile compounds occurring in the samples were collected by utilizing MOS-based e-nose. Later, an optimal data matrix is obtained using feature extraction methods, PCA, loading analysis, and SLDA.
Most of the studies on using e-noses only distinguished between 2 products or more and did not consider possible noise contamination of the gas sensor signals from the e-nose. However, in certain conditions, noise can affect the raw signal by 20% [25]. The noise influences the classification performance. While being sent to the computer, the signals can be interrupted and mixed with unwanted signals, which creates noise [26]–[28]. These noises may interfere with the authenticity of the information, for example, caused by air that is contaminated by certain substances or smells. This noise should be removed to prevent the distortion of the analysis and the classification process. Several researchers have used the discrete wavelet transformation (DWT) to reduce noise in data signals [29]–[35]. However, these studies only focused on the use of the DWT method without involving the use of suitable parameters, such as mother wavelet and level decomposition, although these parameters could improve the performance based on the noise-filtered signal [36]–[38]. Apart from that, the number of sensors has also not been considered, even though using more sensors than necessary incurs extra costs. Based on the analytics, some of the sensors provide no significant information on the samples, hence the costs can be decreased by eliminating unnecessary sensors. Several works also perform sensor array optimization to reduce data dimensions, electrical consumption, production cost, computational and traffic overhead, etc [39]–[41]. For interested readers, recent development and challenges for e-nose signal processing are summarized here [42].
Materials and Method
A. Materials
In this study, an e-nose was built using nine MQ series gas sensors from Zhengzhou Winsen Electronics Technology Co., Ltd. The gas sensors were also used to detect different types of gases, as in our previous study [19]. The list of gas sensors is given in TABLE 2. These gas sensors were assembled to an Arduino microcontroller. For data communication, a universal serial bus (USB) interface was used to transfer the signals from the microcontroller to the computer. The gas sensors were placed in a sample chamber made of transparent glass. FIGURE 1 depicts the component of the e-nose system.
The samples used were ground beef and ground pork bought in fresh condition from the same store on the same date. In the experiment, samples of seven combinations of beef and pork were used. Both ground beef and pork were used in samples with a weight of 100 gr each with various compositions, which were divided into seven classes: the first and seventh classes were 100% beef and 100% pork, respectively. The second, third, fourth, fifth, and sixth classes contained 10%, 25%, 50%, 75%, and 90% of beef from a total sample of 100 grams, respectively. A scale was used to ensure that the weight of the mixture was appropriate. The compositions of the respective samples can be seen in TABLE 3. The following steps were used to collect the data samples:
the e-nose was turned on, and the sensors were warmed up for 15 minutes;
the sample was placed in the sample chamber with the gas sensors;
the processes of data retrieval and transfer to the computer using the USB interface took 15 to 20 minutes for each sample;
the sample chamber was cleaned using a flashing fan for 5 minutes after every sampling, so the next sampling was not affected by gas residue from the previous sampling.
As mentioned previously, the data were divided into seven classes, with 60 data for each class. Therefore, the total number of recorded data was 420. Each data had 10 digital outputs, i.e., S1, S2, S3, S4, S5, S6, S7, S8, S9 for temperature, and another S9 for humidity. In this paper, the digital output is called the raw signal. The data of all 7 classes are shown in TABLE 3. For interested readers, our dataset has also been uploaded here [43], [44].
B. Proposed Methods
After the dataset had been generated, the raw signals were analyzed through several steps, as shown in FIGURE 2. The first step is signal pre-processing, which cleans up the noise and produces output in the form of a reconstructed data signal. The next step is statistical parameter extraction, which utilizes the reconstructed data signal and extracts it to obtain the characteristics of the signal. The third step is the dimensional reduction, where the signal obtained is analyzed to select only the sensors that have the largest impact on pork adulteration detection. The final step is constructing the classification model from the 7 classes. The data obtained from the previous processes are divided into testing data (30%) and training data (70%) to be evaluated by the classification model. The data acquired from the e-nose are processed using a computer with scikit-learn by Python-based machine-learning software [45].
1) Signal Pre-Processing
Signal pre-processing is carried out to eliminate noise in the signals [46]. In this research, the noise was caused by the internal sensors, changes in ambient conditions such as humidity and temperature, and changes in electrical conditions such as voltage and current. The signals produced by an e-nose are usually non-stationary, where the statistical properties of the signal change with time [46], making the noise reduction process more complicated. This study used the discrete wavelet transform (DWT) and then compared several mother wavelets to determine the best-suited mother wavelet for noise filtering. This technique identifies the data from various aspects of signal analysis, trends, breakdown points, discontinuities, and similarities. The data produced by the e-nose are then divided into 7 classes. The first step is to look at the shape of the signals. In the second step, the type of wavelet, the so-called mother wavelet, is determined; this is indispensable because it is varied and is grouped based on the respective basic wavelet functions. The most popular types of mother wavelets in signal processing are Haar, dmey, coiflet, symlet, and Daubechies, all of which were compared in our experiment, with several decomposition levels. The discrete wavelet transform process for a given signal \begin{align*} dwt(m,n)=&\left \langle{ {x(t),w_{m,n} (t)} }\right \rangle \\=&\frac {1}{\sqrt {2^{m}}}\int _{-\infty }^\infty {x(t)\omega \times \left ({{\frac {t-n2^{m}}{2^{m}}} }\right)} dt\tag{1}\end{align*}
\begin{equation*} T(a,b)=\frac {1}{\sqrt {a}}\int _{-\infty }^{+\infty } {x(t)\omega \times \left ({{\frac {t-b}{a}} }\right)dt}\tag{2}\end{equation*}
\begin{equation*} \omega _{m,n} (t)=\frac {1}{\sqrt {a_{0}^{m}}}\omega \left ({{\frac {t-nb_{0} a_{0}^{m}}{a_{0}^{m}}} }\right)\tag{3}\end{equation*}
\begin{equation*} \omega _{m,n} (t)=2^{\frac {-m}{2}}\omega (2^{-m}t-n)\tag{4}\end{equation*}
\begin{equation*} T_{m,n} =\int _{=\infty }^\infty {x(t)\omega _{m,n}} (t)dt\tag{5}\end{equation*}
2) Statistical Parameter Extraction
In this step, parameter extraction is performed to extract the most relevant and informative values to represent the characteristics of the overall sensor response. The pre-processing values of sensor responses are averaged to get a single value [48]. In this research, several statistical parameter extraction methods were carried out (e.g., standard deviation (ST), mean (M), kurtosis (K), and skewness (SK). This study also made several combinations of the main parameter extraction methods such as mean combined with standard deviation (M + ST), mean with skewness (M + SK), mean with kurtosis (M + K), mean with standard deviation and skewness (M + ST. + SK), mean with standard deviation and kurtosis (M + ST + K), and mean with all major parameter extractions (M + ST + SK + K). Statistic parameter extraction using M parameter, the average of the signals to be reconstructed is represented by \begin{equation*} \overline {y(t)} =\frac {\sum {y(t)}}{N}\tag{6}\end{equation*}
\begin{equation*} \sigma =\sqrt {\frac {1}{N}\sum \limits _{i=1}^{N} {(x_{i} -\overline {y(t)})^{2}}}\tag{7}\end{equation*}
\begin{equation*} \alpha ^{3}=\frac {1}{N\sigma ^{3}}\sum \limits _{i=1}^{N} {(x_{i} -\overline {y(t)})^{3}}\tag{8}\end{equation*}
3) Dimensional Reduction
The features generated can be spread across multiple dimensions; for this reason, dimension reduction is used to eliminate variables that do not have a significant role in detecting pork adulteration. Principal component analysis (PCA) is the dimensional reduction method that was used in this research. The eigenvector is used to consider the relationship between the variables. From the experimental results, the digital outputs are considered as PCA variables. The steps to perform principal component analysis are as follows:
calculate the covariance (Cov) using Equation 9, where
is the signal andx is the class target from the signal.y \begin{equation*} Cov(x,y)=\frac {\sum {xy}}{n}-(\overline x)(\overline y)\tag{9}\end{equation*} View Source\begin{equation*} Cov(x,y)=\frac {\sum {xy}}{n}-(\overline x)(\overline y)\tag{9}\end{equation*}
calculate the eigenvalue using Equation 10.
where\begin{equation*} (A-\lambda I)=(0)\tag{10}\end{equation*} View Source\begin{equation*} (A-\lambda I)=(0)\tag{10}\end{equation*}
are square matrices of size n x n, scalar numbers, and identities, respectively.A,\lambda,I calculate the eigenvector using Equation 11.
\begin{equation*} [A-\lambda I][X]=[{0}]\tag{11}\end{equation*} View Source\begin{equation*} [A-\lambda I][X]=[{0}]\tag{11}\end{equation*}
determine the new variable (component) by multiplying the natural variable with the eigenvector.
\begin{equation*} \rho I=\frac {\lambda _{i}}{\sum \limits _{j=1}^{D} {\lambda _{i}}}\times 100\%\tag{12}\end{equation*} View Source\begin{equation*} \rho I=\frac {\lambda _{i}}{\sum \limits _{j=1}^{D} {\lambda _{i}}}\times 100\%\tag{12}\end{equation*}
4) Optimizing the Classification Parameters
Classification is a process of dividing the variables into classes. The division of the classes should match the real condition, i.e., if the meat sample is beef, then the sample should be classified into the beef class by OENS. In this research, OENS used the optimized support vector machine (SVM) as the classification method since this method is capable of learning the data and generating the classification classes by itself [50]. SVM is based on the use of a hyperplane that separates objects based on different classes. SVM has two main parameters, which are C and gamma (
C is regularization parameter in the SVM algorithm. It trades off maximization of decision margin against correct classification of training data to prevent overfitting. In addition, gamma parameter is a part of kernelized SVM using radial basis function (RBF). It refers to the influence of a single training data. These parameters can increase the accuracy as well as the performance of the algorithm.
Unfortunately, there are no exact parameter values for use in the classification process. Several researchers have tried several different value combinations for the parameters, but it takes a long time to execute this process [52]. Hence, this research developed an algorithm to find the best parameters, which can be seen in Algorithm 1. The values were determined based on an experiment with the value of C, ranging from 0.01 to 1000 and
Algorithm 1 Optimized Parameters of SVM
c_param = [0.001, 0.01, 0.1, 1, 10, 100,200,1000]
gamma_param = [0.001, 0.01, 0.1, 1, 10, 100]
for c in c_param:
for g in gamma_param:
for training, testing dataset:
model = svm_train(training, c, g)
score = svm_predict(test, model)
cv_list.insert(score)
scores_list.insert(mean(cv_list),c,g)
print max(scores_list)
Results and Discussion
A sensor test was done to find out the response of the e-nose when executing sample testing [21]. The response generated by the e-nose sensors can be seen in FIGURE 3. Each class is indicated in different colors. Classes 1, 2, 3, 4, 5, 6, and 7 are shown in blue, green, red, cyan, magenta, yellow and black, respectively. The sensor response can be seen for each sensor. The different combination of beef and pork leads to different response of gas sensor. It is influenced by the gas emitted from a meat sample. The different compositions of protein and lipid can produce different gas. The different drawing order of different classes indicates the different response values of each gas sensor. It can be good sign of capability to detect beef adulteration. For example, FIGURE 3(a) is a graph of the signals generated by Sensor 1 for the 7 classes. In total 420 signals were recorded, which were stacked against each class. These stacks would be difficult to identify through the images. For example, the grouping will be incorrect when the data from Class 1 are close to those of Class 2. There was also some interference in each signal caused by noise, as can be seen in FIGURE 3(b), 3(c), and 3(h). The severe noise can be found in sensor 8.
This sensor has selectivity to detect toluene, acetone, and ethanol. The volatility of the three compounds can cause the unstable responses. Furthermore, the raw signal has to be optimized by OENS to ensure that the result is appropriate.
A. Results of Proper Noise Filtering
This research used the discrete wavelet transform for noise reduction, using cross-validation to find the best parameter through mother wavelet and level decomposition. TABLE 4 shows that the db6 wavelet was compatible with the aims of this research based on a comparison with the mother wavelet. The result from 20 experimental runs was level 1 of decomposition; db6 gave a satisfactory result. Furthermore, this research also calculated the accuracy of the raw data signal. The result was 87.61%, which means that the accuracy was increased by 1% by employing proper noise filtering using
DWT with wavelet db6. The preprocessing result is shown in FIGURE 4. The signal looks smoother and the noise is lowered or smoothed. As depicted in FIGURE 3(h), the original signal shows significant noise; it has been reduced after finishing signal reconstruction by DWT with db6, as can be seen in FIGURE 4(h). After the signal was reconstructed, the signal results were extracted by statistical parameter extraction. This research has made 10 combinations of statistical parameter extraction. These statistical parameters will be used as features, as has been done in previous research [53]. Dimensional reduction is used in this study to see which features or variables affect the detection of the mixture of pigs in beef.
B. Optimized Sensor Array
The dimensional reduction is used in this study for dimensional reduction; other than that, it is used as an optimization sensor array. From these experiments, the gas sensor produces ten digital outputs considered as variables in PCA. However, before entering PCA, 10 digital outputs were extracted using several parameter statistical methods. In this manuscript, an example is presented using the Mean (M) as the statistical parameter extraction. Because the extraction parameter is only one, the resulting feature is only 10 features. These ten features will be used as input into the PCA formula.
This research tried to reduce the number of variables. The first step is to calculate the covariance to reduce the number of dimensions or components. TABLE 5 shows the calculation of the eigenvalue, proportion of variance, and cumulative variance that contributes to each component. The next step is choosing the principal component (PC) that will be used. If a cumulative variance of 50% does not give significant accumulation, then a cumulative variance of more than 50% is the best option to get a significant result. From the result, this research used PC 1, which showed 57% of recent variation. For PC 2, it was 75%, for PC 3 it was 87%, for PC 4 it was 92%, for PC 5 it was 96%, for PC 6 it was 98%, for PC 7 it was 99%, and PC 8 it was 100%. PC 9 and PC 10 were not selected because they did not show a significant contribution. The proportion of variance is the percentage after the eigenvalue was generated, 8 components had a substantial contribution (PC1, PC2, PC3, PC4, PC5, PC6, PC7, and PC8). The next step was calculating the eigenvector, as shown in TABLE 6. The eigenvector was calculated for each gas sensor based on the PC that was obtained previously and sorted from the largest to the smallest. Based on the results of PCA calculations, the data from e-nose to detect the adulteration of pork in beef was using 8 most dominant components based on 8 variables provided. These eight components had a fairly big correlation with a proportion of variance of 100%, namely the highest and most dominant factor, MQ 135 factor, with a proportion of variance of 57%, the MQ 4 factor, with a proportion of variance of 19%, and the MQ 9 factor, with a proportion of variance of 12%. The total variance obtained from the 8 variables was 100%.
Besides that, from the eight components that have been selected, this study determines which n_component sensor has the most dominant factor. TABLE 7 shows that in the first component, the dominant factor is S5 or MQ 135. The most significant factor in all components is S1 or MQ 2, which is in component 8. TABLE 8 shows the dimensional reduction results of the ten statistical parameter extraction combinations. Some components from the results of several feature extraction methods can be reduced, such as using the M parameter statistical method. It can reduce the dimensions from 10 to 8 components using the SVM classifier. The M + ST parameter statistical method can reduce the dimensions to 15 from 20. While the M parameter statistical method + SK using four classifiers does not reduce dimensions, 20 components are still being used. The statistical parameter method that produces the most features is M + ST + SK + K with 40 features, which can be reduced using the ANN classifier. FIGURE 5 shows the data after dimensional reduction using PCA. FIGURE 5 a and b denote the data before and after feature scaling using Standard Scaler (Z-score) normalization, respectively. The standardization was used for collecting the distributed data. It can be inferred from FIGURE 5 that the data from the first class became more clustered compared to the other classes.
Plot diagram of the dimensional reduction result using PCA: (a) the data before normalization; (b) the data after normalization using Standard Scaler.
C. Optimized Support Vector Machine (SVM) Parameters
The algorithm to find optimal SVM parameters from the 420 data required 16 seconds of execution time. The data is divided into two, namely, training data and testing data using cross-validation. This study compared three cross-validation types to get fair results, namely 3-fold, 5-fold, and 10-fold. The optimal values found for parameters C and
Meanwhile, for Class 2, 59 data were predicted correctly, and 1 data was predicted incorrectly; for Class 3, 58 data were predicted correctly, and 2 data were predicted incorrectly; 4 data were predicted incorrectly for Class 4, and 1 data was predicted incorrectly for Class 7, and 3 data were predicted incorrectly for Class 3. Lastly, for Class 5, 59 data were predicted correctly and 1 data was predicted incorrectly. In addition, TABLE 10 denotes the results of evaluation SVM with optimal parameters.
Furthermore, this research also compared several classification methods, i.e., artificial neural network (ANN) [54], linear discriminant analysis (LDA), K-nearest neighbors (KNN), and SVM, without using the parameter optimization algorithm 89%, 54%, 87%, and 91%, respectively. SVM with parameters optimization algorithms, which are C and
Conclusion
In this study, an OENS was developed, employing 9 gas sensors and producing 10 digital outputs. The noise of the signals was reduced by reconstructing the signals using DWT with mother wavelet db6, which could increase classification accuracy by 1%. By using mean as the statistical parameter method, generates 10 features and is spread into 10 dimensions. PCA successfully reduced the number of components/dimensions from 10 to 8 components. These 8 components had a fairly big correlation with a proportion of variance of 100%, namely the highest and most dominant factor, MQ 135 factor, with a proportion of variance of 57%, the MQ 4 factor, with a proportion of variance of 19%, and the MQ 9 factor, with a proportion of variance of 12%. The total variance obtained from the 8 variables was 100%. Thus, the optimization algorithm supported the efficiency of the SVM classification process in obtaining the best solution, which was 98.10% on average. This result indicates that OENS is potentially developed for halal authentication and brings closer to practical applications.
For future work, the fingerprint of pork adulteration in smaller portions of beef will be developed.