IoT Based Railway Track Faults Detection and Localization Using Acoustic Analysis

Rail is one of the most energy efficient and economical modes of transportation. Regular railway track health inspection is an essential part of a robust and secure train operation. Delayed investigations and problem discoveries pose a serious risk to the safe functioning of rail transportation. The traditional method of manually examining the rail track using a railway cart is both inefficient and susceptible to mistakes and biasness. It is imperative to automate inspection in order to avert catastrophes and save countless lives, particularly in zones where train accidents are numerous. This research develops an Internet of Things (IoT)-based autonomous railway track fault detection scheme to enhance the existing railway cart system to address the aforementioned issues. In addition to data collection on Pakistani railway lines, this work contributes significantly to railway track fault identification and classification based on acoustic analysis, as well as fault localization. Based on their frequency of occurrences, six types of track faults were first targeted: wheel burnt, loose nuts and bolts, crash sleeper, creep, low joint, and point and crossing. Support vector machines, logistic regression, random forest, extra tree classifier, decision tree classifier, multilayer perceptron and ensemble with hard and soft voting were among the machine learning methods used. The results indicate that acoustic data can successfully assist in discriminating track defects and localizing these defects in real time. The results show that MLP achieved the best results, with an accuracy of 98.4 percent.


I. INTRODUCTION
Railways are a country's lifeline, particularly in developing nations, serving the public's transportation requirements as well as being the backbone of trade and supply lines. The railway market has strengthened over time, providing better opportunities for the public and the country's economy. Rail is one of the most energy efficient modes of transportation, accounting for 8% and 9% of global passenger and freight transit respectively, while consuming only 3% of total transportation energy [1]. Rail uses 12 times less energy and produces 711 times fewer Greenhouse Gases (GHGs) per passenger kilometer travelled than private automobiles and The associate editor coordinating the review of this manuscript and approving it for publication was Tawfik Al-Hadhrami . airlines, making it the most efficient means of motorized passenger transportation. Aside from shipping, freight rail is the most energy efficient and low carbon mode of transportation [1]. However, high performance railway operations must be provided to ensure the continuous running of railway trains and the safety of passengers.
The general public, commuters, and tourists all travel by train, and their safety is compromised if railway tracks are unfit for day-to-day operations. Similarly, freight safety and dependability are critical components of the supply chain, necessitating fault-free and fault tolerant railway tracks. Because mechanical and physical wear and tear can develop over time, regular inspections are essential to reduce train derailing incidents. Rail freight traffic increased internationally between 2018 and 2019, with Europe and Turkey handling around 3.1 trillion-ton kilometers in 2019, ranking slightly lower than Asia/Oceania/Middle East, which handled over 3.5 trillion-ton kilometers of freight by rail in the same year [2]. Each year, China, and India service approximately 773 and 770 billion passenger kilometers, respectively. Russia (175.8 billion), France (88.3 billion), Germany (77 billion), Ukraine (53.1 billion), and the United Kingdom (51.8 billion), are among the other nations with significant rail passenger traffic [3]. Pakistan is also a country where many people prefer traveling by rail, with an anticipated 70 million people reported to travel by train between 2018 and 2019 [4]. Pakistan railways (PR) has earned 48.652 billion Pakistani Rupee (PKR) from its operation between 2020 and 2021 [5]. Although railway is a well-known mode of transportation throughout the country, the sad reality is that it does not nearly match global standard requirements. Cracks, creep, loose fittings, crash sleeper, ballast, discontinuity, missing nuts and bolts, and wheel burnt are some of the key issues. A lack of regular visible maintenance, preemptive inspections, delayed problem detection, all generate severe concerns about the security of rail transit operations in Pakistan. As a result, numerous and severe incidents have occurred in recent years, resulting in significant human and financial damage. According to PR's yearly reports [6], 127 incidents occurred between 2013 and 2020 due to derailing and track defects.
Railway tracks require regular and adequate maintenance and if neglected, has a significant influence on the railway network [7]. To mitigate potential negative effects, the viability of a low-cost automated conventional cart system capable of monitoring the health of the railway track must be developed and evaluated in order to aid the regularly required, efficient and accurate diagnosis, track repairs and to minimize the possibility of accidents. Railway track condition monitoring, where railway tracks are regularly monitored to locate and fix faults, is critical for the ongoing running of railway traffic with a greater degree of safety and dependability. However, monitoring hundreds of thousands of miles of railway track necessitates a significant investment in both money and labor making it unlikely. Additionally, human examination is prone to mistakes, and manual inspection is time consuming and can be biased. In Pakistan, a railway cart is currently utilized for track inspection, with human specialists manually inspecting the track and determining where repairs are required. An automated railway track fault detection system would reduce human error, provide greater inspection ranges and accuracy and reduce overall labor costs. The Internet of Things (IoT) has changed the way we interact with our environment. Smart cities, smart homes, pollution management, energy conservation, smart transportation, and smart industries are applied examples of IoT driven developments. IoT is the also used for data acquisition and telemonitoring in real time. This study presents and proposes an IoT based smart automated cost-effective track conditions inspection approach. Common track faults such as low joint, wheel burnt, creep, crash sleeper, loose nuts and bolts, and point and crossing are investigated with results presented in this study.
The rest of the paper is organized as follows. Section 2 provides a summary of other studies on locating such faults in railway tracks. Section 3 presents the data gathering techniques, data collection device, and proposed study approach. Section 4 provides the results and discussions, while Section 5 has the conclusion.

II. LITERATURE REVIEW
The key motive for inspecting railway lines is for predictive maintenance, problem identification and to ultimately minimize the possibility of train accidents. Periodic and frequent railway line examination is critical. Human inspection of hundreds of thousands of miles of track is time-consuming, laborintensive, and susceptible to human error. Due to human error, manually driven systems are insufficient to monitor the health of tracks routinely, reliably, frequently, and universally; thus, automatic identification and monitoring of track faults/cracks is vital. As a result, several automated systems have been developed to reduce efforts and boost the efficiency. Nondestructive evaluation (NDE) techniques such as electromagnetic approaches (Eddy current testing [8], magnetic flux leakage (MFL) testing [9], guided wave-based systems (ultrasonic testing [9], [10], guided wave detection [11]), vision based systems, IoT based system and acoustic based systems have been employed for rail track inspection. More information on the tools and procedures used for rail track inspection is provided in [8] and [11]. The literature is categorized by electromagnetic, guided, computer vision. IoT and acoustic based approaches below.

A. ELECTROMAGNETIC APPROACHES
The concept of a train-based differential eddy current (EC) sensor system for fastener detection was presented in [12]. The sensor operates via electromagnetic induction, in which an alternating-current carrying coil generates an EC on the rail and other electrically conductive material in the area, and a pick-up coil measures the returning field. The results of both field measurements and lab testing show that the suggested approach can detect an individual fastening system from a height of 65mm above the rail. A time domain feature of the measurement signal was also used to detect missing clamps within a fastening system.
The performance of a machine learning method to identify and analyze missing clamps within a fastening system, as evaluated by a train-based differential eddy cur-rent sensor, was examined in [13]. This study investigated six classification algorithms, with KNN being the highest performing model achieving precision and recall of 96.64 % and 95.52 %, respectively.
A typical excitation coil (EC) sensor to simulate rail crack detection presented by [14] and [15]. The alternating current (AC) bridge was included into the EC system by [16] to balance the large baseline signal. The sensor comprised an excitation coil and two detection coils combined to produce a three-winding transformer. In [17] the authors employed a differential pulse ECT sensor with an excitation coil and VOLUME 10, 2022 two detection coils to measure the plate thickness of various materials. An excitation coil and two hall sensors were used in another differential ECT probe [17]. The detected pulse's characteristics, peak value, and time to zero were extracted for thickness description.
A sensitive magnetic induction head-based magnetic flux leakage (MFL) technique was developed [18], [19]. To measure the change in magnetic flux detected in a magnetic core with an open gap, an induction coil was connected. The maximum sensitivity, however, was attained when the distance was roughly equal to the fracture width. It is very dependent on the orientation of the sensor.
An MFL based multi-sensor technique, with a primary sensor and four auxiliary sensors positioned in the detecting direction was presented [20]. First, the root mean square (RMS) of the primary sensor signal's x-component was determined. The relative values of the sensors signal indicated faults in the data set greater than the threshold. The appropriate distances between these sensors were determined based on the magnitude of a flaw and the lift-off [20]. Finite element modelling and practical investigations demonstrate that this technology successfully suppresses vibration interference and improves flaw identification accuracy.
A technique for detecting the components perpendicular to the steel surface using a sensor probe consisting of a semi-circular yoke with induction coils at each end and a gradiometer with two anisotropic magnetic resistance sensors was proposed by [21]. In [22] the authors suggested a quantitative technique based on the Pulsed MFL method to investigate the effect of sensor lift-off on magnetic field distribution, which impacts the detection capabilities of various damages. The approach employs a ferromagnetic one to direct additional magnetic flux to seep out. A ferrite is added to a LMF sensor to minimize the reluctance to raise the magnetic strength above the faults in order to detect minor imperfections in the rail surface [23]. A magnetic sensor prototype was built utilizing the best parameters determined by numerical parametric research [24].

B. GUIDED WAVE SYSTEMS
A non-destructive defectoscopic approach, or more precisely an ultrasonic test, conducted using the DIO 562 instrument, which also incorporated measurement data processing was proposed in [25]. During an ultrasonic examination, the equipment replicated the form of the rails. The measurement was evaluated using the PC and the specialist programme DIO 2000.
In [26], the authors, introduced a contact-free rail diagnosis method based on ultrasound. The non-ablative laser sources were used to generate waves. Echo reception was accomplished using rotational laser vibrometry that measured angular velocity, elastic deformation, and rail angular displacement. The detection of rail defects was tracked using unique ultrasonic wave signal-based markers.
To achieve visual identification of the oblique fracture on the railhead surface, a quantitative detection approach integrating non-contact laser ultrasonic testing technology and variational mode decomposition (VMD) was presented in [27]. All scanning signals were preliminarily filtered using Wigner time-frequency distribution and fir1 filtering. VMD was also used to divide the signal into several intrinsic mode functions (IMF). The ideal IMF component was chosen based on the correlation coefficient (C) and SNR characteristics between different IMF components and the original signal. Finally, the time-domain and temporal features of signals were used to realize visual crack-induced surface wave energy using ultrasonic propagation pictures.

C. COMPUTER VISION BASED SYSTEMS
Computer vision-based track detection is gaining greater attention. Drones, rather than a moving cart, might enable cost-effective track inspection. An innovative method for calculating gauge measurement using drone footage was proposed by [28]. Track health was evaluated using computer vision algorithms from drone data. For data collection, a Da-Jiang Innovations (DJI) Phantom 3, equipped with a 4k camera and Sony sensors was employed. The images were transformed into hue, saturation, and value (HSV) color space to lessen the impact of changing weather conditions on lighting, and then a Gaussian smoothing filter was applied to reduce noise. Because railway tracks have a purple/pinkish hue, all colors between cyan and magenta were separated using various threshold masks to achieve track recognition. Morphological techniques were employed to delete any linked pixels below a certain threshold value, and then a Canny edge detector was utilized to achieve precise results.
The study presented by [29] used a camera taking images at 30 frames per second, to conduct a computer vision experiment. It was placed on a locomotive with the aim to provide a continuous steady image for real-time railway track fault identification. On the Image net dataset, the Inception V3 model was used to tune for binary class classification. The model generalized effectively on actual vegetation pictures for vegetation overgrowth. A sun kink classifier had a 97.5 % accuracy in classifying professionally produced sun kink videos. The study [30] proposed a visual based track inspection system (VTIS) system employing TrackNet, a multiphase deep learning-based rail surface anomaly detection and classification approach.
A vision-based system for track inspection and defect identification was presented by [31]. A Gabor filter was used to breakdown the input picture, and texture characteristics retrieved using segmentation-based fractal texture examination. The track defects were classified using the AdaBoost classifier. The study by [32] proposed a vision-based autonomous rail inspection system employing the structured topic model (STM) to detect the presence (or absence) of sleepers or fasteners by evaluating real-time pictures captured by a digital camera, positioned beneath a diagnostic train. Similarly, [33] presented a railway track derailment monitoring system for automated visual inspection of railroad tracks that detects flaws using prerecorded videos. The scope of [33] was limited to the localization of rail problems, ballast, tie and tie plate, and spikes, tie plate holes, and anchors.
Deep convolutional neural network (DCNN) based cascade learning embedded vision inspection technique for rail fastener detection was presented in [34]. The two phases of the proposed technique were region location and defect detection. Initially, a modified Single Shot multibox Detector (SSD) model was used to identify the fastener locations within the collected railway images. To identify faulty fasteners, a key component identification approach based on an enhanced Faster Region Convolutional Neural Network (RCNN) was used. Extensive trials were carried out to show the effectiveness of the suggested method. The results of the experiments indicated that the suggested technique achieved an average precision of 95.38% and an average recall of 98.62%.
A vision-based rail track inspection system was presented in [35]. Yolo v3 was implemented and trained as the deep learning model, and subsequently, the accuracy and recall rates of damaged fasteners on the test dataset were validated. In the study, a GoPro motion camera, mounted on rail maintenance vehicle was used to collect and record a total of 20 kilometers of track fastener images. The accuracy and recall rates for faulty fasteners detection were 89% and 95%, respectively.
Using image processing techniques such as Canny edge detection and 2D discrete wavelet transformation, the research presented in [36] enabled real-time identification of railway track faults. Due to its unique threshold amplitude, the Canny edge detection used can recognize squats in realtime, utilizing a camera module installed on a specially constructed handheld Track Recording Vehicle (TRV). Applying a high sub-band frequency filter, the 2D discrete wavelet transformation validated the inference of the Canny edge detector regarding track damage and determined damage severity. Using an OpenCV API, the complete technique was implemented on a Raspberry Pi 3 B+. When tested on a real train track, the algorithm's ability to detect track surface defects in real-time proved reliable. In terms of detecting the degree of track surface degradation, wavelet transformation outperforms Canny edge detection, but its processing overheads becomes a bottleneck in real-time.

D. IoT BASED SYSTEMS
An autonomous robot based on a PIC microprocessor and obstacle sensors was presented by [37]. The vehicle was equipped with a GPS module to track the location of a crack and deliver a Short Message Service (SMS) notifications through a Global System for Mobile communication (GSM) module.
A real-time railway fishplate monitoring system based on IoT was presented by [38]. The proposed system monitored the position of each bolt on each fishplate and notified the central railway monitoring center, neighboring stations, and incoming train drivers if any bolt became loose.
A robot mechanism prototype demonstrated in [39] was capable of detecting rail surface defects such as cracks, squats, corrugations, and rust. To diagnose problems, the system employed ultrasonic sensor inputs in conjunction with image processing, utilizing OpenCV and deep learning techniques. Each robot was locally powered by a Raspberrypi 3 microcontroller used to communicate real-time data to an internet server. Four ultrasonic sensors were mounted above and on each side of a railway track surface to identify problems.
The study by [40] presented an automatic fault detection system, comprising several sensor modules, embedded on a moving robot. An infrared (IR) sensor, a limit switch [41], and ultrasonic sensors were included within the sensor layer, controlled by an LPC 1768 ARM microcontroller. If faults were detected, the location and type of fault were reported to the control room through the GSM module.
In study [42], an ultrasonic metal detecting sensor was used to identify cracks with greater precision. For crack detection, encoders and radio frequency transmitters were employed, with a constant flow of current between the encoders indicating that tracks were fault free. The transmitter would emit RF signals as long as the current remained constant. If there was a crack in the track, the current flow between the encoders would suffer an interruption. This, in turn, inhibits the transmitter from emitting RF signals, resulting in no signal being received by the locomotive's receiver, thus leading the microcontroller to bring the train to a halt.
A track recording vehicle (TRV) with an innovative design based on axle-based acceleration approach for rail track defect diagnosis was proposed in [43]. The system was reported to be 87% more effective than the conventional push trolley-based TRV system, according to site-specific testing. The authors of [44] proposed a unique automated system based on robotics and visual inspection, enabling local image processing while inspecting, a cloud storage of information consisting simply of photos of defective railway tracks, and robot localization within a range of 3-6 inches. The technology employed ML and applied it to the photos received from the tracks to determine potential faults. The areas were then identified, and a dedicated operator with directed spots/areas to examine could carry out a thorough examination.
Researchers of [45], [46], and [47] presented an IoT-based prototype vehicle for railway track crack monitoring. Ultrasonic sensors and infrared sensors were used to detect cracks and obstacles in rail tracks respectively. Whenever a crack was detected in the rail track, the vehicle stopped automatically, and an alert message was sent to the authorities via GSM module with the location determined by the GPS sensor. System presented by [46] and [47] employed a solar cell to charge the battery running the system and the vehicle.

E. ACOUSTIC BASED SYSTEMS
The authors of [48] proposed an acoustic analysis-based system for fault identification and diagnostics. Dataset collection VOLUME 10, 2022 was carried out via a SAMSAM TM NS-AM type rail point system, mounted with audio sensors. This study investigated faults such ice blockage, ballast obstruction, and slackened nuts. MFCC features were used to conduct two tests: one for fault detection and one for fault classification on the entire dataset, achieving an accuracy of 94.1%. An autonomous railway track fault detection system which could identify three types of faults: normal track, wheel burnt, and superelevation using acoustic analysis was presented in [49]. MFCC features were extracted from the audios of faults and fed into support vector machine (SVM), logistic regression (LR), multilayer perceptron (MLP), convolutional neural network (CNN), decision tree (DT), and random forest (RF) models for classification. In terms of accuracy, the DT and RF models outperform the others. Both algorithms showed 97% accuracy in detecting the aforementioned faults. A railway track inspection system combining standard acoustic methods with deep learning models to improve performance was presented by [50]. The system employed two CNN models, convolutional 1D and convolutional 2D, as well as one recurrent neural network (RNN) model, and a long shortterm memory (LSTM). Furthermore, the model reported 94.9%, 96.5%, and 93.3% accuracy on Conv1D, Conv2D, and LSTM, respectively.
Using acoustic emission (AE) monitoring data and knowledge transferred from an acoustic-related database, study presented in [51] described a unique transfer learning method for assessing the structural states of rail tracks. In particular, the proposed CNN model (NA-AE) transferred lower-layer knowledge from a pre-trained AudioSet model to extract the acoustic-specific features of the time-frequency spectrograms from over two months of acoustic emission (AE) monitoring data, collected from an in-service point rail; only the higher layers of the proposed model required training. Testing results indicates that the proposed model NA-AE performed well on the rail condition assessment task, based on AE data, with a high macro-F1 score of 97.5 percent and converge in 100s.
A nondestructive single-sensor AE method to detect and localize cracks in steel rail tracks under stress was presented in [52]. AE signals were recorded by the AE sensor and converted to digital data by the AE collection module. The digital data were denoised to eliminate ambient and wheel/rail contact sounds, and the denoised data were processed and categorized to pinpoint fractures in the steel rail using a deep learning algorithmic model. The computational model was trained and validated using AE signals of pencil lead breaks at the head, web, and foot of steel rail. The deep learning-based AE method was also implemented on-site in order to detect cracks in the steel rail, with an accuracy of 78%, 80%, and 74% in the rail head, web, and foot, respectively.
Several researchers have investigated ways to identify defects in railway tracks; [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24] used electromagnetic approaches and [25], [26], [27] used ultra-sonic waves to detect track faults. In circumstances where the surface of rail wear rail heads is severely damaged or substantially worn, ultrasonic waves are ineffective [53]. Ultrasonic waves cannot detect smaller defects [53], [54]. For electromagnetic detection, the velocity effect can change the signal's amplitude, and the signal is susceptible to interference from the surrounding environment. Consequently, a well-designed algorithm is required to counteract these effects. Unlike ultrasonic inspection, electromagnetic inspection can identify faults close to the surface [54]. [28], [29], [30], [31], [32], [34], [35], [36] proposed vision-based systems employing fault images acquired by several camera types, exhibiting good accuracy in controlled environments. Due to environmental conditions such as light variations, dirty lenses, and temperature, the performance of these systems is substantially reduced in real word applications. Thermographic cameras are expensive and mass role out is unwarranted [53]. Papers [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47] presented IoT-based systems for detecting defects in railway tracks. However, the deployment cost of sensors and equipment makes such systems costly. In addition, malfunctioning sensors demand accurate sensor replacement, further increasing system costs. Additionally, the maintenance of such systems requires the employment of specialized personnel. The theme presented in [48] and [49] used audio recordings of railway line faults, while [50] used the spectrogram of audio data obtained by [49]. Only three faults were detected in [48], with two faults detected in [48] and [49]. Papers [50], [51] used AE to detect cracks in railway tracks and although both achieved good accuracy, the sensor to record AE must be placed on the railway track and due to the complicated geometry of the rail section as well as the high cost of sensors to record AE, it is impossible to identify cracks without the accurate and manual sensor placement on railway line.
This study develops an autonomous IoT-based railway track fault detection system to improve the current railway cart system. This work aims to mitigate the challenges of labor, biasness, human intervention, and resource limitations. On Pakistan's operational railway tracks, the acoustic data of six frequently occurring defects were gathered. This study considers an increased number of faults compared to previous investigations and significantly contributes to the detection, classification, and localization of railway track faults based on acoustic analysis.

III. METHODOLOGY
This section describes the dataset acquisition technique and machine learning algorithms used for classification, along with the proposed methodology. All IoT systems have the following generic architecture, as shown in Figure 2. A framework capable of detecting, responding, and acting/reacting whenever it is exposed to a change or stimulus from a situation in which it is kept without the need for human intervention.
The framework presented in this manuscript is designed for a real case situation. The microphone, and GPS sensor  are all directly connected to the RPi, running under a Linuxbased operating system Raspbian. The RPI is a credit card sized, low-cost computer [55]. The microphone, and GPS are mounted on top of the RPi. The microphone records the acoustic signal caused by the friction of wheel and railway track. An acoustic stereo signal, with a sampling frequency of 44100Hz and GPS locations are recorded and sent every 5 seconds to a Cloud via WIFI network (IEEE 802.11n) using parallel computing via multiprocessing library that enables parallel and distributed computing in python [56] to save disk space. Due to the memory constraint, the RPi memory is organized in Round Robin fashion, where the acoustic signal of wheel track interaction, time stamp and its location are stored in local memory for short period of time and then overwritten by the latest data (acoustic signal, timestamp, and GPS location), once the previous signals are pushed to the cloud. In the case of an absence of internet or interruption in the internet connectivity, the acoustic signals are stored locally and all files are subsequently pushed with a time stamp and GPS location to the cloud when the internet is available.
For this work, the RPi model B+ with 10 GB of free memory from its 32 GB is used. With such capacity, this device can store up to 11627 audios of 5 seconds length, covering 16 hours. This means the RPi can store data locally for 16 hours in case of internet interruptions. Data analytics, to predict the unseen acoustic signal's fault type is performed on the cloud. The GPS module provides the location of the faulty track's acoustic signal patch. Each audio signal patch is 5 seconds and the average speed of the cart is 35kmph. With this arrangement, once a fault is detected, the fault localization lies 48.6m around the provided GPS location. The schematic diagram is shown in figure 1.

A. DATA COLLECTION
A mechanical cart shown in Figure 4 provided by the PR Walhar district Rahim Yar Khan administration was used for data collection. An onsite setup was established for dataset acquisition at the Walhar railway station. Two microphones were placed at the safest maximum distance (1.75 inches) from the point of VOLUME 10, 2022 contact between the wheel and track. Microphones were fitted to the cart's right and left sides to collect data. Figures 5 & 6 depict the arrangement of microphones affixed to the cart's left and right sides. The mechanical cart was propelled by a generator, which kept the cart engine running at an average speed of 35 kilometers per hour. Two ECM-X7BMP microphones shown in Figure 7. Unidirectional electric condensers with a 3-pole locking small plug, were set on the train cart's left and right wheels. These microphones have a sensitivity of -44.0±3 dB and an output impedance of 1.2 k ±30%. Foam or fur was used to shield the microphone diaphragm from air gusts as without a windshield, wind may cause loud pops in the audio transmission. Because the foam windshields are typically the first line of defense against wind noise, they were further utilized to diminish cart vibrations and prevent their transmission to the microphone. Data was collected at a sample frequency of 44.1kHz. Subsequently, the recordings were manually tagged in order to structure the dataset. The recorded audio was then split into 216 frames, with a window length of 1024 and a hop size of 512.

B. PROPOSED METHODOLOGY
The architecture of the classification of six types of railway tracks is represented in Figure 8. The collected audio data was utilized to detect faulty tracks. Acoustic features from the audio data were utilized to train the machine and deep learning algorithms. This study employed 40 Mel-frequency cepstral coefficients (MFCC) each audio frame.
Each audio frame contains 40 Mel-frequency cepstral coefficients (MFCC). Total frames in a 5 sec acoustic audio signal are 216 creating a matrix B with 216 rows and 40 columns. Taking the mean of each column of the matrix B yields a vector D of size 40 and stored in matrix 'A'.   This resulted in a matrix 'A' with 1625 rows and 40 columns, with 1625 rows representing the frames and 40 columns representing the MFCC values respectively. Each element in matrix A represents an MFCC coefficient value for a specific frame from a specific crack class. These features were used to train and test the employed machine learning algorithms.
MFCC is based on signal disintegration using filter bank. The MFCC provides a discrete cosine transform (DCT) of a real logarithm of short-term energy presented on the Mel frequency scale. Equation below expresses Mel's approximation from physical frequency.
where mel(f) denotes frequency in mels and f denotes frequency in hertz. The steps of MFFC execution are listed below [57]: • Reduce the length of the signal by framing it in short frames.
• Estimate the power spectrum period gramme for each frame; • Add the energy from each filter and apply the Mel-filter bank to the power spectra.
• Find the logarithm by adding all of the filter bank energies.
• Consider the DCT of the log filter bank energies. DCT coefficients 1-40 should be retained, while the remainder should be discarded. Figure 9 illustrates the process of obtaining MFCC features. Steps to Derive MFCC features [18].

C. EXPERIMENT SETUP
Railway track inspection professionals from Pakistan Railways detected and confirmed the faults on tested tracks. The cart was run on the defective tracks and audio signals were obtained for the following fault types shown in Table 1: Wheel burns, a typical example is shown here in Figure 10 are caused by a locomotive's driving wheel slipping on the Rail fastenings (Nut and Bolts) keeps rails linked to railway sleepers, giving an adequate slope of rail foot (1:20, 1:40) in the transverse plane, and prevents longitudinal movement of rail surface, causing the wheel to burn. When driving wheel burns is most evident on down steep inclines or during rain [58]. When the locomotive's pulling power is insufficient to support the weight of the train, wheel slip occurs, causing the rail temperature to rise and the rail surface to melt.   Figure 11 is defined as a longitudinal movement of rail in relation to a sleeper. Rail has a propensity to gradually shift in the direction of dominating traffic. Rail creep is common to all railway tracks, and its value ranges from almost nil in certain cases to about 130mm each month [59].

Rail creep as shown in
A sleeper is a weight-bearing component of the railway system that is installed transversely to support the rail. Sleepers are often known as ''Ties'' since they connect the rails [60]. Nowadays, the sleepers used are made of pre-stressed concrete and are commonly referred to as Pre-Stressed Concrete (PSC) sleepers. The sleeper provides the permanent route with longitudinal and lateral support. It ensures that rails are properly gauged and aligned. The sleeper evenly distributes the weight from the rails to the superior ballast surface and works as an elastic medium between the rails and the ballast to cushion the blows and vibrations of moving loads [30]. Among the reasons of such faults are inappropriate screwing and unscrewing of fasteners, inaccuracy during ballasting and ballast deficiency during maintenance, poor tamping, derailments, inappropriate sleeper spacing, and non-alignment of sleepers in track [60].
Bending cracks, sleeper break owing to derailment, cutting cracks, sleeper instability in fastening area, and sleeper damage on dry land are the primary flaws of this stage [60].  A crashed sleeper as shown in Figure 12 is dangerous for the flow of rail traffic.
steel rails [61]. Rails are firmly held in place in the rail seat by the fastening. It prevents the rail from rotating around the outside edges of the rail foot [61]. If either of these fastenings are missing or loose, the train is in danger. Figure 13 shows a loose fastening. In the railroad, a rail joint performs a jointing function (connection function). A high-quality railway joint may effectively reduce the effects of wheels passing through the jointing areas of the steel rail while also enhancing the stability and continuity of passing trains [62]. Joint railway bars, as one of the key parts in railway, are widely employed in light rail and heavy rail to provide railroad transit safety. Figure 14 shows low joint fault. Points and crossings are important components of railway track. They are used to move railway vehicles from one track to another that is either parallel to or diverges from the first. Wear on rails, corrugation, and rail corrosion are some of the types of distresses that occur at points and crossings [63]. Around 90% of severe railway accidents occur at or near    points and crossings across the world [63]. This is because points are the track's weakest link. As a result, significant care must be taken to keep them in excellent working condition. Figure 15 shows a point and crossing. VOLUME 10, 2022  The time domain and Mel spectrogram plots of all faults acoustic signals are shown in Table 2. There is a visual distinction between these track sounds. Table 2 displays the Mel-sound spectrogram's intensity distribution over several frequency ranges. The experimental dataset was collected on the mainline where traffic regularly flows, and because the assigned space only had these faults at the time, so experiment conducted on this line exclusively.

IV. RESULTS
This section focuses on the performance and outcomes of the various classifiers. Forty MFCC coefficients were extracted as features from the audio recordings of the six faults. Subsequently a dataset with 1599 rows and 40 columns was maintained, where rows represent faults and columns comprise extracted features (MFCC coefficients). The dataset was split into train and test datasets with a 70:30 ratio. In this work, fault classification was performed using classical Machine Learning (ML) models such as, logistic regression (LR), adaboost (ADB), and random forest (RF), as well as advanced deep learning model artificial neural network (ANN) and multilayer perceptron (MLP). Table 3 lists the parameters used to tune the classifiers. The non-standardized features vector comprising of 40 MFCC coefficients were input to the ML models to classify into labels (creep, crash sleeper, loose nut bolt, low joint, point and crossing, and wheel burn). Subsequently, the ML models were validated on a validated dataset and evaluated on a test dataset. A 10-fold k-fold cross validation was performed on the dataset and results are shown in table 4.
The ML models performance on the test dataset is shown in Table 5 and can be visualized in figure 16. It is evident from table 5 that deep learning model MLP outperformed other ML models, indicating that the MLP is a more generalized approach. Further, the model performance per class can be viewed by the confusion matrix. Figure 17 shows the confusion matrix of the better performing classifier MLP.

A. COMPARISON
A comparison was performed with acoustic based systems. Researchers [48], [49], [50] presented the acoustic based railway fault detection systems. Although the system presented in this manuscript is also used, acoustic data was collected at a main railway line. The number of faults in the proposed system is greater than that presented by [48], [49], [50] systems. The accuracy achieved by all systems, along with classifier, features and faults is shown in table 6.
It is evident from Table 6 that the proposed system surpasses other presented systems in terms of accuracy with increased number of faults. Further work is now ongoing to repeat and augment the work with further faults across more stations.

V. CONCLUSION
Railway track monitoring and maintenance is essential for an effective and safe railway operation. Absence of agreed, stable and effective track fault detection methods, results in safety alerts, accidents and losses in terms of assets, time, and lives. Thus, satisfactory and timely track maintenance and fault prevention should be conducted as a matter of fact. In many developing nations, the present typical railway cart for track inspection involves manual inspection, heavily relying on human action and judgement for track defect identification. A smart IoT based railway cart is proposed to autonomously identify railway track faults using acoustic analysis and localization. The microphone and GPS sensor mounted on RPi positioned near the wheels of the cart was used to record the sound and send acoustic signal and a GPS location every five seconds to a remote cloud. A dataset was maintained by deriving forty MFCC features from the collected fault sounds. Different machine learning models were trained and evaluated on this data. Amongst them, MLP achieved 98.4% accuracy. The authors are now preparing for a IoT system for the train rather than railway carts to gather more fault types or data from other typical railway terrains across Pakistan and other countries through international collaboration.