A Deep Learning Approach Towards Railway Safety Risk Assessment

Railway stations are essential aspects of railway systems, and they play a vital role in public daily life. Various types of AI technology have been utilised in many fields to ensure the safety of people and their assets. In this paper, we propose a novel framework that uses computer vision and pattern recognition to perform risk management in railway systems in which a convolutional neural network (CNN) is applied as a supervised machine learning model to identify risks. However, risk management in railway stations is challenging because stations feature dynamic and complex conditions. Despite extensive efforts by industry associations and researchers to reduce the number of accidents and injuries in this field, such incidents still occur. The proposed model offers a beneficial method for obtaining more accurate motion data, and it detects adverse conditions as soon as possible by capturing fall, slip and trip (FST) events in the stations that represent high-risk outcomes. The framework of the presented method is generalisable to a wide range of locations and to additional types of risks.


I. INTRODUCTION
Railway station environments are dynamic, and this dynamicity varies according to size and location. A variety of passengers transit the station, including families, old and disabled individuals and groups. Some stations are crowded at peak times because of the limited space, and increases in demand due to operational delays, design or layout deficiencies or management shortages can increase the risk of fall, slip and trip (FST) events.
FSTs are a leading cause of injury. In particular, falls due to slipping are statistically the main cause of accidents on crossways in built environments and railway stations [1]. The consequences of FSTs are not limited to the individual who suffered the accident, who may be seriously injured; FSTs can also affect railway operations, causing delay and disturbing the flow of people. Platforms which offer access to trains and The associate editor coordinating the review of this manuscript and approving it for publication was Lorenzo Ciani . escalators are hazards that form hot spots for FSTs. According to the RSSB Annual Report on Public Safety (2015/2016), over the last five years, the highest percentage of injuries from slips, trips and falls in stations occurred on stairs (38%), with platforms holding the second most likely spot (27%) [2]. Generally, the magnitude of falls worldwide rises with age: it has been reported that the proportion of 32-42% elderly adults (aged 70 years or older) fall each year from 5 to 7 times [3]. Some factors previously presented as the most crucial in FST events in the station include intoxication, security, hurrying, station design, staff skills and training [4]. Other challenges, such as weather conditions, congestion, cultural differences, insufficient maintenance and unwanted events, may cause panic and FSTs [5], [6]. Much of the unsafe behaviour exhibited by passengers, employees and the public can be described via the theory of behaviour-based safety (BBS), which has been demonstrated to be an effective tool for promoting safety [7]. The BBS includes three steps: observation, feedback and training. However, in railway VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. The overlap between FSTs and system aspects.
stations, these observations are based on humans; thus, they also involve human error. Moreover, covering all points in the organisation is costly, time-consuming, or impossible [8], [9]. Worldwide, FSTs are a severe problem that lead to substantial numbers of injuries and have endemic societal economic consequences that affect people of all ages. That is, despite logical, well-conceived attempts to diminish the number of casualties, these approaches have had mixed success [10]. FSTs are classified as causes of unintentional injury in many activities, both occupational and leisure-related, and their causes include loss of balance, which may result in falls to the ground or to lower levels. The factors to be considered include footwear, flooring conditions and visibility level as well as other external factors, such as crowding [11], [12]. Moreover, station design and layout factors, including corridors and entrances, egress routes and escalators, play an important role in safety and security and in preventing FSTs; however, the designs of some older stations include narrow areas [13]. Large numbers of people in these limited areas may lead to crowding over a wide-range of areas, such as in railway stations or entertainment venues (e.g., sports matches or music festivals), which raises the risk level of FSTs. The flow of people may also be affected by obstructions, which can result in pushing, falling or, in the worst case, trampling, which may increase the number of incidents. The FSTs in crowded situations can have serious consequences; historically, many people have died or suffered serious injury during events such as religious pilgrimages. Such risks increase when the railway industry's growth level is inadequate to serve the market demand for train travel. In addition, such risks also increase for older passengers, for travellers carrying large luggage items and when intoxication is considered [14]. FSTs are associated with many aspects of accidents, such as humanto-station environment interactions, including infrastructure and trains. Many causes are attributable to such risks and they lead to many different consequences (see Fig. 1). Moreover, the nature and patterns of FSTs and their active control measures require more research [12].
Currently in railway stations, detecting such risks relies on CCTV or staff observations; however, this approach has the potential for human error and may not result in a timely response, which can exacerbate the consequences. Furthermore, accurate station area detection includes platforms, escalators and tunnels; the images can include the full range of the station and thus provide the potential for timely responses. Technological growth has helped to extend and improve protection, especially CCTV systems. In recent years, automated video surveillance has enhanced public safety awareness and led to innovative research in a wide range of fields, including disaster management, crime prevention and security, assistance for people with disabilities, productivity enhancements and monitoring critical infrastructure [15].
In railway stations, CCTV and analogue cameras aid in accurately detecting station areas, including platforms, escalators and tunnels. However, the human behind the screens is the core of the system, which leads to possible human errors. In practice, however, a greater number of cameras in such areas leads to a loss of the monitors' ability to gain an overview of events in real-time. In fact, in accidents, the CCTV function primarily as evidence; thus, the process involves working with historical records of events more than with real-time event detection. Artificial intelligence (computer vision-based) techniques have been suggested as a possible solution to these issues; at minimum, they could function as an important assisting element to overcome some of the limitations of the conventional methods of risk management at stations and improve the safety system. This paper proposes a monitoring method that uses computer vision to automatically and rapidly identify risks in stations by recognising unsafe actions, providing support for decision-makers in real-time and reducing the potential consequences of unwanted events.
A vision-based approach can be considered the most suitable for crowded critical locations such as railway stations. Computer vision technology has demonstrated its potential for practical, cost-effective, rapid visual data collection, and vision-based approaches have been adopted in many fields, such as construction, safety and quality management [16]- [18] and productivity management [19]- [21]. The railway industry has already seen benefits from such methods, including railway track-gauge irregularities [22], railway maintenance [23] and trespassing detection [24]. Similarly, computer-based image recognition has been applied to detect and recognise railway infrastructure and changes in the surrounding environment [25]. Deep learning techniques have also been proposed to evaluate rail quality using track geometry [26]. Such research focuses on how to utilise technology such as the convolutional neural network (CNN) to analyse big data collected by railway systems to build risk-recognition frameworks-in the case of FSTsrisk in the railway stations, Fig. 2. The railway industry creates big data that have potential value for improving the system. This massive data can be utilised to provide suitable solutions for safety and security risks. The goal is to tackle the changing risks that face a sector via image data. The data can cover a wide variety of aspects and take many forms, such as spatial-temporal data, videos or images and data fusion. The data used for monitoring can be collected at fixed points or be installed on moving trains or other vehicles, such as drones [27], [28]. Moreover, these datagathering systems and their configuration can integrate with the Internet of Things (IoT), which is a framework suitable for big data technology, smart stations, smart cities and smart maintenance [29]- [33].
This remainder of this paper is organised as follows: Section II reviews the related works. Section III provides background information about deep learning and risk management in the railway industry. Section IV presents the concepts of using deep learning for risk management decision making in railway stations. Section V presents the framework of the case study model. Section VI discusses the application of the CNN case study model in railway stations, Section VII provides a data analysis, and finally, a discussion and conclusions are given in Section VIII.

II. RELATED WORKS
In this section, we attempt to assign the previous works to various subsections; however, their topics are scattered across many fields and include a variety of perspectives. CNNs have been widely applied in a wide range of visual computing applications, including signal processing [34], [35], speech recognition [36], medical imaging [37]- [42], object detection [43]- [47], face recognition [48]- [51], robot control [52], [53], autonomous driving (AD) and control [53]- [55] crash detection, risk estimation and traffic monitoring [56], [57]. Some models have even been implemented on mobile devices, such as Google's FaceNet [58] and Facebook's DeepFace [59], which are used for face recognition [60], [61]. Other studies have different perspectives, such as energy efficiency and data availability [62]- [64] and deep learning technologies for civil engineering applications, infrastructure monitoring and pedestrian detection and tracking [52], [65]- [68]. From an occupational safety perspective at a steel plant, images and deep learning have been utilised to detect oil spills [69] and to augment safety in the construction industry [70]. Furthermore, the method used to detect and track humans underwater [71] has also been applied to the automatic detection of unsafe actions in onsite videos [8] and for transport security using X-ray security images [72].
While the main areas initially involved computer science and related technologies, researchers have been applying deep learning techniques in their own fields. In the railway industry, the main concerns of this research are railway operations and safety (risk management). Features generated from CCTV images or other cameras in stations are fed into deep learning models so that they can learn from passengers' actions over a period of time. The goal is to train the CNN to automatically extract feature sequences that represent unsafe acts from videos, detect the presence of such sequences, and then initiate actions to mitigate the possible risks. Depth sensors such as Kinect TM or multiple cameras have been used to detect and record unsafe actions by extracting 3D skeletal models of humans [73]. Additionally, machine learning techniques combined with various processing methods have been applied [74]- [77]. The studied technologies include multisensor fusion-based approaches [78], accelerometer-based approaches [79], [80], smartphone-based approaches [81], [82], vision-based approaches [83], [84] and systems based on video data. Such systems can assist in detecting falls by monitored individuals at their homes [85]. Moreover, a previous study aimed at protecting and detecting falls showed that several major categories of sensing equipment have been used (see Fig. 3) [76], [84], [86]- [94].
The next subsection presents a review of the previous works in some fields related to this study, which involves detecting proposed risks in the study framework in a railway station by applying a CNN.

A. RELATED WORKS IN THE CONSTRUCTION FIELD
In construction and other fields, unsafe human behaviour is an important root cause of accidents [95], [96]. To identify common unsafe actions, stereo cameras were used to collect motion data and construct a 3D skeleton model; then, pattern recognition was applied to manage worker safety in the construction field [8] and to detect problems occurring on the site. Several defect management systems based on image matching have been suggested [97]. For less operational constraints, two smartphones have been used as stereo cameras to acquire motion data and extract 3D human skeletons to track people working in construction fields [98]. Real-time machine learning models with CNN frameworks have been proposed to detect whether workers are wearing safety equipment, such as hats and vests, from images/videos [99] and to detect ground objects [100]. CNNs have also been used to detect safety guardrails [101], objects on roof construction sites [102], workers who fail to wear hard hats [103], [104], falls from heights [105], [106], to maintain safe distances among objects for safety to prevent accidents [107] and unsafe behaviours [73]. Additionally, to estimate risk and reduce accidents, deep learning has been recommended in the shipbuilding Industry [108] and for ship bridge-collision assessment [109]. CNNs have been utilised for automated detection of employees near heavy equipment at construction locations [110], detection of construction vehicles [111], [112] and recognition of structural damage [65]. In fact, for advanced safety performance, computer vision combined with deep learning has been recommended because such approaches can automatically classify unsafe behaviour and conditions on construction sites [70].

B. RELATED WORKS IN CRACK DETECTION
Crack detection has been classified in previous studies into two general method types: image-based crack detection and crack detection based on machine learning. An imagebased crack detection method was suggested to automate crack detection for safety and cost-effective bridge maintenance [113]. Additionally, the authors proposed automating the processes of bridge monitoring and maintenance for safe transportation infrastructure and compared the effectiveness of four crack-detection algorithms (wavelet, Fourier transforms, Sobel, and Canny [114]) for detecting healthy concrete surfaces [115], bridge damage [116] and corrosion detection [117]. Moreover, with the goal of automating concrete bridge decks inspections, a principal component analysis (PCA) algorithm was applied to mitigate the dimensionality problem of feature vectors to extract significant crack features from a database of bridge images [118].
In addition, automatic concrete crack detection in tunnels using deep fully convolutional networks was proposed in [66]. To achieve automated detection and reduce the computational cost of detecting large concrete surface cracks, a method by percolation-based image processing was proposed in [119]. Tunnel crack features extracted based on detecting pixel intensity were classified by a support vector machine SVM algorithm to determine whether cracks were present in pre-processed images [120]. For safety inspection and structure health and reliability, an automated method based on a backpropagation neural network (BNN) was developed for crack detection [121]. For road crack detection, to deal with crack intensity inhomogeneity by capturing and utilising some unique crack characteristics, an automated method was suggested that extracts crack features based on a discriminative integral channel and then classifies the features.

C. RELATED WORKS IN RAILWAY SYSTEMS
Technology such as computer vision will play an essential role in railway system networks and provide effective methods to solve various problems. The vast distances and long tracks in many areas of the world and the growth of complexity pose challenges to maintenance and in fulfilling safety, security and quality; in addition, there are cost restrictions, time-consumption and reliability issues.
Using a random forest algorithm [122] to perform crack detection on 3D asphalt surfaces [123]. Due to their high performance and promising results, convolutional neural networks (CNNs) have been utilised in visual computing in many studies in the field [124]- [126] and for floor area detection [127].
Due to their high performance and promising results, convolutional neural has led to extreme weather, while demand causes the industry to raise capacity and increase the number of trains in the system. Nevertheless, learning machine algorithms can estimate the exact abnormalities by monitoring rail tracks [27] to perform risk assessment of rail failure [128], diagnose track circuit faults [129] provide early and precise detection methods that are essential for avoiding risks [130] and provide information for decision support [131]. It has been shown that video camera inspection is a flexible, effective and automatic method for monitoring rail tracks. Running rolling stock can provide high-resolution images from different angles regarding their surroundings, including tracks and other assets. This data enhances the machine learning and enables high-performance predictions of abnormal changes or unwanted events [132], [133]. Moreover, the use of vision allows for more frequent infrastructure inspections and reduces human errors [134], helping to avoid maintenance train collisions [135] and monitoring to ensure passenger safety at stations [136]. Using a robot for railway tunnel detection reduces worker risk and improves the detection efficiency [137], [138]. Additionally, computer vision has been analysed for use in autonomous emergency train stops [139].
Deep learning methods have been suggested for addressing many obstacles in the railway industry, such as poor or missing data; such methods are expected to improve prediction accuracy, optimise timing, reveal the types of maintenance that should be performed to rail infrastructure [33] and to perform object detection for railway traffic [140].
Of the many applications that have been applied to CNN, in this subsection, we present those that are specifically related to railways. Such studies have been widely reported in the recent literature and use many data sources; they cover management, maintenance, safety and operations [141]. Image-processing approaches for implementing automatic detection have been suggested for monitoring railway infrastructure [128], rail track maintenance [133], railway track inspections and train component inspections [142]- [152] such as the rolling bearings of trains [153].
Sydney trains conducted condition monitoring for inspections and prevention of overhead wiring teardowns using laser and computer vision technologies [163]. Similarly, deep learning has been implemented to conduct traffic signal detection [164], [165], predict train delays [166], detect rail fastener defects and ballast history [167]- [169], detect cracks in and the shape and location of bolts [170], inspect railway ties [134], predict safety risks in communication-based train control systems (CBTCs) [171] and to perform subgrade status inspections [172].
A CNN can be used to estimate crowd density at railway stations [173],to detect intrusions in track areas, such as pedestrians or large livestock via images captured in railway areas [174], to monitor railway construction [152] and for intrusion detection at railway crossings [175]. From the security side, the method been used for detecting violent crowd flows [176], protect the critical infrastructure [177], and identifying tools wielding by attackers such as knives, guns and Explosives [72].
A railway system contains a wealth of data, and visual processing technology can play an essential role in the industry's future. The most up-to-date applications were reviewed in [178].

III. DEEP LEARNING AND RISK MANAGEMENT
As one type of machine learning in AI, deep learning (DL) has been suggested as a method for risk management in railway stations. Accordingly, in this paper, we address some risks by utilising vision data from many points in the system, including both still frames and motion video. Currently, face recognition plays an important role in computer vision and has many applications, such as in autonomous vehicles, human-computer interactions, video surveillance, robotics, health care, medical imaging and homecare technology. Improvements in IT have enabled vision sensors to be installed in railway environments. For example, CCTV cameras widely used and rely on numerous cameras sensors; these cameras are intended to avoid and manage safety and accidents in railway environments.
In this study, we explore DL by utilising a convolutional neural network (CNN) to detect passenger falls. FTSs are common accidents in stations; their causes are sometimes related to human factors such as people running or to factors such as damage to floors (wet or muddy conditions) or a lack of lighting or poor steps design. FTS risks are correlated with other risks, such as overcrowding or emergency evacuation. In some cases, passenger falls can lead to overcrowding and panic; passengers can fall into the gap between the train and the platform, onto the track, or even under trains, and such incidents may escape notice by the train driver or station workers. A team in the platform may not notice a passenger trapped in the doors or people who are very close to the train or children-who might be at increased risk. CCTV cameras in stations can capture a vast amount of data, and such data is typically archived for some period before being deleted. The recorded data can be utilised by the police as evidence in criminal cases, and the system data can be utilised for monitoring all station operations; however, the outcomes currently depends on employees whose job is to watch video screens all the time. However, CCTV management systems (in control rooms) are passive: they provide only a limited ability to maintain safety in stations. When an emergency situation occurs, it is very challenging to identify and manage the emergency immediately.
Human error in such cases can be high, and the locations of monitored cameras may not fully cover all station areas.
Accordingly, it is necessary to systematically observe the risks and any related factors relating to passengers in the station and raise a notification concerning any potential emergency condition in a timely manner.
Multiple cameras can cover all station areas, such as platforms, tunnels and tracks, while image-processing technology can detect real-time risks and then take actions such as notifying the train driver, the central control room (CCR) and station staff with the information, including the location, time and any alarm message. The captured images can be input to a smart system, which can be trained to recognize any pattern differences and can learn over time. This approach reduces the risk of human error and increases the reliability of real-time predictions. It is expected that utilising a smart method such as a CNN would be able to identify passenger falls, running, overcrowding, or any behaviour or conditions that look suspicious. Some current techniques are effective in detecting suspicious behaviour is in real life, such as Hitachi video analysis [179]. Moreover, a thermal camera has been used to detect human body temperature and used to detect changes in emotion [180].
Video surveillance can play many roles in industry security and safety by utilising advanced detection algorithms and identifying risks in early stages, such as suicide, traffic flow, criminal activity, trespassers, smoke and fires. Advanced methods can detect objects and conduct video analytics to assist emergency responses and support decision-makers. It is expected that these detection techniques would aid in developing emergency response plans and communications schemes, which are critical in reducing risks from emergency events in railway stations. Additionally, the new technologies can contribute to measures for ensuring passenger egress and transit at critical station locations, such as tunnels and access points, for emergency responders. Moreover, advanced analytical video surveillance can cover a range of risks, such as collisions, derailments and intrusions from adjacent areas into unauthorised station locations such as a track [181] while managing other subsystems in real-time with minimum manpower and high efficiency (see Fig. 4). In the literature, it has been noted that achievements in deep learning can enable vision and video processing, classification, image captioning, segmentation, object detection, recognition of human actions from the video, picture recovery, security, observation and so on [136], [182]- [184]. Applying new technology, including image processing, computer vision and machine learning, will provide both direct and indirect benefits, such as improvements in safety and security, such as detecting problems at early stages, resulting in time and cost savings for the long term and lead to automatic many processes in the railway system.

IV. DEEP LEARNING FOR DECISION MAKING IN RISK MANAGEMENT AT RAILWAY STATIONS
DL is a subset of machine learning, which depends on employing nonlinear algorithms to match data. There are many methods that employ this technique, but they generally share some commonalities, such as the way each layer receives the output from the former layer as inputs, as shown in Fig. 5. Advancements in hardware and increased data availability have contributed to the ability to effectively train deep CNN networks to identify features not only from static images but also from videos [185], [186]. In addition, including a set of convolution layers in an NN framework has revolutionized image processing. The convolution operation can be defined as follows: where x and w is the kernel, which is an adaptive filter that the network learns [188].
Video identification is challenging role compared to static images due to the complexities involved in capturing continuous spatial and temporal data [189]. In the past few years, DL has gained enormous power for object detection and tracking. Some object detection algorithms include the region-based convolutional neural networks (RCNN), Faster-RCNN, the single shot detector (SSD) and you only look once (YOLO). Among these, Faster-RCNN and SSD achieve higher accuracy, while YOLO offers is more advantageous speed is given preference over accuracy [190].
Many CNNs are configured to use a graphics processing unit (GPU) as a specialised type of electronic circuit, that can swiftly manipulate and convert memory to accelerate the creation of images in a frame buffer [191].
Currently, most machine learning efforts rely on DL techniques, which connects the layers of an artificial neural network (ANN) to systematically identify patterns in the data that affect decision making. DL is a powerful method of machine learning; however, it requires large amounts of training data to be efficient. Such systems make it possible to make decisions without human input; moreover, the system can learn continuously. For instance, self-driving cars are able to make timely decisions about speed and direction from information captured in real-time from their surroundings.
Offering a decision making algorithm to enhance railway station safety and risk management would be a significant improvement in the use of CCTV data, passenger smartphones, ticketing systems, or other related subsystems in stations. In the initial phase of such applications, we can use DL to support the decision makers; later, in the more advanced phases, we can rely on AI as a highly accurate decision maker. In other words, individuals and AI technologies can cooperate to manage various decision-making challenges (uncertainty, complexity and equivocality) [192], [193]. Based on CCTV systems in stations, which can be updated and utilised to capture video frames and collect data reflecting human actions and motion, the resulting data contain spatial and temporal information from many locations in the station, such as platforms. Then, unsafe acts can be detected using a deep learning method, which is mainly based on a set of algorithms that attempt to model high-level abstractions in the data. The model is trained from multiple frames and the spatial VOLUME 8, 2020  features they contain. For a more comprehensive application, we compare the traditional risk management process to a CNN model process to present the steps of the two systems in parallel (see Fig. 6).
Both outcomes will support the decision-maker and reduce uncertainties to a low level in complex systems. The process improvement will support many field activities such as maintenance, passenger crowding. System reviews will enhance actions, add alternatives and redesign the processes regarding predictions and advanced analytics and-importantly, training the model. The cycle of control, continuous improvement and incorporating lessons learned is an essential part of a safety system; thus, this innovative approach fits well into that process, as shown in see Fig. 7.

V. MODEL FRAMEWORK
Railway station monitoring is vital to guarantee that people and the rail system are safe and secure. A monitoring failure can result not only in significant impacts to train delays and maintenance costs but also to passenger safety at the station and to society and the economy.
In the case model adopted in this paper, the goal is to manage the risk of falls by detecting and analysing passengers automatically among the enormous amount of data from CCTV cameras. The outcomes illustrate the practicality and efficiency of the proposed approach. This model relies on image-detection methods and introduces a risk management framework that uses a CNN to analyse the images or videos to detect risks. The proposed framework is depicted in Fig. 8. Video images can be used to identify deficiencies, such as interruptions to passenger flow that cause falls, which leads to overcrowding. In addition, they can be utilised to discover unwanted events that occur in the station.

VI. THE CNN CASE MODEL
The goal of the proposed model is to detect falls based on a CNN. To implement such a method, the system needs to be able to characterise the complex motions of passengers and address more than one passenger fall at the same time. When a fall is detected, the system should present the significant information to railway station control, such as the time and location. The difference between a CNN and ordinary neural networks is that each neuron in a CNN is locally connected to only a few neurons in the previous layer; not to all neurons, as is the case in ordinary neural networks. This enables CNNs to be used to construct deeper networks and, consequently, learn more complex features [194]. Furthermore, CNNs have demonstrated high performance and are relatively easy to train. A basic CNN can be characterised as having two layers: a convolutional base layer that extracts features from an image and a classifier (a fully connected layer) to classify the image based on the detected features. Each frame undergoes a the data acquisition phase that supplies the system with the digitized data from such images. These data may include many events or statuses and can be acquired from both internal and external networks, such as traffic and/or track conditions and weather. Then, manipulation or data mining processes such as feature selection, extraction and standardisation can be applied to process the raw data for analysis. The data can contain many layers, including the acquisition time and location. Next, an appropriate model and deep learning technique are used to perform feature detection and make predictions along with the actions and triggers to be activated when a threshold is breached. The goal is to create a proactive system that can avoid or mitigate unwanted events. VOLUME 8, 2020  The history of events and scenarios from many points in the system will improve the prediction accuracy, and the model is trained from past activities, as shown in Fig. 9.
CNNs have become the main tool used for recent innovations in the comprehension of images [195], videos [136], [196] and audio signals [182], [183].  In this study, we used the Keras library, which is an open source neural network library written in Python that supports easy and fast prototyping. Furthermore, it maintains the CNN and runs seamlessly on both CPUs and GPUs. Keras is compatible with other Python code and can use raw images as inputs to the CNN model, which extracts features. A summary of the experimental configuration is shown in Table 1.
In this part of the study, we employed available processors to execute the framework; however, for large data, employing more powerful CPUs and GPUs is recommended.
We build a model layer by layer using the sequential model type was selected, which is the simplest way to build a model in Keras. Next, to deal with input images as 2D matrices, we selected Conv2D layers with 64 nodes in each layer. A 3 × 3 filter matrix was used for the convolution kernel (see Fig. 10). A CNN structure includes convolutional layers that are the major building blocks; these layers learn the features that are suitable for differentiating between a 'falling' image and a 'not falling' image. Each convolutional layer employs a set of kernels that apply a convolution operation based on the outputs of the preceding layers. We adopted the rectified linear unit (ReLU) as an activation function because ReLU has previously been shown to work well in neural networks.
For the output layer, we selected a dense layer, which is a standard type of layer used in various neural network architectures. To connect the convolution and dense layers, a flattening layer is preferable. In addition, we used dropout layers between the, various layers to avoid data overfitting [196]- [198].

A. EXPERIMENTS 1) DATASETS
The objective of the CNN in this study is to take input image data of passengers in a station and classify each image into one of two classes: either 'falling' or 'not falling'. The dataset was divided into separate frames with known labels (falling or not falling), which were then used as training data for the classifier. To construct the dataset, different activities and complex falling events from different locations were selected from many open source sequences, such as falls on stairs or in the gap between the train and platform, as shown in Fig. 11.
We implemented the proposed method after training to predict risk states in a railway system (at the station) and evaluated the performance of the model. For all the experiments, we used one computer equipped with an Intel Core i7 CPU, 64 GB of memory and an NVIDIA GeForce MX 150 GPU.
We gathered data consisting of both still frames and videos from open sources. Finding such data is challenging both because of privacy concerns and lack of availability for many reasons, such as that data is deleted from data centres periodically and the difficulty of finding and collecting such data.
The data must clustered, classified and labelled; the images show some risky passenger behaviours (see Fig. 12) and the collected videos and images cover many countries. The data raise significant concerns that should be considered in the future station design and in daily operations, for example: • People standing in risky positions near the gap between the train and the platform and close to moving trains.
• Some people trespass into the track areas and can be found in restricted areas.
• Some passengers cross the tracks to take shortcuts between locations.
• Some passengers are pressed against the train and their clothes become trapped in train doors.
• Elderly people fall on escalators and other passengers misuse the escalators.
• Children and those susceptible to fainting falling into the gap between the train and the platform.
• Impacts from technology and lifestyles, such as taking selfies behind the trains • Wheelchairs falling down stairs and escalators.
• Passengers standing in restricted areas. However, the limited data available for each dataset are not sufficient for training deep learning models. Thus, we augmented the collected data with the Le2i dataset built by VOLUME 8, 2020  Charfi et al. [199], which covers many falling positions, and our model performs only binary classification: falling or not falling. We split the dataset into training and test datasets.

2) PRE-PROCESSING AND PREDICTIONS
Despite the data limitations, the data (both images and videos) collected from the web require intensive cleaning. The variety of sources imposes many constraints, images with poor quality and obstructed vision (to the point that the targets cannot be seen) must be removed. After being trained, the model is applied to a test dataset, in which the images have not been seen before, to classify the risk of falling.
Using randomly selected open source images, we divided the data into three sets (training, testing and prediction).The CNN training outcome of results in an accuracy increase with each model training iteration; thus, the model performance validation data eventually reach an acceptable level as the error decreases, as shown in. After training rule, the prediction ability of the model was evaluated on the test sample (see Fig. 14).

3) THE EVALUATION
During the testing process, performance indicators can be calculated from the trained model output. We selected indicators such as accuracy, precision, recall and F1-score and the receiver operating characteristic curve (ROC) for this study. For Predictions, we are focused on identifying the fall risk. Hence, we sample data present the fall and the behaviour of falling occurrence which cover unsafe people positions. For an estimate the risk and to identify the best classes, different datasets cases are studied. Falling and not (case1), and three categories, fall, not falling (normal or safe station) and unsafe behaviour (case 2) see Table 2 and Fig. 13. The prediction model classifies instances of passenger behaviour using a two-class prediction (case1) show the high results. When the prediction is positive and the ground-truth value is also positive, the prediction is called a true positive (TP). Similarly, false positive (FP), true negative (TN) and false negative (FN) values can be calculated [141]. These four values can be presented as a 2 × 2 contingency table, called confusion matrix, as depicted in Fig. 15.
The ROC is a metric that reflects both the sensitivity and specificity of continuous variables and reveals the relationship between them. The ROC curve of the case study results is shown in Fig. 16.
From the perspective of the ROC curve, the model performs effectively in making falling predictions.
The lack of risk-class images in accident cases at stations means that an uneven number of pictures exists between the risk and non-risk groups. Thus, the data make it challenging to model safety vs. risk when training a deep learning model utilising the available images. In this study, we used 80% of the dataset for training, 10% for validation and the remaining 10% for model testing.

VIII. DISCUSSION AND CONCLUSIONS
In this study, we proposed a process for moving from conventional risk management to applying innovative technology to risk management; such an approach can improve safety and security throughout the entire railway industry paradigm. Many benefits can be gained from deep learning in risk management, such as the following.
• A real-time ability can be gained to help avoid risks • Many subsystems in the field can be integrated, including maintenance, security, traffic and passenger models, to form actions that consider multiple aspects.
• Lessons and experience can be integrated into the learning process and automated effectively via machine learning, which is critical for safety systems.
• The effectiveness of operations in stations and other areas linked to railway activities can be improved.
• Time and costs can be saved while improving accuracy to enable long-term quality improvements • Both passenger and workforce experiences can be improved, which reflect on the overall market image.
• Data gathering can be enhanced to more fully utilise effective connections between assets and people.
We propose an efficient railway system technique framework based on a CNN and applied DL algorithms to foster detection of unwanted events in railway stations. We adopted a CNN to extract events such as passenger falls, which may occur on stairs, escalators, or platforms. Different scenarios were anticipated, such as suicide or falling under moving or stopped trains. The fall event detection process can alert the station control centre and then action can be taken to better clarify the situation, which might be an attack, crime, or intoxicated passenger incident. Timely detection will mitigate the risks to other passengers, lead to more rapid responses in emergency or evacuation situations and decrease the potential hospitalisation time. We presented datasets from open sources; however, compiling additional datasets containing training examples would improve the accuracy and cover a wider range of station risks. CNN-based methods require a large pool of labelled training data, and collecting and labelling such data is a complex task. Nevertheless, automatic detection can help with timely maintenance and risk control, and the results can be used as feedback to train the model to obtain improved accuracy. The results with the proposed model confirm that increasing the depth of a deep network can lead to better performances in terms of accuracy. Finding accident data (such as falls in stations) is challenging for many reasons, such as the lack of available data and passenger privacy concerns. Using methods such as computer vision techniques will improve timely risk management, detection and safety and ultimately affect risk management in railway systems. The proposed method could be generalised to detect other risks, such as people running, overcrowding, suspicious item detection or other complex activities in addition to fall patterns in stations. The CNN approach provides real-time, accurate visual monitoring of the risks in railway stations to assist safety or risk management operations, which are reflected in passenger services. The method is more suitable for real world conditions and is cost-effective (enabling, for example, 24-hour monitoring of CCTV cameras with the intention of identifying potential acts of vandalism). Image processing has been shown to be a promising technology that has the ability to improve station safety, manage risks, reduce dwell times and reduce the number of operators at stations. Moreover, image processing techniques are useful for detecting congestion, assessing flow, accessing dangerous zones, identifying people moving in forbidden areas and notifying train drivers about foreign objects ahead [19], [20].
Our results demonstrate that the proposed CNN model can automatically extract and classify risky behaviours (i.e., falling on the platform) with a high level of accuracy. The method carries high confidence that all the objects in a data sequence are detected and recognized. Nevertheless, this CNN model should be improved and implemented to automatically detect risk actions related to human behaviour or asset conditions both during normal operations and in any unanticipated conditions. Such models can lead to intervention by management or execute high-level automated actions; these can directly modify behaviours and mitigate risks or reduce the consequences of accidents. Moreover, the results can be used to provide designers, operators and decision-makers with direct visual outcomes and to allow them to learn how to deliver operations more safely. Additionally, the results indicate that the process can achieve efficient railway system detection under numerous conditions, including aspects such as: • Safety and security • Infrastructure and assets • Maintenance and traffic management operations • Quality and reliability • Operations, passengers, train drivers, workforce management and so on, The development of specialized algorithms for the model can overcome errors and improve response time. By capitalizing of existing CCTV systems, the costs are expected to be reduced over the long term and improve system efficiency by considering the locations and coverage of the cameras. The model offers other benefits to stations that are worth further research, such as predictive maintenance, emergency plans, people counting, train positioning and security. However, the data availability and quality remain a challenge because this technology depend heavily on large amounts of highquality data. Finally, it is time to invest in AI to benefit railway systems, making them safer for staff, customers and the public.

ACKNOWLEDGMENT
The authors would like to acknowledge the support by the Railway Technical Research Institute and The University of Tokyo, Japan. They also wish to thank the European Commission for the Financial Sponsorship of the H2020-RISE ''RISEN: Rail Infrastructure Systems Engineering Network'', which enables a global research network that addresses the grand challenge of railway infrastructure resilience and advanced sensing in extreme environments (www.risen2rail.eu) [200].
SAKDIRAT KAEWUNRUEN received the Ph.D. degree in structural engineering from the University of Wollongong, Australia. He has expertise in transport infrastructure engineering and management, successfully dealing with all stages of infrastructure life cycle and assuring safety, and reliability, resilience, and sustainability of rail infrastructure systems. He has over 400 technical publications. He is a Chartered Engineer and the principal investigator (PI) and chief investigator (CI) of research projects over $8m.
MIN AN is currently a Professor of construction and risk management at the University of Salford. He is also an Honorary Professor with Beijing Jiaotong University, China. His expertise is in the development of safety risk and reliability assessment techniques so that safety and reliability aspects can be considered in design, construction, and maintenance processes. His research work has been funded from a variety of sources, including EPSRC, EU, government agencies, and industry. He is/was the principal investigator (PI) of 31 financed contracts. He is a member and an associate editor of editorial board for 12 international journals.