Research on Time Characteristics of Near Miss in Bohai Sea

It is an effective method to analyze the maritime traffic risk using near miss model, but there are no works from the temporal perspective. The duration of near miss shows the maritime traffic risk. The longer the duration is, the higher the risk is. The influence of the ship maneuvering capability, speed, size, and Convention on the International Regulations for Preventing Collisions at Sea (COLREGS) on the safety of ship navigation are considered in this article, the navigation safety domain is used to detect near misses, and the duration of a single near miss is defined, which is the time from target ship sailing into the navigation safety domain of own ship to leaving, and then an algorithm for calculating duration of near misses is proposed. To measure the similarity of two ships’ lengths, the similarity coefficient model of ships’ lengths is defined. Based on the $k$ -means algorithm, different parameters $k$ are selected to cluster the duration of near misses. According to the clustering results, the connections between the duration of near misses and the ship type, length, and speed were analyzed to identify a series of time characteristics of near misses, that is high frequencies of short-duration near misses and the strong connections between the long-duration near misses, ship length, and ship speed. The method is first proposed for doing research of near misses in temporal dimension. The findings of this article help researchers and maritime traffic management agencies to better understand the importance of maritime traffic risk from the temporal perspective.


I. INTRODUCTION
With the rapid development of modern economy, water transport has become increasingly busy and the occurrence of ship collision has become more and more frequent [1], [2]. The safe navigation of ships is of high societal concern [3]. It is difficult to evaluate the overall risk of waters based on actual ship collisions because of few ship collision accidents [4]. In recent years, it is an effective method to evaluate the maritime traffic risk by replacing the actual ship collisions with near misses [3]. The greater the number of near misses is, the higher the risk of maritime traffic is.
Scholars have considered many factors and proposed a variety of conditions for detecting near misses. Berglund and Huttunen believed that distance to close point of approaching (DCPA) ≤ 0.3nm can be used as a detection condition for near misses [5]. Fukuto and Imazu regarded DCPA of 1 nm and time to close point of approaching (TCPA) of 5 min as The associate editor coordinating the review of this manuscript and approving it for publication was Zhe Xiao . the conditions for detecting near misses from the perspective of reducing the frequency of ship collision warnings and reducing the workload of pilots [6]. Park set DCPA≤0.15nm and TCPA≤3min as the thresholds for VTS to detect the safe speed [7]. Langard et al. found that the average DCPA of ships taking collision avoidance actions was 0.64nm, TCPA was 26min, and the distance between ships was 1.81nm [8]. Kim and Jeong defined the invasion of the ship domain as a detection condition for near misses, to distinguish between encountering and near miss [9]. Sang-Lok Yoo thought that DCPA of 0.1nm, TCPA of 3min, and ships distance of 0.3nm can be used as detection conditions for near misses, and then developed a charting system based on near miss density [4]. To find the critical encountering, Goerlandt et al. defined overlapping of ship domain and ship contour was near miss, and analyzed the risk of collision between ships in open waters of the Gulf of Finland [10]. Tan et al. believed that overlapping of ship domain and ship contour was not suitable for detecting the near misses of merchant ships and fishing ships in restricted waters, and then defined ''shortest passing VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ distance'' ≤ 500m and ''time difference'' ≤ 3min as the detection conditions [11]. Zhang et al. believed that DCPA and TCPA could not fully reflect the risks of encountering. The Vessel Conflict Ranking Operator (VCRO) model was built by using the distance between ships, relative speed and phase. The distance between ships of 1nm and 6nm was set as high-risk and low-risk standards respectively, and then the risks of Finnish waters were analyzed [12]- [14]. At present, most of the works of maritime traffic risk based on near misses are carried out from the spatial perspective, mainly focusing on detection model of near misses and analysis of the frequency and geographical distribution characteristics of near misses to evaluate the maritime traffic risk. However, there is no work focus on the maritime traffic risk based on near misses from temporal perspective.
Near miss is a critical situation that ships encounter each other [10], and the duration of near miss shows the characteristics of maritime traffic risk from temporal perspective. In this article, the navigation safety domain is used to detect near misses, and then the model for calculating duration of near misses is proposed, based on the processing for time synchronization of AIS data. Cargo ships, oil tankers, and passenger ships in Bohai Sea are selected as the research objects, and then the duration of near misses is counted. The k-means clustering model is used to cluster the duration of near misses. The results of clustering are analyzed, and then the correlations between the duration of near misses, the ship type, length, and speed are obtained.
The main contributions of this article are as follows: 1) The influence of the ship maneuvering capability, speed, size, and COLREGS on the safety of ship navigation are considered in this article, and then the navigation safety domain is used to detect near misses, and the model for detecting near misses based on navigation safety domain is proposed.
2) The absolute difference of two ships' lengths cannot fully show the similarity of those, the similarity coefficient model of ships' lengths is defined, which could better measure the similarity of two ships' lengths.
3) A method for studying maritime traffic risk from temporal perspective is first provided. The duration of a single near miss is defined, which is the time from target ship sailing into the navigation safety domain of own ship to leaving, and then the model for calculating duration of near miss is proposed. 4) Findings, the high frequencies of short-duration near misses and the strong connections between long-duration near misses, ship length, and ship speed, could help officers to detect the high potential risk encountering.
The rest of the paper is organized as follows. Section II outlines the conceptual basis of AIS data and k-means clustering algorithm, and presents the model for detecting near misses based on the navigation safety domain, the model for calculating duration of near miss, the similarity coefficient model of ships' lengths. Section III analyzes the connections between the duration of near misses and the ship type, length, and speed. Section IV gives the conclusion.

II. BACKGROUND KNOWLEDGE A. AIS DATA
The determination of near misses is based on an extensive analysis of data obtained from the Automatic Identification System (AIS). The system transmits certain data about the navigational status of the ship to other ships and onshore receiving facilities continuously and automatically [10], [15]. The application of AIS is meaningful in protecting the marine environment, ensuring the safety of life at sea, improving the safety of navigation, and developing the shipping industry efficiently [16], [17]. The types of AIS data used by this article are summarized in Table 1.
There are various types of error in AIS data, such as data corruption, faulty position reports and erroneous Maritime Mobile Service Identity (MMSI) numbers [18], [19] because of sensor signal misalignment, AIS equipment fault, etc. These errors have been filtered out before analyzing the data. The system is self-organized by a time division multiplexing algorithm and provides updates to variable data at rates depending on the ship speed and manoeuvre situation [20]. The rate of data transmission is seen in Table 2. Specifically, the frequency with which AIS data is transmitted depends on whether a ship is underway (intervals of 2-12s depending on speed) or at anchor (intervals of 3min).
Goerlandt believed that for a given ship, the sample rate of about 5 minutes is enough to roughly estimate the number of near misses [10]. However, the sample rate of about 5 minutes is not sufficient for the detailed study of the duration of near misses. Generally, the higher the sample rate is, the more accurate the ship motion characteristics are. This necessitated the interpolation of raw AIS data according to a sample rate interval of 1min.
In practice, the data of ships' trajectories may not be received synchronously. The AIS trajectory sequence needs a fixed sample rate to detect the near misses, so the time-synchronization interpolation is needed to obtain the time synchronous and continuous ship trajectory data. It is necessary to segment the ship trajectory to ensure the correctness of the interpolation. The trajectory will be separated between the two points if the time interval between those two points is too long. The subtrajectory data after segmentation will be deleted when it is too little to provide important information [21].
It is necessary to evaluate whether or not the trajectories of the two considered ships occur in an overlapping timeframe because the two selected ships may sail in same water area in a different period of time.
where T 1 ≤ T 2 represents the trajectories of the two ships occur in an overlapping timeframe, the first time point is T 1 and the last time point is T 2 ; T 1 > T 2 represents the trajectories of the two ships do not occur in an overlapping timeframe.

B. THE MODEL FOR DETECTING NEAR MISSES BASED ON THE NAVIGATION SAFETY DOMAIN
As described in the introduction of this article, scholars have considered many factors to detect near misses, such as DCPA and TCPA, but DCPA and TCPA do not fully reflect the severity level of encountering [12]. The distance between the two ships is also an important factor to detect near misses, but there are no conclusions about the relationship between near misses and the distance between the two ships. Therefore, k AD = A D /L = 10 0.3591lgV own +0.0952 k DT = D T /L = 10 0.5441lgV own −0.0795 (8) VOLUME 8, 2020 the relationship between the distance and near miss is further analyzed. Ship domain is the area around ship where operators wants to avoid other ships for safety reasons. Ship domain is a special form of the safe distance, because the distance between ship and domain boundary in various bearing is different. Since Fujii introduced the concept of ship domain, researchers have been making statistical analysis on the distances between ships in scene of encountering, and calculating the size of domain according to expert experience [22]- [25]. At present, concept of ship domain has been used in maritime research widely, including recent applications to various traffic engineering problems: waterway capacity analysis [24], waterway collision risk analyses [27]- [30], near miss detection [13], [31]- [34], ship traffic simulation development [35], and ship collision avoidance analysis [36], [37].
Zhang used AIS data-driven approach to discover that there is an area, named as probabilistic ship domain [22], around own ship that no ship enters. It should be noted that the size of probabilistic ship domain is basically similar to that of Fujii ship domain, which proves that it is reliable to use the ship domain to detect near misses. The sizes of domains vary quite significantly [38], and the application of different domains to detect near misses affects the detection results. The probabilistic domain seems preferable in terms of safety analysis of marine traffic, but is heavily dependent on the set of AIS data and still under development. Fujii ship domain is small, which can be used to detect the most critical scene of encountering effectively. However, a number of comments should be made in the use of this domain [10]: (1) The domain is symmetric, which implies that the possible influence of COLREGS is not considered.
(2) It is unrealistic that passing behind the stern is considered as dangerous as passing in front of the bow as the result of symmetry.
(3) The largest ship has the largest domain, which means the encountering is classified as danger for the largest ship, whereas safety for the smallest ship.
Considering the influence of the ship maneuvering capability, speed, size, and COLREGS on ship safety, Wang proposed a navigation safety domain named quaternion ship domain (QSD) [39], shown in Fig.2. The boundary functions are described as Eqs (3) -(7), as shown at the bottom of the previous page. The power k determines the shape of the domain while the {R fore , R aft , R starb , R port } identifies the ship domain size. To facilitate the study, k is usually set to 2 [40]. The navigation safety domain is larger than the Fujii ship domain, but the encountering is still so critical that it is worthy of the operators' attention when target ship is sailing into the navigation safety domain of own ship. The comments (1) and (2) can be solved effectively by using the navigation safety domain to detect near misses.
where R fore and R aft are the longitudinal radius of the QSD in fore and aft directions respectively; R port and R starb are the lateral radius of the QSD in starboard and port direc-  tions respectively; L is the own ship length, k AD and k DT are the gains of the advance A D and the tactical diameter D T respectively; V own is the own ship speed represented in knots.
Position relationship between target ship and own ship is represented in Fig.3, and the model for detecting near misses is described using Eqs (9) -(13), where OS is own ship; TS is target ship; θ is the bearing of target ship relative to own ship; β is heading of own ship; λ 1 is the longitude of own ship; ϕ 1 is the latitude of own ship; λ 2 is the longitude of target ship; ϕ 2 is the latitude of target ship; L is own ship length; r is the distance from the center of own ship to the boundary of the domain. f ≤ 0 represents own ship is involved in near miss, and f > 0 represents own ship is not in near miss.
In the scene of encountering, one ship is usually taken as the research object, which is defined as own ship, to analyze the relationship between other ships. For the two ships involved in the same encountering, either ship can be selected as own ship, and the other ship is target ship. The encountering formed between ship A and ship B is shown in Fig 4. The length of ship A is shorter than that of ship B. In Fig 4 (a), the scene is named as A-B scene if A is selected as own ship. In this scene, the position of B is not in the domain of ship A because length of ship A is short, so ship A is not in near miss. In Fig 4 (b), the scene is name as B-A scene if ship B is selected as own ship. In this scene, the position of A is in the domain of ship B because of ship B's large domain, so ship B is in near miss.
According to the above-mentioned example, the scene of near miss with two ships encountering should be divided into four types: both A and B are in near miss, only A is in near miss, only B is in near miss, neither A nor B is in near miss. The status of A and B are shown in Table 3. The flow diagram of the model for detecting near misses is shown in Fig 5. The data structure is represented in Eqs. (14) and (15), as shown at the bottom of the next page, where Dataset (i, j) is the data set of the scenes of near misses which the ith trajectory and the jth trajectory formed, traj(i, j, t) is the data of the ith trajectory and jth trajectory at time t.

C. THE ALGORITHM FOR CALCULATING DURATION OF NEAR MISSES
In this article, the process of a single near miss is defined, which is the process from target ship sailing into the ship domain of own ship to leaving. The duration of the process is defined as the duration of a single near miss. This article presents the algorithm for calculating duration of near miss, in order to get the duration of near misses happened between two selected ships. Algorithm 1 presents the algorithm's corpus. Because the sample rate is 1 minute in this article, 1 minute is also selected as the time threshold.

D. SIMILARITY COEFFICIENT MODEL OF SHIPS' LENGTHS
Due to the complex patterns of maritime traffic, the absolute length difference between own ship and target ship cannot fully show the similarity of two ships' lengths [41]. Therefore, the similarity coefficient model of ships' lengths is defined as where L OS is the own ship length, L TS is the target ship length, L diff is the absolute difference of the own ship length and the target ship length, L sim is the similarity coefficient of the target ship length and the own ship length. L sim > 0 represents the own ship length is longer than the target ship VOLUME 8, 2020 T ← t end -t start 13.
T ← t end -t start 17.
t start ← traj k+1 .t 20. end while 21. return Duration of near miss length. L sim < 0 represents the own ship length is shorter than the target ship length. L sim = 0 represents the own ship length is equal to the target ship length. The closer L sim is to 0, the more similar the lengths of the two ships.
Given two groups of data, one with a length of own ship is 400 m and target ship is 350 m, and the other with 100m and 50 m, respectively, L diff will have the same value, which is 50 m. In fact, the first group is more similar in length. In the above-mentioned example, L sim has values of 0.125 and 0.33, respectively. The closer the value of L sim is to 0, the more the similarity of the ships' lengths.

E. K-MEANS CLUSTERING ALGORITHM
k-means clustering is a classic clustering algorithm, which is widely used in data analysis because of its good clustering effect. The mathematical function model is shown as follow: where x j − c p 2 2 is the squared Euclidean distance between pth cluster centroid and jth data point; j = 1, 2, · · · , n; p = 1, 2, · · · , k; n is the total number of data points and k is the number of clusters. The aim of k-means algorithm is to minimize the objective function J .
In this article, the k-means clustering algorithm is used to cluster the duration of near misses, and then the relationship between the duration of near misses and the similarity coefficient of ships' lengths is analyzed. The data set X = {Duration 1 , Duration 2 , · · · ,Duration n−1 , Duration n } is the duration of near misses, where the number of near misses is n. The k-means clustering can be divided into the following steps: Step1: Randomly select k initial cluster center sample points {c 1 , c 2 , · · · , c k } from the data set X ; Step2: Cluster the data set X according to the clustering center {c 1 , c 2 , · · · , c k } to obtain k clusters {C 1 , C 2 , · · · , C k }, for any x j ∈ X , if x j − c p 1 2 2 ≤ x j − c p 2 2 2 , p 1 = p 2 , p 1 , p 2 = 1, 2, · · · k, then x j is classified as C P 1 ; Step3: Update the center of the adjustment cluster to get the new cluster center, which is recorded as c * 1 , c * 2 , · · · , c * k , c * p = 1 n p x j ∈C p x j , where n p is the number of samples contained in C p ; Step 4: Judge whether the iteration termination condition is satisfied: the iteration stops when c * p = c p , p = 1, 2, · · · k, and the clustering result {C 1 , C 2 , · · · , C k } and cluster center {c 1 , c 2 , · · · , c k } are output when the value of clustering error is minimum; otherwise, Step2 is returned.
k-means clustering algorithm requires determining the optimal parameter k artificially. However, it is difficult to determine the appropriate value of k, when the dataset is large or there is short of prior knowledge. Elbow method is effective to assess the validity of clustering results, which focuses on the percentage of variants as the function of the number of clusters. There should be an optimal value of traj(i, j, t) = {MMSI i , MMSI j , LAT i , LON i , heading i , speed i , LAT j , LON j , heading j , speed j , relative_speed, t} (15) 207722 VOLUME 8, 2020 k-means algorithm, adding the value of k will not contribute to the Sum Square Error (SSE) value significantly. The value of k is added and the SSE value is recorded.
SSE is the sum of the average Euclidean Distance of each point against the centroid. We can try to find the optimal natural number of clusters in a data set by looking for the number of clusters at which there is a knee in the plot of the evaluation measure when it is plotted against the number of clusters. Starting from k = 2, the SSE value is increased with k. The largest difference between the SSE at k and the SSE at k − 1 is the point in which the optimal k value is selected. When the value of k is re-added, the new cluster is similar to the previous cluster and the number of errors does not change significantly which resulted in the value of k.

III. EXPERIMENTAL RESULTS AND DISCUSSIONS A. TIME CHARACTERISTICS OF MARITIME TRAFFIC RISK BASED ON NEAR MISSES
To study the characteristics of maritime traffic risk based on near miss from temporal perspective, the expert questionnaire method was adopted in the process of data collection. 30 interviewees included engineers, professors engaged in the research of maritime traffic risk, captains, chief officers with practical collision avoidance experience and doctoral students with relevant research interests. After ensuring fully understand the purpose of this study, they were asked to fill in the questionnaire according to their knowledge. The maritime traffic risk is divided into 5 levels, and the higher the level is, the greater the risk is. The interviewees judged the level of maritime traffic risk according to the different duration of near misses, and the statistical result is shown in Fig 6. The number in grid represents the expert scoring results.
It can be seen that in 0-30 minutes, the longer the duration of near miss is, the greater the maritime traffic risk is; after 30 minutes, the maritime traffic risk decreases, but still  remains at a high level. As a whole the long-duration near miss means high maritime traffic risk. Generally, the shortduration near miss means the distance of two ships is so far that the ships can escape danger through maneuvering quickly, whereas the long-duration near miss means two ships are in danger situation for a long time, and the operator must focus on the dynamics of the two ships to avoid collisions until the ships are out of danger, which increases the workload of the operators. The operators adapt to the new navigation state which involved in near miss lasting more than 30 minutes, so the risk level for operators is reduced, but it is still high at this time. Although the characteristics are easy to understand, there has not been a detailed quantitative analysis of the characteristics. The accurate research results are helpful to the collision avoidance decision of ships. Therefore, the researches of maritime traffic risk from temporal perspective are of high theoretical and practical value.  Table 4. Because the cooperative operations between merchant ships and engineering ships may be wrongly detected as near misses or collisions [4], engineering ships such as tugs are not considered in this article. Only cargo ships, oil tankers, and passenger ships are selected for this article. Finally, 16,419 trajectories are obtained, which include the information of 2,224 cargo ships, 333 oil tankers, and 14 passenger ships.

C. THE NEAR MISS FORMED BETWEEN TWO SHIPS
A statistical study is made of the duration of both two ships are in near miss and only one ship is in near miss in the study waters, and the results are shown in Table 5.  It can be seen from Table 5, the number of both two ships are in near miss is smaller, and the average duration is shorter. When both two ships are in near miss, the position of each ship is in the domain of the other, both ships have collision risk. Only one ship has collision risk when only one ship is in near miss. Therefore, the scenes of two ships are in near miss should have higher risk, and the number of near misses is smaller. High-risk scenes should be avoided, and the ships should leave the danger as soon as possible when the high-risk scenes are unavoidable. Therefore, the average duration of near misses is short when both two ships are in near miss.

D. CHARACTERISTICS OF DURATION OF NEAR MISSES BASED ON SHIP TYPE
Cargo ships, oil tankers, and passenger ships are selected as the research objects to count the number of ships which involved in near misses, and the result is shown in Table 6. It can be seen that the number of cargo ships is largest and that of the passenger ships is smallest. According to the different scenes, the number of near misses and the total duration of near misses are calculated, and then the average duration of near misses is also calculated. The results are shown in Table 7.
It can be seen from Table 7 that the longest average duration of near misses is 444.74s, which is formed between two cargo ships. Due to the number of cargo ships in study waters is largest, the number of near misses formed between cargo ships is the largest, which is 5419. Observing the scenes of cargo-cargo, cargo-tanker, and cargo-passenger, the average duration is longer than 350s when the type of own ship is the cargo ship. Observing the scenes of passenger-cargo and passenger-tanker, the average duration of near misses is shorter than 300s when the type of own ship is the passenger ship.
In the study waters, the two scenes with the smallest number of near misses are passenger-cargo and passenger-tanker, wherein the number of near misses is 14 and 2 respectively. Since the high safety requirements of the passenger ships, the operators must be very cautious and keep sufficient safety distance to ensure safety. Therefore, the number of near misses is the smallest and the duration is the shortest if own ship is the passenger ship. Table 8 is the statistical table of near misses of cargocargo, cargo-tanker, tanker-cargo, cargo-passenger, tankertanker, passenger-cargo, and passenger-tanker. Observing the scenes of cargo-cargo, cargo-tanker, tanker-cargo, the  number of long-duration near misses is smaller and the number of short-duration near misses is larger. Observing the scenes of cargo-passenger, tanker-tanker, passenger-cargo, and passenger-tanker, the numbers of near misses are all small and the duration of each near miss is short. The duration of each near miss is no longer than 1000s, which formed between passenger ship and cargo ship, as well as passenger ship and oil tanker, and the duration of each near miss is no longer than 1500s, which formed between oil tankers. On the whole, the number of long-duration near misses is smaller and the number of short-duration near misses is larger.
The advantage of box-plot is that it is not affected by outliers and can describe the discrete distribution of data in a relatively stable way, so it is used to analyze the duration of near misses which formed in different ship types. Fig.7 is the box-plot of the duration of near misses for cargocargo, tanker-cargo, cargo-tanker, cargo-passenger, tankertanker, passenger-cargo, and passenger-tanker, where ''ship type1-ship type2'' represents the near miss which is formed between ship type1 and ship type2, such as ''cargo-cargo'' represents the near miss which is formed between cargo ships, is the long-duration near miss, green is the average duration of near misses, and orange line is median value of the duration of near misses. 207724 VOLUME 8, 2020  It can be seen that there are significant differences between the number of long-duration near misses which formed between different ship types. The long-duration near misses which formed in cargo-cargo, tanker-cargo, cargo-tanker, are more than those formed in cargo-passenger, tanker-tanker, passenger-cargo, and passenger-tanker.
The number of cargo ships in Bohai Sea is the largest, the chance of forming long-duration near misses is greater, so the number of long-duration near misses which formed in cargo ships is the largest. The upper edge of each box is lower than 1500s (25min), indicating that the duration of most near misses is shorter than 1500s, and the number of short-duration near misses is largest.
The above-mentioned results indicate that cargo ship is more often than others involved in near miss; passenger ships have higher safety requirements, so the number of near misses is small, and the duration is short; the high frequencies of short-duration near misses are found; the longer the duration of near misses is, the smaller the number of those is. The statistical results of the questionnaire show that when the duration of near misses is less than 30min, it is generally considered that the longer the duration of near misses is, the higher the risk is. However, the duration of most near misses in Bohai Sea is less than 25min, so it is thought that the results of the questionnaire are applicable in the study water.

E. CHARACTERISTICS OF DURATION OF NEAR MISSES BASED ON SHIP LENGTH
Ship length is an important factor that affects the safety of ships. The lengths of all ships which involved in near misses in the study waters are counted in this article, and the results are shown in Fig.8. The number of ships with length of 220-230m is the largest, followed by 190-200m and 290-300m, all the numbers are larger than 600. Fig.9 is the scatter diagram of duration of near misses with own ship length and target ship length. Each point represents a single near miss. The red point indicates that duration of a single near miss is greater than 2000s, and the blue point indicates that duration of a single near miss is less than or equal to 2000s. It can be seen that the distribution of points VOLUME 8, 2020  representing duration of near misses shorter than 2000s are very dense, while the distribution of points representing duration of near misses longer than 2000s are sparse. According to statistics, the number of the scenes of near misses with the duration less than 2000s is 6132, and the ratio of that to the number of all near misses is 98.1%. It indicates that the number of long-duration near misses is smaller and the number of short-duration near misses is larger. Fig.10 is the scatter diagram of the own ship length and the target ship length in each near miss, and the detailed data are presented in Table 9. It can be seen that the points are dense, and the number of near misses is large, due to the large number of the ships with length is 220-230m, 190-200m, and 290-300m in the study waters. When the target ship length is shorter than 200m, the points are dense, and the number of those is 4621. When the target ship length is longer than 200m, the points are sparse, and the number of those is 1627. Especially, there are only 10 points when the own ship length is shorter than 100m and the target ship length is longer than 200m. It means that own ship is more likely to be involved in near miss if the target ship length is short than 200m. Fig.11 is the scatter diagram of the own ship length and the duration of near misses and Fig.12 is the scatter diagram of the target ship length and the duration of near misses. It can be seen from Table 9, Fig.11, and Fig.12, the number of near misses is only 163, when the own ship length is shorter than 100m, which is smaller than that when the target ship length is short than 100m. However, the number of near misses is   950 when the own ship length longer than 300m, which is larger than that when the target ship length longer than 300m.
Using Eqs. (16) and (17) to calculate the similarity coefficient of ships' lengths between the own ship length and the target ship length, the results are presented in Fig.13 (a). The number of near misses is 1651 when L sim < 0, and the number of near misses is 4597 when L sim ≥ 0. There are 28 points when L sim ≥ 0 and the duration is longer than 3000s. However, there are only 5 points when L sim < 0 and the duration larger than 3000s.
k-means clustering algorithm is used to cluster the duration of near misses in Bohai Sea. The appropriate value of k is determined by Elbow method. Fig.14 and Table 10 show that the value of SSE at k = 2 to k = 10. The value of SSE is 13789122 at k = 2 then 6544511 at k = 3, which create a difference of 7244612. The others are 4174451, 2872259, 1798883, 1154094, 849204, 617525, 473775 at k = 4 to k = 10 and the difference gradually decrease, whose ranges from 2370059 to 143750. It can be seen that the change is quite drastic at the value of k = 3 wherein SSE is 6544511, which means that there is a difference of 7244612, higher than 2370059 at k = 4. It can be concluded that the optimal k for the clustering data in Bohai Sea is 3 based on elbow method.
To analyze the overall variation of the data, different parameters k are selected to compare the results of clustering. Due to different value is large when k = 3, 4, 5, and 6, k is set to 3, 4, 5, and 6 respectively. According to the clustering results, the duration range of each cluster, the range of L sim , the number of data and ratio of each cluster are counted, and then the average duration, the span of duration and the span of L sim are also calculated. The results are presented in Fig.13 (b) (c) (d) (e), and the detailed data are given in Tables 11, 12, 13, and 14 respectively.
Observing the last cluster of each table, in Table 11 and 12, the ranges of L sim both are [−2.81, 0.87], and the spans of L sim both are 3.68; in Table 13 and 14, the ranges of L sim both are [−2.81, 0.85], and the spans of L sim both are 3.66; in each table, the average duration is shorter than 200s, the range of L sim is the largest, the span of L sim is larger than 2900s, the number is larger than 3000, the radio is higher than 50%. Observing the cluster 1 of each table, the average duration is longer than 2300s, the range of L sim is the smallest, the span of L sim is smaller than 2, the number is smaller than 250, and the radio is lower than 4%. It means that the number of short-duration near misses is the largest; the frequencies of long-duration near misses are low; and the frequencies of short-duration near misses are high regardless of L sim .  Observing the data from the first cluster to the last cluster, it can be seen that the shorter the average duration is, the larger the span of L sim is. However, in Table 14, the span of L sim of cluster 3 is 2.3, which is larger than that of cluster 4.  Observing the Fig.13 (d), there is only one point in cluster 3 with the L sim which is about -1.5. It means that this point is a special value point that should be ignored, and the law what is got in the former is still valid.
Comparing the span of L sim , the minimum of L sim in each cluster changed a lot, the value of the largest difference can reach 2.31 (cluster 1 and cluster 6 in Table 14), but the maximum of L sim changed little, which is larger than 0.72 and smaller than 0.82. It can be seen that the longer the average duration is, the larger the value of L sim is, which means the long-duration near misses generally have the large L sim .
The above-mentioned results indicate that the frequencies of short-duration near misses are high regardless of L sim , the frequencies of long-duration near misses are low, and the long-duration near misses generally have the large L sim .

F. CHARACTERISTICS OF DURATION OF NEAR MISSES BASED ON SHIP SPEED
In this article, the average speed is defined as the average value of the ship speed during the whole process of a single near miss. The average speed of own ship and that of target ship are calculated respectively. The results are shown in Fig.15, and the detailed data are given in Table 15. When the average speed of the own ship and the average speed of target ship are both lower than 15kn, the number of near misses is 5905, and the ratio is 94.5%. It means that, the ships are more likely to be involved in near miss in case the average speed of the own ship and the average speed of target ship are both lower than 15kn. Fig.16 is the scatter diagram of the duration of near misses, the average speed of own ship and the average speed of target ship. The red point indicates that duration of a single near miss is greater than 2000s, and the blue point indicates that duration of a single near miss is less than or equal to 2000s. It can be seen that the number of long-duration near misses is smaller and the number of short-duration near misses is VOLUME 8, 2020    larger. There are few ships involved in long-duration near misses when the average speed of own ship and the average speed of target ship are both high. Fig.17 is the scatter diagram related to the average speed of own ship and the duration of near misses, and Fig.18 is the scatter diagram related to the average speed of target ship and the duration of near misses. It can be seen from Table 15, Fig.17, and Fig.18, the number of near misses is only 134, which is smaller than that when the average speed of target ship is higher than 15kn in case the average speed of own ship is higher than 15kn. However, in case the average speed of own ship is higher than 5kn, the number of near misses is 2384, which is larger than that when the average speed of target ship is less than 5kn.    The average relative speed is defined as the average value of relative speed between own ship and target ship during the whole process of a single near miss, which can show the overall situation of the relative speed of the two ships VOLUME 8, 2020 in this scene. The average relative speed in each near miss is calculated, and the results are presented in Fig.19 (a). It can be seen that, there are a few cases that the duration of near miss is longer than 4000s when the average relative speed is less than 5kn. However, in case the average relative speed is more than 15kn, there are almost none situation that the duration of near miss longer than 1000s. As the average relative speed increases, the number of near misses decreases.
According to the k-means clustering results in Section III, the range of average relative speed, the number of data and ratio of each cluster are counted, and then the average duration, the span of duration and the span of average relative speed are calculated. The results are presented in  Observing the last cluster of each table, the range of average relative speed and the span of average relative speed are both largest, which are [0. 11,29.05] and 28.94 respectively; the average duration is shorter than 200s, the number is larger than 3000, and the ratio is higher than 50%. Observing the cluster 1 of each table, the average duration is longer than 2300s, the range of average relative speed is the smallest, the span of average relative speed is smaller than 2, the number is smaller than 250, and the radio is lower than 4%. It means that the number of short-duration near misses is largest; the frequencies of long-duration near misses are low; and the frequencies of short-duration near misses are high whatever relative speed.
Observing the data from the first to the last cluster, it can be seen that the shorter the average duration is, the larger the span of average relative speed is. Comparing the span of average relative speed, the maximum of average relative speed in each cluster changed a lot, the value of the largest difference can reach 22.76kn (cluster 1 and cluster 6 in Table 19), but the minimum of average relative speed changed little, which is larger than 0.1 and smaller than 0.35. It can be seen that the longer the average duration is, the smaller the value of maximum of average relative speed is, which means the long-duration near misses generally happen with the small relative speed.
The above-mentioned results indicate that the frequencies of short-duration near misses are high whatever relative speed, the frequencies of long-duration near misses are low, and the long-duration near misses generally have the small relative speed.
The duration, L sim , and the average relative speed of each near miss are presented in Fig.20. The red point indicates that duration of a single near miss is greater than 2000s, and the blue point indicates that duration of a single near miss is less than or equal to 2000s. It can be seen that the longer the duration is, the sparser the data points are, which means the number of long-duration near misses is smaller; the long-duration near misses generally have large L sim and small average relative speed; the short-duration near misses exist, regardless of the value of L sim and average relative speed.

IV. CONCLUSION
The duration of near misses shows the maritime traffic risk, the long-duration near miss means high maritime traffic risk. The navigation safety domain is used to detect near misses accurately from AIS database. The influence of ship maneuverability, size, speed, COLREGS on ship safety are considered, and then the model for detecting near misses based on the navigation safety domain is proposed. Concept about duration of a single near miss is defined as time from target ship sailing into the ship domain of own ship to leaving, and then the model for calculating the duration of near misses is proposed. Due to the absolute difference of two ships' lengths which cannot fully show the similarity of those, the similarity coefficient model of ships' lengths is proposed. The duration of each near miss in the study waters is calculated, and the clustering results of k-means for the duration of all near misses are analyzed. Finally, some connections between duration of near misses, ship type, length and speed are found in this article.
Passenger ships have high safety requirements, so the number of near misses is small, and the duration is short; the frequencies of short-duration near misses are high regardless of ship type, the value of L sim and the value of average relative speed; the connections between long-duration near misses, ship length and speed are strong, the frequencies of long-duration near misses are low, and the long-duration near misses generally have large L sim and small average relative speed. Nonetheless, the model can be further refined to account for factors such as hydrometeorological conditions. The connections between the duration of near misses and the traffic environment are also worth studying. This article provides a method for studying maritime traffic risk from temporal dimension perspective, the connections found in this article can help officers to automatically detect the high potential risk scenes and help researchers and maritime traffic management agencies to better understand the maritime traffic risk from temporal dimension perspective.
YANGYU ZHOU received the B.E. degree in navigation technology from Jimei University, Xiamen, China, in 2018. He is currently pursuing the master's degree in navigation science and technology with the Navigation College, Dalian Maritime University, Dalian, China. His current research interests include ocean big data analysis, AIS data mining, and marine traffic engineering.