Deep Neural Network Architectures for Momentary Forecasting in Dry Bulk Markets: Robustness to the Impact of COVID-19

The COVID-19 pandemic has severely affected various global markets, increasing the need for new forecasting models for the dry bulk market. Therefore, this study proposes deep neural network (abbreviated DNN) architectures to build a model for momentary forecasting that does not affect accuracy in the case of economic shocks (i.e., COVID-19) and elucidates the strategy for obtaining DNNs. First, since momentary and short-term forecastings are fundamentally different, they might use independent methods; as such, I apply DNN for the time series classification to momentary forecasting. Second, the proposed architecture is constructed by considering sparsity, because designing DNN architectures robust to any impacts is a type of overfitting prevention for deep neural networks. Finally, this study proposes indices for quantitatively evaluating the DNN architectures that represent the realized forecasting performance of various deep neural networks. Using these indices, I demonstrate that optimal architectures may need to have model sparsity in the DNN (i.e., sparsity independent of the input data). The importance of this issue has been demonstrated experimentally. As a result, the architectures achieved target performances of 88%, 91%, and 79% accuracy and had stability for Panamax, Supramax, and Capesize vessels, respectively from February 2016 to September 2021 (i.e., five years and eight months). It is difficult to identify a correlation between model performance and volatility. Furthermore, before and after the COVID-19 shock, the performance of the proposed models compared to the optimal one exceeds that of other four recent models, namely “Facebook Prophet,” “DARTS,” “SKTIME,” and “AutoTS”.


I. INTRODUCTION
The global market experienced a major setback in 2020 due to the outbreak of the COVID-19 pandemic and came within the sphere of the impact [1], [2], [3], [4]. As various pieces of evidence regarding the impact of COVID-19 on the supply chain have been identified in many studies [5], [6], [7], analyzing and forecasting the indicators related to maritime logistics that are affected by the supply chain has become a significant research area. Therefore, this study focuses on dry bulk vessels, which account for the largest share of world The associate editor coordinating the review of this manuscript and approving it for publication was Huiyu Zhou. seaborne trade [8] and are responsible for transporting dry bulk commodities such as iron ore, coal, and grains [9], [10]. The main purpose of this study is to provide reliable deep neural algorithms for forecasting the time series of freight rates in dry bulk markets, which directly affect the profits of agents in the market-shipowners, charterers, and shippers.
Of course, several studies present relevant results on forecasting freight rates for dry bulk carriers, as discussed in the literature review section below. However, most of these studies have focused on short-term forecasting (that is, 18 months more or less) and did not consider momentary forecasting (i.e., a week) as an independent topic. The momentary forecasts are the simplest short-term forecasts. In many fields, VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ for the forecasting of time series, the algorithms depend on timescales [11], [12], [13]. In his maritime economics textbook [9], Martin Stopford not only explained that momentary forecasting is a different problem from short-term forecasting, but also remarked that solving this problem is worth investigating: ''''Momentary forecasts are concerned with days or even hours. This is the time scale of charterer shipbroker and traders who decide whether to fix a ship or cargo. . . . . . . This is forecasting at the sharp end at the frontier information availability with no time thick reports. A risky profession, but very rewarding for those who are good at it.'''' Depending on whether a contract is decided this week or the next, a loss or profit could occur; that is, large differences could often be observed between the average rates between two subsequent weeks. Consider a global mining company that plans to move iron ore from Brazil to China in the last week of March or the first week of April 2020. Using time charter, the company would fix (borrow) Capesize carriers for 90 days. According to Clarkson data [14], the average rate for the last week of March 2020 was 3274 USD/day average rate for the first week of April 2020 was 6337 USD/day. Since the ships were borrowed for three months, if the mining company concluded the contract in the last week of March 2020, it could save around 276,000 USD/ship.
Considering the above properties of momentary forecasting, such that: (i) agents in the shipping market are eager to know the information in the near future, (ii) when dealing with time series with high volatility, it could be difficult to perform momentary forecasting with good accuracy. The author attempted to find an alternative approach and justify its use. The specific strategies are as follow: First, this study simplifies the problem compared to previous studies. To predict future values in the dry bulk markets, regardless of the type of forecasting method, such as econometric methods ( [30], [31], [32]) or neural networks ( [33], [34], [35]), the problem can be rewritten mathematically as follows. As M and L are positive integers and M ≫ L, the input data are in time series from t = −M (a specific point in the past) to t=0 (present), and the output data are in time series from t=1 (a close point in the future) to t = 0 (a specific point in the future).
The author denotes the input data as X = · · · , x (2) −M , · · · , x (2) −100N · · · , x (2) −1 , x (1) 0 , x (1) −M , · · · , x (1) −100N · · · , x (1) −1 , x (1) 0 and defines the output data as Y = (x 1 , x 2 , · · · , x L ). Using the neural network, the author wants to find the function f : X → Y , that is, To achieve this objective of building a forecasting model that is robust to major impacts, the author will study a simpler problem than Eq. (2) and proposes the following approach: where L is a positive integer and L ≤ N . In other words, neural networks for time series classification are applied to this forecasting problem. For example, as the author aims to conduct weekly forecasting using past daily data, he considers the case when both x i denote daily data and integers N and L represent four and five, respectively. Second, the author uses data that are ''accessible''. Since input data comprise the freight rates of dry bulk carriers and their mathematical form is a small-sized vector, if this method has high accuracy, it is an interesting approach. However, because the input data are too small, generalization through neural networks is unsuccessful. Hence, to design a neural network with successful generalization, the author considers a ''sparse structure'' in a deep neural network (DNN) based on a mathematical concept. In addition, to evaluate whether the neural network is appropriate, he formulates indices that indicate reproducible accuracy. To help the agents in the shipping industry, who may not assign much time to analyze data by computing or performing such analyses using mobile phones, the author must ensure that data are ''accessible'' through Wilkinson et al.'s FAIR Guiding Principles for scientific data management and stewardship [36]. To be accessible, data must be publicly available on digital platforms to users and be offered at a low cost or for free. Specifically, when input data for a proposed model ( [37], [38], [39], [40]) are represented by freight forward agreements (FFA), in addition to freight rates, FFA data help improve forecasting accuracy, but they are not considered accessible because it is more difficult to obtain FFA data from typical platforms than it is to obtain freight rates; further, FFA data are typically not free.
However, accessibility is not limited to data, and the author extends this concept to algorithms or tools that will be used to analyze the data. If data have high accessibility, the proposed algorithm and the related software must be low priced or free and can be used as a simple data-mining tool for actual users and easily accessible. For example, if the model demands time series freight rates, represented as daily data over several months, to obtain the forecasting for the following week, it is not easy to use such a model on a mobile phone. Hence, the author plans to build a model that satisfies the following conditions. First, input files must be simple and inexpensive. Second, the model must be readily accessible and adopted, modified, or enhanced using open-source software.
Third, the author intends to search for a proper DNNs structure. The simplification, which changes Eq. (2) to Eq. (3), not only makes it easier, but also changes the mathematical nature of the problem. In other words, the above-proposed problem in Eq. (3) is not used to forecast specific values in a future period, but to classify images from the time series. DNNs are useful tools for image classification ( [41], [42], [43]), and employing them is thus reasonable for studying this problem. As previously mentioned, the model must be capable of accurately forecasting future data, which is similar to the accuracy based on past data. From the perspective of computer science, the problem is reduced by designing architectures of DNNs to prevent overfitting. Since preventing overfitting relates closely to sparsity in DNNs, the author focuses on the mathematical concept. Hence, a considerable portion of this study deals with the mathematical analysis of sparsity and experiments on various DNN structures depending on sparsity.
Finally, to evaluate DNN architectures, the author introduces indices as a measure of the reproducibility of forecasting models from the architectures. This study experiments with architectures and selects the most optimal one among them for the best performance. Therefore, quantitative selection criteria are required. As this study aims to propose a forecasting model that agents can use with confidence, the degree of stability in the DNN is closely related to the criteria. Specifically, although the DNN is unstable with respect to accuracy, realizable accuracy, or its lower bound in the model from the DNN is used as a value for the selection criteria or evaluation indices, which will be built and used to determine the best DNN architecture.
Two points are worth noting regarding the models extracted from the DNN algorithm proposed in this study. First, the proposed models surpass the performance of existing algorithms. To demonstrate the excellence of the proposed models, experiments for momentary forecasting are conducted with data ranging from September 2020 to September 2021, a period directly affected by the COVID-19 impact. In this experiment, the existing algorithms are recently developed libraries for the short-term forecasting of time series and, by the number of false forecasting, the proposed DNN's architectures produce the lowest values. This indicates that it is necessary for neural network algorithms to depend on the forecasting scale.
Second, even if the period is extended, models from the author's architectures are still valid. For a considerably long period, from March 2016 to September 2021, the authors experimented the momentary forecasting and show to maintain high levels of accuracy. Considering that it has experienced various economic shocks during this long period, including COVID-19, the accuracy of this model from his algorithms seems to be independent of economic impacts. Furthermore, it may be difficult for anyone to find a correlation between the volatility of the time series and the forecasting performance of this model. Since, according to [44], volatility used to be interpreted as a measurement of uncertainty about future movement regarding asset prices, the result in this study can provide a new perspective that goes beyond the traditional view regarding time series analysis and helps market participants to use this forecasting model without worrying about whether there will be an impact.
The remainder of this paper is organized as follows. Section II reviews previous studies on forecasting freight rates based on artificial neural networks (abbreviated ANNs). Section III describes the data and problem. Section IV introduces the theoretical background and proposes a specific methodology of DNNs for time series classification. Section V presents the results of momentary forecasting and provides their interpretation and excellence. Section VI draws conclusions and presents the study's implications.

II. LITERATURE REVIEW
There are three methods for forecasting shipping freight rates: shipping market models, econometric linear models, and models from machine learning. The first two approaches have theoretical limitations. In the first case, the relationship between supply and demand cannot be adequately applied to momentary forecasting. Moreover, despite the non-linearity of the time series, the econometric approach is based on the premise that a time series is linear. Hence, this study reviews only the literature on ANNs.
For monthly forecasting, Li and Parsons [45] used models from the multilayer perceptron for forecasting monthly, 5-month, and yearly tanker freight rates. Lyridis et al. [46] also proposed models for forecasting tanker spot rates using a training set from February 1981 to September 2009 and a test set from December 2000 to December 2003. A neural network has two hidden layers, consisting of a logistic sigmoid and performance of 9-month and yearly forecasts, which are better than the 1-month, 3-month, and 6-month forecasts. Santos et al. [47] used ANNs to forecast the period charter rates of VLCC tankers, which are extracted from neural networks and econometric methods such as multilayer perceptron (MLP) and radial basis function (RBF) neural networks and the autoregressive integrated moving average (ARIMA) method. They showed that neural networks outperformed statistical methods.
Zeng et al. [48] used an alternative neural network to forecast daily data from the Baltic Dry Index (BDI). In their proposed method, the BDI from 1999 to 2013 was decomposed into four parts depending on the characteristics of each period and they built a model based on the data corresponding to each part. They also performed daily and weekly forecasts, outperforming both the ANN and vector autoregression (VAR) models. Yang and Mehmed [40] employed two ANN models, namely, non-linear autoregressive dynamic network (NARNET) and non-linear autoregressive with external input (NARXNET), and compared their performance for daily forecasting for 1-, 2-, 3-, and 6-month ahead predictions for the Baltic Panamax Index (BPI). The input data were historical BPI and FFA data with 825 observations. The study showed that NARXNET outperformed NARNET in the under 6-month-ahead forecasts, and FFA data could improve BPI prediction.

III. PROBLEM AND DATA
In the Introduction, the author summarizes Eq. (3) below.
Here, the problem is not described in detail, as the ''no free lunch theorem'' [49] implies that it is difficult to find the best algorithm applicable to any problem. The mathematical expression is given in Eq. (4): Input x i with i ∈ integer represents daily prices in the dry bulk market. When dealing with weekly forecasting, input data are daily prices and the size is N = 4. The author denotes the number of observations for a specific period as l and for weekly forecasting as l = 5. The input data are denoted in vector form ( such that x i is the daily freight rate and 0, −1, −2, −3, and −4, represent Friday, Thursday, Wednesday, Tuesday, and Monday, respectively. To analyze the dry bulk market, the author selects Baltic Exchange Indices from January 2, 2006, to September 9, 2021. Specifically, the author uses a neural network and three indices, that is, the Baltic Capesize Index 5TC (BCI), Baltic Panamax Index 5TC (BPI), and Baltic Supramax Index 10TC (BSI), derived from Clarkson Research [14], which is the largest shipping database.
As with any other kind of business data, analysis of the Baltic Indices requires data imputation. Because missing data lead to a distorted model and incorrect forecasting.
For example, there are four freight rates in the week with May Day and 20% information loss. Therefore, imputation is necessary. Except at the end and the beginning of the year, freight rates are not traded on the Baltic Exchange on Fridays or Mondays; therefore, extrapolation is used [50], [51], [52]. In other words, ( All input data corresponding to the three types of dry bulk carriers are divided into training and test sets. The former set ranges from January 2, 2006, to February 12, 2016, and the latter from February 15, 2016, to September 9, 2021. As the author expects to build the model using daily data over a week (i.e., five rates), there are 499 observations in the training set and 280 in the test set.

IV. THEORITICAL BACKGROUND AND METHODOLGY
As described in Section III (Problem and Data), the author focused on the future of dry bulk markets, namely, whether the average freight rates for a future period will rise or fall compared to a past period (see Eq. (3)), but did not predict the exact values of freight rates in the future, as shown in Eq. (2). Methodologically, momentary forecasting is identical to TSC. To reduce data bias, an end-to-end approach is required for TSC; that is, the author should not conduct any data mining except for a few imputations. As such, DNNs are more suitable than other machine learning algorithms.

A. DEEP NEURAL NETWORKS FOR TIME CLASSIFICATION AND MOTIMATION
Fawaz et al. [53] proposed nine DNN approaches for TSC: multilayer perception (MLP), fully convolutional networks (FCN), residual networks (ResNet) encoder, multi-scale convolutional neural network (MCNN), time Le-Net (t-LeNet), multichannel deep convolutional neural networks (MCD-CNN), and time convolutional neural network (Time-CNN). Of these, six neural networks have local pooling layers, which are mappings from input images to features and generate a loss of spatial information [54]. For example, in the case of 2 × 2 max pooling, 75% of the information is lost, as shown in Figure 1. The input images in this problem are vector-type having five elements, and do not contain a lot of information. The six methods with pooling layers are thus not suitable for the applications in this study. Hence, the author considers three algorithms without a pooling layer: MLP, FCN, and ResNet.
Wang et al. [20] proposed these three neural network architectures, which are publicly available at https:// github.com/cauchyturing/UCR_Time_Series_Classification_ Deep_Learning_Baseline. Since they are baselines for TSC, the structures of the three architectures are simple and have the potential for modification or improvement. Before examining an appropriate algorithm, the author attempts an experiment using the prototype algorithm of Wang et al. [55]  and describes the results for the forecast performance shown in Figure 2.
Accuracy is measured to evaluate the performance of the models based on these three architectures. The number of correct forecasts is denoted by T, and F represents the number of incorrect forecasts. accuracy can be defined as: Figures 2(a)-(c) present the accuracy curves depending on the epoch for the three neural network architectures. The blue lines show the accuracy measured using the training data, and the red lines the accuracy measured using the testing data. The accuracy curves in these graphs exhibit two properties. First, one curve is overfitting or underfitting; that is, there exists a gap between the two curves represented as the accuracy measured using training or testing data. This implies that the original architectures did not appropriately explain the data appropriately. Second, the accuracy curves at any epoch exhibited fluctuation. This property sometimes occurs when neural networks are applied to a classification problem [56], [57]. To obtain better neural networks than prototype DNNs, the author needs DNNs to analyze them mathematically.

B. MATHEMATICAL REPRESENTATION OF DNNs
A DNN corresponds to a type of composite function [58] that comprises three or more vector-valued functions (i.e., two or more hidden layers and one output layer) [59]. As general DNNs are nonlinear mappings between tensors, the author limits the scope of his discussion. In this study, the DNN defines function F DNN from a vector (input) X ∈ R n to a vector for one-hot encoding Z, F DNN : R n → Z . If the DNN has (L − 1) hidden layers, the DNN F DNN is represented as: where layers are represented as vector-valued functions F i : are weight matrices and the bias for affine transformation, respectively, and the output layer F L labels function F L : R n L → Z [53]. Each layer consists of neurons, which are the units required to calculate a vector from a previous layer and can be expressed as follows: where f ij : R n i−1 → R for j = 1, 2, · · · , n i are functions represented as neurons, or the smallest units for calculating data in the neural network. Since the neural network uses supervised learning, the author builds the model under the neural network by following the three steps given below.
Step 1) Consider input data X, output data Y and ordered set (X , Step 2) (Optimization) The categorical cross entropy f CCE (·; W 1 , · · · , W L , b 1 · · · , b L ) based on the softmax function is defined as: and the author extracts the vector consisting of parameters W ′ 1 , · · · , W ′ L , b ′ 1 · · · , b ′ L in a neural network having L layers to minimize the mean cost function cost (·), which is VOLUME 11, 2023 generated from entropy f CCE x j : where cost X, Y , F DNN ) Step 3) (Regularization) Using vector the author compares the value of mean cost function cost (, ) corresponding to the test set with the one used for the training set: (12) and the closer the value of α in the above Eq. (12) is 1, the more successful the neural network is.
Hence, the process implies that, if the author extracts a model from the neural network, he/she should consider keeping the training error small and keeping the gap between the training and test error small. It is not inevitable for a neural network to have hidden layers with respect to optimization, by the ''universal approximation theorem'' [60], [61]. However, it is necessary to employ DNNs for building a model with respect to regularization. In other words, it is a necessary condition for successful generalization that the neural network has hidden layers, because DNNs can have a good structural feature, called sparsity.

C. WHAT IS SPARSITY?
Mathematically, a vector x ∈ R n is sparse if most elements in the x are zero [62], and sparsity can be applied to a neural network. When a layer or input is sparse, it means that most or many neurons in the layers rarely activate. If the author makes a DNN sparse, the DNN can reduce overfitting [63], [64], [65], [66], [67], [68]. In other words, ''to create sparsity'' or ''sparsification'' helps the DNN achieve good regularization.
According to Hoefler et al. [69], sparsification can be classified into ephemeral sparsification and model sparsification, and they can be defined as: ''Model sparsification changes the model but does not change the sparsity pattern across multiple inferences or forward passes. . . Ephemeral sparsification. . . it is applied during the calculation of each example individually and is only relevant for this example'.'' In this study, the author applies this classification of sparsification to analyze layers in neural networks. However, Hoefler et al. [69] did not provide mathematical definitions for the model and ephemeral sparsification but only their properties. Hence, using Eqs. (6) and (7), each type of sparsification can be defined as follow.

Definition 1: (ephemeral & model sparsity) Let two sample data in the input set or from (i-1)-th layer as arbitrary vectors be denoted by
When the i-th layer F i : R n → R n i in the neural network and both vectors apply to layer F i in Eq. (6), the author assumes that: and where j ab andj ′ ab for a = 1, 2. · · · , i, and b = 1, 2. · · · , k are positive integers and smaller than n.
(i) If the layer f generates ephemeral sparsity, there is no one pair j ab , j ′ ab such that j ab = j ′ ab . Furthermore, the author defines the set of layers ES as follows: If the layer f generates model sparsity, there exists at least one pair j ab , j ′ ab such that j ab = j ′ ab . Furthermore, the author defines the set of layers MS as follows:

D. LAYERS IN THE NEURAL NETWOKRS
By the ''no free lunch theorem'' [49], the author cannot provide the best method for obtaining optimal neural networks consisting of proper layers. Therefore, his strategy for searching for several optimal neural networks is to evaluate various combinations of layers, namely ReLU, dropout, convolutional operator, block function, and batch normalization. The definitions and properties of each layer are given below.

1) ReLU AND DROPOUT
The author introduces two nonlinear activation functions to ''decide if a neuron can be fired or not'' [70]. First, the rectified linear unit function proposed by Nair and Hinton [71] and the so-called ReLU function is denoted as Relu (·). If x ∈ R n is a vector of input datum or datum from the previous layer, the ReLU function is defined as follows: Second, the dropout function proposed by Srivastava et al. [72] is denoted as f dropout (·; W , r, b) and defined as follows: (20) where * is the element-wise product and r (p) is an independent Bernoulli random vector, where each element is 1 or 0 with probability p or (1 − p). If some layers include the ReLU or dropout function, the neural network has a sparse structure depending on each vector in the previous layer's data, which implies that the two activation functions lead to a neural network with ephemeral sparsification. Regarding the sparsity, both functions in the neural network are illustrated in Figures 3 and 4. Proposition 1: Relu, f dropout ∈ ES , that is, the ReLU and dropout functions generate ephemeral sparsity.

2) CONVOLUTIONAL OPERATOR
In a convolution neural network (abbreviated CNN), as proposed by LeCun et al. [73] for image recognition, there is a mathematical foundation in which the convolutional operator f con is a type of integral transform. In other words, Y is called the convolution of data X in an input layer or previous layers and kernel K , and the so-called Toeplitz matrix [74]. If the VOLUME 11, 2023  input data X is a two-dimensional image, the convolution layer can be represented as: According to the above definition of the convolutional neural network, the convolution layer leads to a neural network with model sparsification. To visualize the relationship between the convolutional neural network and sparsity, let us consider a specific example, where X is a 4 × 4 matrix [x mn ] for 0 ≤ n 1 ,n 2 ≤ 3 and K is a 2 × 2 matrix [k m ′ n ′ ] for 0 ≤ i, j ≤ 3. In this case, convolution Y is a 3 × 3 matrix y ij for 0 ≤ i, j ≤ 2, and Y 's elements are: x (n 1 +i)(n 2 +i) k n 1 n 2 , and, as shown in Figure 5, the convolution layer reduces the number of connections between neurons.
Proposition 2: f con ∈ MS , that is, the convolution layer generates model sparsity.
In Eq. (34), f con (X; K) ij does not depend on x ab when a and b are integers for 0 ≤ a < i or 0 ≤ b < j or N 1 +i < a ≤ M 1 or In other words: x (n 1 +i)(n 2 +i) k n 1 n 2 = f con X ′ : K ij Hence, by definition 1, it concludes that f con ∈ MS .

E. EVALUATION METHOD
The performance of DNN architectures depends on the accuracy of their models. Since the author is dealing with classification problems, he can evaluate performance using the receiver operating characteristics curve or ROC curve [77], [78], [79], [80], [81]. This implies that if a model from the architecture shows a good performance, the architecture is considered a good neural network. When the models extracted from the architecture converge to a specific one or the architecture is stable, the methodology for this evaluation is reasonable. However, most architectures in this study seem unstable.
According to a previous experiment on the prototypes of the three neural networks shown in Figure 2, the accuracy graphs seem to fluctuate and do not converge to a specific value. This implies that it is difficult to obtain a highly accurate model. In this study, the author attempts to reduce fluctuations in the accuracy curve, and not remove the fluctuation in the curve altogether. The author thus proposes a method to fix a sufficiently long interval corresponding to epochs to obtain models with high accuracy and stability.
Specifically, he proposes two indices, namely, a weak index J NN (·) and a strong index K NN (·). To construct the indices, he introduces the following functions and operators. Let E t [·|t ∈ I] and σ t [·], respectively, represent an expectational operator and an operator measuring standard deviation with respect to epoch t under interval I. Acc (t|S) denotes the accuracy function for epoch t under set S:   (56) where and where S tr and S te represent the training set {(x 1 , y 1 ) , (x 2 , y 2 ) , · · · (x m , y m )} and test set {(x m+1 , y m+1 ) , (x m+2 , y m+2 ), · · · (x m+n , y m+n )}, respectively, and I k denotes the interval [a k , b k ] where a k , b k are positive integers and a k < b k ≤ ∞. These indices imply that the performance of the neural network improves as the E t [Acc (F DNN (t)|S tr ) | t ∈ I j ] average of accuracy over interval I j increases or the σ t [Acc(F DNN (t)|S tr )|t ∈ I j ] standard deviation decreases.

V. RESULTS
As stated in Section III, the author considers weekly forecasting using a time series with five daily rates from the previous week. In subsection V-A, using K NN (·) proposed in Section IV, the author determines the optimal neural network architecture for BPI and estimates the actual accuracy of various architectures by calculating indices J NN (·). The process of identifying the best accuracy is summarized as follows: (i) The author designs architectures F i (·) for i = 1, 2, · · · .
(ii) The author extracts models for weekly forecasting from each architecture (iii) By employing a strong index K NN (·), the author determines the best architecture F (K) DNN (I) for interval I and calculates the weak index J NN (·) corresponding to the best one, which leads to a realizable or reproducible accuracy: In Subsection V-B, using identical sample sets for Panamax, the author determines the best architecture depending on Capesize and Supramax and ensures their reproducible accuracy. In Subsection V-D, the author studies the structural features of the best architectures and economic implications of the forecasting results produced by the models.

A. DETERMINATION OF THE BEST ARCHITECTURE FOR PANAMAX
Using an experiment, the author determines the best architecture for neural networks. Let us consider four sets: MLP, FCN, ResN, and Mix. Each set includes 16 architectures, which are variants of the original architecture proposed by Wang et al. [55]. Using the indices J NN (·) and K NN (·), the author evaluates quantitatively the performance of these architectures. These processes enable us to obtain the best architecture and understand whether some layers or activation functions are appropriate.
First, the author introduces the MLP set, which includes 16 architectures, where the layers are dropout, ReLU, and identity functions. Table 1 illustrates the layers of each architecture and their combination. Table 5 shows the values of indices J NN (·) and K NN (·) corresponding to each architecture in the MLP set. Let j be I IV or IX XII, and both indices J NN (·) and K NN (·) corresponding to MLP j are higher than the ones for MLP j+IV. In other words, in view of both indices, the performances of architectures having the dropout layers are worse than those of the architectures without it; thus, the dropout layer is not suitable for designing the neural network architecture in this study. Meanwhile, if the performance of MLP I MLP IV is comparable with MLP IX MLP XII using indices J NN (·) and K NN (·), the indices of architectures with identity are higher than those of architectures having ReLU ; however, there is no significant difference between the two groups. Hence, the author includes the ReLU layer when designing neural network architectures but excludes the dropout layer.
Second, the author considers the architectures in the FCN set, modified by adding layers based on the convolutional operator in Subsection IV-D2 to MLP I MLP IV and MLP IX MLP XII in the MLP set. The FCN set has 16 architectures, where the layers are the convolution operator, batch normalization, rectified linear unit (ReLU), and identity function. Table 2 illustrates their design. According to Table 6, when j is I IV and IX XII, both indices K NN (·) from FCN j are higher than those for FCN j+IV. This indicates that batch normalization is not an appropriate layer in this study. Furthermore, when K NN (F DNN , I = [1500, 2000)), FCN-XII has the highest value of 0.844 in the FCN set. Therefore, after selecting FCN III, FCN IV, FCN XI, and FCN XII, and adding layers with different types of sparsity, the author enhances the performance of the architectures in the FCN set, and the set with modified performance is called the ResN set.
Specifically, ResN set is constructed by the following two steps. STEP 1) The block operator is applied to the four archi- tectures FCN III, FCN IV, FCN XI, and FCN XII; in Subsection IV-D3, the block operator is given as follows: The author substitutes the mathematical representations of FCN III, FCN IV, FCN XI, and FCN XII into F (x) in the block operator. Since the author proposes another model sparsity for the block operator, F (x) should be much more complex than g (x), which has a layer or a few layers which are the convolution operator, batch normalization, or identity function. STEP 2) The author combines the architectures in 'STEP 1)' with the identity or the ReLU function.
The ResN set in the two steps includes the 16 architectures listed in Table 3.  has been performed, and the identity function of the architectures in the ResN set can be underfitted. Hence, considering the architectures with high accuracy in ResN IX ResN XVI, the author selects ResN XII and ResN XV. Using these architectures, the author proposes a different type of sample. Third, the author simulates the 16 architectures in Mix set, which are improved by replacing the last layer in ResN XII and ResN XV, comprising one ReLU function or one identity function, with the weighted average of ReLU and identity functions expressed as where W is a weight matrix, b is a bias, and α is a real number between 0 and 1. In this study, the architectures when α = 1, 0.95, 0.9, 085, 0.8, and 0.75 are considered and represented in Table 4.
Hence, if the author analyzes Table 8 quantitatively, he finds that both the indices J NN (F DNN , I = [1500, 2000)) and K NN (F DNN , I = [1500, 2000)) of Mix XIII have the highest values of 0.871 and 0.878, respectively, among all the architectures in the sets, that is, MLP set, FCN set, ResN set and Mix set. Thus, using this architecture, the author can predict a market's future direction regarding Panamax freight rates with 87.8% realizable accuracy.
According to Table 4, the most optimal architecture Mix XIII for the Panamax vessel consists of three convolutional layers with residual connections and the weighted average of the ReLU function and the Identity function, and global max pooling function for classification, which depend on eight parameters: four weights, W i1 , · · · and W i4 and four biases b i1 , · · · and b i4 .

B. APPLICATION TO OTHER BULK CARRIERES
In Subsection V-A, the author identifies the optimal architecture with highly reproducible accuracy using data on daily Panamax freight rates. Thus, he estimates the future freight rates of different bulk carriers, that is, Capesize and Supramax. For Capesize, using the architectures in the FCN set and the indices J NN and K NN , the author determines the optimal one. For Supramax, the author considers the architectures in the ResN set and selects the best one. The results are summarized as follows. First, the author assesses the performance of architectures in the Mix set by analyzing both indices J NN (I) and K NN (I).  Table 9, during the interval I = [1500, 2000), the strong index K NN (I) of Mix XII attains the highest value of 79.3% in the set, and J NN (I) is 79.3%. This implies that, by employing architecture Mix XII, the author can forecast the market direction of Capesize carriers with at least 79.3% accuracy or usually around 79.3%. The author has also built the model from all architectures in ResN set and Mix set, which he evaluated using both the indices. However, as presented in Table 11 and Table 12, from the perspective of K NN (I = [1500, 2000)), the author is unable to find a sample with better performance than Mix XII.

According to
According to Table 4, the best architecture Mix XII for the Capesize vessel is similar to the best architecture for the Panamax vessel. If the last layer in the former is not 0.8 · Identity(·)+0.2 · ReLU (·) but 0.85 · Identity(·)+0.15 · ReLU (·), architecture Mix XII replaces the Panamax one. The architecture also has four weights, W i1 , · · · and W i4 and four biases, b i1 , · · · and b i4 .
Second, the author determines how accurate the best architecture is for Supramax. Using all architectures in the ResN set, the author estimates both J NN (I) and K NN (I). According to  (F DNN , I), it is not necessary to simulate the architecture in the Mix set. Actually, for both indices, the performance of ResN III is better than that of all architectures in the FCN set and Mix set, as shown Table 13 and Table 14, where the author describes the calculation of both indices in the FCN and Mix sets.
According to Table 3, the most optimal architecture, ResN III for the Supramax vessel consists of three convolutional layers with a residual connection, which has two layers: a convolutional layer and a batch normalization layer, and global max pooling function. The architecture has 14 parameters: five weights, W i1 , · · · and W i5 , five biases b i1 , · · · and b i5 parameters for batch normalization, γ i , β i , µ β , σ β .
In this subsection, using the evaluation indices J and K depicted in Eqs. (55), (56), (57), (58), and (59), the author selects the best hyperparameters corresponding to DNN architectures for the three vessels. Since the three architectures MIX XIII, MIX XII and ResN III have three convolutional layers, the author tried to evaluate the performance of architectures based on the following hyperparameters closely related to the convolutional neural networks: 'learning rate', 'batch size', 'number of convolutional kernels', 'size of convolutional kernels', and 'epoch' [86]. The performance of the DNN architectures depending on each hyperparameter is summarized as follows: • The author performs experiments with the 'learning rate' of 0.9, 0.7, 0.5, 0.3, and 0.1. According to  • The author conducts experiments with 'size of convolutional kernels' of (8 × 1)|(5 × 1)|(3 × 1), (10 × 1)|(7  Table 18 and Table 19, when each epoch is larger than 1500, the performances of the three architectures are stabilized. Specifically, as shown in Table 18, the reproducible accuracies for Panamax, Capesize, and Supramax are approximately 88%, 79%, and 91%, respectively. The best architectures are summarized as follows:

D. INTERPRETATION OF THE RESULTS
Here, the author interprets the design of neural network architectures and the various implications of forecasting the results.

1) REGULARIZATION VERSUS GOODNESS
Since the author decides on the best architecture among all the samples based on the strong index K NN (F DNN , I) in Subsection V-B, the author shows that the architectures for Panamax and Capesize belong to the Mix set and that for Supramax to the ResN set. If the author evaluates architectures using the weak index, the best ones for Panamax and Supramax belong to Mix set and ResN set; however, for K NN (F DNN , I), the best one for Capesize is in the FCN set.

Algorithm 2 Best architecture for Capesize
Input: x ∈ X , y ∈ Y , Parameters: W i1 , · · · , W i4 , b i1 , · · · , b i4 Hyperparameter: learning rate = 0.7, batch size = 32, size of convolutional kernel = (8 × 1)|(5 × 1)|(3 × 1) and epoch= 1500 Output: 1, 0 1: Initialize W 11 , · · · , W i4 , b i1 , · · · , b i8 and set α = 0.85 and β = 0.15. 2: For i < N , In other words, the best architectures corresponding to the three types of bulk carriers require different types of sparsity to obtain high accuracy. Generating sparsity in the architecture leads to successful regularization. That is, the more layers the author adds with different types of sparsity, the smaller the gap between accuracy in the training set and that in the test set. For example,

3) COMPARISON OF OTHER FORECASTS MODELS
The author compares the architecture proposed in this study with four machine learning tools, namely ''Facebook Prophet,'' ''DARTS,'' ''SKTIME,'' and ''AutoTS.'' First, ''Facebook Prophet'' (abbreviated ''fProphet'') is ''open source software released by Facebooks Core Data Science team'', which is available at https://facebook.github.io/ prophet/. The main algorithm for time series forecasting is to  As can be seen from Eq. (64), the main purpose of the ''fProphet'' is to analyze features of the business time series. Therefore, the ''fProphet'' can be analyzed under various types of time series (e.g., air quality, COVID-19 infection, cloud resource management, stock movement, bitcoin) [89], [90], [91], [92], [93]. ''DARTS,'' ''SKTIME,'' and ''AutoTS'' are open-source packages for Python. According to [94], [95], and [96] their frameworks contain not only the ''Facebook Prophet'' but also various statistical methods for DNNs.  To evaluate the proposed algorithm and the other four, the author performs an experiment for momentary forecasting of three vessels, Panamax, Capesize, and Supramax during the first period, which is from September 13, 2019, to March 13, 2020, and the second period from March 16, 2020, to September 11, 2020. In this experiment, we performed 25 weekly forecasts during the first period and 26 weekly forecasts during the second period. The forecasting models corresponding to the three vessels were extracted from Mix XIII, Mix XII, and ResN III. The evaluation criterion is the number of false forecasts. Therefore, the smaller the number, the better is the performance of the algorithm.    In this experiment for momentary forecasting, Tables 20,21,and 22 show that the performance of the models from the best architecture suggested in this study is superior to those of the other four machine learning algorithms. Not only do the three models have higher accuracy than the other models, but it is also noteworthy that the accuracy of the proposed model during both periods does not change significantly compared to that of the other four models.
Above all, these results imply that the theoretical basis premised on short-term forecasting such as in Eq. (64) must be modified for momentary forecasting.

VI. CONCLUSION
In this study, the author examined the use of DNNs for momentary forecasting of freight rates in the dry bulk market. Since momentary forecasting does not predict specific values corresponding to future dates but only the direction of average values during future periods, the author selected neural networks for TSC based on MLP, FCN, and ResNet and proposed two evaluation indices, J NN and K NN , which refer to the actual and the minimal accuracies, respectively. By simulating various architectures and using both indices, the author obtained optimal neural network architectures for three types of dry bulk carriers: Panamax, Capesize, and Supramax. Their optimal architectures can be summarized in the following Table 23.
From the neural network architectures, the author extracted models for momentary forecasting with high accuracy. The forecasting models for the Panamax, Capesize, and Supramax yield forecasts with 88%, 79%, and 91% reproducible accuracy, respectively. Interesting properties were also observed. First, During the pandemic period, from September 2020 to September 2021, the forecasting models from the best DNN architecture are superior to the four recent libraries, namely ''Facebook Prophet,'' ''DARTS,'' ''SKTIME,'' and ''AutoTS.'' Second, the accuracy trends of the forecasting models for five years and eight months remain almost constant.
These findings have the following implications. If anyone analyzes asset dynamics, he or she usually presupposes formulas (1) and (64), which consist of deterministic and uncertain results, and the uncertain things indicate quantities based on white noise. The forecasting results from the four libraries are consistent with the premise that the forecasting accuracy is approximately 50%; as such, it makes sense that unpredictable things are regarded as a Wiener process. However, the results from the DNN architecture have an accuracy much higher than 50%. Hence, the author assumes that the asset's dynamics could have memory or the uncertain part of freight rates' time series might be considered as not ''white noise'' but ''colored noise.'' It is necessary to verify whether this feature exists for other types of time series.
In determining the neural network architecture, the author considered how to generate sparsity in neural networks to obtain successful generalization. Sparsity is classified as the ephemeral and model sparsity are based on whether it depends on the input data. The author mathematically defined both types of sparsity and showed that the two layers containing either the ReLU or dropout functions generate ephemeral sparsity, whereas the two layers having the convolutional or block operators guarantee model sparsity. The layers with model sparsity either enhance or do not worsen forecasting performance; however, the layers with ephemeral sparsity reduce or do not significantly improve the performance. Regardless of the vessel type, the neural networks with a convolution layer have higher forecasting accuracy than those without it. Furthermore, the dropout layer worsens the performance of the neural network (i.e., it leads to unsuccessful generalization of the model from the neural network). This deviates from the common notion that neural networks with dropouts can prevent overfitting.
The purpose of DNN models is to provide a better understanding of the bulk shipping market to scholars and practitioners. The indices to measure the performance of DNNs and the simple input variables, namely, five freight rates, meet this purpose. Furthermore, if the input data are changed, the economic properties of the market can be obtained, and the author has shown there is a rare correlation between the 6-month volatility and weekly forecasting. Therefore, his results can help various agents in the market and improve their overall results.

SUPPLEMENT
The author provides the best codes corresponding to each of the three bulk carriers. The three codes are written in Python.