Detection Scheme With Joint Intertrack Interference and Media Noise Mitigations for Heat Assisted Interlaced Magnetic Recording

Heat assisted interlaced magnetic recording (HIMR) is a promising candidate for the next-generation of magnetic recording technology to further increase the area density beyond 1Tb/in2. Specifically, the high temperature and low temperature tracks are written in an interlaced order to improve the recording performance. However, the inter-track interference (ITI), inter-symbol interference (ISI) and thermal jitters brought by the increased recording density and Curie temperature variations are severe in HIMR, which degrade the bit error rate (BER) performance obviously. In this study, we propose a multitrack detection scheme with joint intertrack interference and media noise mitigations. Here a multi-task neural network (MTNN) is designed to simultaneously predict ITI pattern and residual media noise, then the 2D variable equalizers corresponding to different ITI patterns are implemented and predicted residual media noises are embedded into the branch metrics of modified Bahl-Cocke-Jelinek-Raviv (BCJR) detector to mitigate ITI and whiten media noise. The simulation demonstrates that the proposed MTNN with variable equalizer and modified BCJR detector (MTNN+VE+MB) algorithm mitigates the ITI and media noise effectively. At the channel bit density of 3.10 Tb/in2, it provides 2.6 dB signal-to-noise ratio (SNR) gain compared to that of conventional 2D fixed equalizer with pattern dependent noise prediction detector (FE+PDNP) for the low temperature (LT) tracks with 4% Curie temperature variance.


I. INTRODUCTION
Heat assisted interlaced magnetic recording (HIMR) is the next generation of magnetic recording architecture, which raises the areal density significantly compared to the conventional heat assisted magnetic recording (HAMR) [1]. In HIMR system, the high temperature (HT) tracks are first recorded and then the low temperature (LT) tracks are written to trim the HT in an interlaced manner [2]. It enhances the areal density by combing the interlaced trimming and HAMR technologies [3], [4], which also mitigates the update latency significantly compared to the shingled HAMR technologies.
The associate editor coordinating the review of this manuscript and approving it for publication was Marco Martalo .
However, the increased track and bit densities of HIMR system result in severe inter-symbol interference (ISI), intertrack interference (ITI) and media noise. Meanwhile the thermal jitters in the HIMR system are severe since it is difficult to fabricate the media grains with uniform Curie temperatures. And the low writing temperature gradient of LT tracks further deteriorates the thermal jitters compared to the HT tracks. Such ISI, ITI, and thermal jitters bring severe challenges for the precise signal recovery of HIMR.
Previously researchers have reported various signal processing algorithms to mitigate the ISI, ITI, and media noise in high density magnetic recording system [5], [6], [7]. Nabavi investigated a conventional 2D equalizer by forcing the ITI to be zero and leaving the controllable ISI to be VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ dealt with the following Viterbi detector [8]. Later Wang proposed a multitrack detection scheme with 2D equalizer and joint Bahl-Cocke-Jelinek-Raviv (BCJR) detector, which utilized the adjacent track's information to further improve the detection performance [9]. Meanwhile, in order to mitigate the signal dependent media noise, the pattern-dependent noise prediction (PDNP) detector using linear predictions was proposed to whiten media noises and further improve the performance of Viterbi and BCJR detectors [9], [10], [11].
Researchers also studied a bi-direction PDNP detector using both forward and backward linear predictions of noise for HAMR with high jitters [12]. Recently Barry investigated a PDNP based multitrack detector [13] by adapting these detector parameters with the aim of minimizing bit error rate (BER). Meanwhile it's worth mentioning that researchers have also explored various machine learning based signal processing and channel coding/decoding technologies for magnetic recording system. For example, the 2D iterative soft estimate aided neural network equalizer [15], deep neural network based a posteriori probability detector [17] as well as joint convolutional neural network equalizer and deep neural network decoder [18] etc. were studied to combat the 2D ISI and noises of magnetic recording channel. However, the HIMR [2] system is a recently proposed magnetic recording architecture by combing the conventional interlaced magnetic recording (IMR) [18] with HAMR technology. The unique characteristics of HIMR relative to IMR are as following: the circular thermal profile of HIMR causes severe transition curvature [14] in the low temperature track, where the transition noise caused by fluctuated Curie temperatures of grains overweighs that caused by the anisotropy field fluctuations. Such effects bring serious interferences and colored noises to readback signals, which degrade the BER performance severely. However, the corresponding signal processing techniques have not received the deserved attention, even though there were several reports about the multitrack detection of other high density magnetic recording systems. For example, the symbol based BCJR detectors [9], joint two-track equalization technology [19] etc. were proposed to mitigate the ITI in high density magnetic recording system. Meanwhile various PDNP detectors based on linear autoregressive (AR) model [13] and minimum BER criterion [20] were also studied to whiten the media noise in recording system. However, the ITI brought by increased track density, thermal jitters caused by Curie temperature variation and transition curvature brought by the circular thermal profile are severe in the high density HIMR system. The corresponding multitrack detection algorithm with joint ITI and media noise mitigation ability has been rarely studied for HIMR, which hinders the further enhancement of recording density.
Accordingly, the novelties and contributions of this study are as following: a multitrack detection algorithm capable of jointly mitigating the ITI and media noise in HIMR is proposed at the first time. First a multi-task neural network (MTNN) inputted with readback signals and decoded log likelihood ratio (LLR) of multitrack is designed to predict the ITI pattern and residual media noise simultaneously, which has not been reported before. By using the strong nonlinear characterization and ability of eavesdropping different tasks for MTNN, the ITI pattern and residual media noise can be predicted accurately with decreased computation complexity compared to two separated neural networks. Then on one hand the 2D variable equalizers with distinct coefficients are implemented to mitigate effects of various ITI patterns, which provide the obvious BER performance improvement compared to conventional equalizers trained with pseudo random bit sequence (PRBS) [8]. This is because some ITI patterns are destructive while other ITI patterns are constructive, which should be treated differently during the equalization. On the other hand, the predicted media noises with MTNN are embedded into the branch metrics of modified BCJR detector to further whiten signal dependent noise, where the MTNN can characterize the complex nonlinear behavior of thermal jitters more accurately compared to the AR model based PDNP detector. Especially it's interesting to find that when the thermal jitters and recording density increase, the proposed multitrack detection algorithm with MTNN aided ITI and media noise mitigation ability can provide increased signal to noise ratio (SNR) gains compared to the 2D conventional equalizer and PDNP detector algorithm.
The rest of the article is organized as following. In section II, the recording system modeling and signal processing details including 2D variable equalizer, MTNN and detector are described. In section III, the numerical evaluations of proposed algorithm are investigated and discussed at various recording conditions. Finally, the conclusion is drawn in section IV.

II. RECORDING SYSTEM MODELING AND SIGNAL PROCESSING
A. RECORDING SYSTEM MODELING As the micromagnetic simulations are computationally expensive, a fast and simplified write model [14] is utilized to simulate the writing process of HIMR system, where the FePt medium consists of Voronoi grains with 6 nm averaged radius and 0.5 nm grain boundary. During the writing process, both the laser spot and magnetic field jointly determine the magnetization flipping of recording medium. The write model can characterize the thermal profile of near field transducer, the temperature dependences of grain's anisotropy field H k , the effects of varied Curie temperature (T c ) and anisotropy field (H k ) on the recording radius. More details can be found in reference [14]. Specifically, the medium is assumed to possess the mean Curie temperature T c of 775 K with 2% fluctuations (i.e., standard deviations σ Tc = 2%) and mean room-temperature anisotropy H k of 94.5 kOe with 5% fluctuations (i.e., standard deviations σ H k = 5%). The writing field with magnitude of 10.5 kOe and 45 • angle relative to the easy axis of grain is assumed during the writing process. It is noted that the peaks of writing temperature are varied for the high and low temperature tracks, while the full-width half-maximum (FWHM) of thermal profile is set as 78 nm.
Then the PRBS are recorded at the high temperature (HT) and low temperature (LT) tracks in an interlaced manner ( Fig. 1(a)). Specifically, the bottom tracks (e.g., HT 1 and HT 2 ) are written with high temperature first, which bring smaller transition noise and curvature ( Fig. 1(c)) due to the higher thermal gradient and track width. Then the bottom tracks are trimmed by adjacent top tracks on both sides, where the top tracks (e.g., LT 0 , LT 1 , and LT 2 ) are written with low temperatures in an interlaced order to avoid the overwriting effect of neighboring tracks. For example, if bits with track width of 26 nm need to be recorded, the peak writing temperature is set as 1109 K for the HT track, which results in the track width of 52 nm (TW HT ). Then the LT tracks are written with peak temperature of 809 K by trimming both sides of HT tracks, which results in the track width of 26 nm (TW LT ). Correspondingly the final track width (TW HIMR ) of HIMR system is 26 nm. Then in the following simulation, the peak writing temperatures of HT tracks are assumed as 1038 K, 1109 K, and 1192 K for TW HT of 48 nm, 52 nm, and 56 nm, respectively. While the peak writing temperatures of LT tracks are assumed as 798 K, 809 K, and 822 K for TW LT of 24 nm, 26 nm, and 28 nm, respectively.
Then the read head array (RHA) consisting of three read heads is implemented to detect three tracks simultaneously during the readback process, which has the free layer size of 36 nm × 36 nm × 5 nm, shield to shield spacing of 18 nm and magnetic fly height of 4 nm. Here the side heads of RHA are slightly shifted towards the middle LT track to optimize the BER performance of multitrack detection. The readback signals are obtained by convolving the media's magnetization with the read head potential using a finite difference code. Then the oversampled signals go through the low pass filter and are down-sampled with the baud rate for the following signal processing process.

B. EQUALIZATION AND DETECTION SCHEMES 1) NN PREDICTING ITI TYPE
First, a multitrack detection scheme with neural network (NN) aided 2D variable equalizer ( Fig. 2(a)) is proposed to predict the ITI pattern and utilize the corresponding equalizer coefficients to mitigate ITIs of HIMR. The four-layer neural network predicting ITI type consists of input layer, two hidden layers and output layer, as shown in Fig. 2(a). The numbers of neurons N 1 , N 2 , N 3 , and N 4 in each layer are 18, 25, 10, and 1, respectively. Specifically there are 18 terms of inputs in the neural network, which uses both the readback signals r (1) of three tracks (i.e., the (i-1) th , i th and (i + 1) th track ) and the decoded LLR: d = d i−1,k , d i,k , d i+1,k for the k th bits of three tracks, respectively. Here r denotes the readback signals from timing k − L to timing k + L at the (i-1) th , i th and (i + 1) th track, which are obtained simultaneously during the scan of RHA. Here the decoded LLRs of reliable neighboring HT tracks are used to aid the recovery of recorded information at the less reliable LT track.
where 2L + 1 represents the length of equalizer. Then the output y (2) of hidden layer 1 is represented as, where W (1) is the synaptic weight matrix and b (1) is the bias vector of hidden layer 1. y (1) = [r; d] is the input vector with size of N 1 × 1, r refers to readback signals of three tracks and d represents the decoded LLRs. It is noted that the bold upper-case symbol, bold lower-case and normal font symbol of equations represent the matrix, vector, and scalar variable, respectively. Here the sizes of synaptic weight matrix W (1) and output vector y (2) are N 2 × N 1 and N 2 × 1, respectively. y (2) is also the input vector of hidden layer 2, then the output of hidden layer 2 can be expressed as, where W (2) and b (2) refer to synaptic weight matrix and bias vector of the hidden layer 2. The size of W (2) , b (2) and y (3) are N 3 × N 2 , N 3 × 1, and N 3 × 1, respectively. Here the activation functions of both hidden layers are chosen as ReLU functions. Then the vector of output layer can be expressed as, where W (3) and b (3) are synaptic weight matrix and bias vector of the output layer. Here y (4) = [y 3 , y 4 ], and y (4) i denotes the prediction probability for the i th type of ITI pattern. And the ITI type with the highest probability is chosen as the predicted ITI pattern. Since the ITI VOLUME 10, 2022 of current track is mainly contributed by the recorded bits at two adjacent tracks, there are four types of possible ITI patterns and the details can be found in Section B. 3). It's noted that the activation function of output layer is chosen as SoftMax function, which is widely applied in multi-class classification.

2) MTNN PREDICTING BOTH ITI TYPE AND MEDIA NOISE
Even though the ITI can be handled with variable equalizer effectively, there are still residual signal dependent media noise (e.g., thermal jitters and transition noises) needed to be whitened to further improve the performance of maximum likelihood detector. Here the statistics of thermal jitters in HIMR is quite complex, which could not be just modeled linearly with previous PDNP detector [11]. For comparison the neural network could predict the thermal jitters more accurately due to the strong nonlinear characterizing ability. Accordingly, a multitask neural network is designed to predict the ITI type and media noise jointly, which can decrease the computation complexity and enhance the learning performance compared to two separated neural networks. On one hand, the MTNN inputted with readback signals and decoded LLR of three tracks is implemented to predict the ITI type for the recorded bit at time k, then the 2D variable equalizers with different coefficients and targets are implemented for distinct ITI types to equalize multitrack signals. On the other hand, the same MTNN is used to predict the residual media noise of variable equalizer, which is embedded into the following modified BCJR detector to whiten the noise.
Here MTNN is a four-layer neural network with joint loss function to simultaneously realize the classification (i.e., ITI type prediction) and regression (i.e., media noise prediction) tasks. The MTNN consists of one input layer, one shared hidden layer, separated hidden layers and output layers, respectively, as shown in Fig. 2(b). The number of neurons N 1 , N 2 , N 3_1 , N 3_2 , N 4_1 , N 4_2 in different layers are 18, 25, 10, 15, 1, and 3, respectively. It's found that the ITI type and media noise prediction accuracy with MTNN will only improve slightly when the width and depth of neural network are further increased, hence moderate numbers of neurons and depth of neural network are chosen by considering the tradeoff between the prediction performance and computation complexity.
Additionally, the MTNN is found to slightly improve the ITI type prediction accuracy and decrease the MSE of predicted noise compared to that of separated neural networks (Table 1), since the shared hidden layer of MTNN can share information and learn the common features between different tasks, while the computation complexity of the MTNN is obviously lower than that of separated neural networks. For example, at the recording density of 3.10 Tb/in 2 with 2% σ Tc media noise and 5 dB SNR, the ITI type prediction accuracy of MTNN is 0.14% higher than that of ITI Net, and the MSE of predicted media noise is decreased from 0.182 to 0.172 related to the separated network.
Then the output of separated output layers for task 1 is shown as, y (4) = SoftMax W (3) where W (3) (i.e., size of N 4_1 × N 3_1 ) and b (3) (i.e., size of N 4_1 × 1) are synaptic weight matrix and bias vector of output layer for task 1, respectively. It's worth mentioning even though the outputs y (4) for task 1 of MTNN and NN both denote the probability of predicted ITI pattern, the accuracy of former is improved by sharing information and learning the common features between different tasks. Here the activation function for output layer of task 1 (i.e., the classification task for ITI types prediction) is SoftMax function (6), whereas the activation function of output layer for task 2 (i.e., the regression task for noise prediction) is linear activation function. Correspondingly the predicted noise n P for task 2 is expressed as, where W (5) and b (5) are synaptic weight matrix (i.e., size of N 4_2 × N 3_2 ) and bias vector (i.e., size of N 4_2 × 1) of output layer for task 2, respectively. Accordingly, the loss function of MTNN includes the weighted cross-entropy loss function of task 1 (i.e., classifications of ITI types) and mean squared error (MSE) loss function of task 2 (i.e., noise predictions), which can be expressed as, where θ = [θ 1 , θ 2 , θ 3 , θ 4 ] denotes the ground truth class label for ITI patterns, δ 1 and 1−δ 1 represent the weight coefficients of cross-entropy and MSE functions, respectively. Here the weight coefficient δ 1 = 0.4 is optimized to obtain the minimum joint loss (Fig. 3) of multitask learning. It's difficult to conclude whether the ITI type prediction or medium noise prediction is more important at the general case. However, when the track density of HIMR increases, the SNR gain provided by the variable equalizer using the predicted ITI type becomes larger than that provided by medium nose prediction since the ITI effect brought by the increased track density overwhelms the medium noise along down track direction.
Here n P i (k) and n i (k) are predicted and target media noises for the k th bit of i th track, respectively. n i (k) actually denotes the residual equalization noise, which is obtained by subtracting the ideal equalizer output from realistic equalizer out. Since the medium noise is dominant over electronic noise at the high recording density, the residual equalization noise can be approximated as medium noise and is used as the target noise during the training of MTNN, shown as (9), where a i (k) and y i (k) are the recorded bit and equalized signal for the k th bit at the i th track. Here g i = [g i (0), g i (1), . . . , g i (I )] is the 1D GPR target of i th track obtained with the monic constraint, and I = 2 is ISI length. It is noted that the 2D variable equalizer coefficients and   [21] to obtain the equalized signals of three tracks and soft bit estimations, then the equalized signal of current track and estimation of side tracks were incorporated into BCJR detector for data detection. The previous neural-network-based equalizer was designed to replace the conventional linear equalizer and mitigate the nonlinear distortion caused by the circular curvature of thermal profile. However, this work focuses on the joint ITI and media noise mitigations of HIMR system, and an additional multitask neural network is designed to predict the ITI type and media noises of multitrack simultaneously, then the 2D variable equalizers corresponding to different ITI types are implemented to mitigate the ITI and the predicted medium noise is embedded into the BCJR detector to further whiten the data dependent noise, respectively.
Here the numbers of neural network parameters and computational operations are shown in Table 2. It's noted that the operations include both multiplications and additions. Although the complexity of 2D linear equalizer is lower, its BER performance is still far worse than that of MTNN. The study indicates that the further increased terms of 2D equalizer coefficients could not improve the BER performance noticeably since the effects of 2D ISI exist in a limited range.

3) EQUALIZER AND DETECTOR DESIGN
The conventional 2D equalizer coefficients and 1D GPR target are trained with PRBS of three tracks [8], which are VOLUME 10, 2022  the same for distinct ITI patterns (Table 3). It is denoted as the 2D fixed equalizer (FE) in the following. Actually, PRBS is a binary sequence generated with a deterministic algorithm (e.g., linear-feedback shift registers [22]) that is difficult to predict and exhibits statistical behavior similar to a truly random sequence. Here the ITI is forced to be zero and residual ISI along the down track is handled by the 1D PDNP detector. This algorithm is denoted as FE+PDNP algorithm, as shown in Fig. 5(a).
However, let's take the central target bit as an example, some ITI patterns are constructive to the recovery of target bit (e.g., the magnetization polarities of adjacent tracks' bits are the same as the target bit at the central track) since such ITI strengthens the readback signal of central bit, while some other ITI patterns are destructive (e.g., the magnetization polarities of adjacent tracks' bits are opposite to the central target bit) since such ITI weakens the readback signal of central bit severely. Accordingly, different equalizer coefficients should be adopted to reconstruct the signals for distinct ITI patterns. Here the respective equalizer coefficients are obtained by training different ITI patterns, which are named as the variable equalizers (VE) in this study.
Taking the target bit of central track as an example, the ITI effect is mainly contributed by the recorded bits of adjacent tracks. Here the study indicates that the ITIs caused by the corner bits (i.e., a i−1 (k − 1) , a i−1 (k + 1), a i+1 (k − 1) , a i+1 (k + 1)) are significantly smaller than that caused by the upper and lower bits (as shown in Fig. 4(a)), hence the ITI pattern is simplified as {a i−1 (k), a i (k), a i+1 (k)}, which has 8 possible ITI types. Here a i (k) denotes the bit recorded at the i th track and k th timing. However considering the equivalent polarities relationship between neighboring bits a i−1 (k), a i+1 (k) and target bit a i (k), the possible ITI patterns can be divided into 4 categories (Fig. 4 as an example to explain the training process of VE, the PRBS is recorded at the middle track (e.g., LT 1 in Fig. 5), then the same sequence and the sequence with opposite polarity are recorded at the neighboring tracks (e.g., HT 1 and HT 2 in Fig.5), respectively. It's found that the equalizer coefficients corresponding to the upper bit with the same polarity as the central target bit and equalizer coefficients corresponding to the lower bit with the opposite polarity are positive and negative (i.e., the 2 nd column of Table 3), respectively. Such equalizer coefficients could enlarge the magnitude of readback signal relative to noises, which improve the BER performance. It's noted that the readback signals of three tracks are first equalized by 2D FE at the 1 st iteration since the ITI type is unknown. Then the equalized signals are further processed with the PDNP detector and low-density parity check (LDPC) decoder. Then the LLRs outputted by the LDPC decoder besides the readback signals of three tracks (i.e., Fig. 5(b)) are fed into NN to classify ITI pattern in the 2 nd iteration, and the corresponding 2D variable equalizer, PDNP detector and LDPC decoder are implemented. This algorithm is denoted as NN+VE+PDNP algorithm.
In the simulations, it's assumed that no random track mis-registrations (TMR) and phase offsets occur during the readback process to facilitate the signal processing. However, the simulation indicates that the proposed detection algorithm can still provide obvious SNR gains compared to the conventional algorithm when the TMR (i.e., TMR/TW HIMR ∈[−11.5%, 11.5%]) exists, because using data set with various TMR effects to train the neural network and variable equalizer guarantees the generalization ability of proposed algorithm. Additionally, the TMR can be predicted during the detection process, since researchers have studied various prediction methods such as utilizing the relationship between the energy ratio of readback signal and head offset [23] or designing the 2D asymmetric target best fit to the estimated TMR level [24]. Such TMR prediction methods can be utilized to correct the read head position before implementing the proposed multitrack detection algorithm. Meanwhile if there are timing errors caused by random phase and frequency drift of multiple tracks, the 2D timing recovery algorithm based on 2D data-aided phase locked loop [25] can be implemented to change the sampling instant and obtain the correct sampled signal of each track. Then the correctly sampled signals are inputted into the neural network to predict the ITI type, which avoid the error prediction caused by timing errors of adjacent tracks.
Here the 2D variable equalizer can decrease the residual equalization noise (Fig. 6) significantly compared to the conventional 2D fixed equalizer, since the former can capture the distinct effects of various ITI types more effectively with multiple sets of equalizer coefficients. However, the thermal jitters of HIMR system still result into severe transition noises during the writing process, which degrade the BER performance obviously due to the colored noise property. Hence the media noises caused by thermal jitters are also predicted by the MTNN, which can capture the characteristics of noises more accurately compared to the conventional PDNP detector using linear predictions.
To utilize both the ITI type and media noise predicted by the MTNN, the branch metric of the 1D BCJR detector is modified by embedding both the equalized signal y i (k) with 2D variable equalizer and predicted media noise n P i (k) with MTNN. Take the i th track as an example, the modified branch metrics γ i (S k−1 , S k ) from the time k-1 to time k can be expressed as, Since the conventional BCJR detector is suboptimal for the channel with colored noise, here the predicted media noise is subtracted from the residual noise of 2D variable equalizer to whiten the media noise and improve the detection performance of BCJR detector. During the 1 st iteration, the readback signals of three tracks are equalized by 2D FE, which are further processed with BCJR detector and LDPC decoder to obtain the decoded LLR. Then the readback signals and LLR are fed into MTNN to predict the ITI patterns and media noise during next iterations, which are then sent to the following modified BCJR detector and LDPC decoder. Here the training process of neural network with multiple epochs is carried on only after the LDPC decoder finishes the iterative decoding, and the LDPC decoder will not implement the decoding work during the training process of neural network. Hence the separated LDPC decoding process and training process of neural network guarantee the convergences of both processes. It's also noted that detector is named as modified BCJR detector for the MTNN+VE+MB algorithm since the predicted media noise is subtracted from the residual equalizer noise in the branch metrics of BCJR detector, which is usually ignored by the conventional BCJR detector.

III. NUMERICAL SIMULATION
The numerical evaluations of proposed algorithm are investigated and discussed at various recording conditions. During the simulation process, the LDPC code is used to encode recorded bits of each track independently with the progressive-edge growth algorithm, where the codeword length is kept as 4096 and code rate is varied at VOLUME 10, 2022 different recording conditions. Then the sum-product algorithm is used for the decoding process. Here the RHA is utilized to detect three tracks (e.g., HT 1 , LT 1 and HT 2 in Fig. 5) of HIMR, where the positions of read heads are optimized corresponding to the lowest SNR at the target raw BER of 10 −2 . Specifically, the optimum position of central read head is at the center of middle track (i.e., LT 1 ) to pick up the strongest signal of target track, while the optimum positions of side read heads should shift (i.e., 4 nm) from the center of side tracks (i.e., HT 1 and HT 2 ) toward the central track. Here on one hand the side read heads should shift toward the middle track properly to avoid the interference from the outermost unknown tracks (i.e., LT 0 and LT 2 in Fig. 1); on the other hand, such shifts cannot be too large since the large deviation from the center of side tracks (i.e., HT 1 and HT 2 ) will degrade the pickup signal strength of side tracks obviously. Additionally, since both adjacent tracks of low temperature written track are high temperature written tracks in the HIMR, the read head array is assumed to move across two tracks in the next revolution after it finishes detecting three tracks in one revolution. Accordingly, all the neural network, equalizer and detector only need to be trained once and no extra training is needed in next revolutions, since the structures of track layout covered by the RHA are the same in every revolution. Both 2D FE and VE have the size of 3 × 5, where the size of 1D GPR target is 1 × 3. The number of turbo iteration between BCJR detector and LDPC decoder is 3 and the number of decoding iterations within LDPC decoder is 10. During the simulation, SNR = 10log 10 (V 2 p /σ 2 ) [dB], where V p refers to the peak value of readback signal from the isolated bit, σ is the standard deviation of additive white Gaussian noise. It is noted that the SNR gains are evaluated according to the decoded BER of 10 −6 . And the BER of HT tracks (Fig. 7-10) is obtained by calculating the average BERs of upper and lower HT tracks (i.e., HT 1 and HT 2 of Fig.5) during the scan of read head array, while the BER of LT track refers to that of middle LT track.
First the mean squared errors of equalization noises for 2D FE, VE (G), NN+VE and MTNN+VE+NP algorithms are compared, as shown in Fig. 6. VE (G) stands for the variable equalizer according to the ground truth of ITI patterns, while the NN+VE represent the variable equalizers according to the predicted ITI pattern with neural network. Here the MSE is defined as the mean squared error between equalized signal and ideal equalizer output y ideal It is found that NN+VE can effectively decrease MSE by 37 % (i.e., from 0.12 to 0.076) compared to the FE at the SNR of 10 dB. Meanwhile the MSE gap between VE (G) and NN+VE algorithms becomes negligible with the increased SNR since the prediction accuracy of ITI pattern is improved with the enhanced decoding performance at the higher SNR. It's interesting to find that the MTNN+VE+NP algorithm can further decrease MSE by 71% (i.e., from 0.12 to 0.034) compared to the conventional 2D FE, since the former can mitigate both ITI and media noise effectively.

A. BER PERFORMANCE OF HIMR WITH DIFFERENT MEDIA NOISES
The dominant media noises of HIMR system are attributed to the Curie temperature variation (i.e., σ Tc ) of medium. Hence by assuming the anisotropy constant variation (i.e., σ H k ) as 5%, the σ Tc is varied from 2% to 3% and 4% during the recording process to simulate the performance of multitrack detection algorithm with incremental media noises. Here the track width and bit length of HIMR system are kept as 26 nm and 8 nm (i.e., linear density of 3101 kilobits per inch (KBPI)), respectively, which corresponds to the channel bit density of 3.10 Tb/in 2 . For the 2% σ Tc case, both HT and LT are encoded with LDPC code (R = 0.93). However, such a code rate (R = 0.93) is not strong enough to correct the incremental errors when the σ Tc further increases, hence the LDPC codes with code rates of 0.84 and 0.75 are implemented at σ Tc of 3% and 4%, respectively.
For the LT track with smaller writing thermal gradient and more severe thermal jitters relative to the HT track, it's found that the NN+VE+PDNP algorithm can provide 0.8 dB, 0.9 dB and 1.0 dB SNR gains compared to that of FE+PDNP algorithm when the media noises σ Tc are 2%, 3%, 4%, respectively, as shown in Fig. 7(a). This is because the media noise along the cross track (CT) is data dependent, which results in more obvious pattern dependent interference with increased thermal jitters. The 2D VE can mitigate such pattern dependent media noises along the CT more effectively than the conventional 2D FE, since the former utilizes multiple sets of equalizer coefficients for distinct patterns.
Furthermore, the MTNN+VE+MB algorithm can provide 1.4 dB, 2.0 dB and 2.6 dB SNR gains compared to that of FE+PDNP algorithm when the media noises σ Tc are 2%, 3%, 4%, respectively. Such performance improvements are attributed to both factors: on one hand the ITI pattern and media noise along the CT can be handled by the 2D VE more effectively compared to the 2D FE; on the other hand, the media noise predicted by the MTNN with strong nonlinear characterizing ability can model the thermal jitters more accurately compared to the conventional PDNP  detector using the linear prediction. It's also found that the MTNN+VE+MB algorithm can provide the 1.1 dB, 1.6 dB and 2.0 dB SNR gains compared to the MTNN+VE+B algorithm when the media noises σ Tc are 2%, 3%, 4%, respectively. This is because the former can capture the effect of media noise accurately with MTNN, which improves the BER performance relative to BCJR detector by whitening the residual equalization noise. Correspondingly the proposed multitrack detection algorithm demonstrates the effectiveness of media noise mitigation ability when the media noise deteriorates with the increased thermal jitters.
Additionally, for the HT track, the MTNN+VE+MB algorithm can still provide 1.0 dB, 1.1 dB and 1.2 dB SNR gains ( Fig. 7(b)) compared to that of FE+PDNP when the media noises σ Tc are 2%, 3%, 4%, respectively. It is found that the SNR gains provided by the MTNN+VE+MB algorithm relative to that of FE+PDNP are less significant for the HT track compared to that of LT track. This is because the high thermal gradient of HT track is less sensitive to Curie temperature variations compared to the LT track with lower gradient.
Meanwhile it's interesting to find that the decoded BER performance of LT track with the MTNN+VE+MB algorithm is better than that of HT track at the case of 2% σ Tc , however it becomes worse than the HT track when σ Tc further increases to 3% and 4%. The reasons are as following. When the RHA is used to readback three tracks (HT 1 , LT 1 , and HT n 2 in Fig. 5), only readback signals from the current track and adjacent tracks at one side are available for the equalization of HT tracks, however the readback signal from the LT and both adjacent tracks are available for the equalization of central LT track. This will cause the degradation of equalization performance for the HT track more severely than that of central LT track. Correspondingly when the media noise σ Tc is only 2%, the detrimental effects of ITI on equalization of HT track overweigh that of media noise on LT track, and this deteriorates BER performance of former more severely compared to the latter. On the contrary, the increased media noise becomes more detrimental to LT track with lower thermal gradient when the thermal jitters further increase, which deteriorates the BER performance of LT track more severely compared to the HT track.
Meanwhile when the media noises increase in the HIMR, the SNR gap of HT track and LT track is found to decrease with the MTNN+VE+MB algorithm compared to that with FE+PDNP algorithm. For example, when there is 4% σ Tc , the SNR gaps between HT and LT tracks at the decoded BER of 10 −6 are 2.9 dB and 1.4 dB for the FE+PDNP and MTNN+VE+MB algorithms, respectively. This is because the decoded LLRs of both adjacent HT tracks with more reliable information are fed into the MTNN to aid detecting less reliable LT track for the MTNN+VE+MB algorithm.

B. BER PERFORMANCE OF HIMR WITH DIFFERENT TRACK DENSITIES
The bits are assumed to record with track width of 26 nm and bit length of 8 nm in section A, which corresponds to the channel bit density of 3.10 Tb/in 2 . In order to evaluate the performance of detection algorithms at different track densities, the track width is varied to 28 nm and 24 nm while the bit length is kept as 8 nm (i.e., linear density of 3101 KBPI). This corresponds to the channel bit densities of 2.88 and 3.36 Tb/in 2 , respectively. Here media noises consisting of 2% σ Tc and 5% σ H k are assumed in the simulation and the code rate of LDPC is 0.93.
For the LT track, it is found that the NN+VE+PDNP algorithm can provide 0.5 dB, 0.8 dB, and 1.0 dB SNR gains compared to that of FE+PDNP algorithm at the track widths of 28 nm, 26 nm, and 24 nm, respectively. The reason is that the neural network aided 2D variable equalizer can predict the ITI pattern accurately and mitigate the increasing ITI more effectively with different equalizer coefficients. Furthermore, the proposed MTNN+VE+MB algorithm provides a 1.0 dB, 1.4 dB, and 1.7 dB SNR gains compared to that of FE+PDNP algorithm for the track widths of 28 nm, 26 nm, and 24 nm, respectively. Since both ITI and media noise along the CT direction increases with decreased track width, the MTNN+VE+MB algorithm can mitigate both detrimental effects using the joint learning effect of neural network. For the HT tracks, the MTNN+VE+MB algorithm also provides 0.9 dB, 1.0 dB and 1.3 dB SNR gains compared to that of FE+PDNP algorithm for the track widths of 28 nm, 26 nm, and 24 nm, respectively, as shown in Fig. 8(b). Here the SNR gain of MTNN+VE+MB algorithm at HT tracks is smaller than that of LT tracks because the writing performance of HT tracks with higher thermal gradient is superior to that of LT track.
Additionally, the SNR gap between HT track and LT track is decreased with the proposed MTNN+VE+MB algorithm compared to the FE+PDNP algorithm. For example, when the track width TW HIMR is 26 nm, the SNR gaps between HT track and LT track are 0.8 dB and 0.3 dB for FE+PDNP and MTNN+VE+MB algorithm, respectively. This is because the decoded LLRs of more reliable HT tracks are inputted into the MTNN to aid the detection of less reliable LT tracks for the MTNN+VE+MB algorithm.

C. BER PERFORMANCE OF HIMR WITH DIFFERENT LINEAR DENSITIES
The increased linear density of HIMR system will cause more transition noises and overwriting effects, which also results in the degraded BER performance significantly. Hence the proposed multitrack detection algorithm is also investigated for the HIMR with different linear densities, where the linear density is increased from 2753 KBPI (i.e., bit length of 9 nm) to 3101 KBPI (i.e., bit length of 8 nm) and 3541 KBPI (i.e., bit length of 7 nm), respectively. Here the track width TW HIMR is kept as 26 nm. The media noises of 2% σ Tc and 5% σ H k are assumed in the simulation, and the LDPC code with code rate of 0.93 is utilized.
It's found that for the LT track ( Fig. 9(a)), the NN+VE+PDNP algorithm provides 0.5 dB, 0.8 dB, and 0.9 dB SNR gains compared to that of FE+PDNP algorithm at the bit length of 9 nm, 8 nm, and 7 nm, respectively. This is because the transition noise and overwriting effects of neighboring tracks become more severe with the increased linear density, which also deteriorates the ITI of current track. Thus the variable equalizer can handle such increased ITI more effectively and improve the BER performance compared to that of fixed equalizer. While compared to the FE+PDNP algorithm, MTNN+VE+MB algorithm provides the SNR gains of 1.0 dB, 1.4 dB, and 1.9 dB at the bit length of 9 nm, 8 nm, and 7 nm, respectively. The further performance improvement is achieved since the MTNN can predict the media noise more accurately with its strong nonlinear characterizing ability compared to the PDNP using the linear prediction. Such SNR gains are increased with deteriorated media noises when the linear density increases. Additionally, for the HT tracks, the SNR gains of 0.8 dB, 1.0 dB, and 1.2 dB are obtained by the MTNN+VE+MB algorithm compared to that of FE+PDNP algorithm at the bit length of 9 nm, 8 nm, and 7 nm, respectively. Here the smaller SNR gains are due to the fact that the higher thermal gradient of HT track leads to smaller transition noises and overwriting effects compared to that of LT track.
Even though the linear densities of adjacent track are assumed the same, the proposed detection algorithm can be modified properly for the HIMR with adjacent tracks of different linear densities. First, if the linear density of high temperature (HT) track is about β(β ≥ 1) times as high as that of low temperature (LT) track, the HT track should be sampled at baud rate and LT track should be sampled at β times as high as its own baud rate in order to keep the number of sampled signals the same within unit time at the input terminal of neural network. Then for the 2D equalizer design of HT and LT tracks, the sampled signals are resampled into the baud rate of corresponding target track and the detailed sampling methods can be found in reference [26]. It's interesting to find that the MTNN+VE+MB algorithm can be modified for the HIMR system when the linear density difference of adjacent tracks is within a tolerable range (i.e., β ≤ 1.5), which can still provide obvious SNR gains compared to the conventional FE+PDNP algorithm. For example, if the track widths of HT and LT tracks are both 26 nm, while the bit lengths of HT and LT tracks are 8 nm and 12 nm (i.e., β = 1.5), it is found that the MTNN+VE+MB algorithm can provide 1.5 dB SNR gains (Fig.10) compared to that of FE+PDNP algorithm. Meanwhile for the HT track, the MTNN+VE+MB algorithm can also provide 0.6 dB SNR gains compared to that of FE+PDNP. However, when the linear density differences of adjacent tracks are further increased, the number of possible ITI types predicted by the MTNN should be increased accordingly since one bit of LT track will align with multiple bits of HT at this case.

IV. CONCLUSION
In summary, this study proposes a multitrack detection scheme based on a multi-task neural network to mitigate the ITI and media noise jointly in the HIMR with 2D ISI and thermal jitters. It realizes the predictions of both ITI type and media noise with decreased computation complexity compared to the separated neural networks. On one hand the 2D variable equalizer with distinct coefficients is designed according to the predicted ITI types, which can decrease the equalization noise significantly compared to that of conventional equalizer. On the other hand, the predicted residual media noise of variable equalizer is embedded into the branch metrics of modified BCJR detector to whiten the data dependent noise. The simulation indicates that the proposed MTNN+VE+MB algorithm provides noticeable SNR gains compared to that of FE + PDNP algorithm for both HT track and LT track of HIMR. More importantly the SNR improvements provided by the proposed multitrack detection are further enhanced with the increased media noise, track densities, and linear densities, which demonstrate its effectiveness in the high density HIMR system.