A Hybrid Approach for Approximating the Ideal Observer for Joint Signal Detection and Estimation Tasks by Use of Supervised Learning and Markov-Chain Monte Carlo Methods

The ideal observer (IO) sets an upper performance limit among all observers and has been advocated for assessing and optimizing imaging systems. For general joint detection and estimation (detection-estimation) tasks, estimation ROC (EROC) analysis has been established for evaluating the performance of observers. However, in general, it is difficult to accurately approximate the IO that maximizes the area under the EROC curve. In this study, a hybrid method that employs machine learning is proposed to accomplish this. Specifically, a hybrid approach is developed that combines a multi-task convolutional neural network and a Markov-Chain Monte Carlo (MCMC) method in order to approximate the IO for detection-estimation tasks. Unlike traditional MCMC methods, the hybrid method is not limited to use of specific utility functions. In addition, a purely supervised learning-based sub-ideal observer is proposed. Computer-simulation studies are conducted to validate the proposed method, which include signal-known-statistically/background-known-exactly and signal-known-statistically/background-known-statistically tasks. The EROC curves produced by the proposed method are compared to those produced by the MCMC approach or analytical computation when feasible. The proposed method provides a new approach for approximating the IO and may advance the application of EROC analysis for optimizing imaging systems.


I. INTRODUCTION
Objective, or task-based, measures of image quality (IQ) are advocated for use in the assessment and optimization of medical imaging systems [1]- [5]. Unlike traditional physical measures of IQ, task-based measures of IQ quantify the ability of an observer to perform a specific task such as detection or estimation of a signal. To compute such measures of IQ when developing and refining new imaging technologies, numerical observers (NOs) have been widely employed [2], [4]. This enables the exploration of large parameter spaces when optimizing system performance. Such NOs can be designed to estimate an upper bound on the possible performance of any observer for a given task and collection of image data. Such models are referred to as ideal observers (IOs). Alternatively, anthropomorphic NOs can be designed to mimic the performance of a human observer, which is generally suboptimal. The focus of this work will be on the computation of IOs.
The Bayesian IO is a numerical observer (NO) that achieves the performance of the optimal decision maker acting on given measured data [3], [6], [7]. The IO performance is a task-based image quality measure that depends on the data and the task (e.g., lesion detection) and not the capabilities of a human observer, the quality of feature extraction, or a particular classification scheme. As such, it is a metric of fundamental importance in the objective assessment of imaging hardware and data-acquisition designs [1]. Knowledge of the IO performance is also important because it can reveal when task-related information is readily extracted by a human observer or another sub-optimal NO. This can permit the identification of opportunities for improved image processing or other methodological changes that lead to improved taskperformance.
Much of the literature on IO approximation has focused on binary signal detection tasks. The IO test statistic in this case is generally a non-linear function of the image data and, except in some special cases, cannot be determined analytically. Because of this, sampling-based methods that employ Markov-chain Monte Carlo (MCMC) techniques and supervised learning have been developed to approximate the IO test statistic for medical image applications [8]- [10].
Detection-estimation tasks that involve the detection of a signal and the subsequent estimation of a set of parameters for the signal present decisions are relevant to many medical imaging applications [6], [11]- [15]. The estimation receiver operating characteristic (EROC) curve [6] can be employed to assess the performance of an observer on detection-estimation tasks and the area under the EROC curve (AEROC) can be utilized as a figure-of-merit (FOM). Similar to the case of binary detection tasks, the IO decision strategy for detectionestimation tasks is analytically intractable except in special cases [16]. However, unlike the of case binary detection tasks for which MCMC methods or supervised learning methods can be employed to establish NOs, [9], [17], there is a lack of available NOs for approximating the IO for detectionestimation tasks.
To address this need, in this study a hybrid method is developed for approximating the IO for a wide class of detection-estimation tasks. The proposed method combines deep learning (DL) and a Markov-Chain Monte Carlo (MCMC) method in order to implement the known IO decision strategy [6]. Unlike traditional MCMC methods for approximating the IO, the hybrid method is not limited to use of specific utility functions. In addition, a purely supervised learning-based sub-ideal observer is designed, which avoids the need to involve the MCMC method. Computer-simulation studies are conducted to validate the performance of the proposed methods, which address signalknown-statistically/background-known-exactly (SKS/BKE) and signal-known-statistically/background-known-statistically (SKS/BKS) tasks. The proposed methods provide a new capability for approximating the IO for detection-estimation tasks and may advance the application of EROC analysis for optimizing imaging systems.
The remainder of the paper is organized as follows. Section II provides salient information regarding signal detectionestimation theory. The proposed hybrid IO approximation method and the supervised learning-based sub-ideal observer are described in Section III. The numerical studies and the results are provided in Sections IV and V, respectively. Finally, the article concludes with a discussion in Section VI.

II. BACKGROUND
A linear digital imaging system can be described as a continuous-to-discrete (C-D) mapping process [3]: where g ∈ R N ×1 is the measured image vector, f (r) denotes the object function that is dependent on the coordinate r ∈ R k×1 , k ≥ 2, H denotes a linear imaging operator that maps L 2 (R k ) to R N ×1 , and n ∈ R N ×1 denotes the measurement noise. The imaging process described in Eqn.
(1) can be expressed as [3]: where [g] m and [n] m denote the m th component of g and n, respectively, V denotes the support of f (r), and h m (r) is the point response function (PRF) of the imaging system [3].

A. General detection-estimation tasks
In a binary detection-estimation task, each image is either signal-absent or contains a signal that is specified by a parameter vector θ. The imaging processes under these hypotheses can be expressed as: where f s(θ) and f b denote the signal and background, respectively, and s ≡ Hf s(θ) and b ≡ Hf b denote the signal and background images. To perform this task, a deterministic observer first computes a test statistic T (g) that maps the measured image g to a real-valued scalar variable. The value of T (g) is then compared to a predetermined threshold τ to determine if g satisfies H 0 or H 1 . Subsequently, an estimatê θ(g) is given if the observer decides that the signal is present [6], [16]. An EROC curve is generated by plotting the expected utility ofθ(g) for the true positive (TP) decisions versus the falsepositive (FP) fraction as τ is varied. The expected utility of θ(g) for a true positive (TP) decision (referred to as the expected utility) is defined as [6] where U T P (τ ) is the expected utility, u(θ(g), θ) is a utility function for the parameter estimate, E[·] is the mathematical expectation operator, and step[·] is the Heaviside step function.
In general, the utility function should be designed to return a high value whenθ(g) is close to θ and a low value otherwise. Each point on the EROC curve gives the expected utility of the estimated parameter for the TP decisions at a given falsepositive fraction. Because the utility function may be negative, unlike the traditional ROC curve, the EROC curve is not always an increasing as a function of FPF.

B. Ideal observer for detection-estimation tasks
The IO test statistic T I (g) and estimatorθ I (g) can be computed as [6]: andθ where the quantity Λ(g|θ) is the θ−conditional likelihood ratio: Equation (5) implies that the IO test statistic can be written as [6]: whereθ I (g) is the ideal estimator defined in Eqn. (6).

C. Scanning linear observer
The scanning linear observer (SLO) [12] is a sub-optimal linear observer that can be employed with detection-estimation tasks when the IO is intractable [13], [14]. The SLO is designed to approximate the mode of the posterior density and perform pseudo maximum a posteriori (MAP) estimation [12], [14] as: (θ) whereḡ(θ) is the mean image averaged over the parameters andK −1 g is the inverse covariance matrix for (g −ḡ(θ)) with approximation of slowly varying parameters. The corresponding SLO test statistic is given by (θ SLO (g)) TK−1 gḡ (θ SLO (g)) + ln p(θ SLO (g)). (10)

DETECTION-ESTIMATION TASKS
A. Approximating the IO using a hybrid supervised learning-MCMC method The supervised learning-based methods employed in previous studies [9], [10], [17] cannot be utilized directly to approximate the IO test statistic for detection-estimation tasks. One reason is that the test statistic T I (g) in Eqn. (8) involves an integral that depends on the ideal estimatorθ I (g). To circumvent this, a novel hybrid supervised learning-Markov-Chain Monte Carlo (MCMC) method is described below for IO approximation.
Considering that the signal parameter vector θ is estimated only when the signal is determined to be present, p(θ) = p(θ|H 1 ) holds true [6], [16]. Thus, the test statistic T I (g) can be expressed as Using Bayes's rule, T I (g) can then be decomposed as [7] T I (g) = pr(g|H 1 ) pr(g|H 0 ) pr(θ|g, H 1 )u(θ I (g), θ)dθ where Λ(g) ≡ p(g|H1) p(g|H0) is the likelihood ratio and is the utility weighted posterior mean. Equation (12) is central to the methodology described below. The proposed hybrid supervised learning-MCMC strategy is summarized in Fig. 1. First, a multi-task CNN is employed to approximate the likelihood ratio Λ(g) and the ideal estimatê θ I (g). Second, U (g) is approximated by use of a MCMC technique. Finally, the IO test statistic is obtained by multiplying Λ(g) and U (g) according to Eqn. (12). The corresponding details are described below.
1) Approximating the likelihood ratio and ideal estimate using multi-task CNNs: The likelihood ratio Λ(g) and the ideal estimateθ I (g) can be approximated by use of CNNs. Specifically, as shown in Fig. 1, a multi-task CNN architecture is employed in which several convolutional layers in the Shared Conv block are shared by the detection and estimation sub-networks (i.e., the bottom and top network branches in Fig. 1). Several additional convolutional layers in the Estimation Conv block are employed by the estimation sub-network only. For use in training the detection network, Fig. 1: A schematic of the hybrid method that combines a multi-task CNN and the MCMC method. The Shared Conv and Estimation Conv are two blocks that comprise several convolutional layers. The convolutional layers in the Shared Conv are shared by the detection and estimation sub-networks (i.e., the bottom and top network branches). The layers in the Estimation Conv are employed by the estimation sub-network only.
the loss function is defined as the binary cross entropy loss function to approximate the posterior probability p(H 1 |g), which is a monotonic transformation of Λ(g) [9]. For use in training the estimation sub-networks, the loss function is defined as the negative of the utility function [6], [18]. The derivation of the loss function for estimation tasks is provided in Appendix A.
Consider that a training dataset {g i } 2J 1 that contains 2J independent measured images with the first J images satisfying the H 1 hypothesis and the rest satisfying the H 0 hypothesis is employed to train the multi-task CNN. Let y ∈ {0, 1} denote the class label, where y = 0 and y = 1 correspond to the hypothesis H 0 and H 1 , respectively. The corresponding class labels and target parameter vectors for the signal present cases are denoted as {y i } 2J 1 and {θ i } J 1 , respectively. Let the vector w = [w 1 , w 2 ] denote the trainable parameters of the multi-task CNN, where w 1 and w 2 denote the weights of the detection and estmation sub-networks, respectively. In terms of these quantities, the loss functions employed for training the detection and estimation sub-networks can be expressed as: The multi-task CNN can be trained by minimizing the loss functions described in Eqn. (14) in an alternating fashion. At each iteration of the training process, Eqn. (14b) is minimized first and Eqn. (14a) is minimized subsequently. Additional details relevant to a specific implementation of this procedure are described in Sec. IV-D.
2) Approximating the utility weighted posterior mean using MCMC techniques: According to Eqn. (12), the utility weighted posterior mean U (g) in Eqn. (13) is required when computing T I (g). For a SKS/BKE task, U (g) can be approximated via Monte Carlo integration as: where θ (j) is sampled from the posterior distribution p(θ|g, H 1 ). A Markov chain with an initial parameter vector θ (0) and a proposal density q(θ|θ (j) ) can be constructed to generate θ (j) , which is described below. Given θ (i) , a candidate parameter vector θ * is drawn from the proposal density and is accepted to the Markov chain with an acceptance probability: where p(θ) is the distribution of the parameter vector. The signal images s(θ (i) ) and s(θ * ) are specified by θ (i) and s(θ * ), respectively. The vector θ (i+1) ≡ θ * if the candidate is accepted; otherwise θ (i+1) ≡ θ (i) . When a random walk Metropolis-Hastings (RWMH) [19] is employed, the proposal density q(θ * |θ (i) ) is a simple Gaussian density: . When background variability is considered, U (g) can still be approximated according to Eqn. (15) if it is assumed that the background can be described by a stochastic object model (SOM) that is parameterized by the parameter vector α, i.e., b ≡ b(α). The corresponding acceptance rate is: where Here, b(α (i) ) and b(α * ) are the background images determined by the parameter vectors α (i) and α * , respectively. The quantity p(α) is the corresponding probability density, and q 1 (·) and q 2 (·) are the proposal densities for the signal and background parameters, respectively. The pair (θ (i) , α (i) ) are sampled from the distribution p(θ, b(α)|g, H 1 ), and (θ * , α * ) are candidate parameter vectors that are drawn from the proposal densities. A Markov chain for generating (θ (i) , α (i) ) can then be established in a similar way as described above.
B. Supervised learning-based sub-ideal NO As described above, the proposed hybrid method can approximate the IO for detection-estimation tasks. However, the MCMC technique is a component of the hybrid method, which requires knowledge of background density p(α) if background variability is considered. This currently limits the application of the hybrid method to certain object models [9]. To circumvent this, a purely supervised learning-based suboptimal NO can be employed.
A sub-optimal NO that is established by use of only supervised-learning can be readily obtained by eliminating the MCMC method from the hybrid method as depicted in Fig.  2. This modification removes the influence of the estimation result on the detection performance. The corresponding test statistic and estimator are given by: where T Sub (g) is IO test statistic for binary signal detection tasks andθ Sub (g) IO estimator for detection-estimation tasks.

IV. NUMERICAL STUDIES
Computer-simulation studies were performed to investigate the proposed NOs for detection-estimation tasks. The considered signal detection-estimation tasks included both BKE and BKS tasks. A lumpy background (LB) model [8] and a clustered lumpy background (CLB) model [20] were employed in the BKS tasks.
The imaging system considered was an idealized parallelhole collimator system that was specified by a linear C-D mapping with Gaussian point response functions (PRFs) [8]: where h and w m are the height and width of the PRFs, respectively. The signal to be detected and estimated was modeled by a 2D Gaussian function: where A s is the signal amplitude, w s is the signal width and r s is the center of signal. These signal parameters can be random when a detection-estimation task is specified. Considering the specified imaging system in Eqn. (20), the m th element [s] m of the signal image s is given by: In the studies described below, the AEROC was employed to quantify observer performance. The AEROC was estimated by use of a nonparametric estimator [11] and the uncertainty in the estimates was conveyed by use of a 90% confidence interval.

A. SKS/BKE signal detection-estimation task
In the SKS/BKE case considered, the task was to detect a random signal and estimate its amplitude. This task can be viewed as a surrogate for tumor detection in positron emission tomography (PET) [21]. In this task, the dimensions of g, b, and n were 64 × 64 pixels. Without loss of generality, b = 0. The signal defined in Eqn. (21) was employed with w s = 1 and r s = [32; 32] T . The to-be-estimated signal amplitude was sampled from a Gaussian distribution with mean µ A = 9 and standard deviation σ A = 4, i.e., A s ∼ N (9, 4 2 ). The assumed parameters of the imaging system defined in Eqn. (20) were h = 16 and w m = 3.87. The standard deviation σ n of Gaussian noise was set to 40. To define an EROC curve, a Gaussian utility function, Wunderlich et al. [16] provided the optimal decision rule for this case that is expressed as: Here, s ref is a reference signal whose components are defined as This analytic IO decision strategy was implemented as a reference method that the proposed hybrid method and subideal NO were compared against. The Gaussian proposal density in the MCMC method that was employed to estimate U (g) was chosen to have a standard deviation of 3.
B. SKS/BKS signal detection-estimation tasks with a lumpy background model In the first BKS task considered, the Gaussian signal defined in Eqn. (21) was employed with A s = 6 and w s = 3. The signal location r s was a two-dimensional random vector whose components were independently sampled from a uniform distribution on the interval (16,48). A quadratic utility function u 1 (r s , r s ) = 1 − 1 1 r s − r s 2 2 and an l 1 -norm based utility function u 2 (r s , r s ) = 1 − 1 2 r s − r s 1 were employed, where 1 = 100 and 200, and 2 = 20, respectively. A quadratic utility function focuses on large errors while an l 1 -norm based one emphasizes more on small errors. The different values 1 were employed to investigate the influences of estimation tasks on the test statistics according to Eqn. (12). Compared with previously investigated detection-localization tasks [10], [22], the task in this study can be considered as a straightforward generalization.
To emulate background variability, a lumpy object model [23] was utilized: where N b ∼ P (N ) denotes the number of the lumps with P (N ) denoting a Poisson distribution with the meanN = 5. The lump function l(r − r n |a, w b ) was modeled by a 2D Gaussian function with lump amplitude a = 10 and lump width w b = 7: Here, r n denotes the center location of the n th lump that was sampled from a uniform distribution over the spatial support of the image. The dimensions of s, b, n and g in Eqn. (3) were 64 × 64. The imaging system was specified by h = 40 and w m = 0.5. Given the assumed imaging system, the m th element [b] m of the background image b is given by: .
(27) The measurement noise was described by i.i.d. Gaussian random variables with a mean of 0 and a standard deviation of 320. One realization of the signal image s, the background image b, and the corresponding signal-present noisy measurement g are shown in Figure 3.
Because the IO decision rule in this case cannot be analytically computed, an MCMC-based IO approximation method was established as a reference method. Unlike the proposed hybrid method, this MCMC-based method requires a strong constraint on the utility function. The details of the MCMC approximated IO (MCMC-IO) are provided in Appendix B. The performances of the proposed hybrid method and subideal NO were compared to that of MCMC-IO when the quadratic utility function was employed. When the l 1 -norm based utility function was considered, the MCMC-IO cannot be employed and the SLO described in Sec. II-C was utilized as the reference observer. To implement the SLO, the covariance matrix was estimated by use of the covariance matrix decomposition method [3], [13] and 4,000 signalpresent and 4,000 signal-absent noiseless images were utilized. The Gaussian proposal density in the MCMC method that was employed to estimate U (g) was chosen to have a standard deviation of 4 for each location coordinate.

C. SKS/BKS signal detection-estimation tasks with a clustered lumpy background model
In the second BKS task considered, the signal image s had an amplitude of 0.05 and signal location r s = [32; 32] T . The width of s was a random vector and was sampled from a uniform distribution on the interval (1, 6). A Gaussian utility function, u(ŵ, w) = exp[−(ŵ − w) 2 /(2σ 2 u )], was employed with σ u = 3.
A clustered lumpy background (CLB) model [20] was employed to emulate background variability. The m th element [b] m of the background image b was computed as [20]: Here, K ∼ P (K) denotes the number of the clusters with P (K) denoting a Poisson distribution with the meanK, N k denotes the number of blobs in the k th cluster that was sampled from a Poisson distribution with the mean ofN : N k ∼ P (N ), r k denotes the center location of the k th cluster that was sampled from a uniform distribution over the spatial support of the image, and r kn denotes the center location of the n th blob in the k th cluster that was sampled from a Gaussian distribution with the center of r k and standard deviation of σ. The blob function l(r|R θ kn ) was specified as: Here, L(r) is computed as the "radius" of the ellipse with halfaxes L x and L y , and R θ kn is the rotation matrix corresponding to the angle θ kn that was sampled uniformly between 0 and 2π. The generated background images were normalized to the range between 0 and 1. The parameters that specify the CLB model employed in this study are summarized in Table. I. The measurement noise was described by i.i.d. Gaussian random variables with a mean of 0 and a standard deviation of 0.33. One realization of the signal image s, the background image b, and the corresponding signal-present noisy measurement g are shown in Figure 4. The SLO described in Sec. II-C was utilized as the reference observer. The details regarding the implementation are described in Sec. IV-A.

D. Multi-task CNN training details
Details regarding the implementation of multi-task CNN described in Sec. III-A1 are described here. Each convolutional layer in the Shared Conv block comprised 64 filters with 5 × 5 spatial support followed by a Leaky ReLU activation function [24]. In the detection sub-network, a max-pooling layer [25] was employed to subsample the feature maps and the last layer was fully connected (FC) and employed a sigmoid activation function for estimation of the posterior probability. In the estimation sub-network, the Estimation Conv block included additional convolutional layers to account for the fact that estimation can be a more complicated task. The architecture for both blocks were identical. A max-pooling layer and a FC layer were added to compute the estimate.
The train-validation-test scheme [26] was employed to train and evaluate multi-task CNNs. The initial training dataset included 150,000 signal-present images and 150,000 signalabsent images for the BKS detection-estimation task with the lumpy background. For tasks that involved the clustered lumpy background, 200,000 signal-present images and 200,000 signal-absent images were employed. To mitigate overfitting, a "semi-online learning" method [9], [10] was employed in which measurement noise was generated on-thefly and added to noiseless images drawn from the finite initial training dataset. The validation datatset included 1000 signalpresent images and 1000 signal absent images. Finally, the test dataset comprised 1000 signal-present images and 1000 signal-absent images.
The multi-task CNN was trained by minimizing the loss functions described in Eqn. (14) on mini-batches in an alternating fashion. At each iteration of the training process, Eqn. (14b) was minimized first and Eqn. (14a) was minimized subsequently. Multi-task CNNs comprising different number of layers were trained for 200,000 mini-batches. Each minibatch contained 200 signal-absent images and 200 signalpresent images randomly selected from the training data set. The Adam optimizer [27] with a learning rate of 0.00001 was employed for model training.
In order to accurately approximate the IO, the CNN architecture needs to possess sufficient capacity [9]. For determining the optimal architecture of the multi-task CNN, the training process started from an architecture with one convolutional layer in both the Shared Conv and Estimation Conv blocks to formulate baseline detection and estimation sub-networks. More layers were added gradually as described below. The optimal number of convolutional layers for the detection sub-network was determined by adding more layers in the Shared Conv block until the cross-entropy on the validation dataset did not significantly decrease. The loss functions were considered as significantly decreased if its decrements are at least 1.0% of that produced by multi-task CNN with one less convolutional layer. After the number of layers for the detection sub-network was determined, the optimal number of layers for estimation sub-network was specified by increasing the number of convolutional layers in the Estimation Conv block. The training and implementation of the multi-task CNN were performed using Tensorflow [28].

A. SKS/BKE signal detection-estimation tasks
The optimal network architecture was determined to contain three convolutional layers in the Shared Conv block and two layers in the Estimation Conv block. In Figure 5, the EROC curves produced by the approximated IO (black solid curve) and the sub-ideal NO (yellow dashed curve) are compared with that produced by the analytical computation (red dashed curve). The AEROC values were 0.565±0.010, 0.565±0.010, and 0.570 ± 0.010 corresponding to the approximated IO, the sub-ideal NO, and the analytical computation, respectively. The corresponding EROC curves were statistically equivalent in this SKE/BKE case. Fig. 5: The EROC curves produced by the approximated IO (black), the analytical computation (red), and sub-ideal NO (yellow) for the BKE task were statistically equivalent.

B. SKS/BKS signal detection-estimation tasks with a lumpy background model
The optimal network architecture was determined to contain seven convolutional layers in the Shared Conv block and three layers in the Estimation Conv block. To compare the proposed methods with the IO approximated by the MCMC method, a quadratic utility function described in Sec. IV-B with 1 = 200 was employed. For this case, the EROC curves generated by use of the approximated IO (black solid curve) and the sub-ideal NO (yellow dashed curve) are compared with that produced by the MCMC method (red dashed curve) in Figure  6. The AEROC values were 0.697 ± 0.021, 0.686 ± 0.024, and 0.708 ± 0.027 corresponding to the approximated IO, the subideal NO, and the MCMC-IO, respectively. The corresponding EROC curves were statistically equivalent in this BKS signal detection-estimation task.
A quadratic utility function described in Sec. IV-B with approximated IO. Compared with the case addressed in Figure  6, the variance of U (g) increased from 0.1708 to 0.7295, indicating that the estimation performance has larger influence on the test statistic according to Eqn. (12). Because of this increase in variance, the performance of the hybrid approximated IO became better than that of the sub-ideal NO. In Figure 7, the EROC curves generated by use of the approximated IO (black solid curve) and the sub-ideal NO (yellow dashed curve) are compared with that produced by the MCMC method (red dashed curve). The AEROC values were 0.545±0.036, 0.486±0.042, and 0.553±0.031 corresponding to the approximated IO, the sub-ideal NO, and the MCMC-IO, respectively. The EROC curves corresponding to the approximated IO and the MCMC-IO are in close agreement in this SKS/BKS signal detection-estimation task, while the difference between the approximated IO and the sub-ideal NO is statistically significant. Intuition for this result can be gained by noting that, when the variance of U (g) is small, Eqn. (12) can be approximated as T I (g) ≈ Λ(g) U (g) , where U (g) is the mean of U (g). In this case, because their test statistics differ by (approximately) a constant, the hybrid approximated IO and sub-ideal NO will perform similarly. However, this is not to be expected when the variance of U (g) is large because the test statistics T I (g) and Λ(g) will not simply differ by a constant and they will not generally be related by a monotonic transformation.
To demonstrate a case where the MCMC-IO cannot be applied but the hybrid method can, the l 1 -norm based utility function described in Sec. IV-B was employed. In this case, the SLO was employed as the reference observer. Compared with a quadratic utility function, small errors weight more in an l 1 -norm based one. As shown in Figure 8, the EROC curves generated by use of the approximated IO (black solid curve) and the sub-ideal NO (red dashed curve) are compared with that produced by the SLO (yellow dashed curve). The AEROC values corresponding to the approximated IO, the sub- ideal NO, and the SLO were 0.643 ± 0.018, 0.633 ± 0.018, and 0.294 ± 0.023, respectively. The EROC curves corresponding to the approximated IO and the sub-ideal NO were in close agreement in this task. As expected, the AEROC value produced by the SLO was significantly smaller. The results demonstrated that the performance of sub-ideal NO was significantly better than that of the SLO.

C. BKS signal detection-estimation tasks with a CLB model
To investigate a case where only the sub-ideal purely supervised NO could be applied, the task involving the CLB background model was employed. The optimal network architecture was determined to contain eight convolutional layers in the Shared Conv block and three layers in the Estimation Conv block. Considering that current MCMC methods have not been applied to the CLB object model, the hybrid method was not employed and the SLO was employed as the reference method. The EROC curves corresponding to the sub-ideal NO (black dashed curve) are compared with that generated by SLO (red dashed curve) in Figure 9. The AEROC value corresponding to the sub-ideal purely supervised NO was 0.601 ± 0.012, which was larger than the 0.538 ± 0.013 produced by the SLO. Fig. 9: The EROC curves corresponding to the sub-ideal NO (black) are compared with that generated by SLO (red) for the BKS task with the clustered lumpy background model. The AEROC value produced by the sub-ideal NO was larger than that produced by the SLO.

VI. SUMMARY
General signal detection-estimation tasks are frequently considered in medical imaging. For detection-estimation tasks, the EROC curve has been proposed for evaluating the performance of observers. However, in practice, it is difficult to accurately approximate the IO that maximizes the AEROC for a general detection-estimation task. In this work, a hybrid approach was developed that combines a multi-task CNN and a MCMC method in order to approximate the IO for detectionestimation tasks. Compared with the MCMC-IO, the hybrid method is not limited to use of specific utility functions. Additionally, a supervised learning-based sub-ideal NO was designed for signal detection-estimation tasks. Both SKS/BKE and SKS/BKS tasks were considered and computer-simulation studies were conducted to validate the proposed methods. The proposed hybrid method provides a new approach for approximating the IO and may advance the application of EROC analysis for optimizing imaging systems.
The proposed methods possess certain limitations. The MCMC methods are employed in the hybrid framework for IO approximation, which makes the limitations of MCMC methods also exist in the hybrid methods. Numerous practical issues such as the design of proposal densities from which the Markov chain can be efficiently generated need to be addressed. Because of this, the hybrid method has been limited to relatively simple object models. An advanced method called MCMC-GAN [29] can potentially solve this problem. By replacing the classic MCMC methods in the hybrid method with MCMC-GAN, the IO approximation method can be extended to more complicated background models. In addition, the supervised learning-based method may require a large amount of training data to accurately train multi-task CNNs. To address this limitation, one may establish a stochastic object model (SOM) from experimental data by training an AmbientGAN [30], [31]. Given a well-established SOM, one can produce large amount of training samples to train the multi-task CNNs.
There remains several other topics for future investigation. The proposed methods should be further validated by use of a variety of image data and detection-estimation tasks that address real-world problems. Additionally, it will be important to quantify the effect of insufficient training data on the proposed methods. Finally, it will be interesting to employ the proposed methods to perform task-based performance evaluation of deep learning-based image restoration [32]- [35] and image reconstruction [36]- [38] techniques.

APPENDIX A APPROXIMATING THE IDEAL ESTIMATE USING MULTI-TASK CNNS
According to Eqn. (6), the ideal estimate can be defined as: where Λ(g) is the likelihood ratio. Considering that Λ(g) is independent toθ, Eqn. (30) can be written as: (31) A supervised learning-based method can be employed to approximateθ I (g) that maximizes the expectation E[U (g)] over an ensemble of training data, where the expectation is taken over g. Assuming g is signal-present p(g) = p(g|H 1 ), E[U (g)] can be depicted as: where the expectation is taken over the random variables g,θ, and θ. Considering a training dataset {g j , θ j } J 1 that contains J independent signal-present images with corresponding parameter vectors for joint tasks. Similar to Wunderlich et al. [11], E[u(θ, θ)] can be estimated with the unbiased estimators: whereθ j is the observer's estimate of θ j . Let w 2 denotes the weight vector that parameterizes an estimation sub-network andθ(w 2 ) is the signal parameter vector estimated by the CNN. The desired weight vector w I for the estimation subnetwork can be determined by optimizing the following loss function: which leads to the loss function introduced in Eqn. (14b).

B APPROXIMATING THE IO BY USE OF MCMC METHODS
In this section, a reference method based on MCMC techniques is described. This method can approximate the IO for a general signal detection-estimation. The difference between the hybrid method and the MCMC-based method is that the latter one employ MCMC techniques to approximate the likelihood ratio Λ(g) and ideal estimateθ I (g) while the former one use a multi-task CNN instead. The steps for approximating Λ(g) andθ I (g) are described below.