We formulate edge detection as statistical inference. This statistical edge detection is data driven, unlike standard methods for edge detection which are model based. For any set of edge detection filters (implementing local edge cues), we use presegmented images to learn the probability distributions of filter responses conditioned on whether they are evaluated on or off an edge. Edge detection is formulated as a discrimination task specified by a likelihood ratio test on the filter responses. This approach emphasizes the necessity of modeling the image background (the off-edges). We represent the conditional probability distributions nonparametrically and illustrate them on two different data sets of 100 (Sowerby) and 50 (South Florida) images. Multiple edges cues, including chrominance and multiple-scale, are combined by using their joint distributions. Hence, this cue combination is optimal in the statistical sense. We evaluate the effectiveness of different visual cues using the Chernoff information and Receiver Operator Characteristic (ROC) curves. This shows that our approach gives quantitatively better results than the Canny edge detector when the image background contains significant clutter. In addition, it enables us to determine the effectiveness of different edge cues and gives quantitative measures for the advantages of multilevel processing, for the use of chrominance, and for the relative effectiveness of different detectors. Furthermore, we show that we can learn these conditional distributions on one data set and adapt them to the other with only slight degradation of performance without knowing the ground truth on the second data set. This shows that our results are not purely domain specific. We apply the same approach to the spatial grouping of edge cues and obtain analogies to nonmaximal suppression and hysteresis.