We address the difficult problem of estimating the reliability of multiple-version software. The central issue is the degree of statistical dependence between failures of diverse versions. Previously published models of failure dependence described what behavior could be expected "on average" from a pair of "independently generated" versions. We focus instead on predictions using specific information about a given pair of versions. The concept of "variation of difficulty" between situations to which software may be subject is central to the previous models cited, and it turns out to be central for our question as well. We provide new understanding of various alternative imprecise estimates of system reliability and some results of practical use, especially with diverse systems assembled from pre-existing (e.g., "off-the-shelf") subsystems. System designers, users, and regulators need useful bounds on the probability of system failure. We discuss how to use reliability data about the individual diverse versions to obtain upper bounds and other useful information for decision making. These bounds are greatly affected by how the versions' probabilities of failure vary between subdomains of the demand space or between operating regimes-it is even possible in some cases to demonstrate, before operation, upper bounds that are very close to the true probability of failure of the system-and by the level of detail with which these variations are documented in the data.