The retrieval of digital evidence responsive to discovery requests in civil litigation, known in the United States as Â¿e-discovery,Â¿ presents several important and understudied conditions and challenges. Among the most important of these are (i) that the definition of responsiveness that governs the search effort can be learned and made explicit through effective interaction with the responding party, (ii) that the governing definition of responsiveness is generally complex, deriving both from considerations of subject-matter relevance and from considerations of litigation strategy, and (iii) that the result of the search effort is a set (rather than a ranked list) of documents, and sometimes a quite large set, that is turned over to the requesting party and that the responding party certifies to be an accurate and complete response to the request. This paper describes the design of an Â¿interactive taskÂ¿ for the text retrieval conference's legal track that had the evaluation of the effectiveness of e-discovery applications at the Â¿responsive reviewÂ¿ task as its goal. Notable features of the 2008 interactive task were high-fidelity human-system task modeling, authority control for the definition of Â¿responsiveness,Â¿ and relatively deep sampling for estimation of type 1 and type 2 errors (expressed as Â¿precisionÂ¿ and Â¿recallÂ¿). The paper presents a critical assessment of the strengths and weaknesses of the evaluation design from the perspectives of reliability, reusability, and cost-benefit tradeoffs.