In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage k of the trellis maps to a possible complex-valued symbol transmitted by antenna k. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4 × 4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology. With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps. With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps.