Skip to Main Content
Given a set of n d-dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson-correlation ρ (Hamming distance d · 1-ρ/2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constants ε, ρ >; 0 and d >;>; logn/ρ2 , finds the correlated pair with high probability, and runs in time O(n 3ω/4 + ϵ) <; O(n1.8), where w <; 2.38 is the exponent of matrix multiplication. Provided that d is sufficiently large, this runtime can be further reduced. These are the first subquadratic-time algorithms for this problem for which ρ does not appear in the exponent of n, and improves upon O(n2-O(ρ)), given by Paturi et al. , Locality Sensitive Hashing (LSH)  and the Bucketing Codes approach . Applications and extensions of this basic algorithm yield improved algorithms for several other problems: ApproximateClosest Pair: For any sufficiently small constant ϵ >; 0, given n vectors in Rd, our algorithm returns a pair of vectors whose Euclidean distance differs from that of the closest pair by a factor of at most 1+ϵ, and runs in time O(n2-Θ(√ϵ)). The best previous algorithms (including LSH) have runtime O(n2-O(ϵ)). Learning Sparse Parity with Noise: Given samples from an instance of the learning parity with noise problem where each example has length n, the true parity set has size at most k <;<; n, and the noise rate is η, our algorithm identifies the set of k indices in time n ω+ϵ/3 k poly(1/1-2η) <; n0.8kpoly(1/1-2η). This is the first algorithm with no depenJence on η in the exponent of n, aside from the trivial brute-force algorithm. Learning k-Juntas wi- h Noise: Given uniformly random length n Boolean vectors, together with a label, which is some function of just k <;<; n of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al.  our result for learning k-parities implies an algorithm for this problem with runtime n ω+ϵ/3 k poly(1/1-2η) <; n0.8k poly(1/1-2η), 2 which improves on the previous best of >; nk(1-2/2k)poly( 1/1-2η ), from . Learning k-Juntas without Noise:1 Our results for learning sparse parities with noise imply an algorithm for learning juntas without noise with runtime n ω+ϵ/4k poly(n) <; n0.6 kpoly(n), which improves on the runtime n ω+1/ω poly(n) ≈ n0.7k poly(n) of Mossel n et al. .