Skip to Main Content
Genotype data provide crucial information to understand effects of genetic variation to human health. Current microarray technologies are able to generate raw genotype data from thousands of samples across million of SNP sites. These raw data are processed by computational methods, called genotype caller, to obtain genotypes. Genotype calls of different callers might not be consistent due to noise of bad samples or SNPs. This requires a manual quality control step conducted by experts to remove bad samples or bad SNP sites. In this paper, we propose a maximum likelihood method to detect bad samples to improve the reliability of the results. Experiments with real data demonstrate the usefulness of our method in the quality control process. Thus, our method has the ability to reduce the number of samples that are requested to manually check by experts.