Skip to Main Content
This paper proposes an algorithm based on rough set theory for missing data estimation. This paper also applies a rough set technique for missing data estimation to a large and real database for the first time. It is envisaged in this work that in large databases, it is more likely that the missing values could be correlated to some other variables observed somewhere in the same data. Instead of approximating missing data, it might be cheaper to identify indiscernibility relations between the observed data instances and those that contain missing attributes. Results obtained using the HIV database are acceptable with accuracies ranging from 74.7% to 100%. One drawback of this method is that it makes no extrapolation or interpolation and as a result, can only be used if the missing case is similar or related to another case with more observations.