Skip to Main Content
ldquoLearning methodsrdquo play a key role in the fields of statistics, data mining, and artificial intelligence, intersecting with areas of engineering and other disciplines. These methods for analyzing and modeling data come in two flavors: supervised and unsupervised learning. Regression analysis and classification are two well known supervised learning techniques. To get an effective model from regression analysis it is necessary to check and preprocess the data set in astronomy, bio-informatics, image analysis, computer vision etc, especially when the data sets are large and high dimensional. In these industries large or fat data appear with unusual observations (outliers) very naturally. Checking raw data for outliers in regression is regression diagnostics. Most of the popular diagnostic methods are not good enough for large and high dimensional data. The aim of this paper is to provide a new measure for identifying influential observations in linear regression for large high dimensional data.