Before answering this question it is worth explaining what outliers are, and how do they occur.

Outliers are data points which differ extremely from the other data. Outliers are caused simply by chance, measuring or sampling error. As to removing outliers the decision is generally left to the researcher. This is because an outlier might be caused by the instrumentation e.g. low battery in a scale, when measuring weight. This will mean that such an outlier is caused by an error and is not a true score that can be easily re-measured to get exactly the same result. However an outlier might also be a real data point, caused by extremely intelligent (or relatively non intelligent) individual. Removing such an outlier would have been dishonest, because it means removing a real data point- manipulating the data. Normal distribution contains extreme scores at both ends of the slope. This means that the extreme scores are not an effect of poor concentration during the study, but they represent the real score for that particular individual (1).

It is very difficult and time consuming to detect outliers, especially when you have a data containing e.g. 80 000 scores from 150 participants. Rousseeuw and Leroy (1996) described ways of detecting outliers (2).

There are different ways of dealing with outliers, other than simply getting rid of them. This might sometimes cause problems, because they might be real scores. What we can do to deal with them is to use robust statistics such as median, instead of mean. Median is not as sensitive to outliers as mean, because it is the point in the middle and an outlier only pushes it slightly by 1 place. Whereas mean takes into account the value of all numbers, therefore a single outlier can strongly affect the data (3).Another way of dealing with outliers is using nonparametric tests (4). The reason for this is that they do not require assumption of the normality or homogeneity of variance, and again use median instead of mean.

When carrying out a research it is very important to get valid results. Therefore accurate data needs to back them up. It is the researcher’s responsibility to judge all the outliers and to decide whether to get rid of them or ‘work around them’. It is not dishonest to remove an outlier as long as a researcher has some evidence to suspect that such an outlier is not a real data point.

Further reading:

(1) http://stattrek.com/help/glossary.aspx?target=normal_distribution

(2) Rousseeuw, P., & Leroy, A. (1996). *Robust Regression and Outlier Detection*. John Wiley & Sons., 3rd edition.

(3) http://www.ltcconline.net/greenl/courses/201/descstat/mean.htm

(4) http://www.une.edu.au/WebStat/unit_materials/c6_common_statistical_tests/nonparametr ic_test.html

(5) http://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide- 2.php

## Recent Comments