Just another WordPress.com site

Archive for October, 2011

Homework for my TA

Dear Kat, Here are links to the blogs, I have left comments on (Weeks 4/5)

http://statsbloggy.wordpress.com/2011/10/14/is-it-dishonest-to-remove-outliers-andor-transform-data/

http://statsblog2011.wordpress.com/2011/10/14/just-because-its-significant-doesnt-mean-its-significant-discuss/

http://skakov87.wordpress.com/2011/10/14/is-it-possible-to-prove-research-hypothesis/

http://roisin07m.wordpress.com/2011/10/12/week-3-wildcard-blog-is-it-dishonest-to-remove-outliers-from-data/

http://psychrno.wordpress.com/2011/10/14/is-it-dishonest-to-remove-outliers/

http://dsm1lp.wordpress.com/2011/10/14/should-the-taxpayer-support-basic-research-or-should-the-research-funding-emphasis-be-placed-on-applied-research-in-which-the-beneficiaries-are-clear/

http://skakov87.wordpress.com/2011/10/14/is-it-possible-to-prove-research-hypothesis/

Is it dishonest to remove outliers from your data?

Before answering this question it is worth explaining what outliers are, and how do they occur.

Outliers are data points which differ extremely from the other data. Outliers are caused simply by chance, measuring or sampling error. As to removing outliers the decision is generally left to the researcher. This is because an outlier might be caused by the instrumentation e.g. low battery in a scale, when measuring weight. This will mean that such an outlier is caused by an error and is not a true score that can be easily re-measured to get exactly the same result. However an outlier might also be a real data point, caused by extremely intelligent (or relatively non intelligent) individual. Removing such an outlier would have been dishonest, because it means removing a real data point- manipulating the data. Normal distribution contains extreme scores at both ends of the slope. This means that the extreme scores are not an effect of poor concentration during the study, but they represent the real score for that particular individual (1).

 

 It is very difficult and time consuming to detect outliers, especially when you have a data containing e.g. 80 000 scores from 150 participants. Rousseeuw and Leroy (1996) described ways of detecting outliers (2). 

 

There are different ways of dealing with outliers, other than simply getting rid of them. This might sometimes cause problems, because they might be real scores. What we can do to deal with them is to use robust statistics such as median, instead of mean. Median is not as sensitive to outliers as mean, because it is the point in the middle and an outlier only pushes it slightly by 1 place. Whereas mean takes into account the value of all numbers, therefore a single outlier can strongly affect the data (3).Another way of dealing with outliers is using nonparametric tests (4). The reason for this is that they do not require assumption of the normality or homogeneity of variance, and again use median instead of mean.

 

When carrying out a research it is very important to get valid results. Therefore accurate data needs to back them up. It is the researcher’s responsibility to judge all the outliers and to decide whether to get rid of them or ‘work around them’. It is not dishonest to remove an outlier as long as a researcher has some evidence to suspect that such an outlier is not a real data point.

 

Further reading:


(1)          http://stattrek.com/help/glossary.aspx?target=normal_distribution


(2)          Rousseeuw, P., &  Leroy, A. (1996). Robust Regression and Outlier Detection. John Wiley &       Sons., 3rd edition.

(3)          http://www.ltcconline.net/greenl/courses/201/descstat/mean.htm

(4)          http://www.une.edu.au/WebStat/unit_materials/c6_common_statistical_tests/nonparametr ic_test.html

(5)         http://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide-              2.php

Dear TA :o)

I have commented the blogs of the following people:

Statsbloggy

Bartlettdan

lon03

roisin07m

roisin07m’s comment on kmusial’s blog

Abbydoesmath

Is it possible to prove research hypothesis?

                Scientific theories cannot be proven. However a hypothesis or a theory is accepted as truth, when it produces the best explanation of what we can observe. We can never prove a research hypothesis true, but by using parsimonious theories in our research we can certainly approach the truth.

According to Karl Popper, falsifiable theory is scientific, and a theory that cannot be falsified is not scientific (1). The principle of Falsifiability was not to determine the true or false, or the acceptability of a theory. Simply saying that something is falsifiable doesn’t mean that it is false. Karl Popper believed that “no theory is completely correct, but if not falsified, it can be accepted as truth.”

Theories are very often strongly backed up with experiments. But no matter how much evidence in form of experiments we have about certain theory we can never prove it. Let’s consider the famous example of black swan problem. One could be reporting white swans for ten years, yet still cannot claim that all swans are white. This is simply because single black or even pink swan will disprove the theory. 

As we carry out new experiments, we gather more information about certain field. In some cases we can prove a hypothesis or a theory false. If this happens the entire field might all of a sudden change direction, very often by 180o. This is what Thomas Kuhn named paradigm shift (2). One of the examples of this is that cancer was first thought to arise from ‘germ cells’(3)(Wicha M.S., 2005), now we know that cancer arises from cell mutation.

So to conclude it is not possible to prove are research hypothesis. Even If a theory is backed up with enormous amount of evidence in form of experiments. That’s because one single experiment can change the entire view on the subject.

 

                 

 

(1)                 http://www.experiment-resources.com/falsifiability.html

(2)                Thomas Kuhn, 1962, The Structure of Scientific Revolutions

(3)                http://cancerres.aacrjournals.org/content/66/4/1883.short

(4)           http://www.youtube.com/watch?v=1_DjHkn0Kp8


     

Do you need statistics to understand your data?

In the field of Science, and Psychology understanding the data is crucial. This is because unlike many companies in media, Scientists value the true results. Whereas the media researchers interpret their data to get the results they want, rather than to get the true, accurate, and unbiased results. According to Mark Suster’s research (1) almost 75 % of all media statistics are manipulated to some extent or even made up. The other downside is that people believe statistics, and there is very little we can do about it. Almost 79% agreed with the statement “Statistics can be trusted to give an accurate description of the facts” (2).This is why we need statistics to understand the data. Otherwise we will find it very difficult to differentiate the ‘fake’ results from the accurate and reliable ones.

 

Unfortunately the data on its own is not always obvious enough, to conclude whether a treatment has an effect or not. This is why researchers need to back themselves up with statistics. Scientists collect data use quantitative research that produces data in form of numbers, and qualitative research which produces non numerical data. They use tools like SPSS to help them decide whether or not there is an effect. They will use software like Excel to plot graphs, histograms, and bar charts that will help them to visualise the data. Before they get any conclusion, they will need statistical knowledge to interpret and analyse their results. In case of something not going as planned, researchers would need to use their statistical knowledge to solve statistical problems (3). Therefore to someone who have never used statistics before, carrying out a research, can be a really hard time.

 

On the other hand people could say that you don’t need statistics to understand the data. They would back themselves up saying that researchers often make predictions about the data, but again are those reliable enough? And how accurate can those predictions possibly get, if made by someone who cannot understand statistics?  When you make a prediction, would you rather be correct or incorrect?” (4). Clearly, guessing is not science.

 

(1)    http://www.businessinsider.com/736-of-all-statistics-are-made-up-2010-2

(2)    http://chamblee54.wordpress.com/2011/08/05/i-personally-believe-statistics/

(3)    http://www.amstat.org/publications/jse/v2n1/garfield.html

(4)    http://www.apa.org/pubs/books/4316000c.pdf

Tag Cloud