Jump to content


Question for statisticians or others well versed in statistics

Recommended Posts

My son is in 6th grade at a private school and is doing a final project in his science class where he is supposed to collect data and then analyze it and present it. Part of the analysis is to perform a t-test to determine whether there is a significant difference between data obtained under one condition vs another.


(As an aside, I think it is idiotic to have 6th graders doing a t-test, but it is what it is.)


My understanding of t-tests is that both sets of data need to be normally distributed for the t-test to be valid.


Here are the data:


Condition 1: 0, 3, 3, 4, 4, 4, 5, 5, 7, 8


Condition 2: 0, 0, 1, 3, 3, 4, 5, 5, 6, 8


My feeling is that the data for condition 1 may be normally distributed and the data for condition 2 may not be normally distributed. Would you agree? Is there a way to know for sure?


If the data for either condition are not normally distributed, does that mean that a t-test is not a valid statistical test to compare these particular sets of data?



Link to comment
Share on other sites



Certain types of data can be assumed to be normally distributed, if it was a type of data that generally produces a normal bell curve than it is most likely normal, small sample sizes make it hard to tell if the data are truly normally distributed or not. It has been a while and before children since I have worked with stats, though...


I will say that a that agree with you, even uninformed college students should not be randomly running t-tests. My husband was doing an experiment for one of his college classes and couldn't figure out while his data were not coming up with a significant difference when they should have, he was running the standard t-test when his data was more appropriate for a z-test. He has since studied stats and now knows more about stats than me, he took his stats classes more recently.

Link to comment
Share on other sites

If the data for either condition are not normally distributed, does that mean that a t-test is not a valid statistical test to compare these particular sets of data?


I'll let Ruth (lewelma) answer in terms of high school statistics since my stats knowledge is very rusty.

However for a science fair project, your son's sample size of 10 can apparently still run a t-test without testing if it is normally distributed.


Page 2, b iii http://cusef.byu.edu/documents/teachers/CUSEF_statistics_for_science_fair_students.pdf

Single page http://www.scitechfestival.org/pdf/statisticsdocument.pdf

Single page http://sciencefair.math.iit.edu/analysis/ttest/

Link to comment
Share on other sites

I would consider the goal of this assignment. He is in 6th grade -- I assume he has not taken much in the way of statistics. So you goal is to discuss how a statistical test works. How do you compare the locations of 2 populations when you have sampled? What exactly do the probabilities mean? How can you accept or reject your hypothesis? etc.


Given those would be my goals, I would skip any formal test of normality and just have him graph the 2 data sets. He should be able to recognize that the 2nd one is not normal (you can definitely see it with a graph), and tell him that technically the conditions for the test are not met. But because he is supposed to do a t-test, I would have him just do the test anyway, and have him mention in the discussion that the test will be less reliable because the data are not normal.


If he really wants to do it right, he should do a Mann-Whitney test comparing medians. This test can be easily done by hand and does not require the data to be normal.




Ruth in NZ

Link to comment
Share on other sites

If I remeber right, t- test has a little history. It was deveoped by beer company (guinness) to test compare the beer. I always thought that part was interesting..

Anyhow, I do not even thing there is enough data point to make the claim that is normal distribued or not. If I remeber right, the rule of thumb was you need at least 30 sample point. I will just treated as normal distribution and cal the mean and signma as it is.


To test the normality, in industry, we normally pick Anderson-darling. But again, I don't think you have enough samples). It essentally ( I think) they assume a normality and calcualte the difference. Anyhow, agree with Ruth, I won't bother to go that far. just assume it is normal distributed since you can't say it is not based on the sample size .

Link to comment
Share on other sites

I agree with the posts above. Data should always be graphed and eyeballed. You can do a very simple test to determine whether or not the data is normally distributed: calculate both the mean and median. If they are different, the distribution is skewed, ie not normally distributed and a t-test should not be performed, but the Mann-Whitney U test used instead. Skewed data are common with small sample sizes. And yes, this is well beyond (and thus rather silly) for 6th grade. I'm not sure that it teaches much beyond bad habits.


Link to comment
Share on other sites

If it is required, it certainly won't hurt to do it. I studied statistics in a social science department. The classes I took were taken by folks getting their PhD's in social science; I had an amazing professor who was both on the cutting edge of his field and was a great teacher even of struggling grad students who didn't really want to take statistics.


I remember little of the actual math we studied, but this guy's lectures on how to use statistics have stuck with me.


I remember being astounded that my dad who taught microbiology and did research at a med school only used t-tests while we had a million other tests, but I learned that for most basic science t-tests were THE test at the time (some of this may have had to do with the inability to even calculate other tests).


At any rate one of the areas we covered was how tests responded when the samples weren't big enough or some of the other parameters were violated. What we learned is that most statistical tests like t-tests were surprisingly robust.


He should not be surprised to find that he didn't find anything out because his test was not statistically significant. If anything he can learn now why folks "massage" their data.

Link to comment
Share on other sites

Is this data over time, or did you sort it from smallest to largest already? If you are measuring the same thing and getting 0 sometimes and 8 at other times, then the variation within the group is going to be very high, and the results of comparing tne two groups is not going to be statistically significant. The t-test should tell you that.


On the other hand, if you are doing a measurement over time, or there is some other continuous process going on, then a t-test is probably not the right statistical test. I'd want to know more about your experiment.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Create New...