One-Sample t-test


1. Definition of a One-Sample t-test
2. Test Applications
3. Critical Assumptions
4. Test strengths and weaknesses
5. Statistical formula
6. Examples
7. Additional resources

1. Definition

What is it?

Because the t procedures for inference about means are among the commonly and widely used statistical methods, a solid understanding of the t-test is critical for every budding researcher. A one sample t-test allows the researcher to compare a sample mean to a known value, usually the population mean, in order to determine the probability of the sample mean truly being characteristic of the population or being a misrepresentation of the population. The basic idea of the test is to compare the average of the sample and the population, with an adjustment for the number of cases in the sample and the standard deviation of the average.

In other words, the test helps you to determine: "Does your group come from a different population than the one you're trying to study and make inferences about?"

If you calculate a sample mean and it is different from the one hypothesized, there are two potential reasons for the difference:

  • Your sample comes from a different population.
  • The group comes from the same population, but the mean varies by chance.

The density curves of t distributions are similar in shape to the standard normal curve, although the spread is slighter larger than a standard normal distribution. T-distributions are also symmetric around 0 and are bell-shaped.

Here are a few more names that could be equated with the one sample t-test: sample t-test, student’s t, student’s t-test, single sample t-test

2. Test Applications

When do I use it?

We use the t-statistic when σ (population standard deviation) is unknown and/or the sample size is small (n < 30). In these cases, a z test should no longer be used.
  • Because of its wide applicability, some programs, including SPSS, exclusively use t (and no longer z) to determine significance.


3. Critical Assumptions

What are the rules for using it and not using it?

It's pretty straightforward. Calculating the one sample t-test requires four basic components:
  1. The mean of the sample
  2. The mean of the population or other known value
  3. The standard deviation of the sample.
  4. The number of the sample

But besides having those key pieces of information from your data, the distribution should be normal (which can be checked with a Q-Q plot) and the sample should be drawn randomly. And always remember that the sample must be independent. If you have outliers, skewness, or nonnormality in your data, using a nonparametric test or employing a transformation may result in a more accurate and powerful test.

You should also know that there is a different t distribution for each sample size, therefore a t distribution is specified by giving the degrees of freedom (df). We use t(k) to stand for the t distribution with k degrees of freedom.

4. Test strengths and weaknesses

What can and can't the test do for me?

Thanks to W.S. Gosset, a statistician who worked for the Guinness brewing company in the early 1900's, we can assess a hypotheses even when σ is unknown and our sample size is small through his development of the "student's t" or "student's t distribution". Gosset's employer did not allow him to publish discoveries under his own name, so he wrote under the pen name "Student", hence our present-day student t distribution. With his formula, we have less potential of Type I error in situations where σ is unknown and our sample size is small. This is especially helpful in situations where we cannot sample a large number of the population for ethical or practical reasons.

One potential problem with the t-test is that the researcher can determine the level of significance and thus can retain or reject the null hypothesis based upon their decision. Though not appropriate, a researcher can potentially manipulate the confidence interval in order to obtain the desired outcome.

Another issue with the t-test is that the results are exactly correct only with normal populations. But real populations are never exactly normal! T-procedures are fortunately quite robust against nonnormality in the population except in cases of outliers or significant skewness. Larger samples greatly improve the accuracy of the critical values in t distributions when the population is not normal. If you remember the Central Limit Theorem, the sampling distribution of the sample mean from a large sample is close to normal. Also, as the sample sizes grows, the sample standard deviation approaches the population standard deviation. Therefore, a few practical guidelines help us here:

1) When the sample size is less than 15: use t procedures only if the data are close to normal.
2) When the sample size is at least 15: use t procedures except in the presence of outliers or strong skewness
3) When the sample size is large: the t procedures can be used even for clearly skewed distributions (preferably a sample size of 30 or above)

5. Statistical formula

How do I actually do the test?

1. First establish the hypothesis:

Ho : Null Hypothesis - There is no significant difference between the sample mean and the population mean
Ha
: Alternative Hypothesis - There is a significant difference between the sample mean and the population mean

Three different hypotheses can be looked at:

Case 1: Ho: µ ≤ µo vs. Ha: µ > µo (right-tailed test) - (i.e. this could look at potential improvement or increase)

Case 2: Ho: µ ≥ µo vs. Ha: µ < µo (left -tailed test) - (i.e. this could look at potential decrease)
Case 3: Ho: µ = µo vs. Ha: µ ≠ µo (two-tailed test) - (i.e. this could look at whether something is exactly the same or whether there is a difference)


*You must also establish the desired level of significance that you are interested in comparing, (i.e. .05 or .01)

2. With the four pieces of information noted in #3, we can calculate t:

t = frac{overline{x} - mu_0}{s / sqrt{n}},
t = frac{overline{x} - mu_0}{s / sqrt{n}},
df = n-1 (degrees of freedom)

3. Once you have calculated the t-statistic, you can then determine whether the t-statistic reaches the threshold of statistical signficance by comparing the calculated t value with a standard table of t values based upon the level of significance that you identified in step 1 that you were interested in comparing.

4. You can then use the calculated t and the df (degrees of freedom) to calculate a p value and compare to your desired level of significance. Generally, a p value of .05 or less is needed to reject the null hypothesis.

*Keep in mind, big t and small p = significance!


6. Example

How is the test used in real-life examples?

Let's say that we hypothesize that MSU Social Work doctoral graduates score higher on licensing exams than graduates of social work doctoral programs across the country. From our sample of 25 MSU students, we calculate a mean on the licensing exam of 105 and a standard deviation of 9 . The mean of the population of national social work graduates is 95. From this information we can calculate our null and alternative hypotheses:
Ho: µ ≤ µo vs. Ha: µ > µo The mean licensing scores of MSU doctoral social work grads is similar to the mean licensing scores of the population of doctoral social work grads in the nation
Ha: µ > µo The mean licensing scores of MSU doctoral grads is higher than the mean scores of the population of doctoral social work grads in the nation


We'd like to look at at .05 level of significance.


t = frac{overline{x} - mu_0}{s / sqrt{n}},
t = frac{overline{x} - mu_0}{s / sqrt{n}},
df = n-1 (degrees of freedom)
t=  (x ̅- μ_o)/(s/√n)= (105- 95)/(9/√25)= 5.555
You can use an online calculator as well:
http://www.graphpad.com/quickcalcs/OneSampleT2.cfm

One sample t test

A one sample t test compares the mean with a hypothetical value. In most cases, the hypothetical value comes from theory. For example, if you express your data as 'percent of control', you can test whether the average differs significantly from 100. The hypothetical value can also come from previous data. For example, compare whether the mean systolic blood pressure differs from 135, a value determined in a previous study.

1. Choose data entry format

Enter up to 50 rows.
Enter or paste up to 10000 rows.
Enter mean, SEM and N.
Enter mean, SD and N.
Caution: Changing format will erase your data.

3. Specify the hypothetical mean value

0
1
100

2. Enter data

Mean:

SD:

N:


4. View the results

One sample t test results

P value and statistical significance:
The two-tailed P value is less than 0.0001
By conventional criteria, this difference is considered to be extremely statistically significant.

Confidence interval:
The hypothetical mean is 95.00
The actual mean is 105.00
The difference between these two values is 10.00
The 95% confidence interval of this difference:
From 6.28 to 13.72

Intermediate values used in calculations:
t = 5.5556
df = 24
standard error of difference = 1.800
As you can see from the results we have a big t (5.555) and a small p ( <.0001 X 2 ) which = significance!
Therefore we reject the null hypothesis and indicate that our data shows that the mean MSU social work licensing scores are indeed higher than the mean population social work licensing scores.

You can also use the calculator we looked at in class to find p for the example above at:
http://www.stat.tamu.edu/~west/applets/tdemo.html

Here is another example to think about:
  • You hear that the average person exercises 3 times a week. You think that middle school students exercise less frequently. You ask 10 middle school students how often they exercise each week. Here's what they say (2,1,0,2,3,5,1,2,4,2)
  • Your data shows the average amount of exercise to be 2.2 times per week.
  • Did you happen to pick a group of inactive middle schoolers by chance? Or do middle school students actually exercise less than the average person?
  • You compare the sample mean with the population mean by using a one sample t-test to get an estimate of the probability that the sample mean is different by chance.
  • You calculate the standard deviation 1.476
  • You can then calculate t = -1.714 and p= .1207 using the formula above or a website/chart
  • http://www.graphpad.com/quickcalcs/OneSampleT2.cfm
  • One sample t test results
  • || P value and statistical significance:
    The two-tailed P value equals 0.1207
    By conventional criteria, this difference is considered to be not statistically significant.

    Confidence interval:
    The hypothetical mean is 3.00000
    The actual mean is 2.20000
    The difference between these two values is -0.80000
    The 95% confidence interval of this difference:
    From -1.85587 to 0.25587

    Intermediate values used in calculations:
    t = 1.7140
    df = 9
    standard error of difference = 0.467

    Learn more:
GraphPad's web site includes portions of the manual for GraphPad Prism that can help you learn statistics. First, review the meaning of P values and confidence intervals. Next check whether you chose an appropriate test. Then learn how to interpret results from an one sample //t// test (includes GraphPad's popular analysis checklist).

  • Review your data: ||
Mean
2.20000
  • || SD || 1.47600 ||
  • || SEM || 0.46675 ||
  • || N || 10 ||
  • Your results are not statistically significant and you continue to retain the notion that the average person (middle schoolers included exercise 3 times a week as you do not have enough evidence/data to indicate otherwise.

Here is another really good example online looking at prenatal care and birthweight!
http://ccnmtl.columbia.edu/projects/qmss/the_ttest/onesample_ttest.html

7. Resources

Where do I look if I'm still confused?

Here are a couple of useful links for calculating t and p:

http://www.graphpad.com/quickcalcs/OneSampleT2.cfm (gives you standard error, t, p, and 95% CI of the difference)
http://www.danielsoper.com/statcalc/calc08.aspx (gives you one and two tailed results)
http://www.stat.tamu.edu/~west/applets/tdemo.html (gives you t or p and a graphic display)

And here are a couple of helpful websites:

http://www.statisticssolutions.com/one-sample-t-test
http://www.quality-control-plan.com/StatGuide/ttest_one_ass_viol.htm


You can also calculate a One sample t-test in SPSS by following the steps below:

Analysis > Compare mean > One sample t-test > select the dependent variable and put into the variable box > insert population mean value into the test value box
Click on option > enter the desired percentage of confidence interval > click OK.
This will give you a table with t, df, significance (*two-tailed), & CI difference.

References:


http://www.wadsworth.com/psychology_d/special_features/ext/workshops/t_testsample1
http://www.statisticssolutions.com/one-sample-t-test
http://ccnmtl.columbia.edu/projects/qmss/the_ttest/onesample_ttest.html