2.2 Nature of reliability


What is Realiability?

As you are aware by now, psychological tests are measurement instruments. In this sense, they are no different from yardsticks, speedometers, or thermometers. A psychological test measures how much the test taker has of whatever quality the test measures. For instance, a driving test measures how well the test taker drives a car, and a self-esteem test measures whether the test takers self-esteem is high, low, or average when compared with the self-esteem of similar others.

Three Types of Reliability

Test-Retest Reliability
Alternate forms
Internal Consistency

2.2 Nature of reliability


The Spearman-Brown Formula

The Spearman-Brown formula provides the appropriate adjustment:


Adjusting Split-Half Reliability Estimates

coefficient, should not be adjusted.



2.2 Nature of reliability


Standard Error of Measurement

Interpreting the Standard Error of Measurement


To understand what the SEM means, we must apply it to an individual's test score. Because the SEM acts as a measure of variation, we can assume that if the individual took the test an infinite number of times, the following would result:

Approximately 68% of the observed test scores (X) would occur within ±1 SEM OF the true score (T).
Approximately 95% of the observed test scores (X) would occur within ±2 SEM of the true score (T).
Approximately 99.7% of the observed test scores (X) would occur within ±3 SEM of the true score (T).

The formula for calculating the standard error of measurement is:


Where: SEM represents the standard error of measurement

ó represents the standard deviation of one administration of the test scores rxx represents the reliability coefficient

2.2 Nature of reliability


Factors That Influence Reliability









2.2 Nature of reliability


Reliability of Criterion-Referenced Tests

The structure of criterion-referenced tests is such that the variability of scores among examinees is typically quite minimal. In fact, if test results are used for training purposes and everyone continues training until all test skills are mastered, variability in test scores becomes nonexistent. Under these conditions, traditional approaches to the assessment of reliability are simply inappropriate. With many criterion-referenced tests, results must be almost perfectly accurate to be useful.

How high should the reliability?

There has been some consensus that to be a very accurate measure of individual differences in some characteristic, the reliability should be above .90. The truth is, however, that many standard tests withreliabilities as low as. 70 prove to be very useful. And tests with reliabilities lower than that can beuseful in research.