求一段关于可靠性分析的英文材料
Reliability Analysis
Measures of Reliability
Reliability: the fact that a scale should consistently reflect the construct it is measuring.
One way to think of reliability is that other things being equal, a person should get the same
score on a questionnaire if they complete it at two different points in time (test-retest
reliability. Another way to look at reliability is to say that two people who are the same in
terms of the construct being measured, should get the same score. In statistical terms, the
usual way to look at reliability is based on the idea that individual items (or sets of items)
should produce results consistent with the overall questionnaire.
The simplest way to do this is in practice is to use split half reliability. This method randomly
splits the data set into two. A score for each participant is then calculated based on each half
of the scale. If a scale is very reliable a person’s score on one half of the scale should be the
same (or similar) to their score on the other half: therefore, across several participants scores
from the two halves of the questionnaire should correlate perfectly (well, very highly). The
correlation between the two halves is the statistic computed in the split half method, with large
correlations being a sign of reliability. The problem with this method is that there are several
ways in which a set of data can be split into two and so the results could be a product of the
way in which the data were split. To overcome this problem, Cronbach (1951) came up with a
measure that is loosely equivalent to splitting data in two in every possible way and computing
the correlation coefficient for each split. The average of these values is equivalent to
Cronbach’s alpha, α, which is the most common measure of scale reliability (This is a
convenient way to think of Cronbach’s alpha but see Field, 2005, for a more technically correct
explanation).
There are two versions of alpha: the normal and the standardized versions. The normal alpha
is appropriate when items on a scale are summed to produce a single score for that scale (the
standardized α is not appropriate in these cases). The standardized alpha is useful though
when items on a scale are standardized before being summed.
Interpreting Cronbach’s α (some cautionary tales …)
You’ll often see in books, journal articles, or be told by people that a value of 0.7-0.8 is an
acceptable value for Cronbach’s alpha; values substantially lower indicate an unreliable scale.
Kline (1999) notes that although the generally accepted value of 0.8 is appropriate for
cognitive tests such as intelligence tests, for ability tests a cut-off point of 0.7 if more suitable.
He goes onto say that when dealing with psychological constructs values below even 0.7 can,
realistically, be expected because of the diversity of the constructs being measured.
However, Cortina (1993) notes that such general guidelines need to be used with caution
because the value of alpha depends on the number of items on the scale (see Field, 2005 for
details).
Alpha is also affected by reverse scored items. For example, in our SAQ from last week we had
one item (question 3) that was phrased the opposite way around to all other items. The item
was ‘standard deviations excite me’. Compare this to any other item and you’ll see it requires
the opposite response. For example, item 1 is ‘statistics make me cry’. Now, if you don’t like
statistics then you’ll strongly agree with this statement and so will get a score of 5 on our
scale. For item 3, if you hate statistics then standard deviations are unlikely to excite you so
you’ll strongly disagree and get a score of 1 on the scale. These reverse phrased items are
important for reducing response bias) participants will actually have to read the items in case
they are phrased the other way around. In reliability analysis these reverse scored items make
a difference: in the extreme they can lead to a negative Cronbach’s alpha! (see Field, 2005 for
more detail).
Therefore, if you have reverse phrased items then you have to also reverse the way in which
they’re scored before you conduct reliability analysis. This is quite easy. To take our SAQ data,
we have one item which is currently scored as 1 = strongly disagree, 2 = disagree, 3 =
neither, 4 = agree, and 5 = strongly agree. This is fine for items phrased in such a way that
agreement indicates statistics anxiety, but for item 3 (standard deviations excite me),
disagreement indicates statistics anxiety. To reflect this numerically, we need to reverse the
scale such that 1 = strongly agree, 2 = agree, 3 = neither, 4 = disagree, and 5 = strongly
disagree. This way, an anxious person still gets 5 on this item (because they’d strongly
disagree with it).
To reverse the scoring find the maximum value of your response scale (in this case 5) and add
one to it (so you get 6 in this case). Then for each person, you take this value and subtract
from it the score they actually got. Therefore, someone who scored 5 originally now scores 6–5
= 1, and someone who scored 1 originally now gets 6–1 = 5. Someone in the middle of the
scale with a score of 3, will still get 6–3 = 3! Obviously it would take a long time to do this for
each person, but we can get SPSS to do it for us by using Transform?Compute… (see your
handout on Exploring data).