求一段关于可靠性分析的英文材料

Reliability Analysis

Measures of Reliability

Reliability: the fact that a scale should consistently reflect the construct it is measuring.

One way to think of reliability is that other things being equal, a person should get the same

score on a questionnaire if they complete it at two different points in time (test-retest

reliability. Another way to look at reliability is to say that two people who are the same in

terms of the construct being measured, should get the same score. In statistical terms, the

usual way to look at reliability is based on the idea that individual items (or sets of items)

should produce results consistent with the overall questionnaire.

The simplest way to do this is in practice is to use split half reliability. This method randomly

splits the data set into two. A score for each participant is then calculated based on each half

of the scale. If a scale is very reliable a person’s score on one half of the scale should be the

same (or similar) to their score on the other half: therefore, across several participants scores

from the two halves of the questionnaire should correlate perfectly (well, very highly). The

correlation between the two halves is the statistic computed in the split half method, with large

correlations being a sign of reliability. The problem with this method is that there are several

ways in which a set of data can be split into two and so the results could be a product of the

way in which the data were split. To overcome this problem, Cronbach (1951) came up with a

measure that is loosely equivalent to splitting data in two in every possible way and computing

the correlation coefficient for each split. The average of these values is equivalent to

Cronbach’s alpha, α, which is the most common measure of scale reliability (This is a

convenient way to think of Cronbach’s alpha but see Field, 2005, for a more technically correct

explanation).

There are two versions of alpha: the normal and the standardized versions. The normal alpha

is appropriate when items on a scale are summed to produce a single score for that scale (the

standardized α is not appropriate in these cases). The standardized alpha is useful though

when items on a scale are standardized before being summed.

Interpreting Cronbach’s α (some cautionary tales …)

You’ll often see in books, journal articles, or be told by people that a value of 0.7-0.8 is an

acceptable value for Cronbach’s alpha; values substantially lower indicate an unreliable scale.

Kline (1999) notes that although the generally accepted value of 0.8 is appropriate for

cognitive tests such as intelligence tests, for ability tests a cut-off point of 0.7 if more suitable.

He goes onto say that when dealing with psychological constructs values below even 0.7 can,

realistically, be expected because of the diversity of the constructs being measured.

However, Cortina (1993) notes that such general guidelines need to be used with caution

because the value of alpha depends on the number of items on the scale (see Field, 2005 for

details).

Alpha is also affected by reverse scored items. For example, in our SAQ from last week we had

one item (question 3) that was phrased the opposite way around to all other items. The item

was ‘standard deviations excite me’. Compare this to any other item and you’ll see it requires

the opposite response. For example, item 1 is ‘statistics make me cry’. Now, if you don’t like

statistics then you’ll strongly agree with this statement and so will get a score of 5 on our

scale. For item 3, if you hate statistics then standard deviations are unlikely to excite you so

you’ll strongly disagree and get a score of 1 on the scale. These reverse phrased items are

important for reducing response bias) participants will actually have to read the items in case

they are phrased the other way around. In reliability analysis these reverse scored items make

a difference: in the extreme they can lead to a negative Cronbach’s alpha! (see Field, 2005 for

more detail).

Therefore, if you have reverse phrased items then you have to also reverse the way in which

they’re scored before you conduct reliability analysis. This is quite easy. To take our SAQ data,

we have one item which is currently scored as 1 = strongly disagree, 2 = disagree, 3 =

neither, 4 = agree, and 5 = strongly agree. This is fine for items phrased in such a way that

agreement indicates statistics anxiety, but for item 3 (standard deviations excite me),

disagreement indicates statistics anxiety. To reflect this numerically, we need to reverse the

scale such that 1 = strongly agree, 2 = agree, 3 = neither, 4 = disagree, and 5 = strongly

disagree. This way, an anxious person still gets 5 on this item (because they’d strongly

disagree with it).

To reverse the scoring find the maximum value of your response scale (in this case 5) and add

one to it (so you get 6 in this case). Then for each person, you take this value and subtract

from it the score they actually got. Therefore, someone who scored 5 originally now scores 6–5

= 1, and someone who scored 1 originally now gets 6–1 = 5. Someone in the middle of the

scale with a score of 3, will still get 6–3 = 3! Obviously it would take a long time to do this for

each person, but we can get SPSS to do it for us by using Transform?Compute… (see your

handout on Exploring data).