H809 – Validity and reliability (A11.8 + 11.9)
My two good ‘old friends’ validity and reliability 😉 And I thought I will never hear from them again after being confronted first with them 2007 when doing U205 health and disease. We had a whole book on hypothesis, golden standards, empirical testing, p-values, confidence levels, confounding factors and how best to adjust to them to safeguard a greater validity of the results. Yet, it seems I cannot get rid of both of them 😉
However, what is validity?
In U205 I learned something about content and criterion validity.
- Indicator is valid when it corresponds closely with concept – concept-indicator links
- Is the content adequately measured by the indicator – content validity
- Second measure is the criterion for judging the first – criterion validity
U205 gave the following example to explain criterion validity.
“An example would be where a researcher has constructed a series of questions designed to produce a score that measures depression. The researcher would apply the questions to a group of people, and compare the results of this exercise with ratings of depression made by clinical psychologists in interviews with the same people. If the two ways of measuring depression gave the same results, the questionnaire would be said to have criterion validity.”
Applying that to Ardalan et al.’s (2007) reading one could argue that the second part of the survey, the open-ended question might serve as a second way to measure faculty teaching, but just in case the students address the same issues as asked in the quantitative first part of the questionnaire.
“U205 states that there are other ways of assessing validity that are beyond the scope of this book. It is important always to be aware that validity is a potential problem in measurement, and thus to assess whether researchers have done enough to show their measures are valid tests of the concept under investigation.”
Contrary, the H809 course material refers to Campbell and Stanley (1963) which subdivide validity in e.g. ‘external and internal validity’ and ‘construct validity’. They describe validity as the degree to which a study supports its conclusion. That is all pretty confusing, and does not really help to get a grasp of what is meant by validity. Additional, the term can be used in very different ways depending on the discourse. Surveys, experiments, ethnography, action research, grounded theory, and ideologically committed research show particular differences in the use of validity, highlighting thus that validity is a contested concept.
Wikipedia provides a definition about validity that is far more understandable.
“In science and statistics, validity has no single agreed definition but generally refers to the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world. The word “valid” is derived from the Latin validus, meaning strong. Validity of a measurement tool (i.e. test in education) is considered to be the degree to which the tool measures what it claims to measure.”
Validity is often assessed along with reliability – the extent to which a measurement gives consistent results. Reliability means that the same result must be obtained if the measure is repeated again and again. Yet, reliability does not imply validity. That is, a reliable measure is measuring something consistently, but you may not be measuring what you want to be measuring (Wikipedia, 2011).
U205 provided a good image to show the differences.
It is important to keep in mind that validity is a contested concept. This awareness will help not only to better understand published critiques of research methods but also to ask critical questions of research studies myself.
H807 provides a list of question that could be asked:
- To what extent does the study demonstrate that its findings generalise to other participants, places or times?
- To what extent are causal relationships, rather than just correlations, demonstrated?
- Are the instruments used in the study actually measuring what the researchers claim they measure?
- How strong is the evidence for the claims?
- Are alternative explanations possible?
- How could claims be tested more strongly?