Understand how validity is measured and why it’s important
There are multiple types of validity:
convergent validity: compare the test results with other personality tests of similar traits (convergent validity)
discriminant validity: compare the test results with other dissimilar tests (discriminant validity)
criterion validity: compare the results of the BLIRT test to real-world outcomes
predictive validity: see if the results work to predict people’s behavior in certain situations
You learned about the challenges in designing a personality test and how to measure that test to ensure that it is valid, meaning that it tests what it is supposed to test. In our example, we wanted to ensure that the BLIRT test actually tested blirtatiousness. Let’s review some of the types of validity here.
The researchers chose to use some generalized cultural norms to test the validity of the blirt test. They hypothesized that due to cultural differences, an American person with an Asian cultural background would be less likely to blirt than someone with a European cultural background. Although we shouldn’t overstate the difference, Asian cultures tend to emphasize restraint of emotional expression, while European cultures are more likely to encourage direct and rapid expression.The researchers were able to get BLIRT scores from 2,800 students from European-American cultures and 698 students from Asian-American cultures. Use the bar graph below to adjust the bars based on their prediction about who will be more blirtatious. Then click the link below to see if your prediction is correct.
As you can see, the results were consistent with the researchers’ expectations. The difference between the groups was small (just two points), but statistically significant. The small difference indicates that we shouldn’t turn these modest differences into cultural stereotypes, but the statistically significant difference suggests that cultural experiences may have a real—if modest—effect on people’s blirtatiousness.
What type of validity was measured by comparing the test results between European and Asian American students?
This is another example of criterion validity, because the researchers are comparing the results of their test against a real-world outcome. Oftentimes, criterion validity has researchers compare their test to another similar test to compare it against a benchmark or “gold standard” (for example, comparing a new college entrance exam to the SAT). In this instance, there was no comparison for the Blirt test, so researchers compared it to other criteria, such as jobs (salespeople or librarians) and cultural background.
Let’s revisit the example of measuring the validity of the test by asking students to chat with strangers on the phone and decide how likable they are. The researchers hoped to see if those who were high blisters would be perceived as more likable. Make your prediction for each one, and then check out the results.Who was rated as more likable?
high blirter
low blirter
no difference
Who was rated as someone who “I’d like to be friends with?”
high blirter
low blirter
no difference
Who was rated as more intelligent?
high blirter
low blirter
no difference
What type of validity was measured when researchers compared the results of the blirt test to see who was found to be more likable?
This is predictive validity. Recall that one way to assess validity of the BLIRT scale is to see if it predicts people’s behavior in specific situations. In this example, researchers wanted to predict if those who are more blirtatious are also predictably more outgoing and more likely to make better first impressions than those who are not blirtatious.You don’t need to worry much about discriminating between the types of validity here! You could also make the argument in this example that that if the researchers had designed this instead as a test of “outgoingness” or “talkativeness”, it would be an example of convergent validity, when the researcher looks for other traits that are similar to (but not identical to) the trait they are measuring. If the goal weren’t to predict behavior, you could also argue that this is an example of criterion validity, as the researchers were comparing the results of the blirt test to a real-world outcome (in this case, initial likability).
Can you think of another way, not mentioned in the reading, that experimenters could test the validity of the Blirt test? What type of validity would you be testing? What would you expect the results of your validity test to be?