Scale Construction: What Questions Should We Use?
The first step in constructing any test or scale is getting clear on what you’re measuring.
Swann and colleagues defined blirtatiousness as the tendency to respond to friends and partners quickly and effusively. Effusive means openly expressive—showing emotion strongly and with little restraint. Notice that this definition focuses on behavior more than inner feelings. That’s intentional: in real relationships, what affects others most is what we say and do, even if our private intentions are complicated. That’s what the BLIRT scale aims to measure.
Once you have a clear definition, the next step is deciding how people will respond.
Open-ended questions (e.g., “How open-minded are you? ____”) usually aren’t ideal for personality scales because they’re hard to score consistently. Instead, most inventories use forced-choice formats, where people choose from set options.
Two common forced-choice formats are:
-
Either/or choices (like these questions from the Narcissistic Personality Inventory):

-
Likert scales, where a person rates agreement with a statement on a numbered scale (often 5 or 7 points) Side note: Likert is often pronounced “LICK-ert,” though you may also hear “LIKE-ert.” The creator pronounced it “LICK-ert,” though many people say “LIKE-ert.” Either is common.

Figure 2. Morris Rosenberg’s questions on the self-esteem inventory utilize the Likert scale.
Dr. Swann and his team chose a 7-point Likert format to measure blirtatiousness. To do this, they needed to write clear, simple statements that people could agree or disagree with, where different levels of agreement were possible.
Selecting Strong Items
We aren’t going to ask you to write any questions, but you can imagine that you have joined the test-development team by looking at the eight statements below. Choose four that you think would be the best items to include in the BLIRT scale.
When they were developing the scale, Dr. Swann and his team wrote dozens of questions and then pared them down to 20. Then they got 237 undergraduates to rate the 20 questions for how well they fit the qualities that the BLIRT scale was trying to measure.[1]
Reverse-Worded Items and Reverse Scoring
Questionnaire writers have strategies to encourage people to read the statements carefully. For example, they often write “reverse scoring” items. To show what this means, just below is the 7-point Likert scale used with the Blirtatiousness questionnaire. Below that, you will see two statements. Look at how the statements and the Likert scale fit together.

- I speak my mind as soon as a thought enters my head.
- For this question, 1 means not blirtatious and 7 means very blirtatious.
- I don’t speak my mind as soon as a thought enters my head.
- For this question, 1 means very blirtatious and 7 means not blirtatious.
Dr. Swann and his team chose 8 items for the BLIRT scale and half were worded so that higher numbers mean more blirtatious, and half so that high numbers mean less blirtatious. After the test, a process called “reverse scoring” put all the questions back on the same scale, so that higher numbers mean more blirtatious.[2]
Measuring Personality
Before you go on, now is a good time to measure your blirtatiousness. Follow the link below to find out if you are a blirter or a brooder.
Checking the Test
At this point in the test-creation process, Dr. Swann and his team settled on eight statements that seemed to measure BLIRT. They were ready to administer the test, but before they could praise the test and its effectiveness, they needed to be sure of a few things: the questions need to work together as a set, the test must be reliable, and the test must be valid.
- The questions must work together as a set. In other words, we want to be sure that the 8 items are all giving us responses about the same quality (blirtatiousness) and that the responses people are giving are consistent with one another.
- You might think that a single question would be enough to measure blirtatiousness. Why ask 8 questions when one would do? But research has shown that asking variations on the same question 8 or 10 different times gives a more stable measure. The questions must be slightly different (enough to make people think carefully), but not too different (so they don’t measure different things).
- The researchers administered the BLIRT to 1,137 students and used statistical procedures[3] to be sure that the 8 items in the scale worked together. The results indicated that the 8 items on the scale were consistent with each other in measuring the same psychological quality.
- The test must be reliable. The word “reliability” means “consistent.” We should be able to give you a test of some quality (e.g., how extraverted you are) and then give you that same test again two months later, and your scores should be pretty similar. This is important for what is called “stable traits.” Obviously, some psychological qualities, like moods, change all the time and we would not expect consistency. But, blirtatiousness should be a stable trait.
- One common way to measure reliability of a test is a process called “test-retest reliability.” It is as simple as it sounds: you give the test, wait some period of time, and give it again to the same people.
- The test must be valid. Believe it or not, after all this work, we still don’t know if the BLIRT scale is VALID. Validity is a question of whether or not we are measuring the thing we are trying to measure. Reliability doesn’t tell us if a scale is valid; reliability simply means that we get consistent answers. So how can we figure out if our test is valid or not?
- There is no one way to determine the validity of a scale. Test developers like Dr. Swann usually gather multiple types of validity evidence, such as:
-
Convergent validity: BLIRT scores relate to similar traits or measures
-
Discriminant validity: BLIRT scores do not strongly relate to unrelated traits
-
Criterion validity: BLIRT scores relate to real-world outcomes
-
Predictive validity: BLIRT scores predict behavior in relevant situations
-
- There is no one way to determine the validity of a scale. Test developers like Dr. Swann usually gather multiple types of validity evidence, such as:
- Note: Notice that the four items from the BLIRT are about what you DO. They aren’t about your beliefs (option 1), how you think other people see you (option 3), opinions about yourself (option 4), or what you think about other people (option 6). ↵
- Reverse scoring is simple: 7 becomes 1, 6 becomes 2, 5 becomes 3, 4 stays 4, 3 becomes 5, 2 becomes 6, and 1 becomes 7. Only the 4 items with the reverse wording are rescored this way. The goal is to make it so that higher numbers mean more blirtatious for all the items. ↵
- Cronbach’s alpha and Factor Analysis ↵