3

Understanding Reading Assessment

What Is Learner Assessment?

Learner assessment is an ongoing process in which teachers and learners gather and analyze data and use it to make educational decisions. Tests, interviews, questionnaires, and work samples provide information about learners' educational histories, background experiences and knowledge, as well as specifics about reading skills, goals, and interests.

Why Do We Assess Reading Skills of Adult Learners?

In general, we have at least three purposes for assessment:

To identify individual goals, strengths, and needs--for initial planning
To check on learning and spot problems--for ongoing progress monitoring
To assess learning over time--for outcomes measurement

Assessment is especially important in working with adult readers because the learners in any classroom vary greatly in their reading skills. There is no effective, one-size-fits-all program for teaching reading. In order to help each adult to begin at the appropriate level and make progress, teachers must know exactly what needs to be taught and learned.

Learner profiles

Teachers have long observed that classes of adult learners often include a wide range of skill levels and that adults tend also to be more heterogeneous in ages, interests, and experiences compared to groups of children. What we have only recently learned is that this variety has yet another aspect: even adults who earn similar scores on tests of silent reading comprehension may have very different needs and abilities. Research suggests that if you assess learners' skills in the components of reading, you can use these individual profiles to target instruction appropriately (Kruidenier, 2002; Sabatini, 2002; Snow & Strucker, 2000).

The Adult Reading Components Study (Strucker & Davidson, 2003) assessed 955 adult learners in eight states in order to describe the types of readers enrolled in Adult Basic Education programs. Each of the learners was tested in phonological awareness, rapid naming, word recognition, oral reading, spelling, vocabulary, and background knowledge. The researchers also interviewed the adults to learn about their past educational experience and reading habits. They identified ten clusters, or similar reading profiles, among the ABE group and two more clusters of ESOL learners. The study confirms that readers may achieve similar scores on a silent reading comprehension test but still vary greatly in fluency, decoding skills, and vocabulary.²

² For details, see the Assessment Strategies and Reading Profiles website--part of the National Institute for Literacy's website--at www.nifl.gov/readingprofiles/. You can also take a mini-course on reading and match a learner you know with one of the profiles on the site.

This information is important because most teachers in adult classrooms administer only a silent reading comprehension test, most often the reading subtest of the Test of Adult Basic Education (TABE) or the Comprehensive Adult Student Assessment System (CASAS). Of course, in your classroom, you cannot administer all the tests that researchers use, but you can be aware of the need to pay attention to the component skills as you identify tests and other measures for initial, ongoing, and outcomes assessment. (A suggested process for initial assessment is outlined in this chapter and detailed in Chapter 8.)

How Do We Assess Adults' Reading Skills?

Your plan for reading assessment should include measures that address all three purposes and provide useful information for instruction. For example, for some learners you will need initial phonics assessments to identify which letter-sound relationships they know and can use and what they need to learn. Perhaps a learner knows the common short-vowel patterns and remembers that an e at the end of a word usually means the preceding vowel has its "long" sound. But, this adult might need to learn the vowel sounds represented by oo, oy, and au--to name a few--and strategies for decoding multi-syllabic words. Oral word analysis assessments are useful in identifying these needs.

You will also want an early measure of fluency, to see if word identification is slow or if work on phrasing and expression might improve comprehension. An oral, individually administered assessment may involve single-word reading tasks and/or passage reading.

Throughout the teaching-learning process you'll need ways to monitor skill development and identify problem areas on a daily basis, so you can suggest more practice, re-teach, or adjust your instruction. And finally, you'll want to document growth in those skill areas that have been the focus of instruction. Outcomes measurement is reinforcing for you and the learners and is vital for reporting to administrators and funders.

Types of measures

One way to understand the options in choosing achievement measures is to think about three general categories:

Standardized tests
Classroom- or curriculum-based tests
Supplemental/alternative assessments

Standardized tests. Standardized tests are suitable for some of your assessment purposes. In adult education, when standardized tests are mentioned we often think of the TABE or CASAS, but of course there are many others that assess different kinds of skills in various ways. Here's what they all have in common, according to Holt and Van Duzer (2000):

Standardized tests are created according to explicit specifications with test items selected for difficulty and discrimination power³. They are administered and scored following standard procedures, so that variations in scores may be assumed to represent real differences in learners' abilities, not different administrators or testing conditions.

A norm-referenced assessment compares an individual's current achievement to the average performance (norms) of selected participants (the norming group).
A criterion-referenced assessment compares an individual's achievement to an absolute standard or criterion of performance (Holt & Van Duzer, 2000).

³ During standardized test development and validation, test items are evaluated on their ability to discriminate between students of high and low ability. For example, an acceptable item is answered correctly by most students who earn high scores on the test (or some other measure of proficiency) and incorrectly by most low-scoring students. If everyone gets the right answer to an item, it is a poor "discriminator."

Classroom- or curriculum-based tests. This type of test is closely related to instruction. Teacher-made tests and tests in workbooks and computer-assisted instructional programs fall into this category.

Supplemental/alternative measures. Alternative measures include any methods used to find out what a learner knows or can do, that are intended to show growth and inform instruction, and are not standardized or traditional tests. Valdez-Pierce & O'Malley (1992) suggest the following definitions:

Performance-based assessment

is designed specifically to assess performance on one or more instructional tasks,
requires students to accomplish specific skills and competencies, and
is rated on a pre-determined scale of achievement or proficiency.

Portfolio assessment

is a systematic collection of student work that is analyzed to show progress over time, and
may represent progress and achievement in more than one area.

Self-Assessment

Students monitor their own performance and evaluate their progress and accomplishments.
Students select learning tasks and plan use of time to accomplish tasks.

Following are examples of tools for and documentation of supplemental or alternative assessment:

Products of group or individual study, like stories, class newsletters, or project reports
Records of growth in reading rate
Portfolios and other collections of work samples
Journals
Teachers' anecdotal notes of observations
Learner self-report measures, like checklists, interviews, and surveys

Other informal strategies are also useful for day-to-day monitoring and decision making. You should continue using your judgment to assess learners' progress and identify problems. For instance, in addition to checking learners' work, you may use questioning and observations to get a sense of who is doing well with what and who needs more--or a different kind of--instruction. Speed of task completion, enthusiasm and engagement (or lack of same), and frequency of participation in discussions are obvious indicators of confidence or confusion. Although these measures don't produce reportable data, they do provide good information for teachers.

Measures to address different purposes

Many types of instruments and activities may be useful for the first two assessment purposes: identifying strengths and needs and monitoring progress. Interviews, tests, and samples of oral reading are useful for initial assessment and planning. Standardized tests are typically more reliable than less formal measures and may therefore provide more accurate results for developing reader profiles. For classroom- and curriculum-specific learning, you may develop your own tests and performance-based measures that will be sensitive to the content you have taught. These measures provide good information for teachers and learners.

Outcomes measurement, however, is of interest to others as well. Program funders and other external stakeholders are interested in the outcomes of instruction over a period of time, so the data collected for this purpose must "speak to" those who aren't familiar with the learners and don't know many particulars about the instruction. For this reason, outcomes measurement usually includes objective, often standardized instruments.

What Do We Need To Know About Valid Measurement?

Reliability and validity in assessment

You want to be confident that the assessments you use truly reflect your adult learners' abilities. If you don't have a true measure, the decisions you base on the data may not lead to good results. Two important features of assessment practice relate to this need: reliability and validity. You should consider reliability and validity at every point in the process: choosing or developing assessments, administering, scoring, and interpreting results.

Reliability concerns consistency or stability of scores. If scoring is reliable, different administrators evaluating the same test or performance should arrive at similar scores or ratings. An instrument should have clear guidelines for administration and scoring, and teachers should be trained to ensure that they are using consistent procedures. This "inter-rater reliability" is especially important when subjective scoring judgments are required, as in performance-based assessment. If different teachers rate performance differently, how can you know what a given score means?

A reliable instrument is also consistent over time. If a learner takes a test at two different times with no intervening instruction, his scores should be the same or very similar, because one assumes that abilities don't change much without specific intervention. If the scores are different, they may reflect some feature of the instrument, not the individual's skills and knowledge. If the scores on a test vary without instruction, how can you be sure that post-test scores reflect learning that has occurred?

Of course, no measure is 100% reliable. Test developers and measurement experts use statistical methods to assess the types of reliability discussed above and assign ratings--reliability coefficients ranging from 0 (low) to 1.0 (high). These ratings are based on qualities and features of an instrument. But these are not the only factors that contribute to reliability.

Administrators' and teachers' assessment practices make assessment more or less reliable. If teachers all follow the directions and adhere to standard procedures, objective tests should be reliably scored. Alternative assessments that require teacher judgment are another matter. If you use such assessments, be sure you read and follow all the guidelines and take advantage of any training that is available.

Validity refers to the interpretation and use of test scores (American Educational Research Association, 1999). Validity is extremely important because we make decisions on the basis of these scores, and we need to be confident that they accurately represent the abilities--both strengths and weaknesses--that we intend to measure and that our use of the scores for various purposes is appropriate.

Would a teacher examining the score(s) on a particular test make appropriate inferences about students' abilities?
To what extent do the test scores mean what the developer says they mean?
How accurate is the test as a measure of a student's abilities in a particular domain or content area?
What evidence exists to support the use of a score for different purposes (such as placement in a course or program, eligibility or referral for specific services, and identification of instructional needs)?

Questions like these are addressed during the validation process. Formal instruments have usually been subjected to this kind of evaluation and assigned a "validity coefficient." Programs should choose those with validity at acceptable levels. (Reliability is necessary, but not sufficient, to ensure validity.)

You should also be aware of any factors that might affect a score's validity. For instance, an English language learner who takes a math test in this country may be at a disadvantage if he doesn't read English well. If he doesn't know key vocabulary in the problem-solving section, he will not be able to demonstrate his true math abilities. This test is not a valid measure of his math skill because it requires reading as well as math.

Of course, the issue of language proficiency is important in a broader context as well. English language learners may require assessments designed specifically for them if they are not sufficiently proficient in the English language to understand directions and read the test items. If you give the TABE to someone who doesn't understand the language you can't get a valid score.

A similar situation arises when we attempt to measure vocabulary with a written test. Decoding ability is required to read and respond to written test items, so unless we are sure a learner can read the test items accurately, we cannot be sure whether we are actually measuring knowledge of word meanings or merely decoding ability. When a learner's decoding skills are limited, we need to administer an oral vocabulary test to get a valid measure of vocabulary.

A learner's experience with tests and test-taking skills also may affect the validity of a score. If any of the descriptions in the following list apply, the test score may not accurately reflect the learner's skills and abilities.

The adult:

has been out of school for many years,
is anxious about taking a test,
is not familiar with machine-scored answer sheets or some other part of the testing process, and/or
doesn't know a strategy for approaching difficult multiple-choice questions (for instance, eliminate the obvious wrong answers and make a good guess).

The score in such a situation in part reflects features of the test and/or aspects of the learner's knowledge and experience not related to the content being assessed. You may be familiar with the idea that some people are better at taking tests than others. If we want the score to be a fair measure of reading-related skills, we need to take steps to minimize the effect of test-taking skills.

To improve the validity of scores, you should avoid giving tests at enrollment. Instead, take time to prepare the learners:

Explain what the test measures and how it will help you to help them.
Explain that scores will be kept confidential and not released to others (except for reporting purposes).
Reassure them that you know the test will not show everything they know and can do, and that they will have plenty of other opportunities in class to demonstrate their abilities.
Administer the publisher's practice test (if such a thing exists). If not, at least create a few practice items and give the learners a chance to get comfortable with the answer sheet. As they work on this practice, be sure they are recording their answers correctly (so that the answer to item # 5 is marked on the line for item # 5, for instance!).

As suggested above, your understanding of factors affecting validity ensure that you will make reasoned and fair judgments about a learner's performance. In other words, validity (like reliability) isn't determined only by qualities of a test or other measure. Your interpretation and use of the results are also important. For instance, a reading achievement test like the TABE or CASAS is not intended to be used as a screening tool⁴ for learning disabilities.

⁴ The term "screening" is often misunderstood. A screening tool is used to identify those learners who may have a problem (such as a learning disability). The purpose of screening is to identify those who should be referred for further assessment to make a determination about the problem in question. Although a screening process may include such test scores among other data collected, to use a TABE or CASAS test alone as a screening tool is not a valid use. No matter how valid the score may be for its intended purpose, it may be invalid when used in other ways.

Administering the wrong level of a test also results in invalid scores. If the test is too difficult, there are too few items at a low level to identify a learner's strengths and needs. (The testing experience may also be frustrating and discouraging.) If a test is too easy, the learner cannot demonstrate skills at higher levels, and problems that might show up with more difficult tasks are not revealed. Both situations result in scores that do not reflect true abilities. Although it may be less expensive and may seem more efficient to give the same level of test to everyone, it is not best in the long run because you don't get good information about learners. For valid use of a standardized test that has more than one level, it is vital to begin with a placement or locator test to identify the appropriate level of test to administer.

You (or decision-makers in your program) may consult the Mental Measurements Yearbooks⁵ and/or test publishers' manuals to find reliability and validity data on instruments you are considering, so you can be sure to choose instruments with acceptable ratings.

⁵ The Mental Measurements Yearbooks are published by the Buros Institute of Mental Measurements. You may find them in university libraries or online at www.unl.edu/buros/. You also should be thoughtful, careful, and systematic in your use of the tool(s) you choose in order to get as true a measure as possible.

But no matter how good it is and how professionally you use it, every assessment has limitations. No single measure can provide a complete picture of adults' abilities in reading or any of the reading components. A test is just a sample of performance. And of course, you are only human, and your interpretations are not 100% accurate. More than one kind of problem may result in poor test performance. You may not be fully aware of all the factors involved in performance on a test. You may miss something important and under-interpret the results, or, in contrast, you may make a broad generalization that isn't fully supported by the data.

Multiple measures

One solution to these universal limitations is to use multiple measures, so you can look at each individual in different ways at different times. No single test provides a full picture of what a learner can do. You and the learners should know that although you will make instructional decisions based on one or two early measures, these decisions are tentative, and there will be plenty of other opportunities for them to demonstrate their abilities. If a learner is not good at taking tests, or if you are not sure about your interpretation of the scores, it doesn't matter as much if you also consider other classroom performances in assessing reading abilities.

Of course, multiple measures are necessary to assess the reading components, so you will want to think about the types of measures that will meet your planning needs and your other assessment purposes as well. As you consider the components, think about what it will take to get reliable and valid assessment data on the learners in your program.

How Can We Assess the Reading Component Skills?

In the ideal world, we would all have the time and other resources to assess each of the component skills as needed and to use the information acquired to provide individualized instruction. Of course, that isn't the world we live in, but you may still find it helpful to consider some of the options in case you have an opportunity to influence decision making about your program, to acquire special funding, or to create partnerships to access professional resources from your school or other agencies.

A combination of tests and other measures may be used to address your three broad purposes: initial planning, progress monitoring, and outcomes measurement. Each of the next four chapters includes descriptions of the kinds of tests and tasks used to assess the reading components. Chapter 8 looks at all the component assessments as part of an initial assessment system and suggests next steps in developing individual learning plans. Reviewing this general information may help you to better understand the concepts, issues, and options, so you can make decisions that make the most of your resources.

Of course you can't be expected to make all the necessary changes overnight. We encourage a thoughtful, planned, and deliberate approach. If your goal is to be able to administer and properly use component assessments in order to provide more individually appropriate and effective instruction, you should take one step at a time. We hope you will use what you learn in this book to make a start. When you are comfortable with these changes, you can evaluate them, make any necessary adjustments, and then research and investigate the possibilities for expanding your assessment system. With this approach in mind we suggest the simple plan below. Although it does not provide comprehensive assessment, it is a possible first step. Read Chapter 8 for details on the plan and a checklist to document learner assessment.

A start-up plan for initial assessment

The first three steps are intended for all learners. They provide information about each learner's background and abilities, and also act as screening tools to determine who needs further assessment.

Step 1 (For all learners)
Conduct an interview with each learner at enrollment to set individual reading goals and to learn about specific reading difficulties, past educational experiences (including any special reading help), job skills, and other abilities and interests.
Step 2 (For all learners)
Administer a standardized reading comprehension test (you're probably already doing this) to get a measure of silent reading comprehension and establish a baseline for progress and outcomes measurement (including accountability).
Step 3 (For all learners)
Administer a quick measure of fluency. This is for screening purposes, to assess speed only. Ask each learner to read a short passage aloud as rapidly as possible--with accuracy--and count the number of words read in one minute. (The difficulty level of the passage depends on the learner's reading ability. See Chapter 8 .)

Decision point:

Those who score at least 8 GE on the reading test and read at least 125 words per minute may need no further testing right away. You may proceed with planning and teaching.

For those who score below 8 GE or read more slowly than 125 words per minute, you should get more information to identify the specific causes for the comprehension and/or fluency problem.

Step 4 (For those who need further assessment)
Administer a decoding and/or a word-identification test (to assess print skills).
Step 5 (For those who need further assessment)
Administer an oral vocabulary test (to assess meaning skills).

For details on this initial assessment plan, see Chapter 8.

Ensuring confidentiality

All assessment information should be entirely confidential. Of course, you will report scores to funders as required, but names are not attached to this information. Learners should feel comfortable that anything they reveal is used only by those who need to know in order to provide appropriate instruction. You may not reveal any information to others without the written consent of the learner. You should be certain that test scores, interview information, and other assessment data are stored in secure cabinets and are never left open (on desks, for example) for others to see.