Full Discussion - Focus on Tests and Testing -Strategies for Innovation in Community College ESL, February 4 - 8, 2008

Focus on Tests and Testing

The discussion focuses on the pros and cons of different standardized tests including BEST Plus and CASAS, and several tests used by the DLIELC, as well as some discussion on the use of formative and 'homegrown' assessments in the classroom.


Hi, Ted.

I definitely agree that smaller classes are more likely to bring learner gains, especially in language, since students will have more opportunity to speak when there are fewer students. I didn't know about the DLI policy, but it is a good one (see Transcript: "Class Size; DLIELC") The problem with adult ESL classes, I fear, is that there is such limited money available that programs would find it difficult to keep class size to these numbers.

Has anyone tried to reduce class size and chart the learning gains? If so, please share with us all. It might be that this would be a good investment. If students in smaller classes make faster gains, then there would be more spaces available as these students transitioned to other classes or the workforce.

Jodi Crandall


As long as the BEST Test is used to show gains, I'm afraid that there isn't much chance that REAL gains could be charted and compared. It is used too often to measure progress and is not as good for that purpose as one would hope. Really valid proficiency tests are the only way to prove the point. Achievement tests don't tell much in terms of overall progress. I would also guess that it would take a minimum of 120 hours of solid training to have a measurable level with any test that one could play numbers with. Other factors including attendance, etc. would have to be factored in. Adults have families and jobs in the way of total dedication to attendance. Also, one would have to have a test in which human judgment would play a small role. The BEST is fine for its original purpose, which was to place persons with others at their same levels of communication. Good measurement is the only way to prove anything. I often rely on plain old gut instincts to figure how things are really going. I know that's not very scientific.

Ted Klein


Ted,

Did you use an oral proficiency interview at the Defense Language Institute? If so, how log did it take to administer to each student? I think time is a real barrier to most adult ESL/ESOL programs. Thus, they use tests that can be administered to students as a group and easily evaluated.

For some college programs for international students, a whole range of tests are used for placement, including a writing sample, a reading test, and an oral interview. Clearly that's more possible with smaller numbers of students who are also paying for the classes.

What do you think is needed in the way of tests? In Passing the Torch, Forrest and I point to assessment as a major issue facing all the colleges we studied. We recommend the development of a test of all four skills that, if possible, could be administered and scored in a reasonable amount of time.

Have you tried the BEST Plus test? What has been your experience with that test?

I agree that for placement at literacy levels, a simple test developed by a program might be sufficient.

I also think that for progress through the levels, student achievement in the previous class, as judged by the instructor ("professional wisdom"), is still the best determiner of whether a student is ready to go to the next level.

Jodi Crandall


Jodi,

My wife, Mary Margaret, was a certified OPI (Oral Proficiency Interview; see http://www.dlielc.org/testing/opi_test.html) interviewer at DLI. She says that the average interview was 20 minutes. This depends on the interviewee's level. Two people interview. The higher the level, the longer the test. It's a really good test. The bad news is that there is/was something like a 40% failure rate for certification of interviewer among instructors and staff. Some of these people were pretty sharp, but doing it is an art and a science. There is no question the OPI is the state-of-the-art. Even experienced interviewers get frequent tune-ups. I believe that the OPI scoring system with the + indicating 60% of the next higher level COULD be simplified and adapted to adult education. However, if progress has to be shown as often as now, real proficiency doesn't change that fast and could cost programs $ when they submit their paperwork. IF it were done after maybe 120 class hours, there may be significant changes.

Although I got the training, I bribe other people to do the BEST Plus on my students, including Mary Margaret! I think that it's fine for placement, but really doesn't work for "progress" for a number of reasons. Even when people are trained to use it properly, there is enormous difference, even in how the interviewers expresses themselves orally and make scores. If we have to continue using the BEST tests, another thing that I would like to see is small teams of really good and consistent interviewers that do the tests on all of the students. A weak measurement system is the Achilles Heel in Adult Education.

I would prefer the use of the American Language Course Placement Test (ALCPT; http://www.dlielc.org/testing/ALCPT.html). If you will give me an e-mail address that is convenient, I will send you more information on it from my old workshops. It's cheaper, quicker, and quite accurate. I totally agree with you that teacher wisdom is also a good way to determine where and when the student should move on. Unfortunately, many managers don't trust the peasants! There are also several good ways to place people that don't involve their ability to read the ALCPT.

I put out an Adult Education/ESL Newsletter once in a while. I recently did an article on the OPI, which may answer more of the questions. Here it is:

Adult Education/ESL News Letter; the Federal Oral Proficiency Interview - Jan. 8, 2008

OPI Rating Factor

OPI Performance Profile

Ted Klein


Ted and all,

I imagine others would like to have access to the ALCPT information as well.

I have also given the OPI, not as a certified interviewer but with a lot of training in its use. I use it as part of information gathering in reviewing programs overseas, to see what various levels of instruction correspond to in proficiency terms. I haven't used it with adult ESL population. Like Mary Margaret, I have found that it takes about the same amount of time.

Do others have the same concerns about the BEST Plus?

Jodi Crandall


Jodi,

At Raritan Valley since my arrival five years ago, we have put in place homemade standardized exit exams for grammar and reading/writing classes at each level so that we have some way of ensuring that students with different instructors enter the next level with approximately the same skill level. We did that because many instructors engaged in social promotion (see http://en.wikipedia.org/wiki/Social_promotion). The exit exams have improved the consistency of the program and helped even out the skill level within classes.

We still have the social promotion problem with the listening/speaking classes because there is no standardized assessment, and instructors sometimes pass students who really do not have the requisite level. In fact, some students have complained that the listening/speaking courses are a waste of time because they have classmates in an upper level class who have great difficulty speaking.

Even though we have course outlines with clearly defined learning outcomes, my experience has taught me to be wary of the open-ended concept of professional wisdom. That sometimes morphs into a misguided belief that passing unprepared students is somehow constructive.

Kevin Hinkle


Kevin,

Unfortunately, social promotion is always a danger.

Since your standardized exams for reading/writing and grammar classes have proved to be so helpful in helping students to be in classes at the right level, are you thinking of developing a listening/speaking test as well?

I wonder whether any other programs have developed or identified a listening/speaking test that they could describe or recommend?

Jodi Crandall


I don't know if it's still available, but the basic paper BEST or the scale scored BEST Plus could help avoid social promotion. (For information on scale scores, see http://www.sabes.org/assessment/scalescores.htm) You need people who are trained in either and have not worked with a particular student being interviewed. The training for the old fashioned BEST is not too complicated, especially if you don't need scale scores.

A similar test is possible to homemake, just set standards for a point system, just a few points for grammar or fluency for a number of questions using pictures. Don't allow giving the students feedback for the test. Use some "gold" standard students to help set the levels by numbers, students that are commonly agreed to be at certain levels in the two skill areas.

Mary Jane Jerde


Mary Jane,

Great post. I think you're exemplifying how programs can make creative use of tests, rather than be the captive of them.

Forrest Chisman


Mary Jane,

I used to be at CAL and if my memory serves me, the basic or paper BEST did not discriminate at very high levels. Was that your experience?

Jodi Crandall


Dear Assessment Listserv Members,

It is with great interest that we have been following this discussion regarding one of our assessments, BEST Plus, and its use for oral proficiency measurement on initial placement and learner progress.

As other assessments have been brought into the discussion, we would like to provide the following clarification, taken verbatim from the DLIELC Website: http://www.dlielc.org/testing/ALCPT.html : "DLIELC conducts English language proficiency testing using the Oral Proficiency Interview (OPI), a face-to-face or telephonic interview, and the English Comprehension Level (ECL) test, a multiple choice test of listening and reading comprehension. (The ECL test is provided to authorized US Government users only.) DLIELC also makes available the American Language Course Placement Test (ALCPT) for English language programs conducted outside of DLIELC. Achievement testing of American Language Course (ALC) objectives is conducted using book quizzes and performance tests, which can be obtained with the course materials."

According to the April 2007 edition of the Handbook for the American Language Course Placement Test (ALCPT): "The most important consideration under the general topic of test security is test compromise. One of the easiest ways to compromise any test is through its inappropriate use." Given the definitions above, there are many reasons to question the appropriateness of using any of the above mentioned tests with the adult ESL population in the U.S.

Currently, BEST Plus is the only assessment developed specifically for use with adult English language learners in educational programs, which simultaneously measures adults' listening and speaking skills. As the BEST Plus Level Gain study shows (http://www.cal.org/resources/digest/levelgain.html), more hours equals more gains, whether that be during one academic year (intensity of instruction) or over an adult's lifetime (persistence as a learner). Furthermore, this study also shows that not only is BEST Plus appropriate for initial placement, but it is also a good measure of student progress. In many ways, BEST Plus brings the knowledge and design of other oral proficiency instruments to the field of adult ESL in a highly accessible manner. Test administrations are relatively inexpensive when compared to the high cost of other oral proficiency assessments, the results accurate and immediate, and for the first time, instructors are able to confirm many of their "gut instinct" and "handshake" evaluations based on their expertise and familiarity with the adult ESL population in a standardized manner that is scored by a common rubric.

In regard to test administrator training, another advantage to BEST Plus is that ESL instructors are able to engage in professional development in the area of proficiency assessment. Across the country, responses to BEST Plus training are overwhelmingly positive. Our core of trainers does fantastic work to educate practitioners about the differences between being a good teacher and being a good standardized test administrator, in addition to training test administrators to administer and score BEST Plus. Furthermore, there are several opportunities for previously trained test administrators to recalibrate their scoring accuracy through the use of the Scoring Refresher Toolkit within their local programs and states. In this way, programs can confirm their inter-rater reliability, as CAL has also done during the development and design of BEST Plus. We would encourage anyone interested in finding out more about the design of BEST Plus and its high rate of inter-rater reliability to read the BEST Plus Technical Manual.

Finally, we hope that programs using BEST Plus continue to see the benefits of providing adults ESL learners with data regarding their oral proficiency level and progress. Particularly adults with many commitments need this type of information offered at their convenience locally by their service providers. By knowing their levels, program directors and instructors can to continue to make instructional and programmatic decisions based on learner needs and to assist the learners in their own educational goal setting.

Michelle M. Ueland

Adult ESL Assessments

Center for Applied Linguistics


Good reply.

Jonathan Cohen


Michelle and others,

Thanks so much. We will all benefit from the clear explanation of the BEST Plus.

It's good to hear from some of you who are using the test as well.

Michelle, can you tell us if there are any plans for developing a comparable reading and writing test for adult ESL learners?

Jodi Crandall


Dear Listserv Members,

Greetings!

I'm so glad that there are so many people and programs interested in the Center for Applied Linguistics and our adult ESL assessments.

Regarding the nomenclature, the original BEST tests (oral interview and literacy skills sections) have changed since their original development in the 1980's yet many of us still continue to refer to them affectionately as the BEST tests. The BEST oral interview has been replaced by BEST Plus which is much more comprehensive and covers SPL levels 0 - 10 and NRS levels 1 - 6. Furthermore, BEST literacy skills section has been updated and renamed BEST Literacy. It is currently available in three forms (B, C, and D) and there is a new Test Manual to accompany the updated tests. Although we are cognizant of the need for a higher level reading and writing test, currently there are no initiatives that we are aware of for funding this type of test development within the Department of Education Office of Vocational and Adult Education.

I would encourage anyone on this discussion list who is interested in assessment issues to begin by reviewing their state assessment policy to determine which assessments - of those approved for use by NRS at federally and state-funded adult ESL programs - are also on their state's list. If BEST Plus, BEST Literacy, BEST oral or literacy skills sections are listed, or if anyone would like more information about CAL's assessments, check out CAL's website at: www.cal.org or the BEST Plus website at: www.cal.org/bestplus/

In many states there have been and continue to be numerous training opportunities. For example, in New Jersey the New Jersey Department of Labor regularly schedules BEST Plus training in different areas of the state. If anyone would like further information about the updates to BEST Literacy or about the two forms of BEST Plus (computer adaptive and print-based), the adult ESL assessments team at CAL is happy to provide information and continue to clarify any questions you may have off line. Please contact us directly at: 1-866-845-2378 (toll free) or at aea@cal.org

Sincerely,

Michelle Ueland


Michelle,

Thanks so much for this clarification. You have clarified the various BEST tests and their purposes. You're also so right, if we are to develop additional tests, there needs to be money made available for that.

Jodi Crandall

As an ESL instructor who has used BEST Plus for pre-test and post-test scoring in grant funded classes, I have found the test to be an accurate gauge of student placement and growth in English.

One advantage is that I have been able to test new students myself and have post-testing done by various other ABE or ESL instructors. They were all trained by BEST Plus staff. Before a test session the examiner and I go through a review. I have sat with the examiners for initial post-test sessions to have them gain confidence in their abilities. But I had faith that a "subjective" test could be accurate because I had used BEST for seven years in a refugee program and knew it's value for placement.

The program does a fine job of adjusting level of difficulty based on feedback from responses.

Is one test ever enough? No. But I have enjoyed the benefit of being able to use both CASAS and BEST Plus in smaller community programs.

Mary Jane Jerde


Mary Jane,

Thanks for sharing your experiences.

Do you also use a reading and/or writing test at any of the adult ESOL levels?

Jodi Crandall


Hi,

The focus of class is speaking, so I don't give the students any scale score reading or writing tests. Giving two oral post-tests takes enough time.

For class I do test the various skills.

Mary Jane Jerde


Mary Jane,

Why are you using two oral post-tests?

Jodi Crandall


Practically, because I am able to. We have the trained staff. It is good for them to all use the skills.

Theoretically, because one test is not sufficient assessment. Also, CASAS is a listening test. BEST is a speaking test.

Financially, because I never know which test will produce the better score for reporting.

Mary Jane Jerde


Mary Jane,

Thanks for this clear answer. It's great that you are able to use both -- and I can see the benefits of doing so.

It would be interesting to hear from others who use more than one test, not only why they use it, but also how they are able to do so.

Jodi Crandall


Mary Jane,

That is very interesting. How often do you administer the Best Plus to students as a post-test, and for what purposes? For example, is it the post-test used for promotion decisions, and how is it used to guide teachers in their classroom instruction of particular students/classes? Is it used for NRS reporting? Finally, I think a lot of people would like to know how you would compare the usefulness of this test to CASAS. When and for what purposes would you recommend either test?

Forrest Chisman


All the discussion on this list (the first I've ever joined) has been fascinating. I have a question I haven't seen addressed. Someone said that the BEST-Plus is the only valid kind of test for ESL learners. What does anyone think of the CASAS? That's what we use with our adult students, both ELLs and native speakers. I know it wasn't designed for ESL use but it does use real-life material (road maps, pay stubs, store signs, etc.) and I think that's good. What has always bothered me is that it's taken silently. That doesn't seem right for our lower level students, who rarely make many gains. I wonder if it's considered valid for them or not. I know that other ESL programs do use it.

Thanks.

Gail Burnett


Gail,

Good question!!!!!!!!!! I'd be interested in the replies. Because CASAS is MANDATED in many states, and most programs don't want to use more than one assessment (because of limited time for testing and fear of over-testing). What do you all think of CASAS for ESL?

Forrest Chisman


CASAS is the assessment of choice (mandated) in Iowa. If teachers and learners buy into competency-based instruction, the CASAS assessments have some utility. The short-comings and artificiality of standardized tests seem to be amplified with the ESL.

The reading and listening tests fall short of providing a comprehensive useful assessment of the student's abilities at orientation. We offer three levels in the morning, beginning, intermediate, and advanced, with the placement being done using their writing (application), speaking and listening (initial interview questions), and reading (based on the literacy demonstrated with the application and interview).

However, due to the mix of skills, the beginning and intermediate are really blended multi-level classes with some beginners who can read, but struggle with speech and writing, and some intermediate who can speak very clearly, but struggle with the reading and writing and virtually every other possible combination.

Similarly, teachers and students have been frustrated with the lack of progress in post-testing as well as the inconsistency of the scores. Those students that bring high literacy skills to the class tend to have fairly consistent scores. Those who are not as literate (and are a growing population in ESL) can swing up and down the scaled score throughout the year with little rhyme or reason.

Bottom line - CASAS works well for the standardization required by NRS, but is not as powerful or useful in regard to teaching and learning as one would like.

Jim Schneider


Jim,

Very interesting, and thanks. What puzzles me is that a lot of people seem to say, as you do, that CASAS and other standardized tests work well for NRS, but not for teaching/learning. But if the purpose of NRS is to determine how much students are learning in particular programs, how can this be so? Maybe I'm just dumb about this, but I'd rather confess my ignorance than live with it. Shouldn't NRS be measuring what your students are actually learning? And if CASAS and other standardized tests don't do that, don't we need either a better assessment system, or a better way of reporting to NRS by other means.

As I say, maybe I'm just dim.

Forrest Chisman


Forrest,

Having read everything I can find that you have written in the past 17 years, dim is the least of terms I'd use to describe you. (I'm working on a Ph.D proposal dealing with the marginality of community college ABE programs).

The issue with standardized tests, NRS, NCLB, etc. is that they are trying to impose Frederick Taylor's principles of scientific management (see http://en.wikipedia.org/wiki/The_Principles_of_Scientific_Management) to people. The variables involved with forging steel are certainly complex, but finite in comparison to the variables involved with nearly any aspect of working with people.

An incredible instructor can make a poorly conceived curriculum work, whereas the best curriculum in the world will languish in the hands of a mediocre to poor teacher.

Similarly, standardized assessments are a weak, artificial means of assessing skills.

I do like CASAS for the life & work orientation over a more academic assessment such as TABE. However, even the life and work orientation is highly dependent on the experiences of the learner. This morning I have a Buddhist monk from Cambodia enrolling in our program. He has completed 16 years of education in Cambodia, his listening, speaking, and writing are all exemplary relative to our typical ESL student. However, he is struggling with the last section on a CASAS C level reading assessment because of the context of the material being presented. I suspect that he will do well, but his struggle exemplifies the weakness of using standardized assessment and expecting standardized results. Educating people is significantly more complex than forging steel... We are learning how to become more effective in our approaches, but I cannot fathom teaching and learning ever being "standardized" to the extent of those who seem to believe that NRS/NCLB etc. is the answer to the question.

Jim Schneider


Please pardon this lengthy post.

Standardized achievement assessments that measure overall literacy ability (e.g. CASAS, TABE, NAAL, NAEP, GED, etc.) (for NAAL: National Assessment of Adult Literacy see http://nces.ed.gov/naal/; for NAEP: National Assessment for Education Progress see http://nces.ed.gov/nationsreportcard/), especially if they are group-administered and involve selected response items, may not provide the instructor with an exhaustive and in-depth analysis of every facet of the basic skill for every learner in the classroom. Good teachers and programs routinely supplement with formative assessments. That said, results from standardized achievement assessments can inform the teacher of the ability level for each student. An item analysis can also tell the teacher about the general areas of strengths and weaknesses for individual students and the class as a whole -- information that can inform curriculum/lesson planning.

In the CASAS system (which I am most familiar with), the CASAS competencies form the curricular backbone. These competencies were developed and are periodically revalidated with a broad stakeholder constituency. In recent years, content standards have also been developed that identify the basic skills that underlie these competencies. Priority competencies identified by these stakeholders form the basis for assessment development.

These priority competencies act like "power standards". However, when developing curriculum and lesson plans, we recommend that the instructional content go beyond the competencies measured on the pre-test and incorporate other competencies from the master list that may also be important and relevant to the learners. Because we are talking about measuring basic skill ability and not subject matter knowledge like social studies or biology, instruction in a broader set of competencies and the basic skills that underlie them can facilitate transferable learning that ultimately contributes to greater performance on the post-assessment.

When it comes to measuring progress, I feel that a CASAS post-test-assessment does measure what students are learning because it is assessing learners on the same standards (i.e. CASAS competencies and basic skill content standards) and performance levels that were used for (i) the baseline pre-test assessment, (ii) the development of curriculum, and (iii) the delivery of instruction.

Because the NRS expects learners to substantially improve their literacy ability from pre-to-post test by demonstrating movement from one literacy level to the next, progress in one or two sub-components of a basic skill (e.g. converting percents to fractions or learning a new grammar rule) may in and of itself be insufficient to achieve an NRS level completion. This can be frustrating for a teacher who may have noticed student progress in class work but not see it reflected in the standardized assessment results.

Sometimes, even if the curriculum and instruction are aligned to the standards, learners may get tripped up in the assessment by a certain vocabulary word, by the context of a test item (Jim gave the example of his student who was Buddhist monk from Cambodia), one of the distractors, a writing prompt, etc. Additionally, though standardization brings validity and reliability to the process of measuring progress (which I believe has been vital to maintaining the credibility and funding of adult education at all levels), it [standardization] tags along some "standard measurement error" wherein some fluctuation in scores is not out of the realm of accepted possibility. Computer adaptive testing can minimize this measurement error even further but it may lessen the diagnostic information provided because different students are administered different test items. All these issues are not unique to assessments used for the NRS but are inherent to the process of standardization.

Results from assessments that require students to "construct" their response instead of "selecting" from a set of choices can help this matter somewhat by providing more information that informs instruction. Over the past 10 years, CASAS has developed standardized constructed response assessments such as the functional writing assessment and the workplace speaking assessment. Here, the student responses provide much richer information to inform instruction than might be available from a multiple choice test. With extensive training, ongoing recertification, and stringent inter-rater reliability practices, these tests can also serve as reliable standardized assessments for NRS accountability purposes. The challenges here are related to cost and scalability. These assessments are administered and/or scored one-on-one and the adult education system is just not resourced currently to implement this level of assessment for all learners in the system.

Despite all these issues, in my experience, programs that reflect good outcomes on standardized assessments (both constructed and selected response) encompass more of the characteristics of high quality programs, and are more effective at helping learners to achieve their goals. They have strong leaders, aligned curriculum and instructional practices throughout the program, instruction of reasonable intensity and duration, counseling and other support services, and higher rates of learner persistence and attendance. This has been reaffirmed for me through extensive analyses of Connecticut's performance and funding data, reviews of local curriculum, observations of classroom instruction, and conversations with practitioners in CT and in other states.

We are also beginning to see in our data that achieving higher levels of basic skill proficiency as evidenced on CASAS assessments translates to measurable success outside the classroom e.g. greater probabilities of passing the GED test, greater employment rates, greater average earnings, etc.

One thing I will say about the NRS reporting framework is that instead of using a level completion approach, a growth model may more accurately capture the progress a student is making, regardless of his/her starting point in a functioning level. For example, in one of our high performing programs, only 18% of learners at the advanced ESL level (which spans 15 CASAS scale score points) completed that level while 58% of learners demonstrated a significant gain of 4-scaled score points from pre-to-post test. For this reason, in CT, we evaluate scaled-score point growth as a separate measure within our accountability framework that is different from level completion.

In conclusion, the big challenge from my perspective is not that the assessments do not reliably capture the progress made by learners or measure what is taught in the classroom. The real challenge is that a majority of the ESL students just don't stay long enough to make significant progress, regardless of the assessment used. As I stated earlier, two CASAS studies with CA and CT data and now the recent BEST study with MA and IL data confirm that higher success rates are attainable if students stay for at least 100 hours in a fiscal year, but the reality is that a majority of ESL students don't do that. One just needs to compare the results from NRS Table 4 (all learners) with that of Table 4B (those with matched pre-post tests), for any state that has sound assessment practices and a reliable data system, to see this persistence issue manifested over and over and over again. Strategies to improve persistence - now that's another discussion.

Ajit Gopalakrishnan


Back to Transcripts Table of Contents




Please note: We do not control and cannot guarantee the relevance, timeliness, or accuracy of the materials provided by other agencies or organizations via links off-site, nor do we endorse other agencies or organizations, their views, products or services.