Monday, May 16, 2005

Possible Vote Today on HB3162

The bill to abolish CIM/CAM and replace Oregon's tests with new ones that are actually valid has its "third reading" on the house floor today. They may or may not debate it and vote, depending on the calendar and time constraints.

There is a "minority report" which the Democrats on the Education Committee submitted as an alternative bill for consideration, which will be voted on first. The alternative basically says the Department of Education will conduct a study to decide what to do about CIM/CAM.

Oh great idea. Let's have more talk. Let's do a fact finding mission. Let's let the bureucrats who created this mess conduct an analysis and figure out how to get out of it.

We don't need any more fact finding missions. We need a FACT-FACING mission!

Face it: the assessments are not valid and connot be trusted. CIM and CAM are a statewide joke, but the joke is not funny because it is doing so much harm to our schools.

Unfortunately, this has become a partisan issue in the legislature, as the Democrats have come to the defense of the education establishment lobby, as usual.

At least today, for the first time in more than a decade, legislators will have to cast an up or down vote on CIM and CAM. No more hiding behind studies and committees and analysis.

The people of Oregon will know who is listening to them, and who is listening to the bureaucrats.


Anonymous said...

"We need a FACT-FACING mission!"
Great line, Rob! Keep up the good work!

The Manly Ferry said...

Hey Rob,

I just read the post over on Blue Oregon pointing to Oregon's conservative blogs and requesting that conservatives move their half of the dialogue to these spaces so that progressives can "keep building their community." Even as a contributor, but can't help but see that statement as unfortunate.

For the record, as encouraged as I am by your first comment on Blue Oregon's site, I have to say I'm less excited to see the talk radio position on the resume; I'm a print geek and consider radio a bad forum for dialogue. Still, the radio is the radio and here is here, so I'll check out your site - regularly, time and good Lord willing - to keep up with how the other half thinks.

Richard Meinhard said...

Rob, I agree with you that the state did a poor job of implementing the standards and testing mandate they were given by the legislature. They were to develop criteria for what students were to know and then tests matched to these criteria, criterion referenced tests. Off-the-shelf commercial tests wouldn't fullfill the intent of the law for criteria based standards and testing.

But your point is that the tests aren't valid, and I would agree. There have been no validity studies as far as I know. Washington state did have the NWEL do a validity study on their tests so they could withstand a lawsuit. If Oregon ever mandates the CIM by law, it would then legally have to prove validity if challenged in court.

Technically, the tests are developed using the Rasch model, a statistical method now widely accepted as a replacement for the old item analysis methods of test construction. The Rasch model allows the generation of a scale that is independent of items and students so that the scale can then measure new items to determine their difficulty level. The scale remains fixed since it is independent of new items or new students. New versions of the test are anchored to the same scale. Every item they have in their item bank (which started with the item bank they brought over years ago from the NWEA) has its own set of statistics showing its point on the scale and its error. But most importantly, the Rasch method produces a measure of the "fit" of the item to its place on the scale. If it moves around the scale in its relationship to other items, then it shows as a misfitting item. A poor fit score means something is wrong with it. So the item can be reworked and retested to see if it can fit in with the other items in its proper place on the scale.

The Rasch model of analysis is a psychometric method of statistical analysis, and it does not tell us anything about the validity. You are on good ground to challenge the validity of the state tests. And it is precisely because Cathy Brown didn't know how to use Rasch analysis that the performance tests produced unreliable assessments. When I pointed this out to her, she dismissed Rasch analysis as a way to develop the performance tests.

The conclusion we should reach: Oregon's single answer tests are technically very well done accordining to test experts around the country because they use the most advanced Rasch methods of item analysis. However, their validity is highly questionable, and you are on good ground to question them, particularly the math performance tests. We have what one expert termed Oregon's assessments, tests that are "statistically brilliant and conceptually bankrupt."

Some if not all of the commercial test companies now use Rasch analysis in the development of test items. Some are creating permanent, stable scales so that K-12 student can be measured. Once a test company has produced a stable bank of test items, it is statistically an easy matter to "link" the tests since each has an independent scale that exists separate from what ever cohort of students or test items are used.

But if we want to measure a specific set of criteria, then we need test items derived from those criteria. And that is why ODE had to generate the criterion based tests it now thinks it has. But who is to say an item actually measures its criterion? We get back to validity issues, and it is here that Oregon fails in my evaluation of their work.

In the end, we have the same process of item development. Even if a commercial company is used, it will have to select items based on Oregon's criteria as listed in the state's grade level outcomes. Or, we can ignore what we say we intent on teaching as outcomes and simply select a generic test but that would mean we lose the specific relationship to Oregon's learning objectives.

I think it really doesn't matter what test we use since the validity issue is so huge. The good students score well on whatever test we use. The othe issues you raise of dollar and learning opportunity cost are important. I agree with you regarding those concerns. I would prefer far simpler, far less costly measures. Then give schools and teachers assessment options that would give them real power to diagnose and help students learn important, universially valued forms of knowledge.

Your point that the state has imposed a huge number of outcomes and tests extending now into the CAM has been a recurring problem. Perhaps the legislature should limit the number of outcomes, the testing time, and the cost by law, and then require it be put out on bid. The large staff at the ODE means the ODE is quickly creating a much too elaborate system. It should not be. The real important work as you pointed out is helping to develop better instruction in the classroom. Unfortunately, the ODE has now evolved into a centralized command and control function that is micro managing local schools.

gus miller said...

Good luck Rob. We do not need more "bad policy implemented poorly" from ODE.

Anonymous said...
This comment has been removed by a blog administrator.
Thor the Avenger said...


I do not get the Rasch/RIT scale. I have somewhere read that the RIT scale is alleged to be an interval scale, which I understand to mean that a 10-point difference should be equal whether it is a difference between 190 and 200 or between 240 and 250, just as a one-foot difference in the shot put is the same. I understand you can create equal intervals when measuring how far someone can put the shot, but I have a harder time accepting that when it comes to test results.

The Oregon department of education for the fifth grade sample math test, 2001-02, provided a table converting ‘number correct’ to the RIT score. Total 25 questions. One correct equals 169.0. Two correct jumps 9.8 points to 178.8. After that, for each additional correct answer, the RIT gains trend from over six points to barely over two points at the middle range of 12-13 correct answers, then trend back up [24 correct gets a RIT of 250.8; 25 correct gets a RIT of 260.6, an increase of 9.8 points].

So how can every single one-RIT-point interval be deemed equal?

If a kid gets 24 correct and scores 250.8, there is no way to know which one of the 25 questions he got wrong; but the kid who gets 25 correct gets 9.8 points more. This treats every one of the 25 questions as if it is worth 9.8 points. But that makes no sense. It might sort of make sense if total correct answers was tied to a normal curve. But then the relative difficulty of individual questions does not relate.

Can you explain?