Which test is better for you?
Marks Education is re-evaluating how we advise students on the SAT vs. ACT decision, in light of inconsistencies with the College Board curves for the June and October 2018 SATs and for the October 24, 2018 administration of the PSAT.
Check out the score conversion tables on the Math section of the following four SATs. The first two reflect the older, consistent tests, and the last two are recent tests.
Predictability Is Key
These anomalous curves actually first appeared on the June 2018 SAT and, then again on the October test. Until June 2018, SAT curves were relatively consistent.[1] This predictability is important on standardized tests, as it enables students to know how many questions they need to get correct in order to target a desired score. A student who has limited time to study SAT Reading and can’t handle a difficult 18th/19th-century passage could simply randomly bubble in that passage, focus all his/her attention on the rest of the test, and still earn a 90th plus percentile.
Similar strategies apply to the Math part of the test, where, for instance, a student who hasn’t learned difficult statistical concepts at school (most math courses don’t cover the statistical knowledge tested on 2-3 SAT problems) could guess on those questions and still earn a Math SAT score in the 700s.
But predictability is important also for college admissions offices that want a reliable benchmark to distinguish between, say, Anthony who earns a 750 on the SAT, and Sameer who earns a 680. Throughout the last 20 years, several studies have raised questions about the validity of the SAT. A common question validity studies seek to answer is if the test helps to predict college performance beyond what is predicted by grades. But through all the controversy about validity, the SAT has maintained its reputation for predictability. Predictability (measured by a statistical term known as reliability) is a major reason standardized tests still have a place in decision-making. Where colleges suspect grade inflation, admissions officers could say, ‘We can at least trust that SAT scores will be stable’.
But first in June 2018 and then again in October 2018, unpredictable SAT curves seriously affected many students’ scores.
54/58 questions correct on the October 2017 or March 2018 tests (and on most others) earned a 780. In June 2018, it was 700! After the June test, our response was to assume that the curve was a once-in-a-lifetime irregularity but this happened again in October. Thus seniors taking the SAT had to deal with two very unusual curves, especially challenging for students meeting early college application deadlines. [2]
Choose the least stressful path.
While below we detail these changes and our own theory of why this has happened, the key take away is this: All other things being equal (accommodations, reading speed, and comfort with geometry are some other factors in the SAT-ACT decision), when scores are tied, or even when baseline PSAT or SAT scores are a little higher than those on the ACT, we recommend that students lean ACT.
While the ACT has many issues of its own (including the lower reliability of the Reading and Science sections), it is the more predictable test right now. Marks Education is entrusted with the great responsibility of charting the least stressful outcome for students under great pressure, with many commitments and little time. Until we see evidence of consistent curves on the SAT for at least seven tests in a row (one year of data), or updated reliability data from College Board, we will maintain this recommendation.
[su_divider top=”no” divider_color=”#DA6756″ link_color=”#DA6756″]
MARKS EDUCATION’S ANALYSIS:
What else was affected?
Compared to the June and October SAT, the October 24 PSAT was actually the most unexpected. The PSAT is given three times each October, and in the past, scales on the three PSATs have conformed closely – i.e., the number of correct answers on one test administration has earned students roughly the same raw score that it would on another administration. This year, however, was much different.
Here is the top of the scale on the Math section of the October 10 and 24 PSATs from 2018:
The score conversion on the left, for the October 10 test, is typical, consistent with the scores seen on other recent PSATs since the redesign of the SAT. The score conversion on the right, which affected about 10% of the 1.8 million PSAT test takers, is very different. A student who got 43/48 correct would get 700 on the first test but 610 on the second test.
For the full scale, look here.
The College Board attempted to address these concerns as well as others on their blog; however, we feel this response doesn’t solve the problem, given the October 24 scores, which should not have resulted in 100-point differences between curves.[3] Beyond this, only the top 0.5% (or so) of test takers from each state are eligible for National Merit, so the National Merit Scholarship Corporation (NMSC) would have to selectively lower the award cutoff for those who took the October 24 test, which is extremely unlikely.
But does it really matter? After all, doesn’t a less forgiving scale imply that the test was easier, which means that a student who got fewer questions correct on the rigorous scale would be likely to get more correct on the forgiving one? Yes and no. Theoretically, on an easier test, students should get more questions correct. However, an inconsistent scale creates more variability in outcomes. On each section, as long as you bubble in an answer for every question, the new SAT has a minimum possible score around 330 and a maximum possible score of 800. When, as on the June 2018 SAT, three questions wrong drops students 80 points on a 470-point scale, more of the outcome rests on chance, less on ability. Students are disproportionately penalized for one or two small errors, and luck, or guessing correctly (remember that there are only four answer choices left on the new SAT) can become very important. And these swings can result in changes in the reliability of a test.
The reliability of a test is measured primarily by the coefficient of correlation for the test, or the consistency of scores across administrations. Because the SAT used to be transparent with data, we knew that reliability across the old SAT was high – the correlation coefficient for each part of the three-part SAT was at or above 0.9. But after three years of the new SAT being in place, College Board has still not released a reliability study for the current test.
In the College Board’s 2018 annual report, one standard deviation across an SAT was a little over 100 points (102 points on the Reading-Writing section and 114 on the Math) (Page 4 of this report shows the details). Thus scores on the March and June 2018 administrations were, at some raw scores, almost one standard deviation apart. On the PSATs, the difference between the October 10 and 24 administrations was even larger. One standard deviation on the Math part of the 2018 PSATs was 93 points. The difference between getting 54/58 correct on the October 10 (720) and October 24 PSAT (620) was 100 points.
These differences in outcomes could affect college acceptances and how students plan their test-taking schedules. For seniors hoping to apply early, October can be the last test they submit. Many of these students took the June SAT, and then, after more preparation, the October. While most of our students did well despite the new scales, a few unfortunate ones suffered.
But why has this happened? Why would the creators of the SAT risk frustrating college counselors and students at a time when the ACT is steadily increasing in popularity and market share? (Since 2016, the number of students who choose to take the SAT, not the ones forced to take it on school-day administrations, has been consistently lower than those who take the ACT.)
We think the answer lies in test design changes that came with the new SAT. Back in 2015, when the College Board shifted to the new SAT, they dropped the undisclosed experimental section. The questions on this section, one of the six 25-minute sections on each SAT, were responsible for creating the almost-perfect bell curves on all administrations of the SAT. From data gleaned from hundreds of thousands of test takers, College Board knew exactly how difficult a question was and could create tests that were similar in difficulty and therefore in their curves. Before College Board rolled out the new SAT, we know that they tested thousands of new SAT questions on experimental sections of the old test. We know from contacts within the College Board that these questions were good for many tests worth of SATs. We suspect that that good data, the gold accumulated by old administration, are now used up.
While the old SAT had three undisclosed experimental sections on each test administration (different students saw either a Math, Reading, or Writing experimental section), the current version only has a very short disclosed experimental section with only 15-18 questions. Students are told before they test that they are taking an experimental section. Although, in recent administrations, proctors are being told to read instructions indicating that experimental questions could be used for scoring purposes, this is incorrect, and most students are aware of this. The data gleaned from these very short, and disclosed experimental sections, are likely unreliable.
We fear that unless the College Board reintroduces an undisclosed experimental section, (which can’t happen in the current format of the SAT), we will continue to see above average variation and abnormal curves on future SATs and PSATs.
[su_divider top=”no” divider_color=”#DA6756″ link_color=”#DA6756″]
[1] Look here for the College Panda’s record of recent SAT curves and to see how, across different administrations, a certain number of questions correct would earn approximately the same score.
[2] On the June and October 2018 SATs, even the Writing curves were similarly unpredictable. Here’s a comparison of the Writing curves of four recent tests:
[3] Q: If I got more questions right on this PSAT/NMSQT than previous tests or compared to other students, why did I receive a lower score?
Questions on tests administered on different dates are unique. Because the tests are different, you shouldn’t directly compare the number of questions answered correctly. You need to consider the difference in difficulty between the test questions. This is what equating does and why you are best to compare at the scale scores rather than the raw scores.
Q: Will this affect my chances for National Merit Scholarship Corporation programs?
We are in contact with National Merit Scholarship Corporation about gaps at the high end of the score scale in the math section of an alternate test form so that they have all the information they will need when determining qualifying scores for the 2020 National Merit Scholarship Program.