The Effectiveness and Accuracy of Standardized Tests

The following sample Education research paper is 2514 words long, in MLA format, and written at the undergraduate level. It has been downloaded 500 times and is available for you to use, free of charge.

One thing that nearly every student fears is the stress of standardized tests determining their future in education and employment. The SATs, state assessments, and even classroom-administered unit tests exist to measure the progress that the pupils make throughout the year, providing feedback to the individual students, their teachers, and the larger school board to further their abilities to provide for the students. Though there are flaws with the system in how the tests are applied that must be addressed, standardized testing in education, particularly the formative method, is very effective at tracking a student’s progress in learning.

Formative assessments are categorized by their focus on measuring the process of learning as it is ongoing. One popular instance of a formative assessment is a timed reading exam, administered to young students, to measure their growth in reading competency (Shinn, 62). Kaminsky and Good touch on the importance of this testing, called Dynamic Indicators of Basic Early Literacy Skills (DIBELS), saying they were “were developed to make educational decisions in a Problem-Solving model regarding (1) which children require early literacy skills interventions beyond the general ‘curriculum’,” as well as which methods work on a case-by-case basis, and when the methods have effectively prevented the problem (117). DIBELS testing tracks the progress of an individual student versus the expected progress of all other testers in the same age group and grade and aims to recognize the issues a student has before they become unmanageable.

Formative testing can ideally allow teachers to more accurately help their students on an individual basis. “For children in kindergarten and first grade, the purpose of Problem Identification with DIBELS is to determine which children differ substantially from their peers in the acquisition of early literacy skills and thus are potentially at risk for difficulty in learning to read,” say Kaminsky and Good (127). They continue, saying that with DIBELS, “it is possible to identify…children who are at risk for development of reading difficulties and to evaluate the effectiveness of interventions implemented within the general education classroom on a case-by-case basis” (138). Thus, not only is DIBELS critical for helping the students, but also for preparing teachers by providing them with the background and the material for assisting students through their struggles.

A similar formative testing method is Curriculum-Based Measures, which exist to assist with and correct problems that were not detected (Kaminsky and Good, 1998, 119). Shinn explains that CBM models are “curriculum referenced,” and that “a student’s performance on a test should indicate the student’s level of competence in the local school curriculum” (62). Shinn’s early article discusses the vital role of these types of formative tests as ways to enhance both the role of the students and of the educators in assessing, recording, and monitoring the students’ progress. The CBM methods cover both sides of the process, meaning that even as the students’ progress is monitored and the methods of teaching them adjust, so can those very methods be refined on a broader scale. Stecker, Fuchs, and Fuchs agree: “The hope [of CBM] was that by responding instructionally to students’ poor patterns of performance, teachers should be able to enhance student achievement” (795). Even in the span of time between the 1988 article and the 2005 article, the goal and emphasis on perfecting the methods continued.

Formative testing is not the only type of assessment; summative testing exists on the opposite end of the spectrum. While formative testing pays attention to the beginning of learning, and the process of it, summative testing measures how much has been learned. The SATs and state standardized testing, in particular, are considered summative tests. Thus, summative tests are perfectly adept at showing what has already been learned. Black, Harrison, Lee, Marshall, and William (2004) say, “Summative tests should become a positive part of the learning process. Through active involvement in the testing process, students can see that they can be the beneficiaries rather than the victims of testing because tests can help them improve their learning” (16). To accomplish this student involvement, Black et. al suggest students creating questions for the classroom exams (16). National and state testing can be included in this plan as well, as students can learn to recognize the ways in which they struggle.

Both formative and summative methods have important roles in measuring learning. The formative method is critical for guiding the student in his or her journey, to refine by what methods and what material he or she is learning. Summative exams instead show what schools excel, and which curriculums best serve the student body as a whole, and how to allocate resources to improve public schools with lower performances (McNiel 6; Shinn 62). That means that using formative and summative methods in conjunction, to both discover where resources need to go, and how to focus the efforts on which students and in what manner, could create an ideal situation, to improve the already useful and accurate methods of testing learning.

Black and Wiliam explain in some more detail in their 1998 article how the effects of standardized testing improve learning as well as to measure. They condemn the heavy focus on letter grades and classroom rankings, saying “the giving of useful advice and the learning function are underemphasized” (Black and Wiliam 85). They continue to explain the consequences of this mindset: “When they have any choice, pupils avoid difficult tasks. They also spend time and energy looking for clues to the ‘right answer’” (Black and Wiliam 87). Black and Wiliam’s findings show that standardized testing can so accurately gauge the learning habits and teaching methods in the classroom, that it is possible to pinpoint the ways in which the tests—particularly formative testing in this instance—have improved a student’s ability to learn.

In a similar vein, Black and Wiliam’s article covers the benefits of formative assessments and the importance of improving them. They say, “improved formative assessment helps low achievers more than other students and so reduces the range of achievement while raising achievement overall” (84). Standardized tests measure as they improve; they improve because they measure well.

Despite the positives of standardized tests, which are proven in multiple sources to capture the truth of a student’s learning progress, the numerous variables associated with test-taking mean that the system is not without its flaws. Wiliam says in his 2011 article that it is impossible for tests to completely show the progress of every accomplishment students achieve, that “some [students] will be advantaged by seeing the questions they have been expecting,” while others will be at a disadvantage because they cannot answer questions with which they are most familiar, and that “the fault does not lie with the student, it lies with the inadequacy of the assessment” (118). This information is unsurprising, as the tests would have to be individually customized to adequately measure each student’s interests and strengths, defeating the point of the standardization. So the averages of the exams, while useful for assessing curriculums and for helping students reach the average standards, cannot likely ever reach a status where it can show summative results as a useful measure of a single pupil’s abilities. Only formative classroom exams can begin to capture that kind of result, but they are dedicated not to capturing the essence of a student’s passions, only their ability to learn.

The results of standardized tests, particularly open-ended portions, are often open to interpretation. “We want variations in students’ scores to be caused by differences that are relevant to the construct of interest, rather than to irrelevant factors, such as who did the scoring, the particular selection of items used for the test, and whether the student was having a ‘good’ or ‘bad’ day” (Wiliam 244). The matter of tester bias, the amount of sleep or lack thereof the student received, or familiarity with the particular topics included in the exam itself all fall under the scope of factors impossible for a single test to account for. Like the matter of standardized tests addressing only small facets of a student’s pool of knowledge, the standardized test may only tap into a small amount of a student’s potential in any given test session. A given student may do well with open-ended questions but freeze at a multiple-choice series, while the student beside them may have the opposite reaction.

The role of the teacher is very influential in a student’s performance. Black and Wiliam (1998) say that “pupils who see themselves as unable to learn usually cease to take school seriously. Many become disruptive; others resort to truancy” (84). Thus, a teacher’s role is to encourage and foster his or her students, so that the formative testing can work in tandem; the exams show what a student struggles with, and where they excel, and the teacher makes sure that student has the esteem to apply the necessary changes. Without that encouragement, despite the ability of the formative tests to accurately measure the student’s ability, the student could not follow along the track of projected growth; they would fall into a trap of self-doubt and their talents would wither, un-nurtured.

This encouragement, however, may be difficult for teachers to give, if they are unaware of exactly how to teach and encourage effectively. Kaminsky and Good (1998) say, “For kindergarten and preschool teachers, uncertainty about what to teach is especially problematic, because the outcome of instruction (reading competence) will not be evident until later” (113-4). This is a critical stage in a young student’s life: these early educators have a chance to instill their pupils with the tools necessary for learning further in their academic careers. Formative assessment is a crucial part of solving this issue: formative assessments have the capabilities of mapping what methods lead students to the greatest rate of learning, allowing teachers to gain confidence in their abilities to serve their purpose on the pathway to education. The teachers have constant feedback built into these exams, backed up by summative exams, which help to maximize effectiveness on the broader scale, as discussed above.

The pressure on schools to do well for summative exams, the type of test that helps the department of education to allocate necessary resources, can at times drive schools to put the emphasis not on learning well for the sake of learning, but on individual students serving the school: good test scores for more money and acclaim. “The assumptions that drive national and state policies for assessment have to be called into question. The promotion of testing as an important component for establishing a competitive market in education can be very harmful,” say Black and Wiliam (1998, 90). The treatment of education as a business venture, with the students as valuable resources, devalues the nature of learning. Using a sample of data to compare a student for the sake of knowing where a student’s progress reasonably should be versus the strict competitiveness involved in college admissions is a different situation; one helps a student to learn at the best rate they possibly can, while the other creates an environment that pushes students to adhere to a standard set by an average in data.

Some who study the efficacy of standardized testing are less optimistic about its results and procedures, however. McNeil introduces her book (2000) with the claim that “what will be clear from a close-up analysis of the effects of standardization is that, in fact, standardization undermines academic standards and seriously limits opportunities for children to learn to a ‘high standard’” (6). This is in line with Black and Wiliam’s (1998) findings, taken to a harsher extreme. She elaborates, calling standardized testing too ‘high-stakes’ in their current form, and that “high-stakes decisions, such as grade placement and promotion (or retention), placement in highly stratified academic tracks, and even graduation are increasingly determined by students’ scores on centrally imposed, commercialized standardized tests” (McNeil 6). The problem is the continuing bottom-line mentality in the educational system: students are rewarded for being complacent in this system by getting into good colleges, getting good jobs, and schools get better resources.

The problem with this rewards system is, of course, that the information used to build them is all from summative assessments. College educations are given to students who end up in a prime position, not necessarily to ones who have learned the most. “In manipulating the dialogue [of a class discussion]…the teacher seals off any unusual, often thoughtful but unorthodox attempts by pupils to work out their own answers. Over time the pupils get the message: they are not required to think out their own answers” (Black and Wiliam 88-9). By learning how to parrot back information for the sake of grades on a test, or for a place in the college of their choice, students are robbed of a complete learning experience. They learn to memorize, but cannot learn how to learn. Summative testing, ironically, works against the very people it aims to assist when it is put to use only in this fashion, without any formative testing to keep the education process honest.

The consequences of the summative rewards system affect some groups of students more profoundly than it does others. Good, Aronson, and Inzlicht (2003) say, “because standardization test scores are the preferred standard for college admissions, it is not surprising that Black students make up less than 10% of admissions to 4-year colleges” (646). They contend that Hispanic students fare a little better, but still fall far behind white and Asian students; similarly, women fall behind men by “as much as 35 points” in math and science on the SATs (646). The article blames the internalization of stereotypes by students and testing officials alike (647). Because of this “stereotype threat” as the authors call it, related to the variables mentioned in Wiliam’s (1998) article, there can be no true objectivity in interpreting the results.

Regardless of the difficulties facing standardized tests—issues of subjective grading, lack of confidence from teachers, struggle to balance summative exams with providing a full education—it has been shown that these tests do reliably show exactly how a student has learned by collecting appropriate data, both to mark how individuals have grown from the beginning of their education, and to show how they retained that knowledge until the end (formative and summative assessments, respectively). It also showed that working with both of these methods of assessment together, to monitor both the students’ progress and the schools’ competence at teaching them, could be a very powerful tool. Standardized tests are an open door of opportunity: they provide valuable feedback for educators and people in positions of administrative influence, as well as for the students taking them. With this knowledge, testers (be they teachers or national SAT makers) can know an individual student’s progress as he or she learns, and can better guide that student through difficult times in their education; that student, in turn, helps to refine the curriculum for all students, nation-wide, to boost test scores and encourage learning to prepare for the future confidently.