Posted on

What Research Shows About the Functional Relevance of Standardized Language Tests

As an SLP who routinely conducts speech and language assessments in several settings (e.g., school and private practice), I understand the utility of and the need for standardized speech, language, and literacy tests.  However, as an SLP who works with children with dramatically varying degree of cognition, abilities, and skill-sets, I also highly value supplementing these standardized tests with functional and dynamic assessments, interactions, and observations.

Since a significant value is placed on standardized testing by both schools and insurance companies for the purposes of service provision and reimbursement, I wanted to summarize in today’s post the findings of recent articles on this topic.  Since my primary interest lies in assessing and treating school-age children, for the purposes of today’s post all of the reviewed articles came directly from the Language Speech and Hearing Services in Schools  (LSHSS) journal.

We’ve all been there. We’ve all had situations in which students scored on the low end of normal, or had a few subtest scores in the below average range, which equaled  an average total score.  We’ve all poured over eligibility requirements trying to figure out whether the student should receive therapy services given the stringent standardized testing criteria in some states/districts.

Of course, as it turns out, the answer is never simple.  In 2006, Spaulding, Plante & Farinella set out to examine the assumption: “that children with language impairment will receive low scores on standardized tests, and therefore [those] low scores will accurately identify these children” (61).   So they analyzed the data from 43 commercially available child language tests to identify whether evidence exists to support their use in identifying language impairment in children.

Turns out it did not!  Turns out due to the variation in psychometric properties of various tests (see article for specific details), many children with language impairment are overlooked by standardized tests by receiving scores within the average range or not receiving low enough scores to qualify for services. Thus, “the clinical consequence is that a child who truly has a language impairment has a roughly equal chance of being correctly or incorrectly identified, depending on the test that he or she is given.” Furthermore, “even if a child is diagnosed accurately as language impaired at one point in time, future diagnoses may lead to the false perception that the child has recovered, depending on the test(s) that he or she has been given (69).”

Consequently, they created a decision tree (see below) with recommendations for clinicians using standardized testing. They recommend using alternate sources of data (sensitivity and specificity rates) to support accurate identification (available for a small subset of select tests).

The idea behind it is: “if sensitivity and specificity data are strong, and these data were derived from subjects who are comparable to the child tested, then the clinician can be relatively confident in relying on the test score data to aid his or her diagnostic decision. However, if the data are weak, then more caution is warranted and other sources of information on the child’s status might have primacy in making a diagnosis (70).”

Fast forward 6 years, and a number of newly revised tests later,  in 2012, Spaulding and colleagues set out to “identify various U.S. state education departments’ criteria for determining the severity of language impairment in children, with particular focus on the use of norm-referenced tests” as well as to “determine if norm-referenced tests of child language were developed for the purpose of identifying the severity of children’s language impairment”  (176).

They obtained published procedures for severity determinations from available U.S. state education departments, which specified the use of norm-referenced tests, and reviewed the manuals for 45 norm-referenced tests of child language to determine if each test was designed to identify the degree of a child’s language impairment.

What they found out was “the degree of use and cutoff-point criteria for severity determination varied across states. No cutoff-point criteria aligned with the severity cutoff points described within the test manuals. Furthermore, tests that included severity information lacked empirical data on how the severity categories were derived (176).”

Thus they urged SLPs to exercise caution in determining the severity of children’s language impairment via norm-referenced test performance “given the inconsistency in guidelines and lack of empirical data within test manuals to support this use (176)”.

Following the publication of this article, Ireland, Hall-Mills & Millikin issued a response to the  Spaulding and colleagues article. They pointed out that the “severity of language impairment is only one piece of information considered by a team for the determination of eligibility for special education and related services”.  They noted that  they left out a host of federal and state guideline requirements and “did not provide an analysis of the regulations governing special education evaluation and criteria for determining eligibility (320).” They pointed out that “IDEA prohibits the use of ‘any single measure or assessment as the sole criterion’ for determination of disability  and requires that IEP teams ‘draw upon information from a variety of sources.”

They listed a variety of examples from several different state departments of education (FL, NC, VA, etc.), which mandate the use of functional assessments, dynamic assessments criterion-referenced assessments, etc. for their determination of language therapy eligibility.

But are the SLPs from across the country appropriately using the federal and state guidelines in order to determine eligibility? While one should certainly hope so, it does not always seem to be the case.  To illustrate, in 2012, Betz & colleagues asked 364 SLPs to complete a survey “regarding how frequently they used specific standardized tests when diagnosing suspected specific language impairment (SLI) (133).”

Their purpose was to determine “whether the quality of standardized tests, as measured by the test’s psychometric properties, is related to how frequently the tests are used in clinical practice” (133).

What they found out was that the most frequently used tests were the comprehensive assessments including the Clinical Evaluation of Language Fundamentals and the Preschool Language Scale as well as one word vocabulary tests such as the Peabody Picture Vocabulary Test. Furthermore, the date of publication seemed to be the only factor which affected the frequency of test selection.

They also found out that frequently SLPs did not follow up the comprehensive standardized testing with domain specific assessments (critical thinking, social communication, etc.) but instead used the vocabulary testing as a second measure.  They were understandably puzzled by that finding. “The emphasis placed on vocabulary measures is intriguing because although vocabulary is often a weakness in children with SLI (e.g., Stothard et al., 1998), the research to date does not show vocabulary to be more impaired than other language domains in children with SLI (140).

According to the authors, “perhaps the most discouraging finding of this study was the lack of a correlation between frequency of test use and test accuracy, measured both in terms of sensitivity/specificity and mean difference scores (141).”

If since the time (2012) SLPs have not significantly change their practices, the above is certainly disheartening, as it implies that rather than being true diagnosticians, SLPs are using whatever is at hand that has been purchased by their department to indiscriminately assess students with suspected speech language disorders. If that is truly the case, it certainly places into question the Ireland, Hall-Mills & Millikin’s response to Spaulding and colleagues.  In other words, though SLPs are aware that they need to comply with state and federal regulations when it comes to unbiased and targeted assessments of children with suspected language disorders, they may not actually be using appropriate standardized testing much less supplementary informal assessments (e.g., dynamic, narrative, language sampling) in order to administer well-rounded assessments.  

So where do we go from here? Well, it’s quite simple really!   We already know what the problem is. Based on the above articles we know that:

  1. Standardized tests possess significant limitations
  2. They are not used with optimal effectiveness by many SLPs
  3.  They may not be frequently supplemented by relevant and targeted informal assessment measures in order to improve the accuracy of disorder determination and subsequent therapy eligibility

Now that we have identified a problem, we need to develop and consistently implement effective practices to ameliorate it.  These include researching psychometric properties of tests to review sample size, sensitivity and specificity, etc, use domain specific assessments to supplement administration of comprehensive testing, as well as supplement standardized testing with a plethora of functional assessments.

SLPs can review testing manuals and consult with colleagues when they feel that the standardized testing is underidentifying students with language impairments (e.g., HERE and HERE).  They can utilize referral checklists (e.g., HERE) in order to pinpoint the students’ most significant difficulties. Finally, they can develop and consistently implement informal assessment practices (e.g., HERE and HERE) during testing in order to gain a better grasp on their students’ TRUE linguistic functioning.

Stay tuned for the second portion of this post entitled: “What Research Shows About the Functional Relevance of Standardized Speech Tests?” to find out the best practices in the assessment of speech sound disorders in children.


  1. Spaulding, Plante & Farinella (2006) Eligibility Criteria for Language Impairment: Is the Low End of Normal Always Appropriate?
  2. Spaulding, Szulga, & Figueria (2012) Using Norm-Referenced Tests to Determine Severity of Language Impairment in Children: Disconnect Between U.S. Policy Makers and Test Developers
  3. Ireland, Hall-Mills & Millikin (2012) Appropriate Implementation of Severity Ratings, Regulations, and State Guidance: A Response to “Using Norm-Referenced Tests to Determine Severity of Language Impairment in Children: Disconnect Between U.S. Policy Makers and Test Developers” by Spaulding, Szulga, & Figueria (2012)
  4. Betz et al. (2013) Factors Influencing the Selection of Standardized Tests for the Diagnosis of Specific Language Impairment


4 thoughts on “What Research Shows About the Functional Relevance of Standardized Language Tests

  1. […] that indicates that there is no such thing as a “perfect standardized test” (see HERE for more information).   All standardized tests have their […]

  2. […] age of the children, it is important for examiners to exercise significant caution when it comes to interpretation of standardized testing results. It is a well-documented fact that standardized tests present with numerous limitations when it […]

  3. […] communication, etc. in my experience with this age group, frequently, the informal assessments (vs. the standardized tests), which do a far better job of teasing out language difficulties in […]

  4. […] testing was administered to Len in the first place. A few years ago I wrote a post entitled: “What Research Shows About the Functional Relevance of Standardized Language Tests“.  What researchers found is that there is a “lack of a correlation between frequency […]

Leave a Reply