Several years ago I began blogging on the subject of independent assessments in speech pathology. First, I wrote a post entitled “Special Education Disputes and Comprehensive Language Testing: What Parents, Attorneys, and Advocates Need to Know“, in which I used 4 different scenarios to illustrate the importance of comprehensive language evaluations for children with subtle language and learning needs. Then I wrote about: “What Makes an Independent Speech-Language-Literacy Evaluation a GOOD Evaluation?” in order to elucidate on what actually constitutes a good independent comprehensive assessment. Continue reading Neuropsychological or Language/Literacy: Which Assessment is Right for My Child?
Over the years this blog has amassed many posts on a variety of topics pertaining to the assessment and treatment in speech-language pathology. With over 300 posts and over 130 search categories it’s no wonder that some of you have reached out to ask about effective ways of finding relevant information quickly. As such, in addition to the existing categories pertaining to specific topics (e.g., writing, social communication, etc.) I have created two specific categories which were asked about by numerous blog subscribers in recent emails. Continue reading Helpful Smart Speech Therapy Site Searching Tips
Today due to popular demand I am reviewing the Clinical Assessment of Pragmatics (CAPs) for children and young adults ages 7 – 18, developed by the Lavi Institute and sold by WPS Publishing. Readers of this blog are familiar with the fact that I specialize in working with children diagnosed with psychiatric impairments and behavioral and emotional difficulties. They are also aware that I am constantly on the lookout for good quality social communication assessments due to a notorious dearth of good quality instruments in this area of language. Continue reading Test Review: Clinical Assessment of Pragmatics (CAPs)
Those of you who read my blog on a semi-regular basis, know that I spend a considerable amount of time in both of my work settings (an outpatient school located in a psychiatric hospital as well as private practice), conducting language and literacy evaluations of preschool and school-aged children 3-18 years of age. During that process, I spend a significant amount of time reviewing outside speech and language evaluations. Interestingly, what I have been seeing is that no matter what the child’s age is (7 or 17), invariably some form of receptive and/or expressive vocabulary testing is always mentioned in their language report. Continue reading On the Limitations of Using Vocabulary Tests with School-Aged Students
Here’s a familiar scenario to many SLPs. You’ve administered several standardized language tests to your student (e.g., CELF-5 & TILLS). You expected to see roughly similar scores across tests. Much to your surprise, you find that while your student attained somewhat average scores on one assessment, s/he had completely bombed the second assessment, and you have no idea why that happened.
So you go on social media and start crowdsourcing for information from a variety of SLPs located in a variety of states and countries in order to figure out what has happened and what you should do about this. Of course, the problem in such situations is that while some responses will be spot on, many will be utterly inappropriate. Luckily, the answer lies much closer than you think, in the actual technical manual of the administered tests.
So what is responsible for such as drastic discrepancy? A few things actually. For starters, unless both tests were co-normed (used the same sample of test takers) be prepared to see disparate scores due to the ability levels of children in the normative groups of each test. Another important factor involved in the score discrepancy is how accurately does the test differentiate disordered children from typical functioning ones.
Let’s compare two actual language tests to learn more. For the purpose of this exercise let us select The Clinical Evaluation of Language Fundamentals-5 (CELF-5) and the Test of Integrated Language and Literacy (TILLS). The former is a very familiar entity to numerous SLPs, while the latter is just coming into its own, having been released in the market only several years ago.
Both tests share a number of similarities. Both were created to assess the language abilities of children and adolescents with suspected language disorders. Both assess aspects of language and literacy (albeit not to the same degree nor with the same level of thoroughness). Both can be used for language disorder classification purposes, or can they?
Actually, my last statement is rather debatable. A careful perusal of the CELF – 5 reveals that its normative sample of 3000 children included a whopping 23% of children with language-related disabilities. In fact, the folks from the Leaders Project did such an excellent and thorough job reviewing its psychometric properties rather than repeating that information, the readers can simply click here to review the limitations of the CELF – 5 straight on the Leaders Project website. Furthermore, even the CELF – 5 developers themselves have stated that: “Based on CELF-5 sensitivity and specificity values, the optimal cut score to achieve the best balance is -1.33 (standard score of 80). Using a standard score of 80 as a cut score yields sensitivity and specificity values of .97. “
In other words, obtaining a standard score of 80 on the CELF – 5 indicates that a child presents with a language disorder. Of course, as many SLPs already know, the eligibility criteria in the schools requires language scores far below that in order for the student to qualify to receive language therapy services.
In fact, the test’s authors are fully aware of that and acknowledge that in the same document. “Keep in mind that students who have language deficits may not obtain scores that qualify him or her for placement based on the program’s criteria for eligibility. You’ll need to plan how to address the student’s needs within the framework established by your program.”
But here is another issue – the CELF-5 sensitivity group included only a very small number of: “67 children ranging from 5;0 to 15;11”, whose only requirement was to score 1.5SDs < mean “on any standardized language test”. As the Leaders Project reviewers point out: “This means that the 67 children in the sensitivity group could all have had severe disabilities. They might have multiple disabilities in addition to severe language disorders including severe intellectual disabilities or Autism Spectrum Disorder making it easy for a language disorder test to identify this group as having language disorders with extremely high accuracy. ” (pgs. 7-8)
Of course, this begs the question, why would anyone continue to administer any test to students, if its administration A. Does not guarantee disorder identification B. Will not make the student eligible for language therapy despite demonstrated need?
The problem is that even though SLPs are mandated to use a variety of quantitative clinical observations and procedures in order to reliably qualify students for services, standardized tests still carry more value then they should. Consequently, it is important for SLPs to select the right test to make their job easier.
The TILLS is a far less known assessment than the CELF-5 yet in the few years it has been out on the market it really made its presence felt by being a solid assessment tool due to its valid and reliable psychometric properties. Again, the venerable Dr. Carol Westby had already done such an excellent job reviewing its psychometric properties that I will refer the readers to her review here, rather than repeating this information as it will not add anything new on this topic. The upshot of her review as follows: “The TILLS does not include children and adolescents with language/literacy impairments (LLIs) in the norming sample. Since the 1990s, nearly all language assessments have included children with LLIs in the norming sample. Doing so lowers overall scores, making it more difficult to use the assessment to identify students with LLIs. (pg. 11)”
Now, here many proponents of inclusion of children with language disorders in the normative sample will make a variation of the following claim: “You CANNOT diagnose a language impairment if children with language impairment were not included in the normative sample of that assessment!” Here’s a major problem with such assertion. When a child is referred for a language assessment, we really have no way of knowing if this child has a language impairment until we actually finish testing them. We are in fact attempting to confirm or refute this fact, hopefully via the use of reliable and valid testing. However, if the normative sample includes many children with language and learning difficulties, this significantly affects the accuracy of our identification, since we are interested in comparing this child’s results to typically developing children and not the disordered ones, in order to learn if the child has a disorder in the first place. As per Peña, Spaulding and Plante (2006), “the inclusion of children with disabilities may be at odds with the goal of classification, typically the primary function of the speech pathologist’s assessment. In fact, by including such children in the normative sample, we may be “shooting ourselves in the foot” in terms of testing for the purpose of identifying disorders.”(p. 248)
Then there’s a variation of this assertion, which I have seen in several Facebook groups: “Children with language disorders score at the low end of normal distribution“. Once again such assertion is incorrect since Spaulding, Plante & Farinella (2006) have actually shown that on average, these kids will score at least 1.28 SDs below the mean, which is not the low average range of normal distribution by any means. As per authors: “Specific data supporting the application of “low score” criteria for the identification of language impairment is not supported by the majority of current commercially available tests. However, alternate sources of data (sensitivity and specificity rates) that support accurate identification are available for a subset of the available tests.” (p. 61)
Now, let us get back to your child in question, who performed so differently on both of the administered tests. Given his clinically observed difficulties, you fully expected your testing to confirm it. But you are now more confused than before. Don’t be! Search the technical manual for information on the particular test’s sensitivity and specificity to look up the numbers. Vance and Plante (1994) put forth the following criteria for accurate identification of a disorder (discriminant accuracy): “90% should be considered good discriminant accuracy; 80% to 89% should be considered fair. Below 80%, misidentifications occur at unacceptably high rates” and leading to “serious social consequences” of misidentified children. (p. 21)
Review the sensitivity and specificity of your test/s, take a look at the normative samples, see if anything unusual jumps out at you, which leads you to believe that the administered test may have some issues with assessing what it purports to assess. Then, after supplementing your standardized testing results with good quality clinical data (e.g., narrative samples, dynamic assessment tasks, etc.), consider creating a solidly referenced purchasing pitch to your administration to invest in more valid and reliable standardized tests.
Hope you find this information helpful in your quest to better serve the clients on your caseload. If you are interested in learning more regarding evidence-based assessment practices as well as psychometric properties of various standardized speech-language tests visit the SLPs for Evidence-Based Practice group on Facebook learn more.
- Peña ED, Spaulding TJ, and Plante E. ( 2006) The composition of normative groups and diagnostic decision-making: Shooting ourselves in the foot. American Journal of Speech-Language Pathology 15: 247–54.
- Spaulding, T. J., Plante, E., & Farinella, K. A. (2006). Eligibility criteria for language impairment: Is the low end of normal always appropriate? Language, Speech, and Hearing Services in Schools, 37, 61-72.
- Vance, R., & Plante, E. (1994). Selection of preschool language tests: A data-based approach. Language, Speech, and Hearing Services in Schools, 25, 15-24.
Today I am reviewing a new receptive vocabulary measure for students 7-17 years of age, entitled the Test of Semantic Reasoning (TOSR) created by Beth Lawrence, MA, CCC-SLP and Deena Seifert, MS, CCC-SLP, available via Academic Therapy Publications.
The TOSR assesses the student’s semantic reasoning skills or the ability to nonverbally identify vocabulary via image analysis and retrieve it from one’s lexicon.
According to the authors, the TOSR assesses “breadth (the number of lexical entries one has) and depth (the extent of semantic representation for each known word) of vocabulary knowledge without taxing expressive language skills”.
The test was normed on 1117 students ranging from 7 through 17 years of age with the norming sample including such diagnoses as learning disabilities, language impairments, ADHD, and autism. This fact is important because the manual did indicate how the above students were identified. According to Peña, Spaulding and Plante (2006), the inclusion of children with disabilities in the normative sample can negatively affect the test’s discriminant accuracy (separate typically developing from disordered children) by lowering the mean score, which may limit the test’s ability to diagnose children with mild disabilities.
TOSR administration takes approximately 20 minutes or so, although it can take a little longer or shorter depending on the child’s level of knowledge. It is relatively straightforward. You start at the age-based point and then calculate a basal and a ceiling. For a basal rule, if the child missed any of the first 3 items, the examiner must go backward until the child retains 3 correct responses in a row. To attain a ceiling, test administration can be discontinued after the student makes 6 out of 8 incorrect responses.
Test administration is as follows. Students are presented with 4 images and told 4 words which accompany the images. The examiner asks the question: “Which word goes with all four pictures? The words are…“
According to the authors, this assessment can provide “information on children and adolescents basic receptive vocabulary knowledge, as well as their higher order thinking and reasoning in the semantic domain.”
During the time I had this test I’ve administered it to 6 students on my caseload with documented history of language disorders and learning disabilities. Interestingly all students with the exception of one had passed it with flying colors. 4 out of 6 received standard scores solidly in the average range of functioning including a recently added to the caseload student with significant word-finding deficits. Another student with moderate intellectual disability scored in the low average range (18th percentile). Finally, my last student scored very poorly (1st%); however, in addition to being a multicultural speaker he also had a significant language disorder. He was actually tested for a purpose of a comparison with the others to see what it takes not to pass the test if you will.
I was surprised to see several children with documented vocabulary knowledge deficits to pass this test. Furthermore, when I informally used the test and asked them to identify select vocabulary words expressively or in sentences, very few of the children could actually accomplish these tasks successfully. As such it is important for clinicians to be aware of the above finding since receptive knowledge given multiple choices of responses does not constitute spontaneous word retrieval.
Consequently, I caution SLPs from using the TOSR as an isolated vocabulary measure to qualify/disqualify children for services, and encourage them to add an informal expressive administration of this measure in words in sentences to get further informal information regarding their students’ expressive knowledge base.
I also caution test administration to Culturally and Linguistically Diverse (CLD) students (who are being tested for the first time vs. retesting of CLD students with confirmed language disorders) due to increased potential for linguistic and cultural bias, which may result in test answers being marked incorrect due lack of relevant receptive vocabulary knowledge (in the absence of actual disorder).
I think that SLPs can use this test as a replacement for the Receptive One-Word Picture Vocabulary Test-4 (ROWPVT-4) effectively, as it does provide them with more information regarding the student’s reasoning and receptive vocabulary abilities. I think this test may be helpful to use with children with word-finding deficits in order to tease out a lack of knowledge vs. a retrieval issue.
You can find this assessment for purchase on the ATP website HERE. Finally, due to the generosity of one of its creators, Deena Seifert, MS, CCC-SLP, you can enter my Rafflecopter giveaway below for a chance to win your own copy!
Disclaimer: I did receive a complimentary copy of this assessment for review from the publisher. Furthermore, the test creators will be mailing a copy of the test to one Rafflecopter winner. However, all the opinions expressed in this post are my own and are not influenced by the publisher or test developers.
Peña ED, Spaulding TJ, and Plante E. ( 2006) The composition of normative groups and diagnostic decision-making: Shooting ourselves in the foot. American Journal of Speech-Language Pathology 15: 247–54
When many of us think of such labels as “language disorder” or “learning disability”, very infrequently do adolescents (students 13-18 years of age) come to mind. Even today, much of the research in the field of pediatric speech pathology involves preschool and school-aged children under 12 years of age.
The prevalence and incidence of language disorders in adolescents is very difficult to estimate due to which some authors even referred to them as a Neglected Group with Significant Problems having an “invisible disability“.
Far fewer speech language therapists work with middle-schoolers vs. preschoolers and elementary aged kids, while the numbers of SLPs working with high-school aged students is frequently in single digits in some districts while being completely absent in others. In fact, I am frequently told (and often see it firsthand) that some administrators try to cut costs by attempting to dictate a discontinuation of speech-language services on the grounds that adolescents “are far too old for services” or can “no longer benefit from services”.
But of course the above is blatantly false. Undetected language deficits don’t resolve with age! They simply exacerbate and turn into learning disabilities. Similarly, lack of necessary and appropriate service provision to children with diagnosed language impairments at the middle-school and high-school levels will strongly affect their academic functioning and hinder their future vocational outcomes.
A cursory look at the Speech Pathology Related Facebook Groups as well as ASHA forums reveals numerous SLPs in a continual search for best methods of assessment and treatment of older students (~12-18 years of age).
Consequently, today I wanted to dedicate this post to a review of standardized assessments options available for students 12-18 years of age with suspected language and literacy deficits.
Most comprehensive standardized assessments, “typically focus on semantics, syntax, morphology, and phonology, as these are the performance areas in which specific skill development can be most objectively measured” (Hill & Coufal, 2005, p 35). Very few of them actually incorporate aspects of literacy into its subtests in a meaningful way. Yet by the time students reach adolescence literacy begins to play an incredibly critical role not just in all the aspects of academics but also social communication.
So when it comes to comprehensive general language testing I highly recommended that SLPs select standardized measures with a focus on not language but also literacy. Presently of all the comprehensive assessment tools I highly prefer the Test of Integrated Language and Literacy (TILLS) for students up to 18 years of age, (see a comprehensive review HERE), which covers such literacy areas as phonological awareness, reading fluency, reading comprehension, writing and spelling in addition to traditional language areas as as vocabulary awareness, following directions, story recall, etc. However, while comprehensive tests have numerous uses, their sole administration will not constitute an adequate assessment.
So what areas should be assessed during language and literacy testing? Below are a few suggestions of standardized testing measures (and informal procedures) aimed at exploring the student abilities in particular areas pertaining to language and literacy.
TESTS OF LANGUAGE
- Listening Comprehension (for stories not just sentences)
- Comprehension of Ambiguous and Figurative Language (e.g., idioms, ambiguous expressions, etc.)
- Clinical Evaluation of Language Fundamentals -5 Metalinguistics (up to 22 years of age)
- Semantic Flexibility (e.g., generation of definitions, synonyms, antonyms, multiple meaning words, etc.)
- WORD Test 2 Adolescent (up to 18 years of age)
- Can be supplemented with informal narrative assessment to determine if the student coherently and cohesively summarize expository or narrative texts
- WORD Test 2 Adolescent (up to 18 years of age)
- Critical Thinking and Problem Solving
- TOPS-2 Adolescent Test of Problem Solving-2 (up to 18 years of age)
- Social Communication
- Social Language Development Test Adolescent (up to 18 years of age)
- Can be supplemented with informal assessment of social communication Informal Social Thinking Dynamic Assessment Protocol®)
- Social Language Development Test Adolescent (up to 18 years of age)
- Executive Function
- Informal use of Situational Awareness STOP Observation Tool (Ward & Jacobsen, 2014)
- Phonological Awareness
- Comprehensive Test of Phonological Processing-2 (CTOPP-2) (up to 25 years of age)
- Word Fluency
- Rapid Automatized Naming/Rapid Alternating Stimulus RAN/RAS (up to 18 years of age)
- Reading Fluency
- Reading Comprehension
- TOWL-4: Test of Written Language–Fourth Edition (up to 18 years of age)
- Can be informally supplemented with the use of Grade Rubrics addressing Persuasive/Expository Texts
- TOWL-4: Test of Written Language–Fourth Edition (up to 18 years of age)
It is understandable how given the sheer amount of assessment choices some clinicians may feel overwhelmed and be unsure regarding the starting point of an adolescent evaluation. Consequently, the use the checklist prior to the initiation of assessment may be highly useful in order to identify potential language weaknesses/deficits the students might experience. It will also allow clinicians to prioritize the hierarchy of testing instruments to use during the assessment.
While clinicians are encouraged to develop such checklists for their personal use, those who lack time and opportunity can locate a number of already available checklists on the market.
For example, the comprehensive 6-page Speech Language Assessment Checklist for Adolescents (below) can be given to caregivers, classroom teachers, and even older students in order to check off the most pressing difficulties the student is experiencing in an academic setting.
It is important for several individuals to fill out this checklist to ensure consistency of deficits, prior to determining whether an assessment is warranted in the first place and if so, which assessment areas need to be targeted.
- Receptive Language
- Memory, Attention and Cognition
- Expressive Language
- Problem Solving
- Pragmatic Language Skills
- Social Emotional Development
- Executive Functioning
Based on the checklist administration SLPs can reliably pinpoint the student’s areas of deficits without needless administration of unrelated/unnecessary testing instruments. For example, if a student presents with deficits in the areas of problem solving and social pragmatic functioning the administration of a general language test such as the Clinical Evaluation of Language Fundamentals® – Fifth Edition (CELF-5) would NOT be functional (especially if the previous administration of educational testing did not reveal any red flags). In contrast, the administration of such tests as Test Of Problem Solving 2 Adolescent and Social Language Development Test Adolescent would be better reflective of the student’s deficits in the above areas. (Checklist HERE; checklist sample HERE).
It is very important to understand that students presenting with language and literacy deficits will not outgrow these deficits on their own. While there may be “a time period when the students with early language disorders seem to catch up with their typically developing peers” (e.g., illusory recovery) by undergoing a “spurt” in language learning”(Sun & Wallach, 2014). These spurts are typically followed by a “post-spurt plateau”. This is because due to the ongoing challenges and an increase in academic demands “many children with early language disorders fail to “outgrow” these difficulties or catch up with their typically developing peers”(Sun & Wallach, 2014). As such many adolescents “may not show academic or language-related learning difficulties until linguistic and cognitive demands of the task increase and exceed their limited abilities” (Sun & Wallach, 2014). Consequently, SLPs must consider the “underlying deficits that may be masked by early oral language development” and “evaluate a child’s language abilities in all modalities, including pre-literacy, literacy, and metalinguistic skills” (Sun & Wallach, 2014).
- Hill, J. W., & Coufal, K. L. (2005). Emotional/behavioral disorders: A retrospective examination of social skills, linguistics, and student outcomes. Communication Disorders Quarterly, 27(1), 33–46.
- Sun, L & Wallach G (2014) Language Disorders Are Learning Disabilities: Challenges on the Divergent and Diverse Paths to Language Learning Disability. Topics in Language Disorders, Vol. 34; (1), pp 25–38.
Helpful Smart Speech Therapy Resources
- Assessment of Adolescents with Language and Literacy Impairments in Speech Language Pathology
- Assessment and Treatment Bundles
- Social Communication Materials
- Multicultural Materials
The Test of Integrated Language & Literacy Skills (TILLS) is an assessment of oral and written language abilities in students 6–18 years of age. Published in the Fall 2015, it is unique in the way that it is aimed to thoroughly assess skills such as reading fluency, reading comprehension, phonological awareness, spelling, as well as writing in school age children. As I have been using this test since the time it was published, I wanted to take an opportunity today to share just a few of my impressions of this assessment.
First, a little background on why I chose to purchase this test so shortly after I had purchased the Clinical Evaluation of Language Fundamentals – 5 (CELF-5). Soon after I started using the CELF-5 I noticed that it tended to considerably overinflate my students’ scores on a variety of its subtests. In fact, I noticed that unless a student had a fairly severe degree of impairment, the majority of his/her scores came out either low/slightly below average (click for more info on why this was happening HERE, HERE, or HERE). Consequently, I was excited to hear regarding TILLS development, almost simultaneously through ASHA as well as SPELL-Links ListServe. I was particularly happy because I knew some of this test’s developers (e.g., Dr. Elena Plante, Dr. Nickola Nelson) have published solid research in the areas of psychometrics and literacy respectively.
According to the TILLS developers it has been standardized for 3 purposes:
- to identify language and literacy disorders
- to document patterns of relative strengths and weaknesses
- to track changes in language and literacy skills over time
The testing subtests can be administered in isolation (with the exception of a few) or in its entirety. The administration of all the 15 subtests may take approximately an hour and a half, while the administration of the core subtests typically takes ~45 mins).
Please note that there are 5 subtests that should not be administered to students 6;0-6;5 years of age because many typically developing students are still mastering the required skills.
- Subtest 5 – Nonword Spelling
- Subtest 7 – Reading Comprehension
- Subtest 10 – Nonword Reading
- Subtest 11 – Reading Fluency
- Subtest 12 – Written Expression
However, if needed, there are several tests of early reading and writing abilities which are available for assessment of children under 6:5 years of age with suspected literacy deficits (e.g., TERA-3: Test of Early Reading Ability–Third Edition; Test of Early Written Language, Third Edition-TEWL-3, etc.).
Let’s move on to take a deeper look at its subtests. Please note that for the purposes of this review all images came directly from and are the property of Brookes Publishing Co (clicking on each of the below images will take you directly to their source).
1. Vocabulary Awareness (VA) (description above) requires students to display considerable linguistic and cognitive flexibility in order to earn an average score. It works great in teasing out students with weak vocabulary knowledge and use, as well as students who are unable to quickly and effectively analyze words for deeper meaning and come up with effective definitions of all possible word associations. Be mindful of the fact that even though the words are presented to the students in written format in the stimulus book, the examiner is still expected to read all the words to the students. Consequently, students with good vocabulary knowledge and strong oral language abilities can still pass this subtest despite the presence of significant reading weaknesses. Recommendation: I suggest informally checking the student’s word reading abilities by asking them to read of all the words, before reading all the word choices to them. This way you can informally document any word misreadings made by the student even in the presence of an average subtest score.
2. The Phonemic Awareness (PA) subtest (description above) requires students to isolate and delete initial sounds in words of increasing complexity. While this subtest does not require sound isolation and deletion in various word positions, similar to tests such as the CTOPP-2: Comprehensive Test of Phonological Processing–Second Edition or the The Phonological Awareness Test 2 (PAT 2), it is still a highly useful and reliable measure of phonemic awareness (as one of many precursors to reading fluency success). This is especially because after the initial directions are given, the student is expected to remember to isolate the initial sounds in words without any prompting from the examiner. Thus, this task also indirectly tests the students’ executive function abilities in addition to their phonemic awareness skills.
3. The Story Retelling (SR) subtest (description above) requires students to do just that retell a story. Be mindful of the fact that the presented stories have reduced complexity. Thus, unless the students possess significant retelling deficits, the above subtest may not capture their true retelling abilities. Recommendation: Consider supplementing this subtest with informal narrative measures. For younger children (kindergarten and first grade) I recommend using wordless picture books to perform a dynamic assessment of their retelling abilities following a clinician’s narrative model (e.g., HERE). For early elementary aged children (grades 2 and up), I recommend using picture books, which are first read to and then retold by the students with the benefit of pictorial but not written support. Finally, for upper elementary aged children (grades 4 and up), it may be helpful for the students to retell a book or a movie seen recently (or liked significantly) by them without the benefit of visual support all together (e.g., HERE).
4. The Nonword Repetition (NR) subtest (description above) requires students to repeat nonsense words of increasing length and complexity. Weaknesses in the area of nonword repetition have consistently been associated with language impairments and learning disabilities due to the task’s heavy reliance on phonological segmentation as well as phonological and lexical knowledge (Leclercq, Maillart, Majerus, 2013). Thus, both monolingual and simultaneously bilingual children with language and literacy impairments will be observed to present with patterns of segment substitutions (subtle substitutions of sounds and syllables in presented nonsense words) as well as segment deletions of nonword sequences more than 2-3 or 3-4 syllables in length (depending on the child’s age).
5. The Nonword Spelling (NS) subtest (description above) requires the students to spell nonwords from the Nonword Repetition (NR) subtest. Consequently, the Nonword Repetition (NR) subtest needs to be administered prior to the administration of this subtest in the same assessment session. In contrast to the real-word spelling tasks, students cannot memorize the spelling of the presented words, which are still bound by orthographic and phonotactic constraints of the English language. While this is a highly useful subtest, is important to note that simultaneously bilingual children may present with decreased scores due to vowel errors. Consequently, it is important to analyze subtest results in order to determine whether dialectal differences rather than a presence of an actual disorder is responsible for the error patterns.
6. The Listening Comprehension (LC) subtest (description above) requires the students to listen to short stories and then definitively answer story questions via available answer choices, which include: “Yes”, “No’, and “Maybe”. This subtest also indirectly measures the students’ metalinguistic awareness skills as they are needed to detect when the text does not provide sufficient information to answer a particular question definitively (e.g., “Maybe” response may be called for). Be mindful of the fact that because the students are not expected to provide sentential responses to questions it may be important to supplement subtest administration with another listening comprehension assessment. Tests such as the Listening Comprehension Test-2 (LCT-2), the Listening Comprehension Test-Adolescent (LCT-A), or the Executive Function Test-Elementary (EFT-E) may be useful if language processing and listening comprehension deficits are suspected or reported by parents or teachers. This is particularly important to do with students who may be ‘good guessers’ but who are also reported to present with word-finding difficulties at sentence and discourse levels.
7. The Reading Comprehension (RC) subtest (description above) requires the students to read short story and answer story questions in “Yes”, “No’, and “Maybe” format. This subtest is not stand alone and must be administered immediately following the administration the Listening Comprehension subtest. The student is asked to read the first story out loud in order to determine whether s/he can proceed with taking this subtest or discontinue due to being an emergent reader. The criterion for administration of the subtest is making 7 errors during the reading of the first story and its accompanying questions. Unfortunately, in my clinical experience this subtest is not always accurate at identifying children with reading-based deficits.
While I find it terrific for students with severe-profound reading deficits and/or below average IQ, a number of my students with average IQ and moderately impaired reading skills managed to pass it via a combination of guessing and luck despite being observed to misread aloud between 40-60% of the presented words. Be mindful of the fact that typically such students may have up to 5-6 errors during the reading of the first story. Thus, according to administration guidelines these students will be allowed to proceed and take this subtest. They will then continue to make text misreadings during each story presentation (you will know that by asking them to read each story aloud vs. silently). However, because the response mode is in definitive (“Yes”, “No’, and “Maybe”) vs. open ended question format, a number of these students will earn average scores by being successful guessers. Recommendation: I highly recommend supplementing the administration of this subtest with grade level (or below grade level) texts (see HERE and/or HERE), to assess the student’s reading comprehension informally.
I present a full one page text to the students and ask them to read it to me in its entirety. I audio/video record the student’s reading for further analysis (see Reading Fluency section below). After the completion of the story I ask the student questions with a focus on main idea comprehension and vocabulary definitions. I also ask questions pertaining to story details. Depending on the student’s age I may ask them abstract/ factual text questions with and without text access. Overall, I find that informal administration of grade level (or even below grade-level) texts coupled with the administration of standardized reading tests provides me with a significantly better understanding of the student’s reading comprehension abilities rather than administration of standardized reading tests alone.
8. The Following Directions (FD) subtest (description above) measures the student’s ability to execute directions of increasing length and complexity. It measures the student’s short-term, immediate and working memory, as well as their language comprehension. What is interesting about the administration of this subtest is that the graphic symbols (e.g., objects, shapes, letter and numbers etc.) the student is asked to modify remain covered as the instructions are given (to prevent visual rehearsal). After being presented with the oral instruction the students are expected to move the card covering the stimuli and then to executive the visual-spatial, directional, sequential, and logical if–then the instructions by marking them on the response form. The fact that the visual stimuli remains covered until the last moment increases the demands on the student’s memory and comprehension. The subtest was created to simulate teacher’s use of procedural language (giving directions) in classroom setting (as per developers).
9. The Delayed Story Retelling (DSR) subtest (description above) needs to be administered to the students during the same session as the Story Retelling (SR) subtest, approximately 20 minutes after the SR subtest administration. Despite the relatively short passage of time between both subtests, it is considered to be a measure of long-term memory as related to narrative retelling of reduced complexity. Here, the examiner can compare student’s performance to determine whether the student did better or worse on either of these measures (e.g., recalled more information after a period of time passed vs. immediately after being read the story). However, as mentioned previously, some students may recall this previously presented story fairly accurately and as a result may obtain an average score despite a history of teacher/parent reported long-term memory limitations. Consequently, it may be important for the examiner to supplement the administration of this subtest with a recall of a movie/book recently seen/read by the student (a few days ago) in order to compare both performances and note any weaknesses/limitations.
10. The Nonword Reading (NR) subtest (description above) requires students to decode nonsense words of increasing length and complexity. What I love about this subtest is that the students are unable to effectively guess words (as many tend to routinely do when presented with real words). Consequently, the presentation of this subtest will tease out which students have good letter/sound correspondence abilities as well as solid orthographic, morphological and phonological awareness skills and which ones only memorized sight words and are now having difficulty decoding unfamiliar words as a result.
11. The Reading Fluency (RF) subtest (description above) requires students to efficiently read facts which make up simple stories fluently and correctly. Here are the key to attaining an average score is accuracy and automaticity. In contrast to the previous subtest, the words are now presented in meaningful simple syntactic contexts.
It is important to note that the Reading Fluency subtest of the TILLS has a negatively skewed distribution. As per authors, “a large number of typically developing students do extremely well on this subtest and a much smaller number of students do quite poorly.”
Thus, “the mean is to the left of the mode” (see publisher’s image below). This is why a student could earn an average standard score (near the mean) and a low percentile rank when true percentiles are used rather than NCE percentiles (Normal Curve Equivalent).
Consequently under certain conditions (See HERE) the percentile rank (vs. the NCE percentile) will be a more accurate representation of the student’s ability on this subtest.
Indeed, due to the reduced complexity of the presented words some students (especially younger elementary aged) may obtain average scores and still present with serious reading fluency deficits.
I frequently see that in students with average IQ and go to long-term memory, who by second and third grades have managed to memorize an admirable number of sight words due to which their deficits in the areas of reading appeared to be minimized. Recommendation: If you suspect that your student belongs to the above category I highly recommend supplementing this subtest with an informal measure of reading fluency. This can be done by presenting to the student a grade level text (I find science and social studies texts particularly useful for this purpose) and asking them to read several paragraphs from it (see HERE and/or HERE).
As the students are reading I calculate their reading fluency by counting the number of words they read per minute. I find it very useful as it allows me to better understand their reading profile (e.g, fast/inaccurate reader, slow/inaccurate reader, slow accurate reader, fast/accurate reader). As the student is reading I note their pauses, misreadings, word-attack skills and the like. Then, I write a summary comparing the students reading fluency on both standardized and informal assessment measures in order to document students strengths and limitations.
12. The Written Expression (WE) subtest (description above) needs to be administered to the students immediately after the administration of the Reading Fluency (RF) subtest because the student is expected to integrate a series of facts presented in the RF subtest into their writing sample. There are 4 stories in total for the 4 different age groups.
The examiner needs to show the student a different story which integrates simple facts into a coherent narrative. After the examiner reads that simple story to the students s/he is expected to tell the students that the story is okay, but “sounds kind of “choppy.” They then need to show the student an example of how they could put the facts together in a way that sounds more interesting and less choppy by combining sentences (see below). Finally, the examiner will ask the students to rewrite the story presented to them in a similar manner (e.g, “less choppy and more interesting.”)
After the student finishes his/her story, the examiner will analyze it and generate the following scores: a discourse score, a sentence score, and a word score. Detailed instructions as well as the Examiner’s Practice Workbook are provided to assist with scoring as it takes a bit of training as well as trial and error to complete it, especially if the examiners are not familiar with certain procedures (e.g., calculating T-units).
Full disclosure: Because the above subtest is still essentially sentence combining, I have only used this subtest a handful of times with my students. Typically when I’ve used it in the past, most of my students fell in two categories: those who failed it completely by either copying text word for word, failing to generate any written output etc. or those who passed it with flying colors but still presented with notable written output deficits. Consequently, I’ve replaced Written Expression subtest administration with the administration of written standardized tests, which I supplement with an informal grade level expository, persuasive, or narrative writing samples.
Having said that many clinicians may not have the access to other standardized written assessments, or lack the time to administer entire standardized written measures (which may frequently take between 60 to 90 minutes of administration time). Consequently, in the absence of other standardized writing assessments, this subtest can be effectively used to gauge the student’s basic writing abilities, and if needed effectively supplemented by informal writing measures (mentioned above).
13. The Social Communication (SC) subtest (description above) assesses the students’ ability to understand vocabulary associated with communicative intentions in social situations. It requires students to comprehend how people with certain characteristics might respond in social situations by formulating responses which fit the social contexts of those situations. Essentially students become actors who need to act out particular scenes while viewing select words presented to them.
Full disclosure: Similar to my infrequent administration of the Written Expression subtest, I have also administered this subtest very infrequently to students. Here is why.
I am an SLP who works full-time in a psychiatric hospital with children diagnosed with significant psychiatric impairments and concomitant language and literacy deficits. As a result, a significant portion of my job involves comprehensive social communication assessments to catalog my students’ significant deficits in this area. Yet, past administration of this subtest showed me that number of my students can pass this subtest quite easily despite presenting with notable and easily evidenced social communication deficits. Consequently, I prefer the administration of comprehensive social communication testing when working with children in my hospital based program or in my private practice, where I perform independent comprehensive evaluations of language and literacy (IEEs).
Again, as I’ve previously mentioned many clinicians may not have the access to other standardized social communication assessments, or lack the time to administer entire standardized written measures. Consequently, in the absence of other social communication assessments, this subtest can be used to get a baseline of the student’s basic social communication abilities, and then be supplemented with informal social communication measures such as the Informal Social Thinking Dynamic Assessment Protocol (ISTDAP) or observational social pragmatic checklists.
14. The Digit Span Forward (DSF) subtest (description above) is a relatively isolated measure of short term and verbal working memory ( it minimizes demands on other aspects of language such as syntax or vocabulary).
15. The Digit Span Backward (DSB) subtest (description above) assesses the student’s working memory and requires the student to mentally manipulate the presented stimuli in reverse order. It allows examiner to observe the strategies (e.g. verbal rehearsal, visual imagery, etc.) the students are using to aid themselves in the process. Please note that the Digit Span Forward subtest must be administered immediately before the administration of this subtest.
SLPs who have used tests such as the Clinical Evaluation of Language Fundamentals – 5 (CELF-5) or the Test of Auditory Processing Skills – Third Edition (TAPS-3) should be highly familiar with both subtests as they are fairly standard measures of certain aspects of memory across the board.
To continue, in addition to the presence of subtests which assess the students literacy abilities, the TILLS also possesses a number of interesting features.
For starters, the TILLS Easy Score, which allows the examiners to use their scoring online. It is incredibly easy and effective. After clicking on the link and filling out the preliminary demographic information, all the examiner needs to do is to plug in this subtest raw scores, the system does the rest. After the raw scores are plugged in, the system will generate a PDF document with all the data which includes (but is not limited to) standard scores, percentile ranks, as well as a variety of composite and core scores. The examiner can then save the PDF on their device (laptop, PC, tablet etc.) for further analysis.
The there is the quadrant model. According to the TILLS sampler (HERE) “it allows the examiners to assess and compare students’ language-literacy skills at the sound/word level and the sentence/ discourse level across the four oral and written modalities—listening, speaking, reading, and writing” and then create “meaningful profiles of oral and written language skills that will help you understand the strengths and needs of individual students and communicate about them in a meaningful way with teachers, parents, and students. (pg. 21)”
Then there is the Student Language Scale (SLS) which is a one page checklist parents, teachers (and even students) can fill out to informally identify language and literacy based strengths and weaknesses. It allows for meaningful input from multiple sources regarding the students performance (as per IDEA 2004) and can be used not just with TILLS but with other tests or in even isolation (as per developers).
Furthermore according to the developers, because the normative sample included several special needs populations, the TILLS can be used with students diagnosed with ASD, deaf or hard of hearing (see caveat), as well as intellectual disabilities (as long as they are functioning age 6 and above developmentally).
According to the developers the TILLS is aligned with Common Core Standards and can be administered as frequently as two times a year for progress monitoring (min of 6 mos post 1st administration).
With respect to bilingualism examiners can use it with caution with simultaneous English learners but not with sequential English learners (see further explanations HERE). Translations of TILLS are definitely not allowed as they will undermine test validity and reliability.
So there you have it these are just some of my very few impressions regarding this test. Now to some of you may notice that I spend a significant amount of time pointing out some of the tests limitations. However, it is very important to note that we have research that indicates that there is no such thing as a “perfect standardized test” (see HERE for more information). All standardized tests have their limitations.
Having said that, I think that TILLS is a PHENOMENAL addition to the standardized testing market, as it TRULY appears to assess not just language but also literacy abilities of the students on our caseloads.
That’s all from me; however, before signing off I’d like to provide you with more resources and information, which can be reviewed in reference to TILLS. For starters, take a look at Brookes Publishing TILLS resources. These include (but are not limited to) TILLS FAQ, TILLS Easy-Score, TILLS Correction Document, as well as 3 FREE TILLS Webinars. There’s also a Facebook Page dedicated exclusively to TILLS updates (HERE).
But that’s not all. Dr. Nelson and her colleagues have been tirelessly lecturing about the TILLS for a number of years, and many of their past lectures and presentations are available on the ASHA website as well as on the web (e.g., HERE, HERE, HERE, etc). Take a look at them as they contain far more in-depth information regarding the development and implementation of this groundbreaking assessment.
To access TILLS fully-editable template, click HERE
Disclaimer: I did not receive a complimentary copy of this assessment for review nor have I received any encouragement or compensation from either Brookes Publishing or any of the TILLS developers to write it. All images of this test are direct property of Brookes Publishing (when clicked on all the images direct the user to the Brookes Publishing website) and were used in this post for illustrative purposes only.
Leclercq A, Maillart C, Majerus S. (2013) Nonword repetition problems in children with SLI: A deficit in accessing long-term linguistic representations? Topics in Language Disorders. 33 (3) 238-254.
- Components of Comprehensive Dyslexia Testing: Part I- Introduction and Language Testing
- Part II: Components of Comprehensive Dyslexia Testing – Phonological Awareness and Word Fluency Assessment
- Part III: Components of Comprehensive Dyslexia Testing – Reading Fluency and Reading Comprehension
- Part IV: Components of Comprehensive Dyslexia Testing – Writing and Spelling
- Special Education Disputes and Comprehensive Language Testing: What Parents, Attorneys, and Advocates Need to Know
- Why (C) APD Diagnosis is NOT Valid!
- What Are Speech Pathologists To Do If the (C)APD Diagnosis is NOT Valid?
- What do Auditory Memory Deficits Indicate in the Presence of Average General Language Scores?
- Why Are My Child’s Test Scores Dropping?
- Comprehensive Assessment of Adolescents with Suspected Language and Literacy Disorders
As an SLP who routinely conducts speech and language assessments in several settings (e.g., school and private practice), I understand the utility of and the need for standardized speech, language, and literacy tests. However, as an SLP who works with children with dramatically varying degree of cognition, abilities, and skill-sets, I also highly value supplementing these standardized tests with functional and dynamic assessments, interactions, and observations.
Since a significant value is placed on standardized testing by both schools and insurance companies for the purposes of service provision and reimbursement, I wanted to summarize in today’s post the findings of recent articles on this topic. Since my primary interest lies in assessing and treating school-age children, for the purposes of today’s post all of the reviewed articles came directly from the Language Speech and Hearing Services in Schools (LSHSS) journal.
We’ve all been there. We’ve all had situations in which students scored on the low end of normal, or had a few subtest scores in the below average range, which equaled an average total score. We’ve all poured over eligibility requirements trying to figure out whether the student should receive therapy services given the stringent standardized testing criteria in some states/districts.
Of course, as it turns out, the answer is never simple. In 2006, Spaulding, Plante & Farinella set out to examine the assumption: “that children with language impairment will receive low scores on standardized tests, and therefore [those] low scores will accurately identify these children” (61). So they analyzed the data from 43 commercially available child language tests to identify whether evidence exists to support their use in identifying language impairment in children.
Turns out it did not! Turns out due to the variation in psychometric properties of various tests (see article for specific details), many children with language impairment are overlooked by standardized tests by receiving scores within the average range or not receiving low enough scores to qualify for services. Thus, “the clinical consequence is that a child who truly has a language impairment has a roughly equal chance of being correctly or incorrectly identified, depending on the test that he or she is given.” Furthermore, “even if a child is diagnosed accurately as language impaired at one point in time, future diagnoses may lead to the false perception that the child has recovered, depending on the test(s) that he or she has been given (69).”
Consequently, they created a decision tree (see below) with recommendations for clinicians using standardized testing. They recommend using alternate sources of data (sensitivity and specificity rates) to support accurate identification (available for a small subset of select tests).
The idea behind it is: “if sensitivity and specificity data are strong, and these data were derived from subjects who are comparable to the child tested, then the clinician can be relatively confident in relying on the test score data to aid his or her diagnostic decision. However, if the data are weak, then more caution is warranted and other sources of information on the child’s status might have primacy in making a diagnosis (70).”
Fast forward 6 years, and a number of newly revised tests later, in 2012, Spaulding and colleagues set out to “identify various U.S. state education departments’ criteria for determining the severity of language impairment in children, with particular focus on the use of norm-referenced tests” as well as to “determine if norm-referenced tests of child language were developed for the purpose of identifying the severity of children’s language impairment” (176).
They obtained published procedures for severity determinations from available U.S. state education departments, which specified the use of norm-referenced tests, and reviewed the manuals for 45 norm-referenced tests of child language to determine if each test was designed to identify the degree of a child’s language impairment.
What they found out was “the degree of use and cutoff-point criteria for severity determination varied across states. No cutoff-point criteria aligned with the severity cutoff points described within the test manuals. Furthermore, tests that included severity information lacked empirical data on how the severity categories were derived (176).”
Thus they urged SLPs to exercise caution in determining the severity of children’s language impairment via norm-referenced test performance “given the inconsistency in guidelines and lack of empirical data within test manuals to support this use (176)”.
Following the publication of this article, Ireland, Hall-Mills & Millikin issued a response to the Spaulding and colleagues article. They pointed out that the “severity of language impairment is only one piece of information considered by a team for the determination of eligibility for special education and related services”. They noted that they left out a host of federal and state guideline requirements and “did not provide an analysis of the regulations governing special education evaluation and criteria for determining eligibility (320).” They pointed out that “IDEA prohibits the use of ‘any single measure or assessment as the sole criterion’ for determination of disability and requires that IEP teams ‘draw upon information from a variety of sources.”
They listed a variety of examples from several different state departments of education (FL, NC, VA, etc.), which mandate the use of functional assessments, dynamic assessments criterion-referenced assessments, etc. for their determination of language therapy eligibility.
But are the SLPs from across the country appropriately using the federal and state guidelines in order to determine eligibility? While one should certainly hope so, it does not always seem to be the case. To illustrate, in 2012, Betz & colleagues asked 364 SLPs to complete a survey “regarding how frequently they used specific standardized tests when diagnosing suspected specific language impairment (SLI) (133).”
Their purpose was to determine “whether the quality of standardized tests, as measured by the test’s psychometric properties, is related to how frequently the tests are used in clinical practice” (133).
What they found out was that the most frequently used tests were the comprehensive assessments including the Clinical Evaluation of Language Fundamentals and the Preschool Language Scale as well as one word vocabulary tests such as the Peabody Picture Vocabulary Test. Furthermore, the date of publication seemed to be the only factor which affected the frequency of test selection.
They also found out that frequently SLPs did not follow up the comprehensive standardized testing with domain specific assessments (critical thinking, social communication, etc.) but instead used the vocabulary testing as a second measure. They were understandably puzzled by that finding. “The emphasis placed on vocabulary measures is intriguing because although vocabulary is often a weakness in children with SLI (e.g., Stothard et al., 1998), the research to date does not show vocabulary to be more impaired than other language domains in children with SLI (140).“
According to the authors, “perhaps the most discouraging finding of this study was the lack of a correlation between frequency of test use and test accuracy, measured both in terms of sensitivity/specificity and mean difference scores (141).”
If since the time (2012) SLPs have not significantly change their practices, the above is certainly disheartening, as it implies that rather than being true diagnosticians, SLPs are using whatever is at hand that has been purchased by their department to indiscriminately assess students with suspected speech language disorders. If that is truly the case, it certainly places into question the Ireland, Hall-Mills & Millikin’s response to Spaulding and colleagues. In other words, though SLPs are aware that they need to comply with state and federal regulations when it comes to unbiased and targeted assessments of children with suspected language disorders, they may not actually be using appropriate standardized testing much less supplementary informal assessments (e.g., dynamic, narrative, language sampling) in order to administer well-rounded assessments.
So where do we go from here? Well, it’s quite simple really! We already know what the problem is. Based on the above articles we know that:
- Standardized tests possess significant limitations
- They are not used with optimal effectiveness by many SLPs
- They may not be frequently supplemented by relevant and targeted informal assessment measures in order to improve the accuracy of disorder determination and subsequent therapy eligibility
Now that we have identified a problem, we need to develop and consistently implement effective practices to ameliorate it. These include researching psychometric properties of tests to review sample size, sensitivity and specificity, etc, use domain specific assessments to supplement administration of comprehensive testing, as well as supplement standardized testing with a plethora of functional assessments.
SLPs can review testing manuals and consult with colleagues when they feel that the standardized testing is underidentifying students with language impairments (e.g., HERE and HERE). They can utilize referral checklists (e.g., HERE) in order to pinpoint the students’ most significant difficulties. Finally, they can develop and consistently implement informal assessment practices (e.g., HERE and HERE) during testing in order to gain a better grasp on their students’ TRUE linguistic functioning.
Stay tuned for the second portion of this post entitled: “What Research Shows About the Functional Relevance of Standardized Speech Tests?” to find out the best practices in the assessment of speech sound disorders in children.
- Spaulding, Plante & Farinella (2006) Eligibility Criteria for Language Impairment: Is the Low End of Normal Always Appropriate?
- Spaulding, Szulga, & Figueria (2012) Using Norm-Referenced Tests to Determine Severity of Language Impairment in Children: Disconnect Between U.S. Policy Makers and Test Developers
- Ireland, Hall-Mills & Millikin (2012) Appropriate Implementation of Severity Ratings, Regulations, and State Guidance: A Response to “Using Norm-Referenced Tests to Determine Severity of Language Impairment in Children: Disconnect Between U.S. Policy Makers and Test Developers” by Spaulding, Szulga, & Figueria (2012)
- Betz et al. (2013) Factors Influencing the Selection of Standardized Tests for the Diagnosis of Specific Language Impairment
Today due to popular demand I am reviewing the The Test of Written Language-4 or TOWL-4. TOWL-4 assesses the basic writing readiness skills of students 9:00-17:11 years of age. The tests consists of two forms – A and B, (which contain different subtest content).
According to the manual, the entire test takes approximately between 60-90 minutes to administer and examines 7 skill areas. Only the “Story Composition” subtest is officially timed (the student is given 15 minutes to write it and 5 minutes previous to that, to draft it). However, in my experience each subtest administration, even with students presenting with mild-moderately impaired writing abilities, takes approximately 10 minutes to complete with average results (can you see where I am going with this yet?)
For detailed information regarding the TOWL-4 development and standardization, validity and reliability, please see HERE.
Below are my impressions (to date) of using this assessment with students between 11-14 years of age with (known) mild-moderate writing impairments.
1. Vocabulary – The student is asked to write a sentence that incorporates a stimulus word. E.g.: For ‘ran’, a student may write, “I ran up the hill.” The student is not allowed to change the word in any way, such as write ‘running’ instead of run’. If this occurs, an automatic loss of points takes place. Ceiling is reached when the student makes 3 errors in a row.
To continue, while some of the subtest vocabulary words are perfectly appropriate for younger children (~9), the majority are too simplistic to assess the written vocabulary of middle and high schoolers. For example, other words included in the ‘Vocabulary’ subtest include:
- Form A (#1-20): eat, tree, house, circus, walk, bird, edge, laugh, donate, faithful, aboard, humble, though, confusion, lethal, deny, pulp, verge, revive, intact, etc.
- Form B (#1-20): see, help, prize, sky, stove, cry, enormous, chimney, avoid, nonsense, snout, wept, exotic, cycle, deb, specify, debatable, pastel, rugged, studious, etc.
These words may work well to test the knowledge of younger children but they do not take into the account the challenging academic standards set forth for older students. As a result, students 11+ years of age may pass this subtest with flying colors but still present with a fair amount of difficulty usingsophisticated vocabulary words in written compositions.
2/3. Spelling and Punctuation (subtests 2 and 3). These two subtests are administered jointly but scored separately. Here, the student is asked to write sentences dictated by the examiner using appropriate rules for spelling and punctuation and capitalization. Ceiling for each subtest is reached separately. It occurs when the student makes 3 errors in a row in each of the subtests. In other words if a student uses correct punctuation but incorrect spelling, his/her ceiling on the ‘Spelling’ subtest will be reached sooner then on the ‘Punctuation’ subtest and vise versa.
Similar to the ‘Vocabulary‘ subtest I feel that the sentences the students are asked to write are far too simplistic to showcase their “true” grade level abilities. Below are some examples of sentences from both forms:
- Form A: (2) Run away.; (3) Birds fly.; (9) Who ate the food? (17) The electricity failed in Dallas, Texas.; (22) Because of the confusion, she sought legal help.
- Form B: (3) Am I going?; (18) Bring back three items: milk, crackers, and butter.; (23) After the door was closed, the sound was barely audible.
As you can see from the above, the requirements of these subtest are also not too stringent. The spelling words are simple and the punctuation requirements are very basic: a question mark here, an exclamation mark there, with a few commas in between. But I was particularly disappointed with the ‘Spelling‘ subtest.
Here’s why. I have a 6th grade client on my caseload with significant well-documented spelling difficulties. When this subtest was administered to him he scored within the average range (Scaled Score of 8 and Percentile Rank of 25). However, an administration of Spelling Performance Evaluation for Language and Literacy – SPELL-2, yielded 3 assessment pages of spelling errors, as well as 7 pages of recommendations on how to remediate those errors. Had he received this assessment as part of an independent evaluation from a different examiner, nothing more would have been done regarding his spelling difficulties, since the TOWL-4 revealed an average spelling performance due to it’s focus on overly simplistic vocabulary.
4. Logical Sentences – The student is asked to edit an illogical sentence so that it makes better sense. E.g.: “John blinked his nose” is changed to “John blinked his eye.” Ceiling is reached when the student makes 3 errors in a row. Again I’m not too thrilled with this subtest. Rather than truly attempting to ascertain the student’s grammatical and syntactic knowledge at sentence level a large portion of this subtest deals with easily recognizable semantic incongruities such as the one above.
5. Sentence Combining – The student integrates the meaning of several short sentences into one grammatically correct written sentence. E.g.: “John drives fast” is combined with “John has a red car,” making “John drives his red car fast.” Ceiling is reached when the student makes 3 errors in a row. The first few items contain only two sentences which can be combined by adding the conjunction “and” .
Remaining items are a bit more difficult due to the a. addition of more sentences and b. increase in the complexity of language needed to efficiently combine them. This is a nice subtest to administer to students who present with difficulty effectively and efficiently expressing their written thoughts on paper. It is particularly useful with students who write down a lot of extraneous information in their compositions/essays and frequently overuse run-on sentences.
6. Contextual Conventions – The student is asked to write a story in response to a stimulus picture. S/he earn points for satisfying specific requirements (identified below) relative to combined orthographic (E.g.: punctuation, spelling) and grammatical conventions (E.g.: sentence construction, noun-verb agreement). The student’s written composition needs to contain more than 40 words in order for the effective analysis to take place.
The scoring criteria ranges from no credit or a score of 0 ( based on 3 or more mistakes), to partial credit, a score of 1 (based on 1-2 mistakes) to full a credit – a score of 3 (no mistakes).
- Sentences begin with a capital letter
- Use of quotations marks
- Use of comma to set off direct quotes
- Correct use of apostrophe
- Use of a question mark
- Use of exlamation point
- Capitalization of proper nouns (including story title)
- Number of non-duplicated misspelled words
- Other use of punctuation (hyphen, parentheses, etc.)
- Use of fragments
- Use of run-on/rambling sentences
- Use of compound sentences
- Use of specific coordinating conjunction
- use of introductory phrases/clauses
- Noun-verb disagreement
- Sentences in paragraphs
- Sentence composition
- Number of correctly spelled words with 7 or more letters
- Number of correctly spelled words with 3 syllables or more
- Appropriate use of articles
While the above criteria is highly useful for younger elementary-aged students who may exhibit significant difficulties in the domain of writing, older middle school and high-school aged students as well as elementary aged students with moderate writing difficulties may attain average scoring on this subtest but still present with significant difficulties in this area as compared to typically developing grade level peers. As a result, in addition to this assessment it is recommended that a functional assessment of grade level writing also be performed in order to accurately identify the student’s writing needs.
7. Story Composition – The student’s story is evaluated relative to the quality of its composition (E.g.: vocabulary, plot, prose, development of characters, and interest to the reader).
The examiner first provides the student with an example of a good story by reading one written by another student. Then, the examiner provides the student with an appropriate picture card and tell them that they need to take time to plan their story and make an outline on the (also provided) scratch paper. The student has 5 minutes to plan before writing the actual story. After the 5 minutes, elapses they 15 minutes to write the story. It is important to note that story composition is the very first subtest administered to the student. Once they complete it they are ready to move on to the Vocabulary subtest.
- Story beginning
- Reference to a specific event (occurring before or after the picture)
- Story sequence
- Characters show emotions
- Story action
- Story ending
- Writing style
- Story (overall)
- Specific (listed) story vocabulary
- Overall vocabulary
With respect to this subtest it was significantly more useful for me to use with younger students as well as significantly impaired students vs. older students or students with mild-moderate writing difficulties. Again if your aim is to get an accurate picture of the older students writing abilities I definitely recommend usage of informal writing assessment rubrics based on the student’s grade level in order to have an accurate picture of their abilities.
- Thorough assessment of basic writing areas
- Flexible subtest administration (can be done on multiple occasions with students who fatigue easily)
- Untimed testing administration (with the exception of story composition subtests) may not be very functional with students who present with significant processing difficulties. One 12 year old student actually took ~40 minutes complete each subtest
- Primarily useful for students with severe deficits in the area of written expression
- Lack of computer scoring
- Lack of remediation suggestions based on subtest deficits
Overall, I do find TOWL-4 a very useful testing measure to have in my toolbox as it is terrific for ruling out weaknesses in the student’s basic writing abilities, with respect to simple vocabulary, sentence construction, writing mechanics, punctuation, etc. If I identify previously unidentified gaps in basic writing skills I can then readily intervene, where needed, if needed.
However, it is important to understand that the TOWL-4 is only a starting point for most of our students with complex literacy needs whose writing abilities are above severe level of functioning. Most students with mild-moderate writing difficulties will pass this test with flying colors but still present with significant writing needs. As a result I highly recommend a functional grade level writing assessment as a supplement to the above standardized testing.
Hammill, D. D., & Larson, S. C. (2009). Test of Written Language—Fourth Edition. (TOWL-4). Austin, TX: PRO-ED.
Disclaimer: The views expressed in this post are the personal impressions of the author. This author is not affiliated with PRO-ED in any way and was NOT provided by them with any complimentary products or compensation for the review of this product.