What Does an ‘R’ Sound Look Like?

Has someone ever corrected the way you said a word? You reply with another attempt, only to be corrected again. Frustration mounts as you struggle to hear nuanced differences between the speaker’s pronunciation and your own.

A similar experience often occurs for children in speech therapy who are trying to mimic proposed words. In some cases, a child can get stuck trying to perfect an “r” sound over the course of months or even years.

Display from the staRt app showing a user's pronunciation of the "r" sound in "romp" — Display from the Tara showing a user's pronunciation of the "r" sound in "romp."

To address these challenges, Associate Professor and Director of the Doctoral Program in Communicative Sciences and Disorders, Tara McAllister, and her team at BITS Lab, in partnership with Associate Professor of Music Composition and Technology Tae Hong Park and NYU School of Medicine Professor Mario Svirsky, created the staRt app to help users learn speech skills through biofeedback. Biofeedback uses instrumentation to create a visual display of some aspect of body function, such as heart rate or speech acoustics. As the Lab’s director, McAllister has conducted multiple studies on biofeedback for speech, including studies in which an ultrasound probe held beneath the chin allows users to see how they position their tongues during speech.

With the staRt app, a cheerful starfish prompts users to speak in order to see their own sounds shown as ocean waves that move as the user talks. Since its launch in 2020 on the App Store, staRt has been downloaded more than 9,000 times.

NYU News spoke to McAllister about the creation of staRt, its influence on the practice of speech therapy, and the other ways speech visualization can benefit language learning.

How has visualizing words changed the ways that children correct their speech?

One of the biggest challenges in speech therapy is that many children who have trouble producing a speech sound also have difficulty hearing the difference between what they produce and what other speakers are producing. So, in a traditional therapeutic interaction, the clinician models the sound, and when the child imitates it, the clinician might say "No, don't say euhh, say errr." But if the child doesn't clearly perceive the difference between the two, that can be a really frustrating interaction for both parties. With visual biofeedback, the clinician can show the child the difference between their current output and the target, and the child can adjust their articulation and see if the speech display is getting closer to the target.

What feedback have clinicians given you about their use of the app?

It's really exciting to see how many clinicians are using the staRt app in their everyday practice. The "r" sound is a huge pain point for speech pathologists in the school setting – there are so many children who make progress on all their other speech targets but just get stuck on "r." The feedback I've gotten from clinicians is that the different perspective that the app provides – visual instead of auditory – can help some of these kids get un-stuck.

Can parents use it at home too?

Lots of parents are interested in downloading the app. We do ask that everyone who uses it at least start out with the guidance of a speech-language pathologist, just to make sure the learner is on the right track, but once a level of familiarity has been established, families are welcome to download the app to practice at home.

Has the app influenced the way you approach speech therapy?

My experience with biofeedback has definitely influenced how I would clinically approach a child who is struggling with a particular sound. I mentioned that kids can get stuck on a sound like "r" for months or even years, or they may be dismissed from the caseload without making progress. I think that we're seeing a shift in the clinical management of those children, where instead of trying the same thing for months or years or just giving up, clinicians are reaching for technology-enhanced treatment methods to try to make a breakthrough.

What other groups might benefit from the app?

There is a lot of potential for speech visualization technology to be used in second-language learning. We haven't used the staRt app in this specific context yet, but several students in my lab are investigating whether visual feedback more broadly can help English speakers produce Mandarin sounds more intelligibly. And adult learners of English who are working on the "r" sound are certainly welcome to try out the app as part of their learning process.

I am also excited about the potential for speech visualization to be used in gender-affirming communication training. Some trans people can be negatively impacted if their voice is perceived as incongruous with their gender identity, and they may choose to work with a speech pathologist to achieve a vocal presentation that is comfortable for them. When we talk about speech and gender, people mostly think about the pitch of the voice, but male and female vocal tracts also differ in their resonating characteristics. Resonance is harder to understand than pitch, which makes it harder to target in therapy. But the staRt app allows learners to visualize the resonant frequencies of the vocal tract, which could make it easier to adjust them to match a target such as the average frequency for cis women.

I do want to be clear that we are not trying to impose a specific accent or speech pattern or gender presentation on anyone. People have all different kinds of ways of speaking, and that is a wonderful thing; we don't want to change that. But for people who do want to make a change in their speech patterns, whatever the reason may be, we would like to help them make that change effectively and efficiently.

What do you see as the next innovation for your work, and in the field more broadly?

I am excited about two new directions we're going with our work. The first is delivery of speech therapy services via telepractice, such as over Zoom. COVID-19 saw an absolute explosion in the number of speech pathologists who provide services remotely, and as the world returns to normal, many of those clinicians will continue to opt for telepractice due to its convenience and efficiency. During COVID, we conducted a pilot study demonstrating that biofeedback treatment can be effective when delivered via telepractice. We are now following up on that work with larger-scale studies, and we are adapting the staRt software to work in a browser so it can be easily shared in a video call.

The second direction that will have a huge impact on speech pathology is artificial intelligence. One thing that science has shown very clearly is that a new speech skill needs to be practiced many times before it becomes the new default for a speaker – one study suggested as many as 5,000 repetitions were needed. Clinicians have jam-packed caseloads, which makes it very difficult to achieve those kinds of numbers with any given child. One promising solution is to use AI to extend the services provided by the speech pathologist. For example, a child could see their clinician once a week and complete AI-guided home practice on two other days.

However, the performance of speech recognition technology is still quite poor for the speech of children and individuals with speech disorders. Last year, our team received NIH funding to develop a speech database that can train AI systems to classify children's "r" sounds as on-target or off-target; this project is led by Nina Benway, a doctoral student at Syracuse University. We are currently working to integrate this classifier with the staRt app, which would allow us to provide clinicians with a complete package of tools to help children achieve and maintain new speech targets.