Pitch Noodle:

A User-Centered, Computer-Assisted Pronunciation Training (CAPT) Application for Pitch Learning

Learning a language is hard enough…using the application doesn’t have to be! Pitch Noodle puts language learners first through intuitive navigations, easy-to-interpret audio-visual graphs, and a no-nonsense UI.

Date and Type

End-to end Independent Project
May – July 2021

UX Methods

A/B Testing, Direct Observations, Semi-Structured Interviews, Thematic Analysis, Cognitive Walkthrough, Qualtrics Survey, Wireframing, Prototyping

Role

User Researcher, Python Developer, Acoustical analyst, Data Analysis, UI Designer

TLDR;

Overview

After observing 15 users struggle with the interface of a pitch learning software, I worked to develop a user-centered application (Pitch Noodle) based on user feedback. A/B testing showed that Pitch Noodle outperformed the software in terms of effectiveness, efficiency, and user satisfaction.

Highlights

Pitch Noodle was 9.7% and 12.4% more effective at perception and production training respectively
Pitch Noodle was 27s faster than Praat in terms of task completion time
User satisfaction for Pitch Noodle was 52% higher than that of Praat

This was the process used in this study:

Introducing Pitch Noodle!

At its core, Pitch Noodle is an application that allows users to compare their pitch production to that of a model speaker’s pitch, FAST. Based on user feedback, the ultimate goal of Pitch Noodle was to provide just the right amount of technical information that learners needed to understand how to manipulate the movement of their pitch.

Discovery: Conducted direct observations and interviews with 15 users focusing on behavior, needs, and frustrations

I observed 15 student actors across a 3-week training session as they used PRAAT’s computerized pitch displays to learn the British accent. The goal was to understand how learners would use an intonation learning app. During the direct observation, I acted as a passive observer and took detailed notes of how they used the interface. If needed, I also followed up with questions during the observations regarding specific behaviors. At the end of the observation, I conducted a semi-structured interview in which I asked them open-ended questions about their app use behavior, needs, and frustrations.

* Screenshot of the interface used

Direct Observation

Things I took detailed notes of:

Hotkeys used
Errors made
Navigation paths
Indicators of frustration and confusion + associated task
Learning ‘aha’ moments + associated task

Semi-Structured Interviews

I asked 10 open-ended questions about their experience:

3 questions about values and motivation
2 questions about their tasks and needs
2 questions about general app use behavior
3 questions about challenges in the current app

Insights: Users responded to pitch visualizations but were cognitively overwhelmed by the technical information and complex interface

The student actors’ goals was to sound as nativelike and as natural as possible. Thematic analysis revealed that users found the pitch visualizations helpful, but almost everything else served as a distractor. 7 out of 12 participants reported feeling frustrated enough that they wanted to end the session sooner. The self-reports were corroborated by notes taken from the direct observation.

User Insight #1: Learners want to understand and compare how their voice moves

“Those jumps are really tricky…I don’t think I’d be able to tell [if pitch movement is large enough] without looking at the screen.”
“I didn’t realize how flat my voice was!”
“I keep thinking I’m making my intonation go up when it’s actually going down.”

User Insight #2: Technical information was cognitively overwhelming

“These guys [spectrogram] confuse me. I think it affects my pitch but I’m not sure, so I try random things to try and move the pitch.”
“The contours are too hard to read. I don’t understand how it’s supposed to line up with the rest of the stuff like the blue lines.”

User Insight #3: The complex interface interfered with the learning process

When asked why they only recorded themselves once per utterance:

“I didn’t want to start all over again [loading and setting up files] so I wanted to make sure the one I recorded was a good one.”

When asked why they exited the entire program and restarted it:

“Because I’m so done with all of these windows*.”

Solution: A barebones interface focusing on pitch movement comparisons, charted on a simple graph for interpretability

How might we was to translate acoustical analysis, a complex concept, into something that the average learner can understand? How might we create an interface where students can quickly record and compare pitches without the interface getting in the way? I started out with sketching out various ways to visualize and compare pitch. Then, I sketched out several different concepts before conducting a streamlined cognitive walkthrough (Spencer, 2000) on 2 users. The last step was converting all user feedback into the final Python prototype.

Step 1: Wireframe sketches
(2 users)

Tried a pitch comparisons on two different lines, marked with dotted vertical lines at key points. Users found it complex; also technically difficult to code.

Tested a music concept since many of the theatre students made music comparisons. Did not test well -- singing speech sounds too unnatural.

An X-Y graph with a V-line that runs through the pitch contour as it plays. Could get cluttered with multiple pitch curves on it. — An X-Y graph with a V-line that users can use a mouse to scrub back and forth on. Fared better but still too complicated.

Simple X-Y graph with just the contour. The winner! — Simple X-Y graph with just the pitch curve. Tested best because it was intuitive to understand and users were able to voice the curve easily. The winner!

Step 2: Cognitive Walkthrough (2 users)

Sketch of overall interface layout concept -- focus on button placements and graph visibility.

Used Spencer's (2000) streamlined cognitive walkthrough to assess 1. Will users know what to do at this step? 2. If user does the right thing, will they know they've done the right thing and are making progress towards their goal? — Used Spencer's (2000) streamlined cognitive walkthrough to assess features and workflow

Screen Shot 2022-01-04 at 1.09.50 AM — How to use: Users first start out by uploading the model speaker's input.

Step 3: Python prototype
(2 users)

Next, users move on to recording their own production. Recording is fast and simple — users simply click on the record button to start the recording and the stop button to stop the recording. The program moves as fast as the user wants to go. The goal of this is opportunities for output.

Similarly, additional buttons do not appear until a user has recorded an audio file. Users are able to playback their recordings as many times as they want, or if they are dissatisfied, to delete it and re-record a new one. They are also able to extract the pitch of their file. — Final design: Users upload model input and move onto recording their own input.

Finally, users are able to extract the pitch of each file or recording to compare the differences in pitch contour. This is especially helpful when learning tonal languages as it takes the guesswork out of figuring out the pitch direction of an utterance. The goal of this is accuracy of corrective feedback.

Users are also able to delete contours they no longer want or hide pitch contours temporarily to maintain a tidy display. Each individual contour appears in a different color and is labelled in the legend to help with discrimination. — Then, users extract the pitch of both inputs to compare the differences in pitch movement.

Solution: Secondary research that also informed my designed decisions

To ensure that there was theory and inquiry behind the development, I borrowed pedagogical principles from the field of Applied Linguistics and usability principles from the field of Human-Computer Interaction (HCI) to develop Pitch Noodle.

CAPT Pedagogical Guidelines (Neri et al., 2002)

Authenticity of Input

Defined as a learner’s access to diverse and accurate samples to the target language. The development of new phonetic categories must be derived from available exemplars, hence the need for authentic input.

Opportunities for Output

Defined as a space for learners to produce speech in order to test their hypothesis on new phonetic sounds. This way, learners are able to receive proprioceptive feedback on their own performance and make adjustments as necessary.

Accuracy of Corrective Feedback

The feedback that helps learners notice the discrepancies between their productions and the target language. According to Schmidt’s (1990) “noticing hypothesis”, it is only through this awareness that can lead to the acquisition of a new sound.

Usability Design Guidelines (ISO 9241-11)

Effectiveness

Defined as “the extent to which the intended goals of use of the overall systems are achieved.” In other words, how well do users achieve their goals using the system?

Efficiency

Defined as “the resources that have to be expended to achieve the intended goals”. In other words, what resources (i.e., time, costs, material resources) are consumed to achieve a specific task (e.g., task completion time)?

User Satisfaction

Defined as “freedom from discomfort, and positive attitudes towards the use of the product.” In other words, how do users feel about their use of the system?

Test: Pitch Noodle outperformed PRAAT in efficacy, efficiency, and user satisfaction scores

Efficacy

A/B testing was used on 15 Mandarin tone learners to test perception and production training. Perception training involved identifying the correct tone from a list of tones; production training involved accurately producing a list of tones as rated by 2 native speakers. Pitch Noodle was found to be 9.7% and 12.4% more effective.

Efficiency

A/B testing was used on 2 participants to complete three trials of a task on each interface. Button clicks and task completion time were used as metrics for this study. Pitch Noodle required 11 fewer button clicks, was a whopping 27 seconds faster.

User Satisfaction

A survey was sent out to all users asking them about the helpfulness of the pitch movements and comparisons on both interfaces. Pitch Noodle scored a 90% vs. 21% from PRAAT.

Reflections: What I learned

Simplify! It was tempting to want to include additional features and have the app “do more things”. However, this would not only involve extra development and testing time, learners also repeatedly talked about how they wanted something quick and dirty in the feedback. Additionally, their main frustration with PRAAT was that the features were too overwhelming. I decided to listen to user feedback and to create something bare bones, which paid off.

Asking for help. Going into this, I had very little background in Python or acoustical graphing. As a team of one, I very quickly learned when it was most strategic to keep watching that YouTube video on Dynamic Time Warping (DTW) or to stop and go ask someone for help. I also learned that Discord servers are where brilliant and helpful developers live, and that these saints will volunteer time to look at your code and talk you through the issues.

Importance of pedagogical theories. Corrective feedback in pronunciation was more challenging than I anticipated it to be. The comparative feedback provided in Pitch Noodle is a form of corrective feedback, but it does not tell you how to change your pronunciation or why it is wrong . Additionally, telling the user if they are “right” or “wrong” brings up even more questions: What should the Hertz difference be before something is correct or incorrect? What of individual differences? How should the program account for monotonous vs. expressive speakers? Given another go, I would devote more research to this particular topic in hopes of developing something more viable.

Technology Demo

I had the opportunity to present Pitch Noodle at the TSLL/MWALT 2021 conference! Here is an edited video of the full presentation showcasing a full breakdown of the application.

Pitch Noodle:

A User-Centered, Computer-Assisted Pronunciation Training (CAPT) Application for Pitch Learning

Date and Type

UX Methods

Role

TLDR;

Overview

Highlights

This was the process used in this study:

Introducing Pitch Noodle!

Discovery: Conducted direct observations and interviews with 15 users focusing on behavior, needs, and frustrations

Direct Observation

Semi-Structured Interviews

Insights: Users responded to pitch visualizations but were cognitively overwhelmed by the technical information and complex interface

Solution: A barebones interface focusing on pitch movement comparisons, charted on a simple graph for interpretability

Step 1: Wireframe sketches (2 users)

Step 2: Cognitive Walkthrough (2 users)

Step 3: Python prototype (2 users)

Solution: Secondary research that also informed my designed decisions

CAPT Pedagogical Guidelines (Neri et al., 2002)

Usability Design Guidelines (ISO 9241-11)

Test: Pitch Noodle outperformed PRAAT in efficacy, efficiency, and user satisfaction scores

Efficacy

Efficiency

User Satisfaction

Reflections: What I learned

Technology Demo

Step 1: Wireframe sketches
(2 users)

Step 3: Python prototype
(2 users)