April Tan

Pitch Noodle:

A User-Centered, Computer-Assisted Pronunciation Training (CAPT) Application for Pitch Learning

Learning a language is hard enough…using the application doesn’t have to be! Pitch Noodle puts language learners first through intuitive navigations, easy-to-interpret audio-visual graphs, and a no-nonsense UI.

Date and Type

End-to end Independent Project
May – July 2021

UX Methods

A/B Testing, Direct Observations, Semi-Structured Interviews, Thematic Analysis, Cognitive Walkthrough, Qualtrics Survey, Wireframing, Prototyping

Role

User Researcher, Python Developer, Acoustical analyst, Data Analysis, UI Designer

TLDR;

Overview

After observing 15 users struggle with the interface of a pitch learning software, I worked to develop a user-centered application (Pitch Noodle) based on user feedback. A/B testing showed that Pitch Noodle outperformed the software in terms of effectiveness, efficiency, and user satisfaction. 

Highlights

  • Pitch Noodle was 9.7% and 12.4% more effective at perception and production training respectively
  • Pitch Noodle was 27s faster than Praat in terms of task completion time
  • User satisfaction for Pitch Noodle was 52% higher than that of Praat

This was the process used in this study:

Artboard 5 copy 5

Introducing Pitch Noodle!

At its core, Pitch Noodle is an application that allows users to compare their pitch production to that of a model speaker’s pitch, FAST. Based on user feedback, the ultimate goal of Pitch Noodle was to provide just the right amount of technical information that learners needed to understand how to manipulate the movement of their pitch. 

Discovery: Conducted direct observations and interviews with 15 users focusing on behavior, needs, and frustrations

I observed 15 student actors across a 3-week training session as they used PRAAT’s computerized pitch displays to learn the British accent.  The goal was to understand how learners would use an intonation learning app. During the direct observation, I acted as a passive observer and took detailed notes of how they used the interface. If needed, I also followed up with questions during the observations regarding specific behaviors. At the end of the observation, I conducted a semi-structured interview in which I asked them open-ended questions about their app use behavior, needs, and frustrations.

* Screenshot of the interface used

Direct Observation

Things I took detailed notes of:
  • Hotkeys used
  • Errors made
  • Navigation paths
  • Indicators of frustration and confusion + associated task
  • Learning ‘aha’ moments + associated task

Semi-Structured Interviews

I asked 10 open-ended questions about their experience:
  • 3 questions about values and motivation
  • 2 questions about their tasks and needs
  • 2 questions about general app use behavior
  • 3 questions about challenges in the current app

Insights: Users responded to pitch visualizations but were cognitively overwhelmed by the technical information and complex interface

The student actors’ goals was to sound as nativelike and as natural as possible. Thematic analysis revealed that users found the pitch visualizations helpful, but almost everything else served as a distractor. 7 out of 12 participants reported feeling frustrated enough that they wanted to end the session sooner. The self-reports were corroborated by notes taken from the direct observation.

“Those jumps are really tricky…I don’t think I’d be able to tell [if pitch movement is large enough] without looking at the screen.”

“I didn’t realize how flat my voice was!”

“I keep thinking I’m making my intonation go up when it’s actually going down.”

“These guys [spectrogram] confuse me. I think it affects my pitch but I’m not sure, so I try random things to try and move the pitch.” 

“The contours are too hard to read. I don’t understand how it’s supposed to line up with the rest of the stuff like the blue lines.”

When asked why they only recorded themselves once per utterance:

“I didn’t want to start all over again [loading and setting up files] so I wanted to make sure the one I recorded was a good one.”

When asked why they exited the entire program and restarted it:

“Because I’m so done with all of these windows*.”

Solution: A barebones interface focusing on pitch movement comparisons, charted on a simple graph for interpretability

How might we was to translate acoustical analysis, a complex concept, into something that the average learner can understand? How might we create an interface where students can quickly record and compare pitches without the interface getting in the way? I started out with sketching out various ways to visualize and compare pitch. Then, I sketched out several different concepts before conducting a streamlined cognitive walkthrough (Spencer, 2000) on 2 users. The last step was converting all user feedback into the final Python prototype.  

Step 1: Wireframe sketches
(2 users)

Step 2: Cognitive Walkthrough (2 users)

Step 3: Python prototype
(2 users)

Solution: Secondary research that also informed my designed decisions

To ensure that there was theory and inquiry behind the development, I borrowed pedagogical principles from the field of Applied Linguistics and usability principles from the field of Human-Computer Interaction (HCI) to develop Pitch Noodle. 

CAPT Pedagogical Guidelines (Neri et al., 2002)

Defined as a learner’s access to diverse and accurate samples to the target language. The development of new phonetic categories must be derived from available exemplars, hence the need for authentic input.

Defined as a space for learners to produce speech in order to test their hypothesis on new phonetic sounds. This way, learners are able to receive proprioceptive feedback on their own performance and make adjustments as necessary.

The feedback that helps learners notice the discrepancies between their productions and the target language. According to Schmidt’s (1990) “noticing hypothesis”, it is only through this awareness that can lead to the acquisition of a new sound.

Usability Design Guidelines (ISO 9241-11)

Defined as “the extent to which the intended goals of use of the overall systems are achieved.” In other words, how well do users achieve their goals using the system?

Defined as “the resources that have to be expended to achieve the intended goals”. In other words, what resources (i.e., time, costs, material resources) are consumed to achieve a specific task (e.g., task completion time)?

Defined as “freedom from discomfort, and positive attitudes towards the use of the product.” In other words, how do users feel about their use of the system?

Test: Pitch Noodle outperformed PRAAT in efficacy, efficiency, and user satisfaction scores

Efficacy

A/B testing was used on 15 Mandarin tone learners to test perception and production training. Perception training involved identifying the correct tone from a list of tones; production training involved accurately producing a list of tones as rated by 2 native speakers. Pitch Noodle was found to be 9.7% and 12.4% more effective.

Efficiency

A/B testing was used on 2 participants to complete three trials of a task on each interface. Button clicks and task completion time were used as metrics for this study. Pitch Noodle required 11 fewer button clicks, was a whopping 27 seconds faster.

User Satisfaction

A survey was sent out to all users asking them about the helpfulness of the pitch movements and comparisons on both interfaces. Pitch Noodle scored a 90% vs. 21% from PRAAT.

Reflections: What I learned

  • Simplify! It was tempting to want to include additional features and have the app “do more things”. However, this would not only involve extra development and testing time, learners also repeatedly talked about how they wanted something quick and dirty in the feedback. Additionally, their main frustration with PRAAT was that the features were too overwhelming. I decided to listen to user feedback and to create something bare bones, which paid off.
 
  • Asking for help. Going into this, I had very little background in Python or acoustical graphing. As a team of one, I very quickly learned when it was most strategic to keep watching that YouTube video on Dynamic Time Warping (DTW) or to stop and go ask someone for help. I also learned that  Discord servers are where brilliant and helpful developers live, and that these saints will volunteer time to look at your code and talk you through the issues. 
 
  • Importance of pedagogical theories. Corrective feedback in pronunciation was more challenging than I anticipated it to be. The comparative feedback provided in Pitch Noodle is a form of corrective feedback, but it does not tell you how to change your pronunciation or why it is wrong . Additionally, telling the user if they are “right” or “wrong” brings up even more questions: What should the Hertz difference be before something is correct or incorrect? What of individual differences? How should the program account for monotonous vs. expressive speakers? Given another go, I would devote more research to this particular topic in hopes of developing something more viable.

Technology Demo

I had the opportunity to present Pitch Noodle at the TSLL/MWALT 2021 conference! Here is an edited video of the full presentation showcasing a full breakdown of the application.