Harvard Uncertainty Speech Corpus

The Harvard Uncertainty Speech Corpus is a collection of speech recordings, elicitation materials, level of certainty annotations, and acoustic-prosodic data. The utterances were recorded in a laboratory, in a question-answering setting. In total, the corpus contains 1700 utterances from 42 speakers of American English, 148.79 minutes of speech.

Corpus properties

  • Utterances range in level of certainty {uncertain, neutral, certain} by controlling the difficulty of the questions
  • Repeated instances of specific words and phrases (e.g., sycophantic, Red Line, nine)
  • Level of certainty labels from a panel of listeners as well as from the speaker
  • Crowdsourced item difficulty scores (digit domain only)

How to get the corpus

While this page is in the process of being updated, materials below are available upon request.
  • Elicitation materials
    • vocabulary
    • transportation
    • digits
  • Digital recordings
    • available upon request for research purposes
  • Level of certainty annotations
    • self-reports
    • listener annotations
    • difficulty scores
  • Acoustic-prosodic data

Related Publications

  1. Heather Pon-Barry, Stuart Shieber and Nicholas Longenbaugh. Eliciting and Annotating Uncertainty in Spoken Language. In Proceedings of Language Resources and Evaluation Conference, May 2014.
  2. Heather Roberta Pon-Barry. Inferring Speaker Affect in Spoken Natural Language Communication. Ph.D. Dissertation, Harvard University, November 2012.
  3. Heather Pon-Barry and Stuart M. Shieber. Recognizing Uncertainty in Speech. EURASIP Journal on Advances in Signal Processing, 2011(251753), 2011. Special Issue on Emotion and Mental State Recognition from Speech.
  4. Heather Pon-Barry. Prosodic Manifestations of Confidence and Uncertainty in Spoken Language. Proceedings of Interspeech, pp. 74-77, September 2008.

For questions, please contact Heather Pon-Barry.