TY - JOUR
T1 - What's in a face? Visual contributions to speech segmentation
AU - Mitchel, Aaron D.
AU - Weiss, Daniel J.
N1 - Funding Information:
Correspondence should be addressed to Aaron D. Mitchel, 608 Moore Building, University Park, PA 16802, USA. E-mail: [email protected] and Daniel J. Weiss, E-mail: [email protected] We thank Beth Buerger, Molly Jamison, and Troy Gury for conducting experiments. We also thank Marissa Weyer and Chip Gerfen for help in assembling the visual stimuli. We are grateful to Rich Carlson and Chip for helpful comments and to NIH R03 grant HD048996-01 for support of this research.
PY - 2010/5
Y1 - 2010/5
N2 - Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether learners can utilise visual information to encapsulate statistics for each speech stream. We initially presented learners with incongruent artificial speech streams produced by the same female voice along with an accompanying visual display. Learners successfully segmented both streams when the audio stream was presented with an indexical cue of talking faces (Experiment 1). This learning cannot be attributed to the presence of the talking face display alone, as a single face paired with a single input stream did not improve segmentation (Experiment 2). Additionally, participants failed to successfully segment two streams when they were paired with a synchronised single talking face display (Experiment 3). Likewise, learners failed to successfully segment both streams when the visual indexical cue lacked audio-visual synchrony, such as changes in background screen colour (Experiment 4) or a static face display (Experiment 5). We end by discussing the possible relevance of the speaker's face in speech segmentation and bilingual language acquisition.
AB - Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether learners can utilise visual information to encapsulate statistics for each speech stream. We initially presented learners with incongruent artificial speech streams produced by the same female voice along with an accompanying visual display. Learners successfully segmented both streams when the audio stream was presented with an indexical cue of talking faces (Experiment 1). This learning cannot be attributed to the presence of the talking face display alone, as a single face paired with a single input stream did not improve segmentation (Experiment 2). Additionally, participants failed to successfully segment two streams when they were paired with a synchronised single talking face display (Experiment 3). Likewise, learners failed to successfully segment both streams when the visual indexical cue lacked audio-visual synchrony, such as changes in background screen colour (Experiment 4) or a static face display (Experiment 5). We end by discussing the possible relevance of the speaker's face in speech segmentation and bilingual language acquisition.
UR - http://www.scopus.com/inward/record.url?scp=77951664180&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951664180&partnerID=8YFLogxK
U2 - 10.1080/01690960903209888
DO - 10.1080/01690960903209888
M3 - Article
AN - SCOPUS:77951664180
SN - 0169-0965
VL - 25
SP - 456
EP - 482
JO - Language and Cognitive Processes
JF - Language and Cognitive Processes
IS - 4
ER -