• Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Acquisition
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Religion
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business Strategy
  • Business History
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Systems
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Politics and Law
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Cognitive Psychology

  • < Previous chapter
  • Next chapter >

26 Speech Perception

Sven L. Mattys, Department of Psychology, University of York, York, UK

  • Published: 03 June 2013
  • Cite Icon Cite
  • Permissions Icon Permissions

Speech perception is conventionally defined as the perceptual and cognitive processes leading to the discrimination, identification, and interpretation of speech sounds. However, to gain a broader understanding of the concept, such processes must be investigated relative to their interaction with long-term knowledge—lexical information in particular. This chapter starts with a review of some of the fundamental characteristics of the speech signal and by an evaluation of the constraints that these characteristics impose on modeling speech perception. Long-standing questions are then discussed in the context of classic and more recent theories. Recurrent themes include the following: (1) the involvement of articulatory knowledge in speech perception, (2) the existence of a speech-specific mode of auditory processing, (3) the multimodal nature of speech perception, (4) the relative contribution of bottom-up and top-down flows of information to sound categorization, (5) the impact of the auditory environment on speech perception in infancy, and (6) the flexibility of the speech system in the face of novel or atypical input.

The complexity, variability, and fine temporal properties of the acoustic signal of speech have puzzled psycholinguists and speech engineers for decades. How can a signal seemingly devoid of regularity be decoded and recognized almost instantly, without any formal training, and despite being often experienced in suboptimal conditions? Without any real effort, we identify over a dozen speech sounds (phonemes) per second, recognize the words they constitute, almost immediately understand the message generated by the sentences they form, and often elaborate appropriate verbal and nonverbal responses before the utterance ends.

Unlike theories of letter perception and written-word recognition, theories of speech perception and spoken-word recognition have devoted a great deal of their investigation to a description of the signal itself, most of it carried out within the field of phonetics. In particular, the fact that speech is conveyed in the auditory modality has dramatic implications for the perceptual and cognitive operations underpinning its recognition. Research in speech perception has focused on the constraining effects of three main properties of the auditory signal: sequentiality, variability, and continuity.

Nature of the Speech Signal

Sequentiality.

One of the most obvious disadvantages of the auditory system compared to its visual counterpart is that the distribution of the auditory information is time bound, transient, and solely under the speaker’s control. Moreover, the auditory signal conveys its acoustic content in a relatively serial fashion, one bit of information at a time. The extreme spreading of information over time in the speech domain has important consequences for the mechanisms involved in perceiving and interpreting the input.

Illustration of the sequential nature of speech processing. ( A ) Waveform of a complete sentence, that is, air pressure changes (Y axis) over time (X axis). ( B–D ) Illustration of a listener’s progressive processing of the sentence at three successive points in time. The visible waveform represents the portion of signal that is available for processing at time t1 ( B ), t2 ( C ), and t3 ( D ).

In particular, given that relatively little information is conveyed per unit of time, the extraction of meaning can only be done within a window of time that far exceeds the amount of information that can be held in echoic memory (Huggins, 1975 ; Nooteboom, 1979 ). Likewise, given that there are no such things as “auditory saccades,” in which listeners would be able to skip ahead of the signal or replay the words or sentences they just heard, speech perception and lexical-sentential integration must take place sequentially, in real time (Fig. 26.1 ).

For a large part, listeners are extremely good at keeping up with the rapid flow of speech sounds. Marslen-Wilson ( 1987 ) showed that many words in sentences are often recognized well before their offset, sometimes as early as 200 ms after their onset, the average duration of one or two syllables. Other words, however, can only be disentangled from competitors later on, especially when they are short and phonetically reduced, for example, “you are” pronounced as “you’re” (Bard, Shillcock, & Altmann, 1988 ). Yet, in general, there is a consensus that speech perception and lexical access closely shadow the unfolding of the signal (e.g., the Cohort Model; Marslen-Wilson, 1987 ), even though “right-to-left” effects can sometimes be observed as well (Dahan, 2010 ).

Given the inevitable sequentiality of speech perception and the limited amount of information that humans can hold in their auditory short-term memory, an obvious question is whether fast speech, which allows more information to be packed into the same amount of time, helps listeners handle the transient nature of speech and, specifically, whether it affects the mechanisms leading to speech recognition. A problem, however, is that fast speech tends to be less clearly articulated (hypoarticulated), and hence, less intelligible. Thus, any processing gain due to denser information packing might be offset by diminished intelligibility. However, this confound can be avoided experimentally. Indeed, speech rate can be accelerated with minimal loss of intrinsic intelligibility via computer-assisted signal compression (e.g., Foulke & Sticht, 1969 ; van Buuren, Festen, & Houtgast, 1999 ). Time compression experiments have led to mixed results. Dupoux and Mehler ( 1990 ), for instance, found no effect of speech rate on how phonemes are perceived in monosyllabic versus disyllabic words. They started from the observation that the initial consonant of a monosyllabic word is detected faster if the word is high frequency than if it is low frequency, whereas frequency has no effect in multisyllabic words. This difference can be attributed to the use of a lexical route with short words and of a phonemic route with longer words. That is, short words are mapped directly onto lexical representations, whereas longer words undergo a process of decomposition into phonemes first. Critically, Dupoux and Mehler reported that a frequency effect did not appear when the duration of the disyllabic words was compressed to that of the monosyllabic words, suggesting that whether listeners use a lexical or phonemic route to identify phonemes depends on structural factors (number of phonemes or syllables) rather than time. Thus, on this account, the transient nature of speech has only a limited effect on the mechanisms underlying speech recognition.

In contrast, others have found significant effects of speech rate on lexical access. For example, both Pitt and Samuel ( 1995 ) and Radeau, Morais, Mousty, and Bertelson ( 2000 ) observed that the uniqueness point of a word, that is, the sequential point at which it can be uniquely specified (e.g., “spag” for “spaghetti”), could be dramatically altered when speech rate was manipulated. However, most changes were observed at slower rates, not at faster rates. Thus, changes in speech rate can have effects on recognition mechanisms, but these are observed mainly with time expansion, not with time compression. In sum, although the studies by Dupoux and Mehler ( 1990 ), Pitt and Samuel ( 1995 ), and Radeau et al. ( 2000 ) highlight different effects of time manipulation on speech processing, they all agree that packing more information per unit of time by accelerating speech rate does not compensate for the transient nature of the speech signal and for memory limitations. This is probably due to intrinsic perceptual and mnemonic limitations on how fast information can be processed by the speech system—at any rate.

In general, the sequential nature of speech processing is a feature that many models have struggled to implement not only because it requires taking into account echoic and short-term memory mechanisms (Mattys, 1997 ) but also because the sequentiality problem is compounded by a lack of clear boundaries between phonemes and between words, as described later.

The inspection of a speech waveform does not reveal clear acoustic correlates of what the human ear perceives as phoneme and word boundaries. The lack of boundaries is due to coarticulation between phonemes (the blending of articulatory gestures between adjacent phonemes) within and across words. Even though the degree of coarticulation between phonemes is somewhat less pronounced across than within words (Fougeron & Keating, 1997 ), the lack of clear and reliable gaps between words, along with the sequential nature of speech delivery, makes speech continuity one of the most challenging obstacles for both psycholinguistic theory and automatic speech recognition applications. Yet the absence of phoneme and word boundary markers hardly seems to pose a problem for everyday listening, as the subjective experience of speech is not one of continuity but, rather, of discreteness—that is, a string of sounds making up a string of words.

A great deal of the segmentation problem can be solved, at least in theory, based on lexical knowledge and contextual information. Key notions, here, are lexical competition and segmentation by lexical subtraction. In this view, lexical candidates are activated in multiple locations in the speech signal—that is, multiple alignment—and they compete for a segmentation solution that does not leave any fragments lexically unaccounted for (e.g., “great wall” is favored over “gray twall,” because “twall” in not an English word). Importantly, this knowledge-driven approach does not assign a specific computational status to segmentation, other than being the mere consequence of mechanisms associated with lexical competition (e.g., McClelland & Elman, 1986 ; Norris, 1994 ).

Another source of information for word segmentation draws upon broad prosodic and segmental regularities in the signal, which listeners use as heuristics for locating word boundaries. For example, languages whose words have a predominant rhythmic pattern (e.g., word-initial stress is predominant in English; word-final lengthening is predominant in French) provide a relatively straightforward—though probabilistic—segmentation strategy to their listeners (Cutler, 1994 ). The heuristic for English would go as follows: every time a strong syllable is encountered, a boundary is posited before that syllable . For French, it would be: every time a lengthened syllable is encountered, a boundary is posited after that syllable . Another documented heuristic is based on phonotactic probability, that is, the likelihood that specific phonemes follow each other in the words of a language (McQueen, 1998 ). Specifically, phonemes that are rarely found next to each other in words (e.g., very few English words contain the /fh/ diphone) would be probabilistically interpreted as having occurred across a word boundary (e.g., “tou gh h ero”). Finally, a wide array of acoustic-phonetic cues can also give away the position of a word boundary (Umeda & Coker, 1974 ). Indeed, phonemes tend to be realized differently depending on their position relative to a word or a syllable boundary. For example, in English, word-initial vowels are frequently glottalized (brief closure of the glottis, e.g., /e/ in “isle e nd,” compared to no closure in “I l e nd”), word-initial stop consonants are often aspirated (burst of air accompanying the release of a consonant, e.g., /t/ in “gray t anker” compared to no aspiration in “grea t anchor”).

It is important to note that, in everyday speech, lexically and sublexically driven segmentation cues usually coincide and reinforce each other. However, in suboptimal listening conditions (e.g., noise) or in rare cases where a conflict arises between those two sources of information, listeners have been shown to downplay sublexical discrepancies and give more heed to lexical plausibility (Mattys, White, & Melhorn, 2005 ; Fig. 26.2 ).

Variability

Perhaps the most defining challenge for the field of speech perception is the enormous variability of

Sketch of Mattys, White, and Melhorn’s ( 2005 ) hierarchical approach to speech segmentation. The relative weights of speech segmentation cues are illustrated by the width of the gray triangle. In optimal listening conditions, the cues in Tier I dominate. When lexical access is compromised or ambiguous, the cues in Tier II take over. Cues from Tier III are recruited when both lexical and segmental cues are compromised (e.g., background of severe noise). (Reprinted from Mattys, S. L., White, L., & Melhorn, J. F [2005]. Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General , 134 , 477–500 [Figure 7], by permission of the American Psychological Association.)

the signal relative to the stored representations onto which it must be matched. Variability can be found at the word level, where there are infinite ways a given word can be pronounced depending on accents, voice quality, speech rate, and so on, leading to a multitude of surface realizations for a unique target representation. But this many-to-one mapping problem is not different from the one encountered with written words in different handwritings or object recognition in general. In those cases, signal normalization can be effectively achieved by defining a set of core features unique to each word or object stored in memory and by reducing the mapping process to those features only.

The real issue with speech variability happens at a lower level, namely, phoneme categorization. Unlike letters whose realizations have at least some commonality from one instance to another, phonemes can vary widely in their acoustic manifestation—even within the same speaker. For example, as shown in Figure 26.3A , the realization of the phoneme /d/ has no immediately apparent acoustic commonality in /di/ and /du/ (Delattre, Liberman, & Cooper, 1955 ). This lack of acoustic invariance is the consequence of coarticulation: The articulation of /d/ in /di/ is partly determined by the articulatory preparation for /i/, and likewise for /d/ in /du/. The power of coarticulation is easily demonstrated by observing a speaker’s mouth prior to saying /di/ compared to /du/. The mode of articulation of /i/ (unrounded) versus /u/ (rounded) is visible on the speaker’s lips even before /d/ has been uttered. The resulting acoustics of /d/ preceding each vowel have therefore little in common.

The success of the search for acoustic cues, or invariants, capable of uniquely identifying phonemes or phonetic categories has been highly feature specific. For example, as illustrated in Figure 26.3A , the place of articulation of phonemes (i.e., the place in the vocal tract where the airstream is most constricted, which distinguishes, e.g., /b/, /d/, /g/) has been difficult to map onto specific acoustic cues. However, the difference between voiced and unvoiced stop consonants (/b/, /d/, /g/ vs. /p/, /t/, /k/) can be traced back fairly reliably to the duration between the release of the consonant and the moment when the vocal folds start vibrating, that is, the voice onset time (VOT; Liberman, Delattre, & Cooper, 1958 ). In English, the VOT of voiced stop consonants is typically around 0 ms (or at least shorter than 20 ms), whereas it is generally over 25 ms for voiceless consonants. Although this contrast has been shown to be somewhat influenced by consonant type and vocalic context (e.g., Lisker & Abramson, 1970 ), VOT is a fairly robust cue for the voiced-voiceless distinction.

( A ) Stylized spectrograms of /di/ and /du/. The dark bars, or formants, represent areas of peak energy on the frequency scale (Y axis), which correlate with zones of high resonance in the vocal tract. The curvy leads into the formants are formant transitions. They show coarticulation between the consonant and the following vowel. Note the dissimilarity between the second formant transitions in /di/ (rising) and /du/ (falling). However, as shown in ( B ), the extrapolation back in time of the two second formants’ transitions point to a common frequency locus.

Vowels are subject to coarticulatory influences, too, but the spectral structure of their middle portion is usually relatively stable, and hence, a taxonomy of vowels based on their unique distribution of energy bands along the frequency spectrum, or formants, can be attempted. However, such distribution is influenced by speaking rate, with fast speech typically leading to the target frequency of the formants being missed or leading to an asymmetric shortening of stressed versus unstressed vowels (Lindblom, 1963 ; Port, 1977 ). In general, speech rate variation is particularly problematic for acoustic cues involving time. Even stable cues such as VOT can lose their discriminability power when speech rate is altered. For example, at fast speech rates, the VOT difference between voiced and voiceless stop consonants decreases, making the two types of phonemes more difficult to distinguish (Summerfield, 1981 ). The same problem has been noted for the difference between /b/ and /w/, with /b/ having rapid formant transitions into the vowel and /w/ less rapid ones. This difference is less pronounced at fast speech rates (Miller & Liberman, 1979 ).

Yet, except for those conditions in which subtle differences are manipulated in the laboratory, listeners are surprisingly good at compensating for the acoustic distortions introduced by coarticulation and changes in speech rate. Thus, input variability, phonetic-context effects, and the lack of invariance do not appear to pose a serious problem for everyday speech perception. As reviewed later, however, theoretical accounts aiming to reconcile the complexity of the signal with the effortlessness of perception vary greatly.

Basic Phenomena and Questions in Speech Perception

Following are some of the observations that have shaped theoretical thinking in speech perception over the past 60 years. Most of them concern, in one way or another, the extent to which speech perception is carried out by a part of the auditory system dedicated to speech and involving speech-specific mechanisms not recruited for nonspeech sounds.

Categorical Perception

Categorical perception in a sensory phenomenon whereby a physically continuous dimension is perceived as discrete categories, with abrupt perceptual boundaries between categories and poor discrimination within categories (e.g., perception of the visible electromagnetic radiation spectrum as discrete colors). Early on, categorical perception was found to apply to phonemes—or at least some of them. For example, Liberman, Harris, Hoffman, and Griffith ( 1957 ) showed that synthesized syllables ranging from /ba/ to /da/ to /ga/ by gradually adjusting the transition between the consonant and the vowel’s formants (i.e., the formant transitions) were perceived as falling into coarse /b/, /d/, and /g/ categories, with poor discrimination between syllables belonging to a perceptual category and high discrimination between syllables straddling a perceptual boundary (Fig. 26.4 ). Importantly, categorical perception was not observed for matched auditory stimuli devoid of phonemic significance (Liberman, Harris, Eimas, Lisker, & Bastian, 1961 ). Moreover, since categorical perception meant that easy-to-identify syllables (spectrum endpoints) were also easy syllables to pronounce, whereas less-easy-to-identify syllables (spectrum midpoints) were generally less easy to pronounce, categorical perception was seen as a highly adaptive property of the speech system, and hence, evidence for a dedicated speech mode of the auditory system. This claim was later weakened by reports of categorical perception for nonspeech sounds (e.g., Miller, Wier, Pastore, Kelly, & Dooling, 1976 ) and for speech sounds by nonhuman species (e.g., Kluender, Diehl, & Killeen, 1987 ; Kuhl, 1981 ).

Idealized identification pattern (solid line, left Y axis) and discrimination pattern (dashed line, right Y axis) for categorical perception. Illustration with a /ba/ to /da/ continuum. Identification shows a sharp perceptual boundary between categories. Discrimination is finer around the boundary than inside the categories.

Effects of Phonetic Context

The effect of adjacent phonemes on the acoustic realization of a target phoneme (e.g., /d/ in /di/ vs. /du/) was mentioned earlier as a core element of the variability challenge. This challenge, that is, achieving perceptual constancy despite input variability, is perhaps most directly illustrated by the converse phenomenon, namely, the varying perception of a constant acoustic input as a function of its changing phonetic environment. Mann ( 1980 ) showed that the perception of a /da/-/ga/ continumm was shifted in the direction of reporting more /ga/ when it was preceded by /al/ and more /da/ when it was preceded by /ar/. Since these shifts are in the opposite direction of coarticulation between adjacent phonemes, listeners appear to compensate for the expected consequences of coarticulation. Whether compensation for coarticulation is evidence for a highly sophisticated mechanism whereby listeners use their implicit knowledge of how phonemes are produced—that is, coarticulated—to guide perception (e.g., Fowler, 2006 ) or simply a consequence of long-term association between the signal and the percept (e.g., Diehl, Lotto, & Holt, 2004 ; Lotto & Holt, 2006 ) has been a question of fundamental importance for theories of speech perception, as discussed later.

Integration of Acoustic and Optic Cues

The chief outcome of speech production is the emission of an acoustic signal. However, visual correlates, such as facial and lip movements, are often available to the listener as well. The effect of visual information on speech perception has been extensively studied, especially in the context of the benefit provided by visual cues for listeners with hearing impairments (e.g., Lachs, Pisoni, & Kirk, 2001 ) and for speech perception in noise (e.g., Sumby & Pollack, 1954 ). Visual-based enhancement is also observed for undegraded speech with a semantically complicated content or for foreign-accented speech (Reisberg, McLean, & Goldfield, 1987 ). In the laboratory, audiovisual integration is strikingly illustrated by the well-known McGurk effect. McGurk and McDonald ( 1976 ) showed that listeners presented with an acoustic /ba/ dubbed over a face saying /ga/ tended to report hearing /da/, a syllable whose place of articulation is intermediate between /ba/ and /ga/. The robustness and automaticity of the effect suggest that the acoustic and (visual) articulatory cues of speech are integrated at an early stage of processing. Whether early integration indicates that the primitives of speech perception are articulatory in nature or whether it simply highlights a learned association between acoustic and optic information has been a theoretically divisive debate (see Rosenblum, 2005 , for a review).

Lexical and Sentential Effects on Speech Perception

Although traditional approaches to speech perception often stop where word recognition begins (in the same way that approaches to word recognition often stop where sentence comprehension begins), speech perception has been profoundly influenced by the debate on how higher order knowledge affects the identification and categorization of phonemes and phonetic features. A key observation is that lexical knowledge and sentential context can aid phoneme identification, especially when the signal is ambiguous or degraded. For example, Warren and Obusek ( 1971 ) showed that a word can be heard as intact even when a component phoneme is missing and replaced with noise, for example, “legi*lature,” where the asterisk denotes the replaced phoneme. In this case, lexical knowledge dictates what the listener should have heard rather than what was actually there, a phenomenon referred to as phoneme restoration. Likewise, Warren and Warren ( 1970 ) showed that a word whose initial phoneme is degraded, for example, “*eel,” tends to be heard as “wheel” in “It was found that the *eel was on the axle” and as “peel” in “It was found that the *eel was on the orange.” Thus, phoneme identification can be strongly influenced by lexical and sentential knowledge even when the disambiguating context appears later than the degraded phoneme.

But is this truly of interest for speech perception ? In other words, could phoneme restoration (and other similar speech illusions) simply result from postperceptual, strategic biases? In this case, “*eel” would be interpreted as “wheel” simply because it makes pragmatic sense to do so in a particular sentential context, not because our perceptual system is genuinely tricked by high-level expectations. If so, contextual effects are of interest to speech-perception scientists only insofar as they suggest that speech perception happens in a system that is unpenetrable by higher order knowledge—an unfortunately convenient way of indirectly perpetuating the confinement of speech perception to the study of phoneme identification. The evidence for a postperceptual explanation is mixed. While Norris, McQueen, and Cutler ( 2000 ), Massaro ( 1989 ), and Oden and Massaro ( 1978 ), among others, found no evidence for online top-down feedback to the perceptual system and no logical reasons why such feedback should exist, Samuel ( 1981 , 1997 , 2001 ), Connine and Clifton ( 1987 ), and Magnuson, McMurray, Tanenhaus, and Aslin ( 2003 ), among others, have reported lexical effects on perception that challenge feedforward models—for example, evidence that lexical information truly alters low-level perceptual discrimination (Samuel, 1981 ). This debate has fostered extreme empirical ingenuity over the past decades but comparatively little change to theory. One exception, however, is that the debate has now spread to the long-term effects of higher order knowledge on speech perception. For example, while Norris, McQueen, and Cutler ( 2000 ) argue against online top-down feedback, the same group (2003) recognizes that perceptual (re-)tuning can happen over time, in the context of repeated exposure and learning. Placing the feedforward/feedback debate in the time domain provides an opportunity to examine the speech system at the interface with cognition, and memory functions in particular. It also allows more applied considerations to be introduced, such as the role of perceptual recalibration for second-language learning and speech perception in difficult listening conditions (Samuel & Kraljic, 2009 ), as discussed later.

Theories of Speech Perception (Narrowly and Broadly Construed)

Motor and articulatory-gesture theories.

The Motor Theory of speech perception, reported in a series of articles in the early 1950s by Liberman, Delattre, Cooper, and other researchers from the Haskins Laboratories, was the first to offer a conceptual solution to the lack-of-invariance problem. As mentioned earlier, the main stumbling block for speech-perception theories was the observation that many phonemes cannot uniquely be identified by a set of stable and reliable acoustic cues. For example, the formant transitions of /d/, especially the second formant, differ as a function of the following vowel. However, Delattre et al. ( 1955 ) found commonality between different /d/s by extrapolating the formant transitions back in time to their convergence point, or locus (or hub ; Potter, Kopp, & Green, 1947 ), as shown in Figure 26.3B . Thus, what is common to the formants of all /d/s is the frequency at their origin, that is, the frequency that would best reflect the position of the articulators prior to the release of the consonant. This led to one of the key arguments in support of the motor theory, namely that a one-to-one relationship between acoustics and phonemes can be established if the speech system includes a mechanism that allows the listener to work backward through the rules of production in order to identify the speaker’s intended phonemes. In other words, the lack-of-invariance problem can be solved if it can be demonstrated that listeners perceive speech by identifying the speaker’s intended speech gestures rather than (or in addition to) relying solely on the acoustic manifestation of such gestures. The McGurk effect, whereby auditory perception is dramatically altered by seeing the speaker’s moving lips (articulatory gestures), was an important contributor to the view that the perceptual primitives of speech are gestural in nature.

In addition to claiming that the motor system is recruited for perceiving speech (and partly because of this claim), the Motor Theory also posits that speech perception takes place in a highly specialized and speech-specific module that is neurally isolated and is most likely a unique and innate human endowment (Liberman, 1996 ; Liberman & Mattingly, 1985 ). However, even among supporters of a motor basis for speech perception, agreeing upon an operational definition of intended speech gestures and providing empirical evidence for the contribution of such intended gestures to perception proved difficult. This led Fowler and her colleagues to propose that the objects of speech perception are not intended articulatory gestures but real gestures, that is, actual vocal tract movements that are inferable from the acoustic signal itself (e.g., Fowler, 1986 , 1996 ). Thus, although Fowler’s Direct Realism approach aligns with the Motor Theory in that it claims that perceiving speech is perceiving gestures, it asserts that the acoustic signal itself is rich enough in articulatory information to provide a stable (i.e., invariant) signal-to-phoneme mapping algorithm. In doing so, Direct Realism can do away with claims about specialized and/or innate structures for speech perception.

Although the popularity of the original tenets of the Motor Theory—and, to some extent, associated gesture theories—has waned over the years, the theory has brought forward essential questions about the specificity of speech, the specialization of speech perception, and, more recently, the neuroanatomical substrate of a possible motor component of the speech apparatus (e.g., Gow & Segawa, 2009 ; Pulvermüller et al., 2006 ; Sussman, 1989 ; Whalen et al., 2006 ), a topic that regained interest following the discovery of mirror neurons in the premotor cortex (e.g., Rizzolatti & Craighero, 2004 ; but see Lotto, Hickok, & Holt, 2009 ). The debate has also shifted to a discussion of the extent to which the involvement of articulation during speech perception might in fact be under the listener’s control and its manifestation partly task specific (Yuen, Davis, Brysbaert, & Rastle, 2010 , Fig. 26.5 ; see comments by McGettigan, Agnew, & Scott, 2010 ; Rastle, Davis, & Brysbaert, 2010 ). The Motor Theory has also been extensively reviewed—and revisited—in an attempt to address problems highlighted by auditory-based models, as described later (e.g., Fowler, 2006 , 2008 ; Galantucci, Fowler, & Turvey, 2006 ; Lotto & Holt, 2006 ; Massaro & Chen, 2008 ).

Electropalatographic data showing the proportion of tongue contact on alveolar electrodes during the initial and final portions of /k/-initial (e.g., kib ) or /s/-initial (e.g., s ib ) syllables (collapsed) while a congruent or incongruent distractor is presented (Yuen et al., 2010 ). The distractor was presented auditorily in conditions A and B and visually in condition C. With the target kib as an example, the congruent distractor in the A condition was kib and the incongruent distractor started with a phoneme involving a different place of articulation (e.g., tib ). In condition B, the incongruent distractor started with a phoneme that differed from the target only by its voicing status, not by its place of articulation (e.g., gib ). Condition C was the same as condition A, except that the distractor was presented visually. The results show “traces” of the incongruent distractors in target production when the distractor is in articulatory competition with the target, particularly in the early portion of the phoneme (condition A), but not when it involves the same place of articulation (condition B), or when it is presented visually (condition C). The results suggest a close relationship between speech perception and speech production. (Reprinted from Yuen, I., Davis, M. H., Brysbaert, M., Rastle, K. [2010]. Activation of articulatory information in speech perception. Proceedings of the National Academy of Sciences USA , 107 , 592–597 [Figure 2], by permission of the National Academy of Sciences.)

Auditory Theory(ies)

The role of articulatory gestures in perceiving speech and the special status of the speech-perception system progressively came under attack largely because of insufficient hard evidence and lack of computational parsimony. Recall that recourse to articulatory gestures was originally posited as a way to solve the lack-of-invariance problem and turn a many(acoustic traces)-to-one(phoneme) mapping problem into a one(gesture)-to-one(phoneme) mapping solution. However, the lack of invariance problem turned out to be less prevalent and, at the same time, more complicated than originally claimed. Indeed, as mentioned earlier, many phonemes were found to preserve distinctive features across contexts (e.g., Blumstein & Stevens, 1981 ; Stevens & Blumstein, 1981 ). At the same time, lack of invariance was found in domains for which a gestural explanation was only of limited use, for example, voice quality, loudness, and speech rate.

Perhaps most problematic for gesture-based accounts was the finding by Kluender, Diehl, and Killeen ( 1987 ) that phonemic categorization, which was viewed by such accounts as necessitating access to gestural primitives, could be observed in species lacking the anatomical prerequisites for articulatory knowledge and practice (Japanese quail; Fig. 26.6 ). This result was seen by many as undermining both the motor component of speech perception and its human-specific nature. Parsimony became the new driving force. As Kluender et al. put it, “A theory of human phonetic categorization may need to be no more (and no less) complex than that required to explain the behavior of these quail” (p. 1197). The gestural explanation for compensation for coarticulation effects (Mann, 1980 ) was challenged by a general auditory mechanism as well. In Mann’s experiment, the perceptual shift on the /da/-/ga/ continumm induced by the preceding /al/ versus /ar/ context was explained by reference to articulatory gestures. However, Lotto and Kluender ( 1998 ) found a similar shift when the preceding context consisted of nonspeech sounds mimicking the spectral characteristics of the actual syllables (e.g., tone glides). Thus, the acoustic composition of the context, and in particular its spectral contrast with the following syllable, rather than an underlying reference to abstract articulatory gestures, was able to account for Mann’s context effect (but see Fowler, Brown, & Mann’s, 2000 , subsequent multimodal challenge to the auditory account).

However, auditory theories have been criticized for lacking in theoretical content. Auditory accounts are indeed largely based on counterarguments (and counterevidence) to the motor and gestural theories, rather than resting on a clear set of falsifiable principles (Diehl et al., 2004 ). While it is clear that a great deal of phenomena previously believed to require a gestural account can be explained within an arguably simpler auditory framework, it remains to be seen whether auditory theories can provide a satisfactory explanation for the entire class of phenomena in which the many-to-one puzzle has been observed (e.g., Pardo & Remez, 2006 ).

Pecking rates at test for positive stimuli (/dVs/) and negative stimuli (all others) for one of the quail in Kluender et al.’s ( 1987 ) study in eight vowel contexts. The test session was preceded by a learning phase in which the quail learned to discriminate /dVs/ syllables (i.e., syllables starting with /d/ and ending with /s/, with a varying intervocalic vowel) from /bVs/ and /gVs/ syllables, with four different intervocalic vowels not used in the test phase. During learning, the quail was rewarded for pecking in response to /d/-initial syllables (positive trials) but not to /b/- and /g/-initial syllables (negative trials). The figure shows that, at test, the quail pecked substantially more to positive than negative syllables, even though these syllables contained entirely new vowels, that is, vowels leading to different formant transitions with the initial consonant than those experienced during the learning phase. (Reprinted from Kluender, K. R., Diehl, R. L., & Killeen, P. R. [1987]. Japanese Quail can form phonetic categories. Science , 237 , 1195–1197 [Figure 1], by permission of the National Academy of Sciences.)

Top-Down Theories

This rubric and the following one (bottom-up theories) review theories of speech perception broadly construed . They are broadly construed in that they consider phonemic categorization, the scope of the narrowly construed theories, in the context of its interface with lexical knowledge. Although the traditional separation between narrowly and broadly construed theories originates from the respective historical goals of speech perception and spoken-word recognition research (Pisoni & Luce, 1987 ), an understanding of speech perception cannot be complete without an analysis of the impact of long-term knowledge on early sensory processes (see useful reviews in Goldinger, Pisoni, & Luce, 1996 ; Jusczyk & Luce, 2002 ).

The hallmark of top-down approaches to speech perception is that phonetic analysis and categorization can be influenced by knowledge stored in long-term memory, lexical knowledge in particular. As mentioned earlier, phoneme restoration studies (e.g., Warren & Obusek, 1971 ; Warren & Warren, 1970 ) showed that word knowledge could affect listeners’ interpretation of what they heard, but they did not provide direct evidence that phonetic categorization per se (i.e., perception , as it was referred to in that literature) was modified by lexical expectations. However, Samuel ( 1981 ) demonstrated that auditory acuity was indeed altered when lexical information was available (e.g., “pr*gress” [from “progress”], with * indicating the portion on which auditory acuity was measured) compared to when it was not (e.g., “cr*gress” [from the nonword “crogress”]).

This kind of result (see also, e.g., Ganong, 1980 ; Marslen-Wilson & Tyler, 1980 ; and, more recently, Gow, Segawa, Ahlfors, & Lin, 2008 ) led to conceptualizing the speech system as being deeply interactive, with information flowing not only from bottom to top but also from top down. For example, the TRACE model (more specifically, TRACE II; McClelland & Elman, 1986 ) is an interactive-activation model made of a large number of units organized into three levels: features, phonemes, and words (Fig. 26.7 A). The model includes bottom-up excitatory connections (from features to phonemes and from phonemes to words), inhibitory lateral connections (within each level), and, critically, top-down excitatory connections (from words to phonemes and from phonemes to features). Thus, the activation levels of features, for example, voicing, nasality, and burst, are partly determined by the activation levels of phonemes, and these are partly determined by the activation levels of words. In essence, this architecture places speech perception within a system that allows a given sensory input to yield a different perceptual experience (as opposed to interpretive experience) when it occurs in a word versus a nonword or next to phoneme x versus phoneme y, and so on. TRACE has been shown to simulate a large range of perceptual and psycholinguistic phenomena, for example, categorical perception, cue trading relations, phonetic context effects, compensation for coarticulation, lexical effects on phoneme detection/categorization, segmentation of embedded words, and so on. All this takes place within an architecture that is neither domain nor species specific. Later instantiations of TRACE have been proposed by McClelland ( 1991 ) and Movellan and McClelland ( 2001 ), but all of them preserve the core interactive architecture described in the original model.

Like TRACE, Grossberg’s Adaptive Resonance Theory (ART; e.g., Grossberg, 1986 ; Grossberg & Myers, 1999 ) suggests that perception emerges from a compromise, or stable state, between sensory information and stored lexical knowledge (Fig. 26.7B ). ART includes items (akin to subphonemic features or feature clusters) and list chunks (combinations of items whose composition is the result of prior learning; e.g., phonemes, syllables, or words). In ART, a sensory input activates items that, in turn, activate list chunks. List chunks feed back to component items, and items back to list chunks again in a bottom-up/top-down cyclic manner that extends over time, ultimately creating stable resonance between a set of items and a list chunk. Both TRACE and ART posit that connections between levels are only excitatory and connections within levels are only inhibitory. In ART, in typical circumstances, attention is directed to large chunks (e.g., words), and hence the content of smaller chunks is generally less readily available. Small mismatches between large chunks and small chunks do not prevent resonance, but large mismatches do. In other words, unlike TRACE, ART does not allow the speech system to “hallucinate” information that is not already there (however, for circumstances in which it could, see Grossberg, 2000a ). Large mismatches lead to the establishment of new chunks, and these gain resonance via subsequent exposure. In doing so, ART provides a solution to the stability-plasticity dilemma, that is, the unwanted erasure of prior learning by more recent learning (Grossberg, 1987 ), also referred to as catastrophic interference (e.g., McCloskey & Cohen, 1989 ).

Thus, like TRACE, ART posits that speech perception results from an online interaction between prelexical and lexical processes. However, ART is more deeply grounded in, and motivated by biologically plausible neural dynamics, where reciprocal connectivity and resonance states have been observed (e.g., Felleman & Van Essen, 1991 ). Likewise, ART replaces the hierarchical structure of TRACE with a more flexible one, in which tiers self-organize over time through competitive dynamics—as opposed to being predefined. Although sometimes accused of placing too few constraints on empirical expectations (Norris et al., 2000 ), the functional architecture of ART is thought to be more computationally economical than that of TRACE and more amenable to modeling both real-time and long-term temporal aspects of speech processing (Grossberg, Boardman, & Cohen, 1997 ).

Bottom-Up Theories

Bottom-up theories describe effects of lexical and sentential knowledge on phoneme categorization as a consequence of postperceptual biases. In this conceptualization, reporting “progress” when presented with “pr*gress” simply reflects a strategic decision to do so and the functionality of a system that is geared toward meaningful communication—we generally hear words rather than nonwords. Here, phonetic analysis itself is incorruptible by lexical or sentential knowledge. It takes place within an autonomous module that receives no feedback from lexical and postlexical layers. In Cutler and Norris’s ( 1979 ) Race model, phoneme identification is the result of a time race between a sublexical route and a lexical route activated in parallel in an entirely bottom-up fashion (Fig. 26.7C ). In normal circumstances, the lexical route is faster, which means that a sensory input that has a match in the lexicon (e.g., “progress”) is usually read out from that route. A nonlexical sensory input (e.g., “crogress”) is read out from the prelexical route. In this model, “pr*gress” is reported as containing the phoneme /o/ because the lexical route receives enough evidence to activate the word “progress” and, being faster, this route determines the response. In contrast, “cr*gress” does not lead to an acceptable match in the lexicon, and hence, readout is performed from the sublexical route, with the degraded phoneme being faithfully reported as degraded.

Simplified architecture of ( A ) TRACE, ( B ) ART, ( C ) Race, ( D ) FLMP, and ( E ) Merge. Layers are labeled consistently across models for comparability. Excitatory connections are denoted by arrows. Inhibitory connections are denoted by closed black circles.

Massaro’s Fuzzy Logical Model of Perception (FLMP; Massaro, 1987 , 1996 ; Oden & Massaro, 1978 ) also exhibits a bottom-up architecture, in which various sources of sensory input—for example, auditory, visual—contribute to speech perception without any feedback from the lexical level (Fig. 26.7D ). In FLMP, acoustic-phonetic features are activated multimodally and each feature accumulates a certain level of activation (on a continuous 0-to-1 scale), reflecting the degree of certainty that the feature has appeared in the signal. The profile of features’ activation levels is then compared against a prototypical profile of activation for phonemes stored in memory. Phoneme identification occurs when the match between the actual and prototypical profiles reaches a level determined by goodness-of-fit algorithms. Critically, the match does not need to be perfect to lead to identification; thus, there is no need for lexical top-down feedback. Prelexical and lexical sources of information are then integrated into a conscious percept. Although the extent to which the integration stage can be considered a true instantiation of bottom-up processes is a matter for debate (Massaro, 1996 ), FLMP also predicts that auditory acuity of * is fundamentally comparable in “pr*gress” and “cr*gress”—like the Race model and unlike top-down theories.

From an architectural point of view, integration between sublexical and lexical information is handled differently by Norris et al.’s ( 2000 ) Merge model. In Merge, the phoneme layer is duplicated into an input layer and a decision layer (Fig. 26.7E ). The phoneme input layer feeds forward to the lexical layer (with no top-down connections) and the phoneme decision layer receives input from both the phoneme input layer and the lexical layer. The phoneme decision layer is the place where phonemic and lexical inputs are integrated and where standard lexical phenomena arise (e.g., Ganong, 1980 ; Samuel, 1981 ). While both FLMP and Merge produce a decision by integrating unaltered lexical and sublexical information, the input received from the lexical level differs in the two models. In FLMP, lexical activation is relatively independent from the degree of activation of its component phonemes, whereas, in Merge, lexical activation is directly influenced by the pattern of activation sent upward by the phoneme input layer. While Merge has been criticized for exhibiting a contorted architecture (Gow, 2000 ; Samuel, 2000 ), being ecologically improbable (e.g., Grossberg, 2000b ; Montant, 2000 ; Stevens, 2000 ), and being simply a late instantiation of FLMP (Massaro, 2000 ; Oden, 2000 ), it has gathered the attention of both speech-perception and spoken-word-recognition scientists around a question that is as yet unanswered.

Bayesian Theories

Despite important differences in functional architecture between top-down and bottom-up models, both classes of models agree that speech perception involves distinct levels of representations (e.g., features, phonemes, words), multiple lexical activation, lexical competition, integration (of some sort) between actual sensory input and lexical expectations, and corrective mechanisms (of some sort) to handle incompleteness or uncertainty in the input. A radically different class of models based on optimal Bayesian inference has recently emerged as an alternative to the ones mentioned earlier—recently in psycholinguistics at least. These models eschew the concept of lexical activation altogether, sometimes doing away with the bottom-up/top-down debate itself—or at a minimum blurring the boundaries between the two mechanisms. For instance, in their Shortlist B model, Norris and McQueen ( 2008 ) have replaced activation with the concepts of likelihood and probability, which are seen as better approximations of actual (i.e., imperfect) human performance in the face of actual (i.e., complex and variable) speech input. The appeal of Bayesian computations is substantial because output (or posterior) probabilities, for example, probability that a word will be recognized, are estimated by tabulating both confirmatory and disconfirmatory evidence accumulated over past instances, as opposed to being tied to fixed activation thresholds (Fig. 26.8 ). In particular, Shortlist B has replaced discrete input categories such as features and phonemes with phoneme likelihoods calculated from actual speech data. Because they are derived from real speech, the phoneme likelihoods vary from instance to instance and as a function of the quality of the input and the phonetic context. Thus, while noisier, these probabilities are a better reflection of the type of challenge faced by the speech system in everyday conditions. They also allow the model to provide a single account for speech phenomena that usually require distinct ad-hoc mechanisms in other models. A general criticism leveled against Bayesian models, however, concerns the legitimacy of their priors , that is, the set of assumptions used to determine initial probabilities before any evidence has been gathered (e.g., how expected is a word or a phoneme a priori). Because priors can be difficult to establish, their arbitrariness or the modeler’s own biases can have substantial effects on the model’s outcome. Likewise, compared to the models reviewed earlier, models based on Bayesian inference often lead to less straightforward hypotheses, which makes their testability somewhat limited—even though their performance level in terms of replicating known patterns of data is usually high.

Main Bayesian equation in Shortlist B (Norris & McQueen, 2008 ). P(word i |evidence) is the conditional probability of a specific word ( word i ) having been heard given the available (intact or degraded) input ( evidence ). P(word i ) represents the listener’s prior belief, before any perceptual evidence has been accumulated, that word i will be present in the input. P(word i ) can be approximated from lexical frequencies and contextual variables. The critical term of the equation is P(evidence|word i ) , which is the likelihood of the evidence given word i , that is, the product of the probabilities of the sublexical units (e.g., phonemes) making up word i . This term is important because it acknowledges and takes into account the variability of the input (noise, ambiguity, idiosyncratic realization, etc.) in the input-to-representation mapping process. The probability of word i so calculated is then compared to that of all other words in the lexicon ( n ). Thus, Bayesian inference provides an index of word recognition that considers both lexical and sublexical factors as well as the complexity of a real and variable input.

Tailoring Speech Perception: Learning and Relearning

The literature reviewed so far suggests that perceiving speech involves a set of highly sophisticated processing skills and structures. To what extent are these skills and structures in place at birth? Of particular interest in the context of early theories of speech perception is the way in which speech perception and speech production develop relative to each other and the degree to which perceptual capacities responsible for subtle phonetic discrimination (e.g., voicing distinction) are present in prelinguistic infants. Eimas, Siqueland, Jusczyk, and Vigorito ( 1971 ) showed that 1-month-old infants perceive a voicing-based /ba/-/pa/ continuum categorically, just as adults do. Similarly, like adults (Mattingly, Liberman, Syrdal, & Halwes, 1971 ), young infants show a dissociation between categorical perception with speech and continuous perception with matched nonspeech (Eimas, 1974 ). Infants also seem to start off with an open-ended perceptual system, allowing them to discriminate a wide range of subtle phonetic contrasts—far more contrasts than they will be able to discriminate in adulthood (e.g., Aslin, Werker, & Morgan, 2002 ; Trehub, 1976 ). There is therefore strong evidence that fine speech-perception skills are in place early in life—at least well before the onset of speech production—and operational with minimal, if any, exposure to ambient speech. These findings have led to the idea that speech-specific mechanisms are part of the human biological endowment and have been taken as evidence for the innateness of language, or at least some of its perceptual aspects (Eimas et al., 1971 ). In that sense, an infant has very little to learn about speech perception. If anything, attuning to one’s native language is rather a matter of losing sensitivity to (or unlearning ) phonetic contrasts that have little communicative value for that particular language, for example, the /r/-/l/ distinction for Japanese listeners.

However, the idea that infants are born with a universal discrimination device operating according to a use-it-or-lose-it principle has not been unchallenged. For instance, on closer examination, discrimination capacities at the end of the first year appear far less acute and far less universal than expected (e.g., Lacerda & Sundberg, 2001 ). Likewise, discrimination of irrelevant contrasts does not wane as systematically and as fully as the theory would have it (e.g., Polka, Colantonio, & Sundara, 2001 ). For example, Bowers, Mattys, and Gage ( 2009 ) showed that language-specific phonemes learned in early childhood but never heard or produced subsequently, as would be the case for young children of temporarily expatriate parents, can be relearned relatively easily even decades later (Fig. 26.9A ). Thus, discriminatory attrition is not as widespread and severe as previously believed, suggesting that the representations of phonemes from “forgotten” languages, that is, those we stop practicing early in life, may be more deeply engraved in our long-term memory than we think.

By and large, however, the literature on early speech perception indicates that infants possess fine language-oriented auditory skills from birth as well as impressive capacities to learn from the ambient auditory scene during the first year of life (Fig. 26.10 ). Auditory deprivation during that period (e.g., otitis media; delay prior to cochlear implantation) can have severe consequences on speech perception and later language development (e.g., Clarkson, Eimas, & Marean, 1989 ; Mody, Schwartz, Gravel, & Ruben, 1999 ), possibly due to a general decrease of attention to sounds (e.g., Houston, Pisoni, Kirk, Ying, & Miyamoto, 2003 ). However, even in such circumstances, partial sensory information is often available through the visual channel (facial and lip information), which might explain the relative resilience of basic speech perception skills to auditory deprivation. Indeed, Kuhl and Meltzoff ( 1982 ) showed that, as early as 4 months of age, infants show a preference for matched audiovisual inputs (e.g., audio /a/ with visual /a/) over mismatched inputs (e.g., audio /a/ with visual /i/). Even more striking, infants around that age seem to integrate discrepant audiovisual information following the typical McGurk pattern observed in adults (Rosenblum, Schmuckler, & Johnson, 1997 ). These results suggest that the multimodal (or amodal) nature of speech perception, a central tenet of Massaro’s Fuzzy Logical Model of Perception (FLMP; cf. Massaro, 1987 ), is present early in life and operates without much prior experience with sound-gesture association. Although the strength of the McGurk effect is lower in infants than adults (e.g., Massaro, Thompson, Barron, & Laren, 1986 ; McGurk & MacDonald, 1976 ), early cross-modal integration is often taken as evidence for gestural theories of speech perception and as a challenge to auditory theories.

A question of growing interest concerns the flexibility of the speech-perception system when it is faced with an unstable or changing input. Can the perceptual categories learned during early infancy be undone or retuned to reflect a new environment? The issue of perceptual (re)learning is central to research on second-language (L2) perception and speech perception in degraded conditions. Evidence for a speech-perception-sensitive period during the first year of life (Trehub, 1976 ) suggests that attuning to new perceptual categories later on should be difficult and perhaps not as complete as it is for categories learned earlier. Late learning of L2 phonetic contrasts (e.g., /r/-/l/ distinction for Japanese L1 speakers) has indeed been shown to be slow, effortful, and imperfect (e.g., Logan, Lively, & Pisoni, 1991 ). However, even in those conditions, learning appears to transfer to tokens produced by new talkers (Logan et al., 1991 ) and, to some degree, to production (Bradlow, Pisoni, Akahane-Yamada, & Tohkura, 1997 ). Successful learning of L2 contrasts is not systematically observed, however. For example, Bowers et al. ( 2009 ) found no evidence that English L1 speakers could learn to discriminate Zulu contrasts (e.g., /b/-//) or Hindi contrasts (e.g., /t/ vs. /˛/) even after 30 days of daily training (Fig. 26.9 B ). Thus, although possible, perceptual learning of L2 contrasts is greatly constrained by the age of L2 exposure, the nature and duration of training, and the phonetic overlap between the L1 and L2 phonetic inventories (e.g., Best, 1994 ; Kuhl, 2000 ).

Perceptual learning of accented L1 and noncanonical speech follows the same general patterns as L2 learning, but it usually leads to faster and more complete retuning (e.g., Bradlow & Bent, 2008 ; Clarke & Garrett, 2004 ). A reason for this difference is that, while L2 contrast learning involves the formation of new perceptual categories, whose boundaries are sometimes in direct conflict with L1 categories, accented L1 learning “simply” involves retuning existing perceptual categories, often by broadening their mapping range. This latter feature makes perceptual learning of accented speech a special instance of the more general debate on the episodic versus abstract nature of phonemic and lexical representations. At issue, here, is whether phonemic and lexical representations consist of a collection of episodic instances in which surface details are preserved (voice, accent, speech rate) or, alternatively, single, abstract representations (i.e., one for each phoneme and one for each word). That at least some surface details of words are preserved in long-term memory is undeniable (e.g., Goldinger, 1998 ). The current debate focuses on (1) whether lexical representations include both indexical (e.g., voice quality) and allophonic (e.g., phonological variants) details (Luce, McLennan, & Charles-Luce, 2003 ); (2) whether such details are of a lexical nature (i.e., stored within the lexicon), rather than sublexical (i.e., stored at the subphonemic, phonemic, or syllabic level; McQueen, Cutler, & Norris, 2006 ); (3) the online time course of episodic trace activation (e.g., Luce et al., 2003 ; McLennan, Luce, & Charles-Luce, 2005 ); (4) the mechanisms responsible for consolidating newly learned instances or new perceptual categories (e.g., Fenn, Nusbaum, & Margoliash, 2003 ); and (5) the possible generalization to other types of noncanonical speech, such as disordered speech (e.g., Lee, Whitehall, & Coccia, 2009 ; Mattys & Liss, 2008 ).

( A ) AX discrimination scores over 30 consecutive days (50% chance level; feedback provided) for Zulu contrasts (e.g., /b/-//) and Hindi contrasts (e.g., /t/ vs. /˛/) by DM, a 20-year-old, male, native English speaker who was exposed to Zulu from 4 to 8 years of age but never heard Zulu subsequently. Note DM’s improvement with the Zulu contrasts over the 30 days, in sharp contrast with his inability to learn the Hindi contrasts. ( B ) Performance on the same task by native English speakers with no prior exposure to Zulu or Hindi. (Adapted with permission from Bowers, J. S., Mattys, S. L., & Gage, S. H., [2009]. Preserved implicit knowledge of a forgotten childhood language. Psychological Science , 20 , 1064–1069 [partial Figure 1].)

Summary of key developmental landmarks for speech perception and speech production in the first year of life. (Reprinted from Kuhl, P. K. [2004]. Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience , 5 , 831–843 [Figure 1], by permission of the Nature Publishing Group.)

According to Samuel and Kraljic ( 2009 ), the aforementioned literature should be distinguished from a more recent strand of research that focuses on the specific variables affecting perceptual learning and the mechanisms linking such variables to perception. In particular, Norris, McQueen, and Cutler ( 2003 ) found that lexical information is a powerful source of perceptual recalibration. For example, Dutch listeners repeatedly exposed to a word containing a sound halfway between two existing phonemes (e.g., witlo* , where * is ambiguous between /f/ and /s/, with witlof a Dutch word—chicorey—and witlos a nonword) subsequently perceived a /f/-/s/ continuum as biased in the direction of the lexically induced percept (more /f/ than /s/ in the witlo* case). Likewise, Bertelson, Vroomen, and de Gelder ( 2003 ) found that repeated exposure to McGurk audiovisual stimuli (e.g., audio /a*a/ and visual /aba/ leading to the auditory perception of / aba/) biased the subsequent perception of an audio-only /aba/-/ada/ continuum in the direction of the visually induced percept. Although visually induced perceptual learning seems to be less long-lasting than its lexically induced counterpart (Vroomen, van Linden, Keetels, de Gelder, & Bertelson, 2004 ), the Norris et al. and Bertelson et al. studies demonstrate that even the mature perceptual system can show a certain degree of flexibility when it is faced with a changing auditory environment.

Comparison of speech recognition error rate by machines (ASR) and humans. The logarithmic scale on the Y axis shows that ASR performance is approximately one order of magnitude behind human performance across various speech materials (ASR error rate for telephone conversation: 43%). The data were collated by Lippmann ( 1997 ). (Reprinted from Moore, R. K. [ 2007 ]. Spoken language processing by machine. In G. Gaskell [Ed.], Oxford handbook of psycholinguistics (pp. 723–738). Oxford, UK: Oxford University Press [Figure 44.6], by permission of Oxford University Press.)

Speech Recognition by Machines

This chapter was mainly concerned with human speech recognition (HSR), but technological advances in the past decades have allowed the topic of speech perception and recognition to become an economically profitable challenge for engineers and applied computer scientists. A complete review of Automatic Speech Recognition’s (ASR) historical background, issues, and state of the art is beyond the scope of this chapter. However, a brief analysis of ASR in the context of the key topics in HSR reviewed earlier reveals interesting commonalities as well as divergences among the preoccupations and goals of the two fields.

Perhaps the most notable difference between HSR and ASR is their ultimate aim. Whereas HSR aims to provide a description of how the speech system works (processes, representations, functional architecture, biological plausibility), ASR aims to deliver speech transcriptions as error-free as possible, regardless of the biological and cognitive validity of the underlying algorithms. The success of ASR is typically measured by the percentage of words correctly identified from speech samples varying in their acoustic and lexical complexity. While increasing computer capacity and speed have allowed ASR performance to improve substantially since the early systems of the 1970s (e.g., Jelinek, 1976 ; Klatt, 1977 ), ASR accuracy is still about an order of magnitude behind its HSR counterpart (Moore, 2007 ; see Fig. 26.11 ).

What is the cause of the enduring performance gap between ASR and HSR? Given that the basic constraints imposed by the signal (sequentiality, continuity, variability) are the same for humans and machines, it is tempting to conclude that the gap between ASR and HSR will not be bridged until the algorithms of the former resemble those of the latter. And currently, they do not. The architecture of most ASR systems is almost entirely data driven: Its structure is expressed in terms of a network of sequence probabilities calculated over large corpora of natural speech (and their supervised transcription). The ultimate goal of the corpora, or training data, is to provide a database of acoustic-phonetic information sufficiently large that an appropriate match can be found for any input sound sequence. The larger the corpora, the tighter the fit between the input and the acoustic model (e.g., triphones instantiated in hidden Markov models, HMM, cf. Rabiner & Juang, 1993 ), and the lower the ASR system’s error rate (Lamel, Gauvain, & Adda, 2000 ). By that logic, hours of training corpora, not human-machine avatars, are the solution for increased accuracy, giving support to the controversial assertion that human models have so far hindered rather than promoted ASR progress (Jelinek, 1985 ). However, Moore and Cutler ( 2001 ) estimated that increasing corpus sizes from their current average capacity (1,000 hours or less, which is the equivalent of the average hearing time of a 2-year-old) to 10,000 hours (average hearing time of a 10-year-old) would only drop the ASR error rate to 12%.

Thus, a data-driven approach to speech recognition is constrained by more than just the size of the training data set. For example, the lexical and syntactic content of the training data often determines the application for which the ASR system is likely to perform best. Domain-specific systems (e.g., banking transactions by phone) generally reach high recognition accuracy levels even when they are fed continuous speech produced by various speakers, whereas domain-general systems (e.g., speech-recognition packages on personal computers) often have to compromise on the number of speakers they can recognize and/or training time in order to be effective (Everman et al., 2005 ). Therefore, one of the current stumbling blocks of ASR systems is language modeling (as opposed to acoustic modeling), that is, the extent to which the systems include higher order knowledge—syntax, semantics, pragmatics—from which inferences can be made to refine the mapping between the signal and the acoustic model. Existing ASR language models are fairly simple, drawing upon the distributional methods of acoustic models in that they simply provide the probability of all possible word sequences based on their occurrences in the training corpora. In that sense, an ASR system can predict that “necklace” is a possible completion of “The burglar stole the…” because of its relatively high transitional probability in the corpora, not because of the semantic knowledge that burglars tend to steal valuable items, and not because of the syntactic knowledge that a noun phrase typically follows a transitional verb. Likewise, ASR systems rarely include the kind of lexical feedback hypothesized in HSR models like TRACE (McClelland & Elman, 1986 ) and ART (Grossberg, 1986 ). Like Merge (Norris et al., 2000 ), ASR systems only allow lexical information and the language model to influence the relative weights of activated candidates, but not the fit between the signal and the acoustic model (Scharenborg Norris, ten Bosch, & McQueen, 2005 ).

While the remaining performance gap between ASR and HSR is widely recognized in the ASR literature, there seems to be no clear consensus on the direction to take in order to reduce it (Moore, 2007 ). Given today’s ever-expanding computer power, increasing the size of training corpora is probably the easiest way of gaining a few percentage points in accuracy, at least in the short term. More radical solutions are also being envisaged, however. For example, attempts are being made to build more linguistically plausible acoustic models by using phonemes (as opposed to di/triphone HMMs) as basic segmentation units (Ostendorf, Digilakis, & Kimball, 1996 ; Russell, 1993 ) or by preserving and exploiting fine acoustic detail in the signal instead of treating it as noise (Carlson & Hawkins, 2007 ; Moore & Maier, 2007 ).

The scientific study of speech perception started in the early 1950s under the impetus of research carried out at the Haskins Laboratories, following the development of the Pattern Playback device. This machine allowed Franklin S. Cooper and his colleagues to visualize speech in the form of a decomposable spectrogram and, reciprocally, to create artificial speech by sounding out the spectrogram. Contemporary speech perception research is both a continuation of its earlier preoccupations with the building blocks of speech perception and a departure from them. On the one hand, the quest for universal units of speech perception and attempts to crack the many-to-one mapping code are still going strong. Still alive, too, is the debate about the involvement of gestural knowledge in speech perception, reignited recently by neuroimaging techniques and the discovery of mirror neurons. On the decline are the ideas that speech is special with respect to audition and that infants are born with speech- and species-specific perceptual capacities. On the other hand, questions have necessarily spread beyond the sublexical level, following the assumption that decoding the sensory input must be investigated in the context of the entirety of the language system—or, at the very least, some of its phonologically related components. Indeed, lexical feedback, online or learning related, has been shown to modulate the perceptual experience of an otherwise unchanged input. Likewise, what used to be treated as speech surface details (e.g., indexical variations), and commonly filtered out for the sake of modeling simplicity, are now more fully acknowledged as being preserved during encoding, embedded in long-term representations, and used during retrieval. Speech-perception research in the coming decades is likely to expand its interest not only to the rest of the language system but also to domain-general cognitive functions such as attention and memory as well as practical applications (e.g., ASR) in the field of artificial intelligence. At the same time, researchers have become increasingly concerned with the external validity of their models. Attempts to enhance the ecological contribution of speech research are manifest in a sharp increase in studies using natural speech (conversational, accented, disordered) as the front end of their models.

Aslin, R. N. , Werker, J. F. , & Morgan, J. L. ( 2002 ). Innate phonetic boundaries revisited. Journal of the Acoustical Society of America , 112, 1257–1260.

Bard, E. G. , Shillcock, R. C. , & Altmann, G. T. M. ( 1988 ). The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context. Perception and Psychophysics , 44 , 395–408.

Bertelson, P. , Vroomen, J. , & de Gelder, B. ( 2003 ). Visual recalibration of auditory speech identification: A McGurk aftereffect. Psychological Science , 14, 592–597.

Best, C. T. ( 1994 ). The emergence of native-language phonological influences in infants: A perceptual assimilation model. In H. Nusbaum & J. Goodman (Eds.), The transition from speech sounds to spoken words: The development of speech perception (pp. 167–224). Cambridge, MA: MIT Press.

Blumstein, S. E. , & Stevens, K. N. ( 1981 ). Phonetic features and acoustic invariance in speech.   Cognition , 10, 25–32

Google Scholar

Bowers, J. S. , Mattys, S. L. , & Gage, S. H. ( 2009 ). Preserved implicit knowledge of a forgotten childhood language. Psychological Science , 20, 1064–1069.

Bradlow, A. R. , & Bent, T. ( 2008 ). Perceptual adaptation to non-native speech.   Cognition , 106, 707–729.

Bradlow, A. R. , Pisoni, D. B. , Yamada, R. A. , & Tohkura, Y. ( 1997 ). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America , 101, 2299–2310.

Carlson, R. , & Hawkins, S. (2007). When is fine phonetic detail a detail? In Proceedings of the 16th ICPhS Meeting (pp. 211–214). Saarbrücken, Germany.

Clarke, C. M. , & Garrett, M. F. ( 2004 ). Rapid adaptation to foreign-accented English. Journal of the Acoustical Society of America , 116, 3647–3658.

Clarkson, R. L. , Eimas, P. D. , & Marean, G. C. ( 1989 ). Speech perception in children with histories of recurrent otitis media. Journal of the Acoustical Society of America , 85, 926–933.

Connine, C. M. , & Clifton, C. ( 1987 ) Interactive use of lexical information in speech perception. Journal of Experimental Psychology: Human Perception and Performance , 13, 291–299.

Cutler, A. ( 1994 ). Segmentation problems, rhythmic solutions.   Lingua , 92, 81–104

Cutler, A. , & Norris, D. ( 1979 ). Monitoring sentence comprehension. In W. E. Cooper & E. C. T. Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett (pp. 113–134). Hillsdale, NJ: Erlbaum.

Dahan, D . ( 2010 ). The time course of interpretation in speech comprehension. Current Directions in Psychological Science, 19, 121–126.

Delattre, P. C. , Liberman, A. M. , & Cooper, F. S. ( 1955 ). Acoustic loci and transitional cues for consonants.   Journal of the Acoustical Society of America , 27, 769–773.

Diehl, R. L. , Lotto, A. J. , & Holt, L. L. ( 2004 ). Speech perception. Annual Review of Psychology , 55, 149–179.

Dupoux, E. , & Mehler, J. ( 1990 ). Monitoring the lexicon with normal and compressed speech: Frequency effects and the prelexical code. Journal of Memory and Language , 29, 316–335.

Eimas, P. D. ( 1974 ). Auditory and linguistic processing of cues for place of articulation by infants. Perception and Psychophysics , 16, 513–521.

Eimas, P. D. , Siqueland, E. R. , Jusczyk, P. , & Vigorito, J. ( 1971 ). Speech perception in infants. Science , 171, 303–306.

Everman, G. , Chan, H. Y. , Gales, M. J. F , Jia, B. , Mrva, D. , & Woodland, P. C. ( 2005 ). Training LVCSR systems on thousands of hours of data. In Proceedings of the IEEE ICASSP (pp. 209–212).

Felleman, D. , & Van Essen, D. ( 1991 ). Distributed hierarchical processing in primate cerebral cortex. Cerebral Cortex , 1, 1–47.

Fenn, K. M. , Nusbaum, H. C. , & Margoliash, D. ( 2003 ). Consolidation during sleep of perceptual learning of spoken language. Nature , 425, 614–616.

Fougeron, C. , & Keating, P. A. ( 1997 ). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America , 101, 3728–3740.

Foulke, E. , & Sticht, T. G. ( 1969 ). Review of research on the intelligibility and comprehension of accelerated speech. Psychological Bulletin , 72, 50–62.

Fowler, C. A. ( 1986 ). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics , 14, 3–28.

Fowler, C. A. ( 1996 ). Listeners do hear sounds not tongues.   Journal of the Acoustical Society of America , 99, 1730–1741.

Fowler, C. A. ( 2006 ). Compensation for coarticulation reflects gesture perception, not spectral contrast. Perception and Psychophysics , 68, 161–177.

Fowler, C. A. ( 2008 ). The FLMP STMPed.   Psychonomic Bulletin and Review , 15, 458–462

Fowler, C. A. , Brown, J. M. , & Mann, V. A. ( 2000 ). Contrast effects do not underlie effects of preceding liquids on stop-consonant identification by humans. Journal of Experimental Psychology: Human Perception and Performance , 26, 877–888.

Galantucci, B. , Fowler, C. A. , & Turvey, M. T. ( 2006 ). The motor theory of speech perception reviewed. Psychonomic Bulletin and Review , 13, 361–377.

Ganong, W. F. ( 1980 ). Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance , 6, 110–125.

Goldinger, S. D. ( 1998 ). Echoes of echoes? An episodic theory of lexical access. Psychological Review , 105, 251–279.

Goldinger, S. D. , Pisoni, D. B. , & Luce, P. A. ( 1996 ). Speech perception and spoken word recognition: Research and theory. In N. J. Lass (Ed.), Principles of experimental phonetics (pp. 277–327). St. Louis, MO: Mosby.

Google Preview

Gow, D. W. ( 2000 ). One phonemic representation should suffice.   Behavioral and Brain Science , 23, 331.

Gow, D. W. , & Segawa, J. A. ( 2009 ). Articulatory mediation of speech perception: A causal analysis of multi-modal imaging data. Cognition , 110, 222–236.

Gow, D. W. , Segawa, J. A. , Ahlfors, S. P. , & Lin, F. H. ( 2008 ). Lexical influences on speech perception: A Granger causality analysis of MEG and EEG source estimates. Neuroimage , 43, 614–23.

Grossberg, S. ( 1986 ). The adaptive self-organization of serial order in behavior: Speech, language, and motor control. In E. C. Schwab & H. C. Nusbaum (Eds.), Pattern recognition by humans and machines, Vol 1. Speech perception (pp. 187–294). New York: Academic Press.

Grossberg, S. ( 1987 ). Competitive learning: From interactive activations to adaptive resonance. Cognitive Science , 11, 23–63

Grossberg, S. ( 2000 a). How hallucinations may arise from brain mechanisms of learning, attention, and volition. Journal of the International Neuropsychological Society , 6, 579–588.

Grossberg, S. ( 2000 b). Brain feedback and adaptive resonance in speech perception. Behavioral and Brain Science , 23, 332–333.

Grossberg, S. , Boardman, I. , & Cohen, M. A. ( 1997 ). Neural dynamics of variable-rate speech categorization. Journal of Experimental Psychology: Human Perception and Performance , 23, 481–503.

Grossberg, S. , & Myers, C. ( 1999 ). The resonant dynamics of conscious speech: Interword integration and duration-dependent backward effects. Psychological Review , 107, 735–767.

Houston, D. M. , Pisoni, D. B. , Kirk, K. I. , Ying, E. A. , & Miyamoto, R. T. ( 2003 ). Speech perception skills of infants following cochlear implantation: A first report. International Journal of Pediatric Otorhinolaryngology , 67, 479–495.

Huggins, A.W. F. ( 1975 ). Temporally segmented speech and “echoic” storage. In A. Cohen & S. G. Nooteboom (Eds.), Structure and process in speech perception (pp. 209–225). New York: Springer-Verlag.

Jelinek, F. ( 1976 ). Continuous speech recognition by statistical methods. Proceedings of the IEEE , 64, 532–556.

Jelinek, F. ( 1985 ). Every time I fire a linguist, the performance of my system goes up . Public statement at the IEEE ASSPS Workshop on Frontiers of Speech Recognition, Tampa, Florida.

Jusczyk, P. W. , & Luce, P. A. ( 2002 ). Speech perception and spoken word recognition: Past and present. Ear and Hearing , 23, 2–40.

Klatt, D. H. ( 1977 ). Review of the ARPA speech understanding project.   Journal of the Acoustical Society of America , 62, 1345–1366.

Kluender, K. R. , Diehl, R. L. , & Killeen, P. R. ( 1987 ). Japanese quail can form phonetic categories.   Science , 237, 1195–1197.

Kuhl, P. K. ( 1981 ). Discrimination of speech by non-human animals: Basic auditory sensitivities conductive to the perception of speech-sound categories. Journal of the Acoustical Society of America , 95, 340–349.

Kuhl, P. K. ( 2000 ). A new view of language acquisition.   Proceedings of the National Academy of Sciences USA , 97, 11850–11857.

Kuhl, P. K. ( 2004 ). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience , 5, 831–843.

Kuhl, P. K. , & Meltzoff, A. N. ( 1982 ). The bimodal development of speech in infancy. Science , 218, 1138–1141.

Lacerda, F. , & Sundberg, U. ( 2001 ). Auditory and articulatory biases influence the initial stages of the language acquisition process. In F. Lacerda , C. von Hofsten , & M. Heimann (Eds.), Emerging cognitive abilities in early infancy (pp. 91–110). Mahwah, NJ: Erlbaum.

Lachs, L. , Pisoni, D. B. , & Kirk, K. I. ( 2001 ). Use of audio-visual information in speech perception by pre-lingually deaf children with cochlear implants: A first report. Ear and Hearing , 22, 236–251.

Lamel, L. , Gauvain, J-L. , & Adda, G. ( 2000 ). Lightly supervised acoustic model training. In Proceeding of the ISCA Workshop on Automatic Speech Recognition (pp. 150–154).

Lee, A. , Whitehall, T. L. , & Coccia, V. ( 2009 ). Effect of listener training on perceptual judgement of hypernasality. Clinical Linguistics and Phonetics , 23, 319–334.

Liberman, A. M. ( 1996 ). Speech: A special code . Cambridge, MA: MIT Press.

Liberman, A. M. , Delattre, P. C. , & Cooper, F. S. ( 1958 ). Some cues for the distinction between voiced and voiceless stops in initial position. Language and Speech , 1, 153–167.

Liberman, A. M. , Harris, K. S. , Eimas, P. , Lisker, L. , & Bastian, J. ( 1961 ). An effect of learning on speech perception: The discrimination of durations of silence with and without phonemic significance. Language and Speech , 4, 175–195.

Liberman, A. M. , Harris, K. S. , Hoffman, H. S. , & Griffith, B. C. ( 1957 ). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology , 54, 358–368.

Liberman, A. M. , & Mattingly, I. G. ( 1985 ). The motor theory of speech perception revised. Cognition , 21, 1–36.

Lindblom, B. ( 1963 ). Spectrographic study of vowel reduction.   Journal of the Acoustical Society of America , 35, 1773–1781.

Lippmann, R. ( 1997 ). Speech recognition by machines and humans.   Speech Communication , 22, 1–16.

Lisker, L. , & Abramson, A. S. ( 1970 ). The voicing dimensions: Some experiments in comparative phonetics. In Proceedings of the Sixth International Congress of Phonetic Sciences (pp. 563–567) . Prague, Czechoslovakia: Academia.

Logan, J. S. , Lively, S. E. , & Pisoni, D. B. ( 1991 ). Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America , 89, 874–886.

Lotto, A. J. , Hickok, G. S. , & Holt, L. L. ( 2009 ). Reflections on mirror neurons and speech perception.   Trends in Cognitive Science , 13, 110–114.

Lotto, A. J. , & Holt, L. L. ( 2006 ). Putting phonetic context effects into context: A commentary on Fowler (2006). Perception and Psychophysics , 68, 178–183.

Lotto, A. J. , & Kluender, K. R. ( 1998 ). General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification. Perception and Psychophysics , 60, 602–619.

Luce, P. A. , Mc Lennan, C. T. , & Charles-Luce, J. ( 2003 ). Abstractness and specificity in spoken word recognition: Indexical and allophonic variability in long-term repetition priming. In J. Bowers & C. Marsolek (Eds.), Rethinking implicit memory (pp. 197–214). New York: Oxford University Press.

Magnuson, J. S. , McMurray, B. , Tanenhaus, M. K. , & Aslin, R. N. ( 2003 ). Lexical effects on compensation for coarticulation: The ghost of Christmas past. Cognitive Science , 27, 285–298.

Mann, V. A. ( 1980 ). Influence of preceding liquid on stop-consonant perception. Perception and Psychophysics , 28, 407–412.

Marslen-Wilson, W. D. ( 1987 ). Functional parallelism in spoken word recognition. Cognition , 25, 71–102.

Marslen-Wilson, W. D. , & Tyler, L. K. ( 1980 ). The temporal structure of spoken language understanding. Cognition , 8, 1–71.

Massaro, D. W. ( 1987 ). Speech perception by ear and eye: A paradigm for psychological inquiry . Hillsdale, NJ: Erlbaum.

Massaro, D. W. ( 1989 ). Testing between the TRACE model and the Fuzzy Logical Model of speech perception. Cognitive Psychology , 21, 398–421.

Massaro, D. W. ( 1996 ). Integration of multiple sources of information in language processing. In T. Inui & J. L. McClelland (Eds.), Attention and performance XVI: Information integration in perception and communication (pp. 397–432). Cambridge, MA: MIT Press.

Massaro, D. W. ( 2000 ). The horse race to language understanding: FLMP was first out of the gate and has yet to be overtaken. Behavioral and Brain Science , 23, 338–339.

Massaro, D. W. , & Chen, T. H. ( 2008 ). The motor theory of speech perception revisited. Psychonomic Bulletin and Review , 15, 453–457.

Massaro, D. W. , Thompson, L. A. , Barron, B. , & Laren, E. ( 1986 ). Developmental changes in visual and auditory contributions to speech perception. Journal of Experimental Child Psychology , 41, 93–113.

Mattingly, I. G. , Liberman, A. M. , Syrdal A. K. , & Halwes T. ( 1971 ). Discrimination in speech and nonspeech modes.   Cognitive Psychology , 2, 131–157.

Mattys, S. L. ( 1997 ). The use of time during lexical processing and segmentation: A review. Psychonomic Bulletin and Review , 4, 310–329.

Mattys, S. L. , & Liss, J. M. ( 2008 ). On building models of spoken-word recognition: When there is as much to learn from natural “oddities” as from artificial normality. Perception and Psychophysics , 70, 1235–1242.

Mattys, S. L. , White, L. , & Melhorn, J. F ( 2005 ). Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General , 134, 477–500.

McClelland, J. L. ( 1991 ). Stochastic interactive processes and the effect of context on perception. Cognitive Psychology , 23, 1–44.

McClelland, J. L. , & Elman, J. L. ( 1986 ). The TRACE model of speech perception. Cognitive Psychology , 18, 1–86.

McCloskey, M. , & Cohen, N. J. ( 1989 ). Catastrophic interference in connectionist networks: The sequential learning problem. The Psychology of Learning and Motivation , 24, 109–165.

McGettigan, C. , Agnew, Z. K. , & Scott, S. K. ( 2010 ). Are articulatory commands automatically and involuntarily activated during speech perception? Proceedings of the National Academy of Sciences USA , 107, E42.

McGurk, H. , & MacDonald, J. W. ( 1976 ). Hearing lips and seeing voices.   Nature , 264, 746–748.

McLennan, C. T. , Luce, P. A. , & Charles-Luce, J. ( 2005 ). Examining the time course of indexical specificity effects in spoken word recognition. Journal of Experimental Psychology: Learning Memory and Cognition , 31, 306–321.

McQueen, J. M. ( 1998 ). Segmentation of continuous speech using phonotactics. Journal of Memory and Language , 39, 21–46.

McQueen, J. M. , Cutler, A. , & Norris, D. ( 2006 ). Phonological abstraction in the mental lexicon. Cognitive Science , 30, 1113–1126.

Miller, J. D.   Wier, C. C. , Pastore, R. , Kelly, W. J. , & Dooling, R. J. ( 1976 ). Discrimination and labeling of noise-buzz sequences with varying noise lead times: An example of categorical perception. Journal of the Acoustical Society of America , 60, 410–417.

Miller, J. L. , & Liberman, A. M. ( 1979 ). Some effects of later-occurring information on the perception of stop consonant and semivowel. Perception and Psychophysics , 25, 457–465.

Mody, M. , Schwartz, R. G. , Gravel, R. S. , & Ruben, R. J. ( 1999 ). Speech perception and verbal memory in children with and without histories of otitis media. Journal of Speech, Language and Hearing Research , 42, 1069–1079.

Montant, M. ( 2000 ). Feedback: A general mechanism in the brain.   Behavioral and Brain Science , 23, 340–341.

Moore, R. K. ( 2007 ). Spoken language processing by machine. In G. Gaskell (Ed.), Oxford handbook of psycholinguistics (pp. 723–738). Oxford, England: Oxford University Press.

Moore, R. K. , & Cutler, A. (2001, July 11-13). Constraints on theories of human vs. machine recognition of speech . Paper presented at the SPRAAC Workshop on Human Speech Recognition as Pattern Classification, Max-Planck-Institute for Psycholinguistics, Nijmegen, The Netherlands.

Moore, R. K. , & Maier, V . (2007). Preserving fine phonetic detail using episodic memory: Automatic speech recognition using MINERVA2. In Proceedings of the 16th ICPhS Meeting (pp. 197–203). Saarbrücken, Germany.

Movellan, J. R. , & McClelland, J. L. ( 2001 ). The Morton-Massaro law of information integration: Implications for models of perception. Psychological Review , 108, 113–148.

Nooteboom, S. G. ( 1979 ). The time course of speech perception. In W. J. Barry & K. J. Kohler (Eds.), “Time” in the production and perception of speech (Arbeitsberichte 12). Kiel, Germany: Institut für Phonetik, University of Kiel.

Norris, D. ( 1994 ). Shortlist: A connectionist model of continuous speech recognition. Cognition , 52, 189–234.

Norris, D. , & McQueen, J. M. ( 2008 ). Shortlist B: A Bayesian model of continuous speech recognition. Psychological Review , 115 , 357–395.

Norris, D. , McQueen. J. M. , & Cutler, A. ( 2000 ). Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences , 23, 299–370.

Norris, D. , McQueen, J. M. , & Cutler, A. ( 2003 ). Perceptual learning in speech. Cognitive Psychology , 47, 204–238.

Oden, G. C. ( 2000 ). Implausibility versus misinterpretation of the FLMP.   Behavioral and Brain Science , 23, 344.

Oden, G. C. , & Massaro, D. W. ( 1978 ). Integration of featural information in speech perception. Psychological Review , 85, 172–191.

Ostendorf, M. , Digilakis, V. , & Kimball, O. A. ( 1996 ). From HMMs to segment models: A unified view of stochastic modelling for speech recognition. IEEE Transactions, Speech and Audio Processing , 4, 360–378.

Pardo, J. S. , & Remez, R. E. ( 2006 ). The perception of speech. In M. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2nd ed., pp. 201–248). New York: Academic Press.

Pisoni, D. B. , & Luce, P. A. ( 1987 ). Acoustic-phonetic representations in word recognition. Cognition , 25, 21–52.

Pitt, M. A. , & Samuel, A. G. ( 1995 ). Lexical and sublexical feedback in auditory word recognition. Cognitive Psychology , 29 , 149–188.

Polka, L. , Colantonio, C. , & Sundara, M. ( 2001 ). A cross-language comparison of /d/–/Δ/ perception: Evidence for a new developmental pattern. Journal of the Acoustical Society of America , 109, 2190–2201.

Port, R. F. (1977). The influence of speaking tempo on the duration of stressed vowel and medial stop in English Trochee words . Unpublished Ph.D. dissertation, Indiana University, Bloomington.

Potter, R. K. , Kopp, G. A. , & Green, H. C. ( 1947 ). Visible speech . New York: D. Van Nostrand.

Pulvermüller, F. , Huss, M. , Kherif, F. , Moscoso Del Prado Martin, F. , Hauk, O. , & Shtyrof, Y. ( 2006 ). Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences USA , 103, 7865–7870.

Rabiner, L. , & Juang, B. H. ( 1993 ). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

Radeau, M. , Morais, J. , Mousty, P. , & Bertelson, P. ( 2000 ). The effect of speaking rate on the role of the uniqueness point in spoken word recognition. Journal of Memory and Language , 42, 406–422.

Rastle, K. , Davis, M. H. , & Brysbaert, M. , ( 2010 ). Response to McGettigan et al.: Task-based accounts are not sufficiently coherent to explain articulatory effects in speech perception. Proceedings Proceedings of the National Academy of Sciences USA , 107, E43.

Reisberg, D. , Mc Lean, J. , & Goldfield, A. ( 1987 ). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In R. Campbell & B. Dodd (Eds.), Hearing by eye: The psychology of lip-reading (pp. 97–114). Hillsdale, NJ: Erlbaum.

Rizzolatti, G. , & Craighero, L. ( 2004 ). The mirror-neuron system.   Annual Review of Neuroscience , 27, 169–192,

Rosenblum, L. D. ( 2005 ). Primacy of multimodal speech perception. In D. B. Pisoni & R. E. Remez (Eds.), The handbook of speech perception (pp. 51–78). Oxford, England: Blackwell.

Rosenblum, L. D. , Schmuckler, M. A. , & Johnson, J. A. ( 1997 ). The McGurk effect in infants.   Perception and Psychophysics , 59, 347–357.

Russell, M. J. (1993). A segmental HMM for speech pattern modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 640–643)

Samuel, A. G. ( 1981 ). Phonemic restoration: Insights from a new methodology. Journal of Experimental Psychology: General , 110, 474–494.

Samuel, A. G. ( 1997 ). Lexical activation produces potent phonemic percepts. Cognitive Psychology , 32, 97–127.

Samuel, A. G. ( 2000 ). Merge: Contorted architecture, distorted facts, and purported autonomy. Behavioral and Brain Science , 23, 345–346.

Samuel, A. G. ( 2001 ). Knowing a word affects the fundamental perception of the sounds within it. Psychological Science , 12, 348–351.

Samuel, A. G. , & Kraljic, T. ( 2009 ). Perceptual learning for speech.   Attention, Perception, and Psychophysics , 71, 1207–1218.

Scharenborg, O. , Norris, D. , ten Bosch, L. , & Mc Queen, J. M. ( 2005 ). How should a speech recognizer work ? Cognitive Science , 29, 867–918.

Stevens, K. N. ( 2000 ). Recognition of continuous speech requires top-down processing. Behavioral and Brain Science , 23, 348.

Stevens, K. N. , & Blumstein, S. E. ( 1981 ). The search for invariant acoustic correlates of phonetic features. In P. Eimas & J. Miller (Eds.), Perspectives on the study of speech (pp. 1–38). Hillsdale, NJ: Erlbaum.

Sumby, W. H. , & Pollack, I. ( 1954 ). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America , 26, 212–215.

Sussman, H. M. ( 1989 ). Neural coding of relational invariance in speech: Human language analogs to the barn owl. Psychological Review , 96, 631–642.

Summerfield, A. Q. ( 1981 ). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception and Performance , 7, 1074–1095.

Trehub, S. E. ( 1976 ). The discrimination of foreign speech contrasts by infants and adults. Child Development , 47, 466–472.

Umeda, N. , & Coker, C. H. ( 1974 ). Allophonic variation in American English.   Journal of Phonetics , 2, 1–5.

van Buuren, R. A. , Festen, J. , & Houtgast, T . ( 1999 ). Compression and expansion of the temporal envelope: Evaluation of speech intelligibility and sound quality. Journal of the Acoustical Society of America , 105, 2903–2913.

Vroomen, J. , Van Linden, B. , Keetels, M. , de Gelder, B. , & Bertelson, P. ( 2004 ). Selective adaptation and recalibration of auditory speech by lipread information: Dissipation. Speech Communication , 44, 55–61.

Warren, R. M. , & Obusek, C. J. ( 1971 ). Speech perception phonemic restorations. Perception & Psychophysics , 9 , 358–362.

Warren, R. M. , & Warren, R. P. ( 1970 ). Auditory illusions and confusions.   Scientific American , 223 , 30–36.

Whalen, D. H. , Benson, R. R. , Richardson, M. , Swainson, B. , Clark, V. P. , Lai, S. ,… Liberman, A. M. ( 2006 ). Differentiation of speech and nonspeech processing within primary auditory cortex. Journal of the Acoustical Society of America , 119, 575–581.

Yuen, I. , Davis, M. H. , Brysbaert, M. , & Rastle, K. ( 2010 ). Activation of articulatory information in speech perception. Proceedings of the National Academy of Sciences USA , 107, 592–597.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • APA Open Access

Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology

Ben alderson-day.

1 Department of Psychology, Durham University

Charles Fernyhough

The authors would like to thank David Smailes, Peter Moseley, Sam Wilkinson, and Elizabeth Meins for their comments on drafts of the manuscript.

This work was supported by the Wellcome Trust (WT098455).

Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes.

When people reflect upon their own inner experience, they often report that it has a verbal quality ( Baars, 2003 ). Also referred to as verbal thinking , inner speaking, covert self-talk , internal monologue , and internal dialogue , inner speech has been proposed to have an important role in the self-regulation of cognition and behavior in both childhood and adulthood, with implications for inner speech dysfunction in psychiatric conditions and developmental disorders involving atypical language skills or deficits in self-regulation ( Diaz & Berk, 1992 ; Fernyhough, 1996 ; Vygotsky, 1934/1987 ). Despite its apparent importance for human cognition, inner speech has received relatively little attention from psychologists and cognitive neuroscientists, partly due to methodological problems involved in its study. Nevertheless, a large body of empirical work has arisen relating to inner speech, albeit in rather disparate research areas, and it plays an increasingly prominent role in psychological theorizing ( Dolcos & Albarracín, 2014 ; Fernyhough & McCarthy-Jones, 2013 ; Hurlburt, Heavey, & Kelsey, 2013 ; Oppenheim & Dell, 2010 ; Williams, Bowler, & Jarrold, 2012 ).

The aim of the present article is to review the existing empirical work on inner speech and provide a theoretical integration of well-established and more recent research findings. First, we summarize the key theoretical positions that have been advanced relating to the development, cognitive functions, and phenomenology of inner speech. We then consider methodological issues that attend the study of inner speech. Next, we consider how inner speech emerges in childhood. In the fourth section, we consider the phenomenology of inner speech in adulthood along with its cognitive functions. We then review what is known about inner speech in atypical populations before considering neuropsychological evidence relevant to theorizing about its functional significance. Finally, we consider prospects for an integrated cognitive science of inner speech, combining developmental, cognitive, psycholinguistic, and neuropsychological evidence to provide a multicomponent model of the phenomenon.

Inner speech can be defined as the subjective experience of language in the absence of overt and audible articulation. This definition is necessarily simplistic: as the following will demonstrate, experiences of this kind vary widely in their phenomenology, their addressivity to others, their relation to the self, and their similarity to external speech. Inner speech, on these terms, incorporates but does not reduce to phenomena such as subvocal rehearsal (the use of phonological codes for the maintenance of information in working memory). The concept is also sometimes used interchangeably with thinking , to the extent that a close focus on the phenomenological, developmental, and cognitive features of inner speech necessitates a certain amount of redefinition of that term. In what follows, we will avoid talking about thinking in favour of mental processes that can be more tightly specified.

Given this diversity in terminology, our literature search covered a broad range of research areas and depended considerably on secondary sources and citation lists of key articles. Web of Knowledge , PsycINFO , and Google Scholar were searched for articles published from 1980–2014 containing the following keywords: inner speech, private speech, self-talk, covert speech, silent speech, verbal thinking, verbal mediation, inner monologue, inner dialogue, inner voice, articulatory imagery, voice imagery, speech imagery, and auditory verbal imagery . Both empirical and theoretical articles were permitted. Studies that only covered externalized forms of self-talk were generally not included, unless they referred to a relevant effect or population where inner speech data were not available; for instance, to our knowledge there have been no studies specifically studying inner speech in attention deficit hyperactivity disorder (ADHD), but there is research on private speech (e.g., Corkum, Humphries, Mullane, & Theriault, 2008 ). Where a recent review on a topic had been published (such as Hubbard, 2010 , on auditory imagery; or Winsler et al., 2009 , on private speech) we chose to selectively discuss studies in that area, and refer the reader to relevant summaries.

Theories of Inner Speech

Noting a possible reason for the relative neglect of the phenomenology of inner speech, Riley (2004) observes that “the fact of its insistent indwelling can blind us to its peculiarities” (p. 8). And yet inner speech has long had an important role to play in psychological theorizing. Plato (undated 1987) noted that a dialogic conversation with the self is a familiar aspect of human experience. Although inner speech figures in a variety of psychological, neuroscientific, and philosophical discourses ( Fernyhough, 2013 ), its nature, development, phenomenology, and functional significance have received little theoretical or empirical attention. One reason for this is that inner speech by definition cannot be directly observed, limiting the scope for its empirical study and requiring the development of methodologies for studying it indirectly (see Methodological Issues). While there exists a range of theoretical perspectives on inner speech (e.g., Larrain & Haye, 2012 ; Morin, 2005 ; Oppenheim & Dell, 2010 ), two in particular have proved influential for theorizing about its cognitive functions. One relates to the development of verbal mediation of cognition and behavior, and one relates to rehearsal and working memory.

Vygotsky’s Theory

In Vygotsky’s (1934/1987) theory of cognitive development, inner speech is the outcome of a developmental process. Vygotsky assumed that understanding how such a phenomenon emerges over the life span is necessary for full comprehension of its subjective qualities and functional characteristics. Via a mechanism of internalization, linguistically mediated social exchanges (such as those between the child and a caregiver) are transformed, in Vygotsky’s model, into an internalized “conversation” with the self. The development of verbal mediation is envisaged as the process through which children become able to use language and other sign systems to regulate their own behavior. Prelinguistic intelligence is thus reshaped by language to create what Vygotsky and his student Luria termed a “functional system,” a key concept in their antimodularist view of functional localization in the brain ( Fernyhough, 2010 ; Luria, 1965 ; Vygotsky, 1934/1987 ).

Vygotsky formulated his view of inner speech in contrast to the theory of John B. Watson. Best known as a founder of behaviorism, Watson saw inner speech (which he identified with “thinking”) as resulting from a process of the gradual reduction of self-directed speech: in other words, a purely mechanical process in which speech becomes quieter and quieter until it is first merely a whisper, and then silent thought ( Watson, 1913 ). This view of inner speech as subvocalized language was, Vygotsky believed, mistaken ( Berk, 1992 ). Rather, he contended, inner speech is profoundly transformed in the process of internalization, and its development involves processes more complex than the mere attenuation of the behavioral components of speaking.

Vygotsky saw support for his theory in the phenomenon now known as private speech (previously egocentric speech ), in which children talk to themselves while engaged in a cognitive task. In Vygotsky’s (1934/1987) theory, private speech represents a transitional stage in the process of internalization in which interpersonal dialogues are not yet fully transformed into intrapersonal ones. Vygotsky saw private speech as having a primary role in the self-regulation of cognition and behavior, with the child gradually taking on greater strategic responsibility for activities that previously required the input of an expert other (such as a caregiver). Empirical research since Vygotsky’s time has challenged this unifunctional view of private speech, with self-directed talk now proposed to have multiple functions including pretense, practice for social encounters, language practice, and so on ( Berk, 1992 ). Most studies point to private speech being an almost universal feature of development ( Winsler, De León, Wallace, Carlton, & Willson-Quayle, 2003 ), although there are important individual differences in frequency and quality of self-talk ( Lidstone, Meins, & Fernyhough, 2011 ). It is also now acknowledged that private speech does not atrophy after the completion of internalization, but can persist into adulthood as a valuable self-regulatory and motivational tool.

As noted, the developmental transition envisaged by Vygotsky (from social to private to inner speech) was proposed to be accompanied by both syntactic and semantic transformations (see Fernyhough & McCarthy-Jones, 2013 ). Internalization involves the abbreviation of the syntax of internalized language, which results in inner speech having a “note-form” quality (in which the “psychological subject” or topic of the utterance is already known to the thinker) compared with external speech. Vygotsky identified three main semantic transformations accompanying internalization: the predominance of sense over meaning (in which personal, private meanings achieve a greater prominence than conventional, public ones); the process of agglutination (the development of hybrid words signifying complex concepts); and the infusion of sense (in which specific elements of inner language become infused with more semantic associations that are present in their conventional meanings). For example, a word like “interview” might have a clear referent (an upcoming appointment), but its sense could mean much more when uttered in inner speech: worry, performance anxiety, hopes for the future, or the need to prepare.

Vygotsky’s ideas about inner speech have been extended in recent theoretical and empirical research. Fernyhough (2004) proposed that inner speech should take two distinct forms: expanded inner speech , in which internal dialogue retains many of the phonological properties and turn-taking qualities of external dialogue, and condensed inner speech , in which the semantic and syntactic transformations that accompany internalization are taken to their conclusion, and inner speech approaches the state of “thinking in pure meanings” described by Vygotsky (1934/1987) . In this latter form of inner speech, the phonological qualities of the internalized speech are attenuated and the multiple perspectives ( Fernyhough, 1996 , 2009a ) that constitute the dialogue are manifested simultaneously. In Fernyhough’s model, the default setting for inner speech is condensed, with the transition to expanded inner speech resulting from stress and cognitive challenge.

Recent empirical research has been largely supportive of Vygotskian claims about the functional significance of private speech, particularly its relations to task difficulty and task performance ( Al-Namlah, Fernyhough, & Meins, 2006 ; Fernyhough & Fradley, 2005 ; Winsler, Fernyhough, & Montero, 2009 ), and its developmental trajectory ( Winsler & Naglieri, 2003 ). Vygotsky’s ideas about the role of such mediation in self-regulation have begun to be integrated into modern research into the executive functions, the heterogeneous set of cognitive capacities responsible for the planning, inhibition, and control of behavior (e.g., Cragg & Nation, 2010 ; Williams, Bowler, & Jarrold, 2012 ). One implication of Vygotsky’s theory, that inner speech is dialogic in nature, has been proposed to be important in domains such as social understanding ( Davis, Meins, & Fernyhough, 2013 ) and creativity ( Fernyhough, 2008 , 2009a ). Inner speech has also been proposed to have an important role in metacognition, self-awareness, and self-understanding ( Morin, 2005 ).

Inner Speech in Working Memory

A second important theoretical perspective concerns the role of inner speech in working memory. Working memory refers to the retention of information “online” during a complex task, such as keeping a set of directions in mind while navigating around a new building, or rehearsing a shopping list.

Models of working memory vary in terms of whether it is considered a single or multicomponent process, its relation to attention, and the importance of individual differences ( Miyake & Shah, 1999 ). The theory most pertinent to discussing inner speech—and still the most influential approach—is that derived from Baddeley and Hitch’s (1974) multicomponent model. Baddeley and Hitch proposed that working memory comprised three components: a central executive, responsible for the allocation and management of attentional resources; the phonological (sometimes known as the articulatory) loop, a slave system responsible for the representation of acoustic, verbal, or phonological information; and a visuospatial scratchpad, a slave system that serves visual and spatial aspects of task-based short-term memory (STM). Baddeley (2000) also added a fourth component, the episodic buffer, a multimodal temporary store that can bind concurrent stimuli and draw on information from long-term memory.

The distinction between slave systems in Baddeley and Hitch’s model has produced a large body of research on the operations of verbal working memory. In this model, the phonological loop is made up of two subcomponents: a passive, phonological store, with a decay time of 1–2 s, and an active rehearsal mechanism that uses offline speech planning processes—in other words, inner speech, or something very similar ( Baddeley, 1992 ).

Support for the independence of a phonological loop from other working memory processes has largely come from evidence of interference effects in dual-task studies. In such paradigms, participants are asked to encode a set of target stimuli—such as learning a list of words—while engaging in a secondary task which either involves verbal or visuospatial processing. A typical verbal distractor method is articulatory suppression: engaging the articulators in a separate task (such as repeating days of the week) has been shown to disrupt memory for verbal material in numerous studies (e.g., Baddeley, Lewis & Vallar, 1984 ). In contrast, tapping out particular spatial patterns selectively affects visuospatial working memory skills, leading to impaired recall in that domain only ( Logie, Zucco, & Baddeley, 1990 ).

Evidence of verbal representations in the memory trace comes from common memory effects related to specific verbal and phonological properties. For instance, words that take longer to say overtly reduce overall recall, suggesting a “word length effect” on the memory trace ( Baddeley, Thomson, & Buchanan, 1975 ). Words that sound the same are also prone to confusion, leading to poorer recall for the whole list of items: this “phonological similarity effect” influences maintenance of verbal material, but also visual material that has been verbally rehearsed ( Conrad & Hull, 1964 ).

Developmentally, there is evidence that the different components of working memory follow different trajectories of maturation, and that this divergence of developmental pathways begins relatively early ( Alloway, Gathercole, & Pickering, 2006 ). Although the evidence is not unequivocal, it is generally agreed that children begin to use verbal mediation of STM from around 7 years of age ( Gathercole, 1998 ), at which point they begin to be susceptible to the phonological similarity effect ( Conrad, 1971 ; Gathercole & Hitch, 1993 ) and word length effect ( Baddeley, Thomson, & Buchanan, 1975 ). The ability to hold phonological representations in mind, however, appears to come online much earlier, possibly as young as 18 months (e.g., Mani & Plunkett, 2010 ). One way of interpreting this evidence is to think that the phonological loop primarily functions as a language-learning tool, as evidenced in its use in the first phases of language acquisition in infancy ( Baddeley, Gathercole, & Papagno, 1998 ).

Comparing Vygotskian and Working Memory Approaches to Inner Speech

To date, there have been few attempts to integrate the Vygotskian and working memory approaches to inner speech (although see Al-Namlah et al., 2006 ). One objection that is occasionally raised regarding integration of Vygotskian and working memory accounts is that, because an operational phonological loop predates the emergence of private speech, inner speech development cannot be driven by private speech internalization ( Hitch, Halliday, Schaafstal, & Heffernan, 1991 ; Perrone-Bertolotti, Rapin, Lachaux, Baciu, & Lœvenbruck, 2014 ). The presence of a phonological loop indeed rules out the suggestion that an earlier stage of private speech is necessary for the development of verbal mentation. However, as Al-Namlah, Fernyhough, and Meins (2006) point out, this objection misunderstands the Vygotskian position, which prioritizes the question of how language is employed in internal self-regulation above the neural or cognitive substrates that make language use possible. Put another way, the working memory approach largely confines itself to questions of what inner speech is necessary for (i.e., verbal rehearsal and recoding), whereas a Vygotskian approach describes the contingent use of inner speech as a tool for enhancing and transforming other developing cognitive functions.

Methodological Issues

As a psychological process with no overt behavioral manifestation, inner speech has traditionally been considered difficult or impossible to study empirically. However, recent methodological advances have meant that a range of direct and indirect methods exist for studying inner speech. Some methods have been designed to encourage inner speech and examine its effects; some have sought to block or inhibit inner speech and observe which other processes are also impacted. Finally, some techniques have sought to “capture” inner speech processes spontaneously, during the course of everyday life.

Questionnaires

The simplest approach to investigating inner speech is to ask people to report directly on its occurrence. Such methods are particularly valuable for investigating inner speech frequency, context dependence, and phenomenological properties, although their veridicality has often been questioned (for a recent example see Hurlburt et al., 2013 ).

Questionnaire approaches to inner speech tend to follow typical steps for scale development. For example, McCarthy-Jones and Fernyhough (2011) generated statements about the quality and structure of inner speech and submitted them to exploratory and confirmatory factor analysis in two undergraduate samples, resulting in an 18-item Varieties of Inner Speech Questionnaire (VISQ). Other self-report scales assess features such as inner speech frequency, content, and context (e.g., Duncan & Cheyne, 1999 ). Although such scales often report acceptable psychometric reliability, correlations among scales can be weak ( Uttl, Morin, & Hamper, 2011 ), indicating limited validity, or that scales are measuring different aspects of a complex, multifaceted construct.

Experience Sampling

While questionnaires are typically used to ask about inner speech in general or across a particular time period, experience sampling methods ( Csikszentmihalyi & Larson, 1987 ) aim for momentary assessments of inner speech, selected at random. The virtue of such approaches is that they avoid the need for participants to make a general judgment about the extent and nature of their inner speech, usually asking only about the contents of experience at the moment of a random alert (such as a beep).

Some experience sampling techniques will use the same or similar items as questionnaires that ask about inner speech; others have used diary or thought-listing techniques to prompt participants to report on their experience in a more open-ended way (e.g., D’Argembeau et al., 2011 ). Other researchers prefer to use detailed introspective interviews as part of their experience sampling approach. Considered methodologically problematic for a long time due to the impossibility of objective verification, introspective methods have undergone a resurgence of interest in recent years ( Hurlburt & Schwitzgebel, 2007 ). One highly developed method, Descriptive Experience Sampling (DES), involves training participants to report on their own inner experience in the moment before a random alert, first through making brief notes for themselves and then through a detailed expositional interview. As will be discussed, using DES to assess inner speech reveals striking phenomenological richness and diversity, which in some cases appears to contradict findings from self-report questionnaires ( Hurlburt et al., 2013 ). However, the extensive and iterative interview processes involved in DES have also been questioned for the extent to which they may shape and change the experiences that participants report (see Hurlburt & Schwitzgebel, 2007 ).

Private Speech as an Indicator of Inner Speech

One indirect approach to researching inner speech is through the study of what Vygotsky held to be its observable counterpart, private speech. For example, Al-Namlah et al. (2006) investigated whether Vygotsky’s ideas about the development of verbal mediation in childhood would be evidenced in a domain-general transition to verbal self-regulation. They found that use of self-regulatory private speech on a “Tower of London” task (a commonly used measure of planning, where participants must move rings on a set of poles to match a particular arrangement) correlated with the size of the phonological similarity effect, an index of inner speech use in working memory. Such a finding suggests close links between private speech and covert verbal encoding.

There are difficulties, however, with taking private speech as a direct proxy for inner speech: for instance, extensive private speech use in some children could reflect a lack of internalized inner speech, while an outwardly silent child could be using inner speech all the time. Subtle signs of inner speech can also be coded alongside private speech. For example, Fernyhough and Fradley (2005) used a coding frame (based on Berk, 1986 ) that distinguished between social speech (vocalizations during a task that were clearly addressed to someone), private speech (nonaddressed overt vocalizations), and task-relevant external manifestations of inner speech (indecipherable lip and tongue movements or silent articulatory behavior during a task).

Dual-Task Methods

Another indirect methodology that escapes some of these concerns is the use of dual-task designs. The rationale here is that interfering with or blocking inner speech, through a secondary task that prevents subvocal articulation, can be investigated in relation to deficits on a primary task (similarly to how such methods are used in working memory studies). Articulatory suppression to interfere with inner speech on cognitive tasks has been widely used in children and adults ( Baldo et al., 2005 ; Hermer-Vazquez, Spelke, & Katsnelson, 1999 ; Lidstone, Meins, & Fernyhough, 2010 ). Ideally articulatory suppression is deployed along with an additional condition including a nonverbal task, such as spatial tapping, as this allows investigators to control for general effects of dual-tasking and to identify effects specific to inner speech processes. In working memory studies, a further control task is sometimes included to interfere with the central executive: random number generation, for example, is thought to block both articulatory/phonological slave processes and to require direct attention from the central executive in order to avoid generating nonrandom number sequences ( Baddeley, 1996 ).

Phonological Judgments

An alternative method of studying inner speech, which overlaps with methods used in auditory imagery research, is to ask participants to make judgments based on the contents of their inner speech. For example, participants may be required to judge whether given words or sentences rhyme, or count the syllables in a given word ( Filik & Barber, 2011 ; Geva, Bennett, Warburton, & Patterson, 2011 ). Such methods have been argued to provide a more objective test of inner speech use than self-report methods ( Hubbard, 2010 ). However, it should be noted that judgment tasks of this kind often assume that phonological properties of inner speech are in some way being consulted, rather than the decision being based on other available stimulus information (rhyming judgments, for instance, could also be based on orthographic features of word stimuli).

Neuroimaging and Neuropsychology

Finally, a number of studies have either used functional neuroimaging techniques or neuropsychological case studies to examine the neural substrates of inner speech. Such studies have been conducted since the earliest days of neuroimaging ( McGuire et al., 1995 ), and have been driven primarily by an interest in the possible role of inner speech in the experience of auditory verbal hallucinations (see Adult Psychopathology), although neuroimaging research on verbal working memory (e.g., Marvel & Desmond, 2010 ) and imagery for speech ( Tian & Poeppel, 2010 ) has also made an important contribution.

Typical inner speech elicitation methods include subvocal articulation of words and sentences or imagining speech with varying characteristics (e.g., first- vs. third-person speech, or fast vs. slow speech; Shergill et al., 2002 ). Such studies have been criticized for their lack of ecological validity in eliciting inner speech, and for their failure to recognise the possibility of inner speech continuing during baseline assessments ( Jones & Fernyhough, 2007 ). Furthermore, some elicitation paradigms for inner speech have not adopted behavioral controls to check whether inner speech is actually produced during scanning experiments, relying instead on participants’ self-reported acquiescence with the task (this problem is also faced in auditory imagery research; see Zatorre & Halpern, 2005 , for a discussion). Approaches for counteracting this include the administration of behavioral tasks that require internal phonological judgments: asking participants to judge the metric stress of simple words, for example, is thought to require internal inspection of speech ( Aleman et al., 2005 ). Neuroimaging findings relating to inner speech are considered in Inner Speech in the Brain.

Development of Inner Speech

Studying the development of inner speech can give us important information about its phenomenological qualities and psychological functions. Researching inner speech in childhood presents specific methodological challenges, including participants’ compliance with dual-task demands (e.g., articulatory suppression), limitations on the richness of child participants’ experience sampling reports, and age-related restrictions on neuroimaging.

Private Speech as a Precursor of Inner Speech

The methodological challenges that attend the study of inner speech have led to a focus on its observable developmental precursor, private speech, as a window onto its development. Key questions that have been examined include the emergence and apparent extinction of private speech, the social context within which self-directed speech is observed, and the role of verbal mediation in supporting specific activities. Much of the prior literature on private speech was outlined in an extensive review by Winsler (2009) ; accordingly, this section provides a brief overview of private speech findings in children, with reference to some more recent studies.

As noted above, private speech is an almost universal feature of young children’s development. It was first described by Piaget in the 1920s, who considered it as evidence of young children’s inability to adapt their communications to a listener (hence, his term egocentric speech ). Private speech has subsequently been shown to have a significant functional role in the self-regulation of cognition and behavior. Typically emerging with the development of expressive language skills around age 2–3, private speech frequently takes the form of an accompaniment to or commentary on an ongoing activity. A regular occurrence between the ages of 3 and 8, private speech appears to follow a trajectory from overt task-irrelevant speech, to overt task-relevant speech (e.g., self-guiding comments spoken out loud), to external manifestations of inner speech (e.g., whispering, inaudible muttering; Berk, 1986 ; Winsler, Diaz, & Montero, 1997 ).

In line with Vygotsky’s theory, the occurrence of self-regulatory private speech is associated in some studies with task performance and task difficulty (e.g., Fernyhough & Fradley, 2005 ), and demonstrates some of the structural changes, such as abbreviation, hypothesized to attend internalization ( Goudena, 1992 ). There is evidence to support Vygotsky’s claim that self-regulatory speech “goes underground” in middle childhood to form inner speech, with private speech peaking in incidence around age 5 ( Kohlberg, Yaeger, & Hjertholm, 1968 ) and then declining in parallel with a growth in inner speech use ( Damianova, Lucas, & Sullivan, 2012 ) as defined by Fernyhough and Fradley’s (2005) criteria. However, there is also evidence for continuing high levels of private speech use well into the elementary school years ( Berk & Garvin, 1984 ; Berk & Potts, 1991 ) and indeed into adulthood ( Duncan & Cheyne, 2001 ; Duncan & Tarulli, 2009 ). Examples of continued use of private speech, however, do not necessarily indicate similar functions or benefits for performance: comparing verbal strategy use on cognitive tasks in children aged 5–17, Winsler and Naglieri (2003) showed that 5-year-olds but not older children performed better on tasks when they used more overt speech, even though private speech persisted well beyond this age.

Despite its proposed origins in social interaction ( Furrow, 1992 ; Goudena, 1987 ), social influences on private speech have not been studied extensively in recent years. In one recent exception, McGonigle-Chalmers, Slater, and Smith (2014) studied the extent to which private speech use is moderated by the presence of another person in the room when 3- to 4-year-old children attempted a novel sorting task. Out-loud commentaries—which typically narrated or explained what was happening during the task—were significantly more prevalent when another person was in the room, suggesting a social, declarative function of private speech. Ratings were also made of incomplete or mumbled speech commentaries, which were suggestive of inner speech being used during the task, but notably these did not change significantly with the presence or absence of another person. Thus, the production of overt private speech may be socially sensitive while inner speech or more covert processes retain a necessarily private and self-directed role.

These findings are in line with Vygotsky’s original observations that private speech depends on children’s understanding that they are in the presence of an interlocutor who can understand them, and are consistent with his view that private speech emerges through a differentiation of the social regulatory function of social speech, with speech that was previously used to regulate the behavior of others gradually becoming directed back at the self. They are also congruent with Piaget’s (1959) interpretation of private speech as representing a failed attempt to communicate, and with Kohlberg, Yaeger, and Hjertholm’s (1968) characterization of private speech as a “parasocial” phenomenon.

The social relevance of private speech is also supported by recent research on imaginary companions in childhood. Davis, Meins, and Fernyhough (2013) studied private speech during free play and imaginary companion (IC) status in a large sample of 5-year-olds ( n = 148). Children with an IC used significantly more covert private speech during free play than those without an IC, a relation that was evident even when controlling for effects of socioeconomic status, receptive language skill, and total number of utterances. Although a causal direction cannot be specified, these findings suggest that individual differences in creative and imaginative capacities are important to consider in gauging the developmental role of private speech.

Thus, while Vygotsky’s model of the developmental significance of self-directed speech has been well supported by empirical research, private speech may have functions that go beyond self-regulation of cognition and behavior. Private speech appears to have a role in emotional expression and regulation ( Atencio & Montero, 2009 ; Day & Smith, 2013 ), planning for communicative interaction ( San Martin, Montero, Navarro, & Biglia, 2014 ), theory of mind ( Fernyhough & Meins, 2009 ), self-discrimination ( Fernyhough & Russell, 1997 ), fantasy ( Olszewski, 1987 ), and creativity ( White & Daugherty, 2009 ). Engaging in private speech has also recently been proposed to have a role in the mediation of children’s autobiographical memory ( Al-Namlah, Meins, & Fernyhough, 2012 ). It seems likely that private speech is a multifunctional phenomenon; comparisons with the functionality of its putative counterpart, inner speech, are considered below.

The Cognitive Functions of Inner Speech in Childhood

Children’s adoption of inner speech is evidenced relatively early in development in the apparent emergence of the phonological similarity effect around age 7 ( Gathercole, 1998 ). The effect is typically evidenced when visually presented items that are phonologically similar prove harder to recall than phonologically dissimilar items, due to interference between item words that sound the same. When children are asked to learn a set of pictures, those aged 7 and over tend to exhibit a phonological similarity effect, suggesting that visual material is being recoded into a verbal form via subvocal rehearsal (i.e., inner speech). Children younger than 7, in contrast, tend not to demonstrate this effect, suggesting an absence of verbal rehearsal strategies ( Henry, Messer, Luger-Klein, & Crane, 2012 ).

This conclusion has recently been questioned by Jarrold and Citroen (2013) who argue that the apparent emergence of the phonological similarity effect at age 7 does not necessarily reflect a qualitative change in strategy. In a study of 5- to 9-year-old children, they tested recall for verbally versus visually presented items, while also varying the mode of response (verbal or visual reporting), to examine whether verbal recoding of visually presented items specifically showed a change with age. While visual encoding plus verbal reporting demonstrated the most prominent phonological similarity effect, interactions between age and similarity were evident in each condition; that is, even when verbally recoded rehearsal was not specifically required. In addition, a simulation model indicated that the lack of an effect in younger children could be explained by floor effects in recall for other, dissimilar items to be remembered. Thus, evidence of phonological similarity effects may emerge around age 7 not because of an adoption of rehearsal strategies at this time, but as a result of gradual changes in overall recall skill.

Jarrold and Citroen’s finding does not undermine the idea that children may generally tend to utilize verbal rehearsal more with age, but suggests that the presence or absence of a phonological similarity effect should not be taken to indicate a specific, qualitative shift in children’s inner speech strategies (see also Jarrold & Tam, 2010 ). Moreover, it highlights the need (also considered by Al-Namlah et al., 2006 ) to evaluate children’s use of verbal strategies in the context of their other skills, such as STM capacity.

Whether or not children’s use of inner speech undergoes a qualitative change in early to middle childhood, there is good evidence to suggest that it plays an increasingly prominent role in supporting cognitive operations in this developmental period. Most of the work in this area has concerned the role of verbal strategies in supporting complex executive functions such as cognitive flexibility and planning. Concerning the former, the ability to represent linguistic rules to guide and support flexible behavior has been proposed as a core part of executive functioning development during childhood ( Zelazo, Craik, & Booth, 2004 ; Zelazo et al., 2003 ). In general, younger children (3- to 5-year-olds) will struggle with tasks requiring a switch between two different response rules, whereas older children will not. Evidence to suggest that this involves verbal processes is provided by reductions in performance on such tasks under articulatory suppression (e.g., Fatzer & Roebers, 2012 ) and improvements in performance when participants are encouraged to use verbal cues ( Kray, Gaspard, Karbach, & Blaye, 2013 ). Younger but not older children appear to benefit from the prompt to use verbal labels, both on switching tasks ( Kray, Eber, & Karbach, 2008 ) and in other contexts (see Müller, Jacques, Brocki, & Zelazo, 2009 , for a review), suggesting a lack of spontaneous inner speech use at younger ages.

What exactly inner speech is doing to support performance in this way is not always clear: in a review of child and adult switching studies, Cragg and Nation (2010) noted that verbalized strategies speed up performance on switch and nonswitch trials but do not necessarily facilitate the act of switching itself. If so, this would suggest that inner speech is helping to maintain a specific response set, or is acting as a reminder of task and response order, rather than being involved in flexible responding per se . In any case, use of inner speech appears to become a key strategy in switching tasks during childhood, and there is evidence of this strategic use continuing into adulthood (e.g., Emerson & Miyake, 2003 , see Cognitive Functions of Inner Speech in Adulthood).

Research on planning and verbal strategies in childhood has almost exclusively been conducted using tower tasks, such as the Tower of London task ( Shallice, 1982 ) or the very similar Tower of Hanoi puzzle. As noted previously, tower tasks require participants to move a set of rings or disks from one arrangement to another across three columns. Although fundamentally a visuospatial problem, the number of possible moves to a solution creates a problem-space bigger than visuospatial working memory capacity will typically allow, meaning that verbal strategies often come into play.

Private speech on such tasks has been observed to increase in relation to task difficulty in children ( Fernyhough & Fradley, 2005 ) and correlates with other indicators of verbal strategy use, such as susceptibility to the phonological similarity effect on STM tasks ( Al-Namlah et al., 2006 ). Concerning inner speech specifically, Lidstone, Meins, and Fernyhough (2010) compared Tower of London performance in children under articulatory suppression, foot-tapping, and normal conditions. Performance (as indicated by percentage of correct trials) was selectively impaired during articulatory suppression, and the size of the performance decrement correlated with private speech use in the control condition, although this was only evident when participants were specifically instructed to plan ahead. Effects of articulatory suppression on Tower of London performance have also been reported in the control groups of typically developing children in studies on autism (e.g., Wallace, Silvers, Martin, & Kenworthy, 2009 ), but these effects have not always been clearly separable from other dual-task demands ( Holland & Low, 2010 ).

The apparent use of verbal strategies in recall, switching, and planning tasks, and correlations among them (e.g., Al-Namlah et al., 2006 ), are suggestive of a domain-general shift to verbal mediation in early childhood, affecting processes as different as STM and problem-solving. However, it seems likely that inner speech use across domains may still follow separable trajectories and be guided by the specific demands of each task. The data from studies of cognitive flexibility and other executive domains suggest that, even within a given task, inner speech will only be a useful strategy in particular conditions: naming stimuli, for example, appears to speed up response execution, but naming the response required (e.g., stop or go ) does not ( Kray, Kipp, & Karbach, 2009 ). There is also still a relative lack of research comparing strategy use across multiple domains. In one recent exception, Fatzer and Roebers (2012) observed strong effects of articulatory suppression on complex memory span (i.e., working memory), medium effects on a measure of cognitive flexibility, and little effect on a test of selective attention. If these processes are taken to follow separate rates of maturation, it seems likely be that inner speech offers a domain-general tool that is only selectively deployed when it is most relevant and beneficial to the executive functioning process at hand.

How do Children Experience Inner Speech?

Asking people to reflect on the subjective qualities of their inner experience is fraught with difficulties, and the challenges are arguably more acute when working with children. Some attempts have been made to use experience sampling methods with children, although they have not to date focused on inner speech. For example, Hurlburt ( Hurlburt & Schwitzgebel, 2007 , p. 111, Box 5.8) used DES with a 9-year-old boy, who noted that the construction of a mental image (of a hole in his backyard containing some toys) took a considerable amount of time to complete. Complex or multipart images are known to take longer to generate than simple images ( Hubbard & Stoeckig, 1988 ; Kosslyn et al., 1983 ), and this may particularly be the case for visual imagery in children. If this were to apply also to inner speech, it suggests that the phenomenology of verbal thinking in children may lack a certain richness and complexity. In a series of studies, Flavell and colleagues (e.g., Flavell, Flavell, & Green, 2001 ; Flavell, Green, & Flavell, 1993 ) also found limited understanding of inner experience (such as of the ongoing stream of consciousness assumed to characterize many people’s experience) in preschool children. This can be interpreted either in terms of young children’s weak introspective abilities ( Flavell et al., 1993 ) or in terms of young children lacking adult-like inner speech, as a result of the time it takes to become internalized ( Fernyhough, 2009b ).

Children’s reluctance to report on inner speech, coupled with their apparent lack of awareness of it, should not necessarily be taken as indicating that they do not experience it in any form. The suggestion of links between private speech and various imaginative and creative activities, such as engaging with an imaginary companion ( Davis et al., 2013 ), also raises the interesting question of whether inner speech plays a similar role in the inner experience of young children. The development of better methods to investigate inner speech phenomenology in children is needed to begin to answer this and related questions.

Inner Speech in Adult Cognition

Cognitive functions of inner speech in adulthood.

Inner speech in adulthood has largely been studied as a cognitive tool supporting memory and other complex cognitive processes (see Sokolov, 1975 , for an early review). Although inner speech has frequently been claimed to be important for problem-solving across different contexts, a precise account of its cognitive functions requires examination of its deployment in different task domains.

As in research with children, studies on inner speech function in adulthood have largely focused on its role in verbal STM and executive function. The use of inner speech as a rehearsal tool in working memory is perhaps its most well-known function: verbal rehearsal can refresh the memory trace continuously, provided articulation is not suppressed, and this will reliably lead to better recall ( Baddeley, 1992 ). Even if articulation is blocked, there is evidence that the phonological store—or “inner ear”—can still maintain some phonological information, albeit in a state where it is more liable to interference and decay ( Baddeley & Larsen, 2007 ; Smith, Reisberg, & Wilson, 1992 ). Articulatory suppression also removes the word-length and phonological similarity effects typically observed for verbal rehearsal ( Baddeley, 2012 ). Contemporary research on verbal working memory in adults is extensive and will not be discussed here (see, e.g., Camos, Mora, & Barrouillet, 2013 ; Macnamara, Moore, & Conway, 2011 ). Regarding executive functions, most research has again focused on cognitive flexibility (via sorting/switching tasks) and planning (via tower tasks).

In adults, inner speech continues to be implicated in tasks that require switching between different responses and rules ( Baddeley, Chincotta, & Adlam, 2001 ), as it does in children ( Cragg & Nation, 2010 ). For example, Emerson and Miyake (2003) compared switching performance across a range of experiments using articulatory suppression and a foot-tapping control. The deployment of articulatory suppression consistently disrupted performance by increasing the “switch cost” between trials requiring different arithmetic rules, suggesting that inner speech acted as a tool to prepare and smooth transitions between trials. In addition, this effect was specifically moderated by the types of task cues deployed: task conditions with explicit cues reduced the effect of articulatory suppression, suggesting that inner speech was not required when the task materials sufficiently supported the required mode of response. Task difficulty, in contrast, made no difference to the articulatory suppression effect. These results suggest that inner speech facilitated performance by specifically acting as a mnemonic cue for how to respond, when such cues were lacking in the task itself.

Supporting evidence for the relevance of cue types to inner speech was provided in a follow-up experiment by Miyake, Emerson, Padilla, and Ahn (2004) . Comparing switch costs for judgment tasks with full word (e.g., SHAPE) or single letter (S . . .) cues, articulatory suppression increased the switch costs for the latter but not the former. This was interpreted by Miyake et al. as evidence that inner speech was required for switching where it played a non-negligible role in the retrieval of relevant information. That is, blocking inner speech only really mattered when inner speech was needed to “fill out” the cues in the task; more explicit cues such as a full word did not recruit inner speech to the same degree, and thus no switch cost was induced. This filling out of a response is in some ways analogous to effects that have been observed for auditory imagery, where participants can have vivid and sometimes involuntary auditory experiences in the gaps of familiar songs or other sounds (e.g., Kraemer, Macrae, Green, & Kelley, 2005 ). Taken together, these studies suggest that inner speech has a beneficial effect on performance (by minimizing costs associated with switching), but only in conditions where verbalization seems to somehow complete the information set needed for an efficient and consistent response.

For planning, in contrast, there is perhaps less evidence for inner speech having a central role in adult task performance. While Williams et al. (2012) reported an increase in the number of moves used by adults attempting a tower task under articulatory suppression, Phillips, Wynn, Gilhooly, Della Sala, and Logie (1999) previously found no effect of interfering with inner speech on planning skills. An individual differences analysis by the latter group indicated that tower performance was closely related to visuospatial rather than verbal working memory skills ( Gilhooly, Wynn, Phillips, Logie, & Della Sala, 2002 ; see also Cheetham, Rahm, Kaller, & Unterrainer, 2012 ). Similarly, in a virtual-reality study of multitasking that included the requirement to adjust complex plans midway through a task, Law et al. (2013) reported no effect of articulatory suppression on adult performance, but effects of random number generation (posited to block general executive resources) and concurrent auditory localization (requiring spatial working memory).

Inconsistencies in the planning literature imply that, while children may deploy private and inner speech during common planning tasks, adults appear to rely less on these strategies. What is important to bear in mind with such tasks, though, is that they largely require planning within a visuospatial domain. Tower tasks can be planned verbally, but task execution and the representation of possible states is still fundamentally a visuospatial activity. That is, it is not clear that the creation and implementation of verbal plans would be the optimal strategy on such tasks, even if children and adults spontaneously self-talk when they attempt them. Similarly, standard multitasking tasks (e.g., Law et al., 2013 ) often require navigation around a spatial array or environment: verbal processes may help to set up a plan, but are arguably unlikely to take priority over visuospatial skills during the commission of a plan. The contrast between child and adult deployment of self-directed speech could reflect the relative weakness of visuospatial working memory in the former ( Gathercole, Pickering, Ambridge, & Wearing, 2004 ; Pickering, 2001 ), leading to compensatory use of verbal strategies to “bootstrap” performance.

Another skill closely related to planning is logical or propositional reasoning. A prima facie assumption may be that, if inner speech plays an integral role in certain higher cognitive processes, it would be most likely to support explicitly verbal forms of inference, such as reasoning about verbal propositions or syllogisms. Evidence to support this proposition, however, is mixed. Verbal working memory appears to be important for maintaining information about logical premises, particularly when information is encountered sequentially, but generally verbal interference does not impair this kind of reasoning any more than visuospatial forms of interference, such as spatial tapping ( Gilhooly, 2005 ). There may be individual differences in strategy use during reasoning, with participants varying in the extent to which they report predominantly verbal or visual strategies. These individual differences appear to relate to variation in verbal and spatial working memory skills, but do not necessarily translate into differences in reasoning skill ( Bacon, Handley, & Newstead, 2005 ). Similarly, matrix reasoning tasks, which predominantly consist of visuospatial stimuli but which can be solved using various visual or verbal strategies, do not appear to be specifically affected by articulatory suppression: for instance, Rao and Baddeley (2013) compared effects of number repetition (articulatory suppression) and backward counting (central executive interference) on matrix reasoning, and found that only the latter negatively affected the time it took to reach a solution. Thus, inner speech does not appear necessary for tasks involving logical reasoning, even for verbal material.

Beyond its putative roles in task-switching, planning, and logical reasoning, inner speech has been hypothesized to be involved in a range of other processes, including reasoning about others, spatial orientation, categorization, cognitive control, and reading. Two studies have used verbal shadowing (the immediate repetition of verbal material, postulated to block subvocal articulation) to investigate the role of language in false-belief reasoning. Newton and de Villiers (2007) compared verbal shadowing and rhythmic tapping effects on nonverbal reasoning in a sample of adults. Success rates were significantly lower for false-belief reasoning during verbal interference, but not spatial interference. In contrast, judgments about true belief were accurate across all conditions, demonstrating the specificity of the verbal effect to false-belief attribution. A more recent study by Forgeot d′Arc and Ramus (2011) also observed an interference effect of verbal shadowing, but this was not specific to false-belief reasoning; shadowing also affected reasoning about other mental states (such as agents’ goals) and mechanistic reasoning.

Employing similar techniques, Hermer-Vazquez et al. (1999) showed that verbal shadowing interfered with performance on a task requiring integration of geometric and color information, suggesting a role for inner speech in the labeling and binding of information across modalities. Using a verbal distractor task (number repetition), Lupyan (2009) reported specific effects of verbal interference on categorization skills in adults when they were asked to classify pictures based on a single perceptual dimension (e.g., color) while ignoring other relevant dimensions (such as shape). Finally, Tullett and Inzlicht (2010) compared adult response inhibition skills on a Go-NoGo task under articulatory suppression, spatial tapping, and control (single-task) conditions. Compared with spatial tapping, articulatory suppression was associated with a greater number of commission errors, an effect that was particularly exacerbated when a switching component was added to the inhibition task.

Inner speech also appears to be an important part of silent reading (see Perrone-Bertolotti et al., 2014 , for a recent review). Many people appear to evoke auditory imagery for speech while they read, and there is evidence that it retains some of the properties of external, heard speech. For instance, Alexander and Nygaard (2008) played a conversation involving two voices with different speaking rates (one fast, one slow), and then asked participants to read passages apparently written by the people whose voices they had heard. For easy texts read out loud, passages “written” by the slow voice tended to be read more slowly than those associated with the fast voice; reading silently showed no effect of voice. But for more difficult texts, both out-loud and silent reading showed evidence of being read according to the speed of speech that was previously heard. This effect also showed evidence of individual differences: those who self-reported low imagery skills only showed a voice effect on their silent reading for difficult texts, but those with high imagery skills showed the effect for easy and difficult passages of text. Thus, more complex or challenging conditions appear to prompt inner-speech-like experiences as a complementary tool during reading, but for some people this experience will persist even during easy reading.

Elsewhere, self-talk (both overt and covert) has been proposed to play a significant role in behavioral control and motivation during competition and high-performance sport (see Hardy, 2006, for a review ). For instance, Hatzigeorgiades, Zourbanos, Mpoumpaki, and Theodorakis (2009) compared the effect of self-talk training on tennis players’ performance, confidence, and anxiety. Participants were randomly assigned to either three training sessions that emphasized use of motivational and instructional self-talk (e.g., “go, I can do it,” or “shoulder, low”) or control sessions that included a tactical lecture on use of particular shots. Players trained to use self-talk showed improvements in task performance (a forehand drive test), and also reported increased self-confidence and decreased anxiety, whereas no such changes were observed for the control group.

Effects of self-talk and “verbal self-guidance” are also extensive in organizational and educational psychology studies (e.g., Brown & Latham, 2006 ; Oliver, Markland, & Hardy, 2010 ), and the use of self-talk to instruct and motivate in sport and other performance-related fields is largely consistent with the view that inner speech has a primary role in self-awareness and self-evaluation ( Morin, 2005 , 2009a ). However, research in this area has not typically distinguished between overt and covert forms of speech, making it hard to draw strong conclusions about the role that specifically internal representation of speech might play.

Nevertheless, a few recent studies have asked participants to specifically engage in imagined self-talk, and then examined the impact on motivation and behavioral control. For instance, Senay, Albarracín, and Noguchi (2010) compared the impact of interrogative and declarative self-talk on participants’ anagram performance and intention to exercise. Imagining questions in inner speech prior to starting the task (e.g., statements such as “Will I . . .?”) were associated with better anagram performance and intention to exercise compared with imagining declarative statements (e.g., “I will . . .”), with the latter being mediated by changes in the intrinsic motivation to exercise. Similar effects were found in a second study by Dolcos and Albarracín (2014) that compared inner speech in the first and second person, with prompts to imagine giving advice in the form of “You . . .” leading to better performance and motivation than imagined speech in the first person: “I can do this.” Such protocols have their limitations: they do not include a control for checking that participants were actually engaging in the kinds of self-talk they were instructed to use, nor whether participants also deployed self-talk during the subsequent performance tasks (i.e., anagrams). But they are notable for highlighting how even small changes in grammar and reference of self-talk could impact upon task motivation, and for their consistency with dialogic approaches to everyday inner speech. Indeed, Dolcos and Albarracín (2014) explicitly note that the use of second-person inner speech could reflect the putative social origins of regulatory inner speech, suggesting that “initial external encouragements expressed using You may become internalized and later may develop into self-encouragements” (p. 641).

Finally, the adoption of inner speech or other verbal strategies can, in some instances, be counterproductive to particular cognitive processes. The capacity for verbal labels or narratives to reshape memories and other cognitive representations has long been noted: for example, Loftus and Palmer (1974) demonstrated that the use of words like smashed instead of hit led to greater estimates of car collision speed for eyewitnesses of an accident. Verbal redescription of prior events has been most extensively studied via the phenomenon of “verbal overshadowing,” a term coined by Schooler and Engstler-Schooler (1990) following evidence that verbal description of the perpetrator of a crime was associated with a 25% reduction in recognition of the perpetrator’s face. Subsequent studies using a range of tasks have reported evidence of verbal labels appearing to reduce or distort accurate recall ( Meissner & Brigham, 2001 ). Candidate explanations for verbal overshadowing have included interference effects from verbal content, a shift in processing focus in the translation to verbal information (from global and holistic to local and specific), and changes in decision criteria that result from verbal recoding ( Chin & Schooler, 2008 ).

However, overshadowing effects have also proved hard to replicate, with recent studies reporting much lower effect sizes than those in Schooler and Engstler-Schooler’s original study. A recent “registered replication” attempt ( Alogna et al., 2014 ), conducted across 31 labs, found that verbal overshadowing reduced recall by 4%–16%, depending on how close to the original event a verbal description was made. Although it is unclear how frequently and with what strength such effects occur, their existence highlights the fact that the adoption of a verbal strategy will not always be a complementary tool, and may even obscure the original representation (phonemic similarity effects following verbal recoding of visual material could also be considered an example of a maladaptive verbal strategy).

How do Adults Experience Inner Speech?

The phenomenology of inner speech in adulthood has been investigated using two main methods: questionnaires and experience sampling. Questionnaires have the advantage of allowing data gathering from large samples in a single testing session; experience sampling, in contrast, is typically conducted with smaller numbers but can provide rich and idiographically detailed information ( Alderson-Day & Fernyhough, 2014 ).

A variety of self-report questionnaires and listing methods have been used to assess adults’ inner speech, including the Scale for Inner Speech ( Siegrist, 1995 ), the Self-Verbalization Questionnaire ( Duncan & Cheyne, 1999 ), the Self-Talk Use Questionnaire ( Hardy, Hall, & Hardy, 2005 ), and the Self-Talk Scale ( Brinthaupt, Hein, & Kramer, 2009 ). The focus of these instruments has, however, been on the context and functions of self-talk, rather than its phenomenological properties, and they have not clearly discriminated between overt self-talk and inner speech (see Hurlburt & Heavey, 2015 , for a critique).

Nevertheless, such scales shed some light on intuitive or everyday views on the functions of self-directed speech. For example, Morin, Uttl, and Hamper (2011) surveyed 380 undergraduates’ views on inner speech in an open-format procedure where participants were asked to list “as many verbalisations as they typically address to themselves” (p. 1715). Common contents of inner speech were self-addressed evaluations and emotional states, while the most common functions listed were mnemonic functions (reminders to do things) and planning. This was interpreted by the authors as supporting a primarily self-reflective role of inner speech in everyday cognition, along with its importance as a tool for thinking about the future ( Morin et al., 2011 ). Their findings echoed earlier studies of self-verbalization, which also highlighted frequent reports of evaluative and mnemonic experiences in inner speech ( Duncan & Cheyne, 1999 ).

Some studies have sought to explore how frequently positive and negative content occurs in self-talk, and what effect this has on other factors, such as mood. For instance, Calvete et al. (2005) developed scales of Negative and Positive Self-Talk and explored their correlates for psychopathology traits in a large sample of Spanish students ( n = 982). The negative scale included self-talk statements about anxious, depressive, and angry self-talk, while the positive scale included items on coping, minimization of worries, and positive orientation. As might be expected, many of the positive and negative subscales were significantly associated with trait measures of psychopathology: for instance, trait depression was strongly predicted by depressive self-talk and trait anxiety by anxious and depressive self-talk. Positive predictors were more varied: minimizing inner speech was negatively associated with anxiety and anger but not depression, while positively oriented self-talk was linked to lower depression but higher levels of anger. Such results reflect the intuitive idea that inner speech is involved in the representation of everyday worries and low mood, but they also highlight a problem of construct validity: if depressive self-talk strongly predicts depressive traits, how clear is it that two separate phenomena are being measured? That is, to what extent do relations between valenced self-talk and mood reflect content overlaps in self-report measures?

The only self-report scale directly focused on the experience of inner speech is the Varieties of Inner Speech Questionnaire ( McCarthy-Jones & Fernyhough, 2011 ). Development of the VISQ was motivated by a recognition that existing operationalisations of inner speech had been based on relatively impoverished conceptions of the phenomenon, along with an ambition to investigate aspects of inner speech, such as dialogicality and condensation, important in Vygotsky’s theory. Using data from separate exploratory and confirmatory samples of university students, factor analysis of the scale highlighted four underlying factors: dialogic inner speech , or the tendency to engage in inner speech with a back-and-forth, conversational quality; condensed inner speech , the experience of inner speech in an abbreviated or fragmentary form; other people in inner speech (i.e., representation of others’ voices, or inner speech saying something that someone else would usually say); and evaluative/motivational inner speech , where inner speech serves to judge or assess one’s own behavior. Of these, evaluative/motivational inner speech was the most commonly endorsed: 82.5% of responses indicated at least some experience of those characteristics. Dialogic inner speech was almost as prevalent (77.2%), while condensed inner speech (36.1%) and the presence of other people in inner speech (25.8%) were less common, while still being reported by a substantial minority.

Although they did not specifically ask about emotional content of inner speech, the VISQ factors also picked out tendencies toward negative emotional states: evaluative inner speech and the presence of other people in inner speech were both positively associated with trait anxiety and, to a lesser extent, depression. In a separate study ( Alderson-Day et al., 2014 ), frequencies for the VISQ factors were closely replicated in a third student sample, and showed a further link to emotional functioning: evaluative inner speech, but not other kinds of inner speech, negatively predicted levels of global self-esteem. In addition to being specific to inner speech (rather than an unspecified mixture of overt and covert self-talk), studies with the VISQ contrast with Calvete et al. (2005) by not referring to positive or negative inner speech content directly, and yet still demonstrating links between inner speech and mood, thus avoiding concerns about content overlap.

In contrast to questionnaires, which largely focus on trait-like qualities of inner speech, experience sampling methods seek to capture moments of spontaneous experience. In one of the first studies to apply such methods to inner speech ( Klinger & Cox, 1987 ), college students were asked to complete a short questionnaire on their inner experience following a series of random beeper alerts. Thoughts containing “interior monologue” were reported in roughly three quarters of samples, alongside regular experience of visual imagery. Experience sampling studies of inner speech since then have largely been restricted to Hurlburt’s Descriptive Experience Sampling method, which is predicated on the bracketing of presuppositions about the frequency and form of inner experience ( Hurlburt & Heavey, 2006 ). In DES, participants only report on moments of experience that occurred immediately prior to random beep alerts (normally 1–2 s), and are encouraged to avoid generalizations about how they usually think or “what they always do.” Hurlburt, Heavey, and Kelsey (2013) argue that one result is that DES provides a more accurate indication of the frequency of inner speech, and that generally this is much lower than other estimates, occurring in around 20%–25% of random samples (although see Alderson-Day and Fernyhough, 2014 ).

In addition, DES has provided an exceptionally rich body of data on the many forms that inner speech can take ( Hurlburt et al., 2013 ). Preferring the term inner speaking to inner speech (in order to emphasize its active nature), Hurlburt et al. note several key features of the phenomenon: individuals typically apprehend themselves to be speaking meaningfully in the absence of vocalizations; these experiences are generally in the person’s own voice, with its characteristic rhythm, pacing, tone, and so forth; the utterances are similar in form to external speaking, and bear the same potential emotional weight; inner speaking is generally in complete sentences, uses the same kinds of words as external speech, and can be addressed either to the self or to another; and the phenomenon is apprehended as being actively produced rather than passively heard.

A distinct form of inner experience, not reducible to inner speaking, is inner hearing , which Hurlburt defines as “the experience of hearing something that does not exist” in the individual’s immediate surroundings or external environment (p. 1485). Other categories of inner experience that are not equated to inner speaking are unsymbolized thinking , or “the experience of an explicit, differentiated thinking that does not include words, images, or any other symbols” (p. 1486), sensory awareness , and thinking (defined as a purely cognitive process without any phenomenological qualities).

Finally, in a study that could be seen as occupying a middle ground between questionnaire-based studies and experience sampling, D’Argembeau, Renaud, and Van der Linden (2011) conducted a thought diary experiment, where participants were asked to keep track of any future-directed thoughts they had over the course of a day. Recorded thoughts (written in a notebook) were rated by participants for a variety of characteristics, such as modality (e.g., inner speech, visual), affective content, and personal importance, and coded by experimenters for function, time specificity, and valence. Experiences of inner speech were particularly associated with action-planning and decision-making, in contrast to more visual forms of future-oriented cognition. In such cases, the everyday phenomenology of inner speech appears to parallel its accompaniment to specific cognitive tasks where inner speech is used as a planning or deliberative tool.

What is the Relation Between Inner Speech and Overt Speech?

Examining the relation between inner speech and its overt counterpart can enable the testing of models of inner speech production. Recall that one model, often associated with Watson (1913) , holds that inner speech is identical to external speech with highly attenuated articulatory commands. A contemporary version of this model, the motor simulation hypothesis, is an example of a wider group of “embodied simulation” theories (e.g., Bergen, 2012 ), which hold that processes such as word understanding and mental imagery have similar content and structure to actions or perceptions but attenuated characteristics. On such a model, inner speech and overt speech should share a number of linguistic and structural features. In contrast, views of inner speech (such as Vygotsky’s) that see it as representing a transformed version of external speech would predict that inner speech would lack the featural richness of overt speech, and may vary in form depending on context (thus avoiding the processing costs of vividly representing speech on each occasion). For instance, Fernyhough (2004) proposed that inner speech varies with cognitive and emotional conditions between abstracted ( condensed ) and concrete ( expanded ) forms.

Researchers have addressed the question of the phenomenological richness of inner speech by studying errors and delays in its production. In a silent reading study, Filik and Barber (2011) compared eye movements in participants with northern or southern English accents when reading limericks. The poems were designed to either rhyme or clash in the participants’ normal accent (e.g., mass and glass rhyme in a northern English accent, but not a southern accent). Compared with congruent poems, limericks that did not rhyme in the participant’s accent led to disruption in eye-tracking patterns, suggesting that participants’ inner speech retained the surface-level auditory properties of their external speaking voice.

A contrasting view is provided by Oppenheim and Dell (2008) , who have argued that inner speech differs from overt speech in many of its psycholinguistic properties. Specifically, they argue that inner speech retains deep features, such as lexical and semantic information, but typically does not represent surface-level information such as phonological detail. Their evidence comes from a comparison of tongue-twister errors in overt and inner speech, in which participants report on the internal errors that they make. While in overt speech errors occurred reflecting both lexical bias (the tendency to produce a real word rather than a nonword), and phonemic similarity effects (such as substituting reef and leaf ), in inner speech only the former were reported. Oppenheim and Dell interpreted this as evidence that inner speech is impoverished at featural levels.

In contrast, two studies by Corley and colleagues reported similar phoneme substitution errors in inner and overt speech, for both fluent speakers ( Corley, Brocklehurst, & Moat, 2011 ) and adults who stutter ( Brocklehurst & Corley, 2011 ). Making phoneme substitutions in inner speech would suggest that specific phonological features are encoded in inner speech and available to internal inspection. Such findings support a common view in psycholinguistic research that inner speech largely serves to support error monitoring in speech production, whereby utterances can be inspected and corrected via an “internal loop” (e.g., Wheeldon & Levelt, 1995 ).

One way to reconcile these varied findings on the phenomenological richness of inner speech is to consider how it might be affected by articulation. In follow-up work, Oppenheim and Dell (2010) showed that phonemic similarity errors do appear if participants perform the tongue-twister task with the addition of silent mouthing, but not if participants are instructed to imagine saying phrases “without moving their mouth, lips, throat, or tongue” ( Oppenheim & Dell, 2010 , p. 1552). These findings led Oppenheim and Dell to propose the flexible abstraction hypothesis , according to which there is only one kind of inner speech, represented at the level of phonemic selection, but where that representation can be modulated by articulation to include more explicit features. Thus, in cases where inner speech appears to have specific phonological features (as in Corley et al., 2011 ), this may have been be due to participants deploying a form of inner speech involving a greater degree of articulation (such as silently mouthing words as they are represented in inner speech).

The reliance on participant self-report for errors in inner speech is an important limitation when interpreting these studies. As Hubbard (2010) has argued, apparent differences in phonological features between overt and covert speech may simply reflect participants’ ability to introspectively monitor and report specific features of their inner speech. However, this would not explain the presence of similar features in overt and covert speech in Corley, Brocklehurst, and Moat’s (2011) study, or when a greater level of articulation is deployed ( Oppenheim & Dell, 2010 ).

Moving beyond inner speech production to the processes involved in generating external speech, there is a large body of psycholinguistic research on the role of inner speech as a potential error monitor for external speech (e.g., Hartsuiker & Kolk, 2001 ; Nooteboom, 2005 ), a full discussion of which is beyond the scope of this article (see Hickok, 2012, for a review ). Key to most of such models is that inner speech is posited as part of a speech production system involving predictive simulations or “forward models” of linguistic representations. Such forward models prepare perceptual systems for self-generated inputs: for example, producing overt speech is thought to involve the sending of an “efference copy” of the speech motor plan to speech perception areas, forming the basis for a predictive model of what the utterance will sound like, and inhibiting the ensuing auditory response ( Grush, 2004 ).

Where inner speech fits in to such models is not always clear, not least because there appears to be no external percept or motor consequence to be attenuated if no sound is created. Producing inner speech can have similar influences to overt speech on speech perception, such as priming perception of external sounds (e.g., Scott, Yeung, Gick, & Werker, 2013 ), suggesting that it too involves the sending of efference copies to receptive areas. One possibility is that inner speech is a minimal form of overt speech that has been attenuated because it is recognised as being self-produced (for a discussion of this possibility, see Langland-Hassan, 2008 ). Alternatively, it has been suggested that inner speech in some way constitutes a featurally abstract forward model ( Pickering & Garrod, 2013 ), or that we experience phonological features in inner speech because of the sensory prediction created by a forward model ( Scott, 2013 ). As will be discussed in the final section, this has implications for models of auditory verbal hallucinations in which inner speech is proposed to be misattributed to an external source.

Inner Speech in the Brain

The similarities and differences between inner speech and external speech have also been examined in relation to underlying neural processes. Research in this area has come from studies on speech-motor processing in the brain, which has largely treated inner speech as a covert articulatory planning process (for a review, see Price, 2012 ), researchers interested in inner speech dysfunction as a basis for psychopathology ( McGuire, Murray, & Shah, 1993 ; Shergill, Brammer, Williams, Murray, & McGuire, 2000 , see Adult Psychopathology), and work on the rehearsal and maintenance of verbal working memory (e.g., Marvel & Desmond, 2010 ).

A prima facie assumption might be that the neural correlates of inner speech would simply reflect an attenuated or inhibited version of neural states associated with overt speech. In support of this, activation of Broca’s area or left inferior frontal gyrus has been observed during both overt and silent articulation of words, specifically in the ventral portion of the pars opercularis ( Price, 2012 ). Alongside this, supplementary motor area (SMA) and parts of premotor cortex are often implicated, in addition to the anterior portion of the insula, although it has been claimed that the latter is more specifically tied to muscular processes required for overt speech production ( Ackermann & Riecker, 2004 ).

Based on evidence from neuropsychological studies, it has been argued that verbal working memory processes rely on a separate neural network to speech production ( Baddeley & Logie, 1999 ; see The Neuropsychology of Inner Speech). However, most recent studies have implicated similar and overlapping networks for verbal working memory maintenance and overt speech in fronto-temporal regions, along with recruitment of the cerebellum ( Marvel & Desmond, 2012 ) and posterior temporoparietal structures such as the planum temporale and inferior parietal lobule ( Andreatta, Stemple, Joshi, & Jiang, 2010 ). While the cerebellum is thought to support motor processes involved in verbal rehearsal ( Marvel & Desmond, 2010 ), the involvement of temporoparietal cortex has been proposed to reflect recruitment from long-term memory of phonological representations to support working memory maintenance ( Price, 2012 ). Activation of inferior frontal gyrus (IFG), premotor cortex, and the Sylvian parietal-temporal area (SPT) show both load and rehearsal rate effects during verbal working memory maintenance ( Fegen, Buchsbaum, & D’Esposito, 2015 ), while disruption to posterior superior temporal gyrus using repetitive transcranial magnetic stimulation interferes with both speech production and verbal working memory maintenance ( Acheson, Hamidi, Binder, & Postle, 2011 ).

The concurrent recruitment of inferior frontal and posterior temporal regions during inner speech is supported by earlier studies of covert speech, auditory imagery, and verbal self-monitoring. McGuire et al. (1996) asked participants either to articulate sentences silently from cue words, or to imagine them in another’s voice. (In order to distinguish it from inner speech, the latter was referred to as auditory verbal imagery .) Contrasts using PET scanning indicated that inner speech was associated with left IFG activation, while imagining another’s speech involved SMA, premotor cortex, and left superior and middle temporal gyri. As these temporal areas in particular are typically associated with speech perception, the authors suggested that this reflects a greater “internal inspection” during the generation of representations of others’ speech, driven by the need to pay particular attention to representing the phonological characteristics of another’s voice. Subsequent research has also implicated similar regions of temporal cortex in the monitoring of inner speech: Shergill et al. (2002) , for instance, reported greater activation of superior temporal gyrus, left IFG, and the pre- and postcentral gyri when participants were asked to vary the speed of their inner speech.

One problem with such studies is the lack of a behavioral control when asking participants to generate inner speech in the scanner. As noted in the auditory imagery literature ( Hubbard, 2010 ; Zatorre & Halpern, 2005 ), it is risky to rely on participants’ own reports of inner speech, even if the areas identified in such studies appear to coincide with speech production networks. One way of avoiding this is to use inner speech tasks that rely on phonological judgments. For instance, Aleman et al. (2005) used fMRI to scan participants while they either listened to or imagined hearing words that were pronounced with the stress on the first or second syllable. For both heard and imagined speech, inferior frontal gyrus, insula, and superior temporal gyrus were activated, although for the latter region only a posterior portion was active for imagined words. As this pattern of activity was not seen for a comparable task where participants had to make a semantic judgment about the words, Aleman and colleagues argued that posterior superior temporal gyri (STG) was required for representation of metric stress in the phonological loop. This, when combined with evidence from studies of verbal working memory, would seem to support the general fronto-temporal network of areas highlighted in inner speech elicitation studies (e.g., McGuire et al., 1996 ), notwithstanding their lack of behavioral controls.

Another concern about standard neuroimaging approaches to inner speech is that they are limited by the temporal resolution of fMRI, meaning that the dynamic interplay between areas responsible for speech production and perception may be overlooked. Neurophysiological techniques, such electroencephalography (EEG) and magnetoencephalography (MEG), offer millisecond-scale resolution, albeit usually at the expense of spatial precision within the brain.

Preliminary evidence from MEG research has highlighted potential differences in the timecourse of different kinds of inner speech, and how its production affects speech perception areas in temporal cortex. Tian and Poeppel (2010) compared MEG responses for (a) overt articulation, (b) imagining saying something in one’s own voice, (c) imagined hearing something in someone else’s voice, and (d) actually hearing another’s voice. Imagined speaking and hearing both localized to bilateral temporal cortex (which was interpreted as indicating the auditory simulation process), but imagery for speaking localized first to left parietal cortex ( Tian & Poeppel, 2010 ). In a subsequent experiment, imagery for speaking and hearing appeared to have different repetition priming effects on auditory cortical responses: the former increasing activity, and the latter inhibiting it ( Tian & Poeppel, 2013 ). Tian and Poeppel argue that these differences exist because the additional motor elements of imagined speech involve the deployment of a somatosensory forward model (i.e., not just a sensory simulation), and serve to prime auditory areas to recognise a given response (a top-down effect), rather than habituate them to an old response (a bottom-up effect; Tian & Poeppel, 2012 ). The involvement of parietal cortex is also consistent with findings from studies of mental imagery in other modalities, which often involve recruitment of the superior and inferior parietal lobules ( McNorgan, 2012 ).

One caveat in interpreting Tian and Poeppel’s findings is that they compare imagined speaking in one’s own voice with imagined hearing of another’s voice, making it hard to disentangle additional demands involved in generating one’s own voice versus another’s (cf. McGuire et al., 1996 ). In addition, they explicitly refer to their stimuli as prompting mental imagery, rather than inner speech, leaving open to what extent their task is tapping similar processes to those involved in verbal rehearsal, for example. Nevertheless, the suggested separation of “spoken” and “heard” representations in their results would be consistent with separable articulatory rehearsal and phonological store components in the phonological loop ( Baddeley & Logie, 1999 ). They are also in line with separate behavioral effects of the “inner voice” and “inner ear” that have been reported in auditory imagery experiments ( Hubbard, 2010 ), and Hurlburt’s distinction between inner speaking and inner hearing ( Hurlburt et al., 2013 ).

While the above findings are informative about the neural components of inner speech, one final concern about such studies is their ecological validity. Many have largely relied on relatively simple word- or sentence-repetition paradigms, meaning that they may miss a degree of complexity and variety inherent in everyday inner speech ( Jones & Fernyhough, 2007 ). Some recent studies have reported on the neural basis of more naturalistic forms of inner speech, such as those involved in silent reading, or spontaneous cognition during verbal mind-wandering (also known as stimulus-independent thought ). For instance, Yao, Belin, and Scheepers (2011) compared brain activation during reading for direct speech ( The man said “Get in the car” ) and indirect speech ( The man said to get in the car ), on the rationale that the former likely involved specific representation of a character’s voice. Compared with passages of indirect speech, direct speech was associated with greater activation in right auditory cortex (posterior and middle superior temporal sulcus), alongside recruitment of the superior parietal lobules, precuneus, and occipital regions bilaterally. The authors argued that this reflects a more vivid perceptual simulation of the “inner voice” during reading, in a way that might be more spontaneous and ecologically valid than methods that require the top-down elicitation of specific voices in inner speech (cf. Shergill et al., 2001 ).

A second example is provided by Doucet et al. (2012) , who studied self-reported inner speech during a resting-state MRI session. A large sample of participants ( n = 307) completed a custom-designed questionnaire ( Delamillieure et al., 2010 ) about their resting cognition immediately after an 8-min scan. Participants’ reports for proportion of time spent in either inner speech or visual imagery were then assessed for their effect on connectivity within five resting brain networks selected using independent components analysis (ICA). Greater time spent using either inner speech or visual imagery was linked to reduced connectivity between two networks: the default mode network, which is usually associated with introspection and self-referential thinking ( Raichle et al., 2001 ), and a fronto-parieto-temporal network, including the inferior frontal gyrus, middle and inferior temporal gyri, angular gyrus, and precuneus. Fronto-parietal networks are often thought to support attentional focus and engagement, and in a prior study Doucet and colleagues had linked this fronto-parieto-temporal network to the maintenance of internally generated representations ( Doucet et al., 2011 ). Thus, the use of either inner speech or visual imagery in this case appeared to involve some sort of decoupling between introspective and attentional processes. Although these data are only preliminary and not specific to inner speech, they point toward the possibility of identifying separable resting networks involved in the generation and maintenance of spontaneous verbal thoughts.

Inner Speech and Variations in Linguistic Experience

One further difference between the developmental and working memory approaches to inner speech is in their relative emphasis on the influence of linguistic experience. Specifically, the Vygotskian developmental account would hold that variations in language experience should be reflected in the subsequent nature of self-directed speech. In private speech research, this idea has been tested by examining the effect of, for example, culture-specific patterns of child–adult interaction ( Berk & Garvin, 1984 ) and contrasts between collectivistic and individualistic cultures ( Al-Namlah et al., 2006 ). Perhaps reflecting the greater methodological challenges in studying speech that has been fully internalized, there are very few studies that speak to this topic in relation to inner speech.

Research on bilingualism has provided some preliminary information on the relation between inner speech and prior linguistic experience, largely via studies of second language (L2) learning. In general, use of L2 inner speech appears to increase with proficiency in the second language, but also evidences a change in function, with less fluent learners reporting use of L2 specifically for rehearsal and planning of speech, but more able speakers using it for less voluntary and more abstract modes of thinking ( de Guerrero, 2005 ). L2 learners have also been known to report a growing “din” of sounds and words from the second language in their mental experience as they become more proficient ( Krashen, 1983 ), an experience that is suggestive of the internalization or developing automaticity of thought in L2.

There is also evidence that L2 learning may have a differing impact upon inner speech and related processes depending on when it is encountered. For instance, Larsen, Schrauf, Fromholt, and Rubin (2002) studied inner speech and autobiographical memory in relation to second-language learning among Polish immigrants living in Denmark. Half of the participants were “early” immigrants, moving at the average age of 24, while the other half had moved at a later age ( M = 34 years). Despite both groups having lived in Denmark for at least 30 years, early compared with late immigrants reported greater use of Danish inner speech, while both groups tended to report autobiographical memories in Polish when the recalled events occurred prior to moving, and in Danish when the events occurred after moving. Two implications can be drawn from this study: first, that the language of inner speech is affected by the age of acquisition of a second language, and second that any such effect may be independent of a language-specificity effect linking recall of autobiographical memories to the language used at encoding.

Another approach to this question is to consider the experience of inner speech, or analogous processes such as imagery for sign language, in people who are deaf. Historically, a large body of psychological research was conducted under the mistaken assumption that people who are deaf would have no inner language facility, and would thus lack certain capacities for abstract thought (see, e.g., Oléron, 1953 ). This not only assumed an identity between language and complex thought, but also failed to recognise deaf people’s ability to use nonspeech based languages, such as signing. This impoverished view of deaf individuals’ cognition only began to be overturned in the 1960s, with the emergence of studies reporting abstract, nonverbal reasoning in deaf individuals (e.g., Furth, 1964 ) and a rise in awareness that sign-based languages are highly rich and complex languages in their own right ( Stokoe, 2005 ).

Since then, a broad range of studies have examined verbal and nonverbal cognitive skills in deafness (see Marschark, 2006, for a review ), although still very little is known about the use and prevalence of inner speech or sign by deaf people. From a developmental perspective, it may be expected that deaf individuals would report qualitatively different experiences of their inner speech, or be less likely to engage in certain kinds of inner speech, if their opportunities to engage communicatively with others in early childhood are constrained (over 90% of deaf children have hearing parents; Vaccari & Marschark, 1997 ). However, recent data from a questionnaire study on private sign and inner speech by Zimmermann and Brugger (2013) suggest the opposite. In a sample of 28 hearing and 28 deaf adults (of whom 20 were congenitally deaf), Zimmerman and Brugger reported regular use of “signed soliloquy”—overt signing for a private purpose—in deaf signers, which occurred with a greater frequency than did private speech in hearing participants. In addition, deaf participants reported greater use of positive/motivational “inner speech” compared with hearing participants, although the questionnaire used to measure this, the Inventory on Self-Communication for Adults, did not ask participants to distinguish whether this was a specifically verbal or signed experience. The authors interpreted both of these findings as reflecting possible use of coping strategies to counteract feelings of isolation associated with the experience of hearing impairment.

These findings point to the importance of conducting more research with larger samples of deaf individuals, and particularly the necessity of examining the influence of differing linguistic backgrounds, which in the deaf population can be very heterogeneous. That said, existing findings are at least suggestive of the possibility that inner speech or other forms of self-directed language can form part of positive compensatory strategies rather than merely being shaped by prior social interactions.

A final group of interest in this regard is adults who for various reasons have poor language skills. Alarcón-Rubio, Sánchez-Medina, and Winsler (2013) studied private speech use in illiterate adults, in comparison with those with high literacy, when engaging with a categorization task. Compared with high-literacy participants, participants with low literacy displayed much more externalized private speech, particularly on more difficult forms of the task. Such findings support the Vygotskian prediction that linguistic skills are associated with a general internalization of verbal strategies.

Inner Speech in Atypical Populations

The foregoing review has demonstrated that inner speech plays a prominent role in everyday experience and cognitive function for healthy children and adults. Important information on the psychological significance of inner speech is also provided by studies of how typical processes of inner speech development and production are perturbed in atypical populations, including developmental disorders and psychiatric illnesses in adulthood.

Developmental Disorders

One area that has seen an increased attention to inner speech is the study of autism spectrum disorders (ASDs). ASDs are characterized by difficulties in social interaction and communication alongside the presence of restricted interests and repetitive behaviors ( WHO, 1993 ). Many children with an ASD show delays in early language development, and even those with good structural language skills—such as children with Asperger syndrome (AS)—typically have enduring difficulties in communicating with others. Given the proposed grounding of inner speech in external communication and interaction, it follows that the development of inner speech is likely to be disrupted and/or delayed in individuals with an ASD ( Fernyhough, 1996 ). This in turn could have implications for the understanding of cognitive strengths and weaknesses seen in ASD, such as problems with theory of mind and executive functioning skills ( Baron-Cohen, Leslie, & Frith, 1985 ; Russell, 1997 ).

Anecdotal support for this idea comes from the descriptions of inner experience made by people with ASDs. Most notably, Temple Grandin (1995) is known for describing her experience of thought as “thinking in pictures” rather than inner speech (for an elaboration of this idea, see Kunda & Goel, 2010 ). In a study using DES, Hurlburt, Happé, and Frith (1994) interviewed three adults with AS about their inner experience. As the authors noted, there are questions about the communicative and introspective demands of this technique for individuals with ASD. Nevertheless, two of the three participants reported uniformly visual experiences, and none of the three described experiences of inner speech or internal dialogue ( Hurlburt et al., 1994 ).

A number of subsequent experimental studies have examined inner speech in autism, primarily using paradigms from executive functioning research. Inner speech was indirectly probed by Russell, Jarrold, and Hood (1999) in a study of executive skills in children with autism and typically developing matched controls. Two tasks were deployed which either (a) did not require the maintenance of an explicit rule, or (b) demanded an overt verbal response that would conflict with maintenance of inner speech. ASD participants showed intact performance on both tasks, in contrast to evidence of executive difficulties in other studies on autism ( Hughes, Russell, & Robbins, 1994 ; Ozonoff, Pennington, & Rogers, 1991 ), which the authors argued could reflect differences in the deployment of inner speech in ASD. On the first task, no specific rule needed to be maintained in inner speech, leading to equal performance in both groups. On the second task, the requirement to respond verbally could have produced a conflict for typically developing participants if they were also maintaining a rule in inner speech, thus nullifying any advantage they might have had over participants with autism ( Russell et al., 1999 ).

Whitehouse, Maybery, and Durkin (2006) conducted the first study directly examining inner speech in ASD. On a verbal recall task, children with ASD showed a reduced picture superiority effect compared with controls—an effect which is thought to rely on dual coding of pictures visually and verbally via inner speech ( Paivio, 1991 ). In follow-up experiments, the same group of participants showed a diminished word-length effect on their verbal recall, and no effect of articulatory suppression, both of which suggested a diminished use of inner speech to support memory processes ( Whitehouse et al., 2006 ).

Further evidence of irregularities in inner speech was provided in studies by Holland and Low (2010) , Russell-Smith, Comerford, Maybery, and Whitehouse (2014) , and Wallace et al. (2009) . Wallace et al. (2009) compared problem-solving performance on the Tower of London with and without articulatory suppression in adolescents with autism and typically developing controls. Pairwise comparisons of performance in each group indicated that typically developing participants, but not ASD participants, were adversely affected by articulatory suppression, suggesting an interference effect with inner speech. It should be noted, however, that the initial group-by-condition interaction effect was not significant in this case, and the main effect of group only approached significance ( Wallace et al., 2009 ). Holland and Low (2010) also compared children with autism and typically developing controls on a towers task (the Tower of Hanoi) along with an arithmetic-based switching task. On both tasks, children with autism were affected less by articulatory suppression than were control children, and on the arithmetic task children with autism also showed proportionately greater interference from a visuospatial distractor activity. Finally, Russell-Smith et al. (2014) compared children with ASD and typically developing children on a card-sorting task under normal conditions, articulatory suppression, explicit strategy verbalization, and concurrent mouthing (included to control for nonspecific motor demands). Articulatory suppression impaired performance in typically developing children, but not ASD children. Moreover, explicit verbalization—which may have been expected to benefit the ASD group if they were not already using inner speech—only showed benefits for control participants. Thus, across tasks drawing on capacities for memory, planning, and cognitive flexibility, there is evidence that inner speech is less likely to be used by children with ASD than by their typically developing counterparts.

However, evidence of typical verbal strategy use in ASD children has also been reported in some cases ( Williams, Happé, & Jarrold, 2008 ; Winsler, Abar, Feder, Schunn, & Rubio, 2007 ). In a study contrasting children with autism, children with attention deficit hyperactivity disorder (ADHD), and typically developing children, Winsler, Abar, Feder, Schunn, and Rubio (2007) coded overt private speech use on the Wisconsin Card Sorting Task ( Heaton, 1993 ) and a physical problem-solving task. Contrary to expectations, no consistent group differences were observed in private speech use, with around 70% of ASD participants spontaneously using private speech to support their performance. As no interference tasks were used, the findings do not show that internalized verbal strategies (i.e., inner speech) were being used in the same way, but they are suggestive of similarities in inner speech use between ASD, ADHD, and typically developing children.

Supporting this idea, Williams, Happé, and Jarrold (2008) reported intact use of inner speech during verbal recall in children with autism. Using a task that included pictures that were either phonologically similar, visuospatially similar, or dissimilar in both respects, both ASD and control children showed evidence of the phonological similarity effect, proposed to occur when inner speech is used to recode pictures into words to assist recall.

Williams and colleagues argued that these results reflect intact inner speech as a mechanism to support recall in ASD, but did not rule out potential qualitative differences in inner speech. One way in which inner speech in autism could differ qualitatively from inner speech in typical development is in the resources drawn on to support it. Lidstone, Fernyhough, Meins, and Whitehouse (2009) conducted a reanalysis of the data from Whitehouse et al. (2006) comparing relations between cognitive profile and inner speech in children with autism and in typically developing controls. Because inner speech is proposed to have a basis in early communicative interaction, Lidstone and colleagues hypothesized that children with autism with greater nonverbal than verbal skills (a cognitive profile common in ASD) would also be less likely to use inner speech during task performance. This prediction was confirmed: only ASD participants showed a significant effect of cognitive profile, with NV > V participants showing the least interference from articulatory suppression on an arithmetic switching task. The authors also suggested that this may explain some of the previous null findings of inner speech differences in autism reported by Williams et al. (2008) . This, however, was not supported in a reanalysis of the Williams et al. (2008) data by Williams and Jarrold (2010) , who found verbal ability to be the strongest predictor of inner speech use, rather than the relative levels of verbal and nonverbal skills.

Qualitative differences in inner speech in ASD might also be evidenced in the formal properties of inner speech. Williams, Bowler, and Jarrold (2012) studied inner speech use in adults with ASD on a verbal recall task and a Tower of London planning task. On the former, the phonological similarity effect and articulatory suppression effect were used as indices of inner speech use; on the latter, the index was the size of the articulatory suppression effect. On the memory task, both ASD and typically developing adults showed evidence of inner speech use, but on the planning task only controls were affected by articulatory suppression.

Williams et al. argued that sense could be made of these results by drawing on Fernyhough’s (1996) distinction between monologic and dialogic inner speech. The memory task only requires verbal material to be rehearsed, via repetition, in a way that does not require the coordination of multiple perspectives: in other words, a monologic strategy. In contrast, the planning task required a dialogic consideration of multiple alternatives and routes, and the weighing-up of different strategies. If dialogic thinking ( Fernyhough, 1996 , 2009a ) has its roots in external communication and interaction with others, then it is dialogic but not monologic inner speech that would be expected to be either atypical or absent in ASD. Thus, it could be that ASD individuals deploy monologic inner speech to support their cognitive performance, but either do not possess or do not use dialogic inner speech in the same way ( Williams et al., 2012 ). Supporting this idea, communication scores for ASD participants on the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2000 ) and Autism Quotient ( Baron-Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001 ) were observed to predict articulatory suppression effects during planning, suggesting a link between communicative ability and lack of inner speech specifically to support problem-solving. Further work is needed to test this hypothesis: the same distinction has not yet been tested in children with ASD, nor on other problem-solving tasks that would in theory require dialogic inner speech. Nevertheless it represents a promising route for understanding inner speech in a unique population.

Similar benefits of studying atypical populations emerge in the example of specific language impairment (SLI). Lidstone et al. (2012) proposed that, if inner speech use is related to earlier communicative development, children with an SLI may be expected to demonstrate delay or deviation in their inner speech skills. In line with the evidence of private speech use in adults with literacy problems ( Alarcón-Rubio, Sánchez-Medina, & Winsler, 2013 ), children with an SLI showed normal effects of articulatory suppression on a towers task, but evidenced less internalized forms of private speech while attempting the task. Lidstone and colleagues interpreted their results as an example of delayed inner speech internalization, rather than a qualitative difference in verbal strategy use. Evidence of general delays or disruptions to self-directed speech as a result of developmental disorder is also provided by research on ADHD ( Corkum et al., 2008 ; Kopecky, Chang, Klorman, Thatcher, & Borgstedt, 2005 ), although thus far these studies have only reported on private speech, rather than inner speech.

The identification of differences in self-directed speech in developmental disorders raises the prospect of developing training and instruction methods that could benefit those with cognitive or behavioral difficulties. In the case of autism, it may be that encouragement to engage in dialogic speech processes such as asking questions or taking different perspectives could benefit individuals’ performance on specific tasks or in certain scenarios. However, it is important to recognise that the use of differing cognitive strategies in this group is a mark of variation, not deficiency: the adult participants with ASD in Williams et al.’s (2012) study could complete tower tasks apparently without recourse to verbal strategies, so intervention in this case would be inappropriate. Where training with verbal protocols may be more warranted is in situations that demand specific use of verbal strategies: for instance, use of written cues improves problem-solving efficiency on the Twenty Questions task in children with ASD ( Alderson-Day, 2011 ). In the cases of SLI and ADHD, instructional training in private speech at earlier ages may serve to counteract apparent delays in verbal strategy use. “Think Aloud” methods have been used for some time with children with specific educational needs (e.g., Montague & Applegate, 1993 ; Rosenzweig, Krawec, & Montague, 2011 ), although such methods have been criticized in the past for being overly instructional and failing to recognise that children’s own strategies need to be facilitated, rather than being prescribed by another ( Diaz & Berk, 1995 ). As with ASD, the exact kind of training required will likely depend on the specific skills of the child and their ability to engage in a social and scaffolded process. One promising avenue of research here is the use of microanalytic methods to study exactly when within tasks different kinds of self-talk are deployed (see Kuvalja, Verma, & Whitebread, 2014 , for a recent example of such research in an SLI sample).

Adult Psychopathology

Atypical processing of inner speech has been implicated in psychotic disorders, mood disorders, and anxiety disorders. In relation to psychotic disorders, inner speech has been particularly strongly associated with the phenomenon of auditory verbal hallucinations (AVHs), or the experience of hearing a voice in the absence of any speaker. AVHs—also sometimes referred to as “voice-hearing” experiences—are typically associated with the diagnosis of schizophrenia, but are by no means limited to that group of disorders and occur in a significant minority of the general population as well ( Johns et al., 2014 ). A prominent theory of AVHs holds that they stem from misattribution of inner speech to an external source ( Bentall, 1990 ; Feinberg, 1978 ; Frith, 1992 ). This model has received some support from cognitive studies demonstrating self- and source-monitoring deficits in individuals who experience AVHs ( Brookwell, Bentall, & Varese, 2013 ; Waters, Woodward, Allen, Aleman, & Sommer, 2012 ).

The inner speech model of AVHs also gains support from neuroimaging studies showing activation of language networks during AVHs ( Allen et al., 2012 ). Findings from “symptom-capture” studies (investigating neural correlates of the occurrence of AVHs in the scanner) show activation of inferior frontal gyrus bilaterally ( Kühn & Gallinat, 2012 ), while speech-processing atypicalities in schizophrenia patients who experience AVHs are also consistent with a model in which self-generated speech is likely to be misattributed ( Ford & Mathalon, 2004 ; Whitford et al., 2011 ). Finally, results from neurostimulation studies point to activation of language-relevant areas in AVHs but also highlight inconsistencies requiring refinement of the standard inner speech model of voice-hearing ( Moseley, Fernyhough, & Ellison, 2013 ).

Despite the support for an inner speech account of AVHs, several outstanding difficulties remain in accounting for voice-hearing in terms of inner speech. One relates to the difficulty of studying the state of AVH during scanning via symptom-capture studies ( Ford et al., 2012 ). Recent meta-analyses of symptom-capture studies have come to only partially overlapping conclusions: while Jardri et al. (2011) found AVHs to be associated with activation in left IFG, anterior insula, superior temporal, and hippocampal areas, Kühn and Gallinat (2012) could only find consistent results for bilateral IFG, postcentral gyrus, and parietal areas. The involvement of left IFG in both analyses appears to implicate Broca’s area in the AVH state, but the lack of overlap in other areas precludes inferences about how exactly inner speech may come to be experienced as having an external source. Furthermore, there is evidence that the observed Broca’s area activation could be an artifact of the target detection demands involved in many symptom-capture designs: other stimulus detection tasks involving the monitoring of a particular target and a button press to indicate its presence also often activate this brain region, and it is possible that more naturalistic or retrospective forms of symptom capture would reveal more consistent results for alternative regions ( van Lutterveld, Diederen, Koops, Begemann, & Sommer, 2013 ; although see Shergill et al., 2000 ). As such, evidence from neuroimaging research is suggestive of inner speech being involved in the occurrence of AVHs, but problems in interpreting the evidence remain.

Attempts to evaluate the inner speech model of AVHs have also been limited by impoverished conceptions of inner speech and inappropriate methods for eliciting it ( Jones & Fernyhough, 2007 ; Moseley & Wilkinson, 2014 ). An emerging competitor account conceptualizes AVHs as intrusions from memory, a view arguably supported by evidence of aberrant hippocampal activations in AVHs ( van Lutterveld et al., 2012 ). A further development has been the recognition that AVHs likely take multiple forms ( Jones, 2010 ), with only some forms of the phenomenon being explicable in terms of misattributed inner speech: others may be better understood in terms of a “hypervigilant” attention to external threat ( Dodgson & Gordon, 2009 ), or intrusions from memory ( Michie, Badcock, Waters, & Maybery, 2005 ). Finally, the inner speech model is arguably only applicable to AVHs, not to hallucinations in other modalities, such as visions.

Despite these concerns, the inner speech account of AVHs remains a powerful explanatory tool for at least some voice-hearing experiences, and one that is worthy of further investigation. Phenomenologically, AVHs bear many important resemblances to the experience of typical inner speech ( Larøi et al., 2012 ), such as their frequent dialogicality and self-regulatory quality. Areas of active research interest include understanding the relation between clinical and nonclinical AVHs ( Johns et al., 2014 ), including findings that AVHs in nonpatients are associated with more typical neural organization of language processes than in clinical groups ( Diederen, De Weijer, et al., 2010 ). One possibility is that the distinction between clinical and nonclinical AVHs relates to differing roles of stress and cognitive challenge in triggering anomalous attributions of inner speech ( Fernyhough, 2004 ). Subclinical hallucinatory experiences may also relate to specific characteristics of inner speech in the nonclinical population: dialogic and evaluative characteristics of inner speech, along with the presence of other people in inner speech, have all been related to auditory hallucination proneness in undergraduate samples ( Alderson-Day et al., 2014 ; McCarthy-Jones & Fernyhough, 2011 ).

The role of inner speech in hallucinatory experiences is further illuminated by the example of AVHs in deafness ( Atkinson, Gleeson, Cromwell, & O’Rourke, 2007 ; Pedersen & Nielsen, 2013 ). Some people who are deaf either prelingually or from birth have reportedly had the experience of “hearing” voices in the absence of a speaker (e.g., du Feu & McKenna, 1999 ). Close examination of the phenomenology of such experiences, however, suggests that they rarely incorporate explicit auditory properties. Rather, prior reports may reflect misinterpretation of patients’ descriptions by (predominantly hearing) practitioners and researchers, differing usages of terms such as “loudness” across spoken and signed languages, or deployment of hallucination scales and interviews that do not translate well into use with the deaf population ( Atkinson, 2006 ). When custom-made materials that are specifically designed for deaf participants are used to enquire about unusual experiences, a wide variety of primarily communicative, but not necessarily auditory, phenomena are reported, including experiences of fingerspelling, subvocal sensations, and visual experiences of signing and lipreading. Furthermore, they appear to broadly map on to the prior linguistic experience of the individual: those who had experience of spoken language prior to hearing loss reported more auditory hallucinatory phenomena, while those with little or no access to spoken or signed languages in early childhood reported nonverbal communicative sensations that appeared to lack a specific auditory, visual, or tactile modality ( Atkinson et al., 2007 ).

The range of experiences described in such reports, and their implications for self-monitoring accounts of inner speech and AVHs, make it tempting to draw a range of conclusions. First, it could be argued that the evidence for the existence of AVHs in deaf individuals implies that a misattribution of inner speech is less important to explaining the phenomenon than a more general misattribution of a communicative or articulatory code: that is, it would appear to force a generalization of the self-monitoring account of AVHs, beyond the specifics of speech and into a more general conception of communication. Second, if the deaf and hearing experience of AVH were considered to be comparable, and AVH in deaf participants reflected their prior linguistic experience, then the same might also be expected of AVH and inner speech in the hearing population. Inner speech on this reading would be an internalized reflection of prior communicative experience, susceptible to individual differences in linguistic skills and developmental history.

A degree of caution is appropriate here though, for two reasons. One is that the amount of data available on deaf individuals with hallucinations is still very meagre: Atkinson et al. (2007) , for example, reported on a total sample of 27 individuals, and included some subgroups containing only two people. The other reason, referred to in Inner Speech and Variations in Linguistic Experience, is that very little is known about everyday use of inner speech, inner sign, or any other equivalent in the deaf population. As such, it would be unwise to draw any strong conclusions about “typical” inner speech or AVHs in deafness without knowing more about what is typical in the inner experience of deaf people.

While the greater proportion of research interest has concerned AVHs in people with schizophrenia, inner speech has also been implicated in other forms of adult psychopathology. Rumination is known to be an important feature of depression, and the repetitive concentration on negative thoughts is often described in primarily verbal terms ( Nolen-Hoeksema, 2004 ). In this literature, specific engagement in inner speech has not always been tested, meaning that the verbal nature of rumination has perhaps been more assumed than demonstrated. Nevertheless, some recent studies have highlighted strong verbal and auditory features of depressive states ( Moritz et al., 2014 ; Newby & Moulds, 2012 ) and drawn specific links between inner speech and depression ( Holmes, Lang, & Shah, 2009 ; Holmes et al., 2006 ).

For example, Moritz et al. (2014) asked people with mild and moderate depression to report on the sensory phenomenology of their depressive thoughts and ruminations by completing a web-based measure, the Sensory Properties of Depressive Thoughts Questionnaire, which asked about bodily/tactile experiences, visual sensations, and auditory properties, such as experiencing an “inner critic” (p. 1050). Distinct auditory properties were reported by 31% of the sample, which was more common than visual experiences (27%) but less common than bodily experiences (40%). The presence of sensory experiences in depressive thoughts was consistent with a prior, smaller study by Newby and Moulds (2012) , although in that case visual experiences were more common than auditory experiences, both for ruminations and intrusive memories. Such studies have been interpreted as showing that verbal depressive thoughts either have their own sensory qualities or are accompanied by concurrent imagery, although investigators do not always ask about the verbality of these cognitions: Newby and Moulds (2012) specifically asked their participants about frequency of verbal thoughts, but Moritz et al. (2014) did not.

More specific evidence of a role for inner speech or verbal thinking in depression has come from studies by Holmes and colleagues. Relative to visual mental imagery, instructions to think verbally about hypothetical scenarios can lead to reductions in mood ( Holmes et al., 2006 ) and susceptibility to a subsequent negative mood induction, even when the imagined scenarios are positive ( Holmes et al., 2009 ). Holmes et al. (2009) have argued that this apparently paradoxical feature of verbal thinking reflects the less immersive qualities of inner speech compared to mental imagery, and the capacity to make unfavorable comparisons when thinking from a more abstract position. Within nonclinical populations, research with elementary schoolchildren has reported associations among self-reported rates of positive self-talk, self-esteem, and depression ( Burnett, 1994 ), while overly evaluative forms of inner speech appear to relate to low self-esteem in university students ( Alderson-Day et al., 2014 ).

Stronger and more specific links between psychopathology and inner speech are evident in research on worry and anxiety. Worry is an example of repetitive thinking that is typically defined as being negative, uncontrollable, and aimed at some form of ill-defined problem-solving, such as a problem with no clear solution (see Watkins, 2008, for a review ). Worrying often seems to take a verbal form, and this can have an exacerbating effect in contrast to negative thought in other modalities (such as visual imagery). For instance, Stokes and Hirsch (2010) encouraged a group of self-reported high worriers to engage in either visual imagery or verbal thinking about a worrying topic. Verbal worrying was associated with an increase in negative intrusive thoughts, while visual imagery was associated with a decrease in intrusions (see McCarthy-Jones, Knowles, & Rowse, 2012 , for a contrasting example involving hypomania). The tendency for worrying to be linked specifically to verbal processes is consistent with prior research in generalized anxiety disorder (GAD): Behar, Zuellig, and Borkovec (2005) asked a general participant sample and a sample of people with GAD and posttraumatic stress disorder traits to report on their verbal thoughts (“words you are saying to yourself”) and mental imagery (“pictures in your mind”) during recall inductions for worry and trauma. Worry experiences were predominantly verbal in form, while trauma recall was largely imagery-based. Moreover, worrying was more generally associated with a rise in anxious affect during the experiment, while trauma recall showed a closer link to depressive thinking.

As noted in How do Adults Experience Inner Speech?, more evaluative forms of inner speech correlate with higher levels of nonclinical trait anxiety in university students ( McCarthy-Jones & Fernyhough, 2011 ). Research has also linked anxious self-talk with greater anxiety symptoms in children and adolescents (e.g., Kendall & Treadwell, 2007 ; Sood & Kendall, 2007 ), although such studies arguably suffer from concerns about content overlap between different self-report measures. As in the example of positively and negatively valenced self talk ( Calvete et al., 2005 ), separating the linguistic phenomenon from the psychopathological state is problematic when both self-report measures ask about similar internal states. For this reason, mood-manipulation studies such as that of Stokes and Hirsch (2010) provide more reliable indicators of the relations between inner speech and anxiety.

The Neuropsychology of Inner Speech

Before considering how developmental, cognitive, and neuroscientific findings on inner speech might be integrated into a comprehensive model, we briefly consider what neuropsychological research has contributed to the understanding of inner speech. Generally, evidence from this area has largely supported the idea that inner speech plays an important role in adult cognition, while also shedding light on the relationship between overt and covert speech.

Prior to fMRI research on the topic, neuropsychological cases played an important role in establishing the neural basis of verbal working memory. Baddeley and colleagues have argued that the phonological loop system does not require the same neural systems as overt speech production, based in part on evidence that working memory impairment was more closely associated with damage to the supramarginal gyrus in the parietal cortex, and because of double dissociations between samples with speech planning and speech production difficulties (see Baddeley & Logie, 1999, for a review of these arguments). However, subsequent neuroimaging studies on verbal working memory have not always supported this distinction (e.g., Marvel & Desmond, 2010 ), and neuropsychological studies show examples of both overlap and separation in overt and covert speech processes.

For instance, two studies by Geva and colleagues reported on inner speech and language skills in aphasia ( Geva et al., 2011 , 2010 ). In a behavioral study, Geva et al. (2011) tested 27 patients with poststroke aphasia and 27 healthy controls on a range of language tasks, including rhyming tests of inner speech. Compared with controls, patients were impaired on both inner and overt speech, and performance in both was closely correlated. However, there were also individual cases of intact inner speech but not overt speech, or vice versa, suggesting a possible dissociation between the two domains. Geva et al. (2011) used voxel-based lesion mapping to examine the neural correlates of aphasic impairments in 17 of this sample. Impaired performance on inner speech tasks (rhyming and homophone judgment) was observed to correlate with lesions to left pars opercularis (inferior frontal gyrus) and the supramarginal gyrus, relations that remained even when overt speech performance and working memory skills were taken into account. Such data do not prove a dissociation between inner and overt speech, but they do support the notion that inner speech is not simply identical to overt speech processes at a neural level.

Vercueil and Perrone-Bertolotti (2013) reported on a case study of a woman with epilepsy who experienced jargon-like inner speech during seizures. The experience of jargon in overt aphasia is well-documented, but very few accounts exist of jargon in inner speech, most likely due to difficulties in comprehension and reporting of such an experience by patients. In this case study, the patient was able to report on her experience of jargon-like inner speech during seizures:

Her written report mentioned the fact that during her seizures, even inner speech became incomprehensible, with the perception of an inner jargon which remained self-sustained throughout the seizure even though it sounded strange (she literally wrote: “Incomprehension of inner language (thought is unintelligible), and if I try to repeat inner language out loud, incomprehensible words come out (at any rate I don’t understand them!)” ( Vercueil & Perronne-Bertolotti, 2013 , p. 308).

The authors argue that this provides evidence of shared mechanisms in overt and inner speech, in contrast to the findings of Geva et al. (2011) . It is not clear, however, that the two studies are mutually inconsistent, given that areas required for producing overt and inner speech could largely overlap and yet also draw on unique resources. Irrespective of that issue, though, such descriptions highlight the possible separation of monitoring and comprehension skills from production in inner speech.

Levine, Calvanio, and Popovics (1982) described the case of a 54-year-old man who lost his ability to produce language after a mild right hemiparesis, and consequently was unable to generate inner speech, although he did retain an ability to read (with some difficulty). Levine et al. proposed that the patient’s preserved language skills were based on highly developed visual imagery, supported by his general competence on spatial tasks (such as copying complex figures). Another case study of aphasia, in this case following a stroke in language-relevant areas of left temporal lobe, was documented in the autobiography of Dr. Jill Bolte Taylor ( Taylor, 2006 ). Taylor referred to “the dramatic silence that had taken residency inside my head” (pp. 75–76) in describing her loss of inner speech and a range of associated difficulties such as memory retrieval. Morin (2009b) interpreted Taylor’s loss of inner speech as causing an impairment of a sense of individuality and capacity to reflect on the self, consistent with his proposal that inner speech is involved in self-awareness and the creation of a sense of self ( Morin, 2005 ).

Finally, two studies by Baldo and colleagues examined the impact of damage to language regions on problem-solving and reasoning ( Baldo, Bunge, Wilson, & Dronkers, 2010 ; Baldo et al., 2005 ). Baldo et al. (2005) tested the role of language in supporting task performance on the Wisconsin Card Sorting Test in (a) stroke patients with impaired language abilities, and (b) neurologically intact adults under articulatory suppression conditions. In the clinical group, performance on the WCST was positively correlated with language skill (naming and comprehension), as were matrix reasoning skills. In the nonclinical group, who completed the WCST with and without articulatory suppression, performance was consistently worse when inner speech was blocked, although similar effects were also seen for a visuospatial distractor condition.

In a second study, Baldo, Bunge, Wilson, and Dronkers (2010) examined problem-solving performance on Raven’s Color Progressive Matrices in a sample of 107 patients with left hemisphere stroke lesions and varying levels of language impairment. Stroke patients with aphasia were significantly worse in their problem-solving than were patients without aphasia, particularly for puzzles requiring relational reasoning rather than visuospatial matching. Furthermore, impaired performance in relational reasoning puzzles was related to lesions to the left middle and superior temporal areas of the cortex. Taken together, these studies suggest that damage to typical language areas could impede performance during certain kinds of problem-solving, even when the task does not clearly require language to be attempted.

Toward an Integrated Cognitive Science of Inner Speech

As the foregoing review has demonstrated, a growth of research interest in inner speech has coincided with methodological progress in techniques for eliciting and manipulating it experimentally and imaging its neural substrates ( Fernyhough, 2013 ). At the same time, empirical advances have not always been tightly linked to theoretical issues concerning the development, phenomenology, and possible cognitive functions of inner speech. In this section, we consider outstanding challenges and obstacles remaining for an integrated cognitive science of inner speech, beginning with the question of whether inner speech represents a unitary process that can be adapted to the demands of different tasks and contexts.

Toward a Unifying Account of Inner Speech

We begin by considering whether the findings reviewed above fit with what might be termed a “minimal” account of inner speech. A number of studies still primarily associate inner speech with a unitary process equivalent to covert articulation ( Figure 1a ), with specific functions in maintenance of verbal information and covert planning of speech acts ( Geva et al., 2011 ; Marvel & Desmond, 2010 ; Scott, 2013 ). This view of inner speech is reflected in the selection of tasks in neuroimaging studies, in which participants are typically asked to repeat words or sentences, or judge the stress of specific syllables. The research reviewed here, however, has implicated inner speech in a variety of cognitive processes including social cognition, executive function, and imagination, with functional properties of inner speech changing considerably with age and linguistic experience. There is also evidence, from psycholinguistic and phenomenological studies, to suggest that inner speech can vary in its phonological, semantic, and syntactic properties, from abstract to concrete, from condensed to expanded, and from inner speaking to inner hearing. A minimal view of a single form of inner speech deployed for such varied functions in such different contexts, and with such differing phenomenology, would at the very least require specification of how a unitary process could operate.

An external file that holds a picture, illustration, etc.
Object name is bul_141_5_931_fig1a.jpg

Inner speech (a) as covert articulation, (b) as a flexible abstraction, and (c) in condensed/expanded forms.

Oppenheim and Dell’s (2010) flexible abstraction hypothesis is an example of an account in which a single underlying process can be adapted for differing task demands. In their model, inner speech is primarily an abstract verbal representation at the level of phonemic selection, whose degree of featural specification can be adjusted depending on the degree of articulation deployed (see Figure 1b ). The contrast between condensed and expanded inner speech in Fernyhough’s (2004) model could be viewed in a similar way, although in that case what varies between condensed and expanded forms is the semantic and syntactic complexity of the inner speech representation, as well as its phonological detail (see Figure 1c ).

If the core of inner speech is considered as an abstract code containing a combination of semantic, syntactic, and phonological information, one way to account for its apparent varieties is to think about that “kernel” or abstract code being unpacked in different ways, depending on the recruitment of additional cognitive resources. An “utterance” in inner speech could be articulated to a greater or lesser degree, depending on the relative deployment of speech motor processes. In the case of greater articulatory involvement, inner speech would resemble something akin to an “inner voice,” which would usually correspond to the speaker’s own. According to working memory models, continued rehearsal of the inner speech utterance via the phonological loop would keep the trace maintained in an “inner ear” (see Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is bul_141_5_931_fig2a.jpg

A multicomponent model of inner speech, incorporating developmental, working memory, and psycholinguistic features.

For much of the time this may be all there is to inner speech: a relatively abstract speech code that can be more or less featurally specified, reverberating between articulatory and phonological store components of a verbal working memory system. However, if reports of “inner hearing” are also considered variations in inner speech experience, then articulation may not be the only way of unpacking such representations. Recruitment of phonological associations from memory, without articulation, could give rise to inner hearing experiences, or inner speech that involves the experience of other people’s voices. Specifically trying to produce another’s voice in inner speech (or, as some would term it, auditory imagery) would draw even more upon memory for phonological information to fill out the auditory detail of the trace. Based on the neuroimaging findings discussed in Inner Speech in the Brain, the relative involvement of articulatory and phonological information in this process will correspond to the use of inferior frontal areas (Broca’s area, insula) and posterior temporal structures (superior and middle temporal gyri, temporoparietal junction), respectively.

As an offline, abstract code, inner speech can act as a representational tool, for internal planning, rehearsal, or rumination, or for filling in the gaps in the absence of relevant information (e.g., Miyake, Emerson, Padilla, & Ahn, 2004 ). The presence or absence of a task could determine the structure of inner speech deployed in a particular situation. Conditions where a given response needs to be maintained or regulated (as in set-shifting) may be more likely to require an expanded and task-specific form of inner speech (and in some cases private speech), while more exploratory, open-ended forms of verbal thinking could remain at a more abstracted, condensed level.

This view of inner speech as a multicomponent system points to the value of taking a developmental perspective on this complex and varied experience. Fernyhough (2010) has proposed that inner speech can be considered as resulting from the development of a functional system ( Luria, 1965 ; Vygotsky, 1934/1987 ). Luria construed the executive functions as a functional system involving the interaction of hierarchically organized subsystems with diverse neurological foci ( Luria, 1965 ). Rather than seeking the cause of executive functioning development solely in brain maturation, Luria held that that social interaction shapes emerging cortical organization in the preschool years: “Social history ties those knots which form definite cortical zones in new relations with each other, and if the use of language . . . evokes new functional relations . . ., then this is a product of historical development, depending on ‘extracerebral ties’ and new ‘functional organs’ formed in the cortex” ( Luria, 1965 , p. 391). In Luria’s view, a new form of executive functioning emerges when prelinguistic capacities for monitoring, planning, and inhibition of behavior enter into interfunctional relations with language abilities ( Fernyhough, 2010 ). This corresponds to Vygotsky’s “revolution” in development, when preintellectual language and prelinguistic cognition become fused ( Vygotsky, 1930–1935/1978 ) in the emergence of self-regulatory private speech and then inner speech.

In this view, inner speech represents a functional system whereby initially independent neural systems are “wired together” in new ways by social experience. The basic tools necessary for this developmental progression—such as a phonological loop and the capacity for verbal rehearsal—may already be in place relatively early in childhood, serving core functions of speech production and language learning. Subsequently, the effects of social interaction and communication shape how those tools are put to use in supporting cognition from middle childhood onward. By adolescence and adulthood, changing patterns of deployment of the components of the functional system link nominally separate systems of executive skill, but not necessarily in the same way as before.

Another example of the development of a functional system is the emergence of theory of mind (ToM). ToM capacities have been proposed to result from early forms of intentional-agent understanding becoming modulated by language ( Fernyhough, 2008 , 2010 )—accounting for, inter alia , evidence for very strong relations between ToM reasoning and language in childhood, and effects of inner speech disruption on ToM performance in adulthood ( Newton & de Villiers, 2007 ). One implication of this view is that the functional system(s) of ToM will evidence shifting patterns of relation across age of the component neural systems, consistent with evidence that ToM networks in the brain “crystallize” from more diffuse agglomerations of neural foci in the course of childhood ( Saxe, Carey, & Kanwisher, 2004 ).

A functional systems approach thus entails shifting relations among constituent neural systems over the course of development which will not necessarily represent their eventual pattern of interaction in adulthood ( Fernyhough, 2010 ). In the case of inner speech, early interrelations among language and other systems will likely change as the child develops, perhaps incorporating ToM-relevant regions that are different from those identified in adulthood. Charting these emerging dynamic relations is a challenge for future research. One possibility is that the generation of overt self-regulatory private speech gradually captures the emerging ToM system, or indeed other neural systems, so that the child is able to represent a perspective on her own self-generated speech. In the case of ASD, a delay or deviation in the emergence of ToM, perhaps at the level of very early intentional-agent understanding, may prevent the yoking of language and ToM systems necessary for internal dialogue.

A functional systems approach also has relevance for understanding data from neuropsychological studies of inner speech. Particularly interesting in this respect is the case of acquired aphasia, whose influence on ToM and executive functioning abilities has been the subject of several recent studies (e.g., Willems, Benn, Hagoort, Toni, & Varley, 2011 ). If language is essential for the development of dialogic inner speech, then individuals with acquired aphasia might be expected to be at a disadvantage in tasks requiring inner dialogue. However, typical language development prior to the onset of aphasia may allow the development of dialogic inner speech in childhood and adolescence, creating the cognitive structures necessary for dialogic thinking even if one of the component systems is subsequently damaged and another neural system has to be recruited to replace it. Central to Luria’s reasoning about functional systems was the idea that brain lesions will have differing significances depending on where they occur within an emerging functional system and at what point in development ( Luria, 1965 ; Vygotsky, 1934/1987 ).

Adopting a developmental approach thus points to further developments in how inner speech can be conceptualized and modeled. On this view, inner speech will be shaped by the individual’s linguistic and social experiences, possessing the qualities of being evaluative, discursive, or addressed to others, because it retains some of the pragmatic characteristics of external communication. We have also noted that developmental considerations motivate the drawing of distinctions between monologic and dialogic inner speech ( Fernyhough, 1996 ), a distinction that has been supported by data on self-reported experiences of inner speech ( Alderson-Day et al., 2014 ; McCarthy-Jones & Fernyhough, 2011 ). The dialogic quality of some forms of inner speech is plausibly supported by the recruitment of ToM systems as described above. Figure 3 represents a model incorporating the inner speech model depicted in Figure 2 , with the addition of the social–cognitive processes that may underlie inner dialogue. Fernyhough has proposed that the dialogicality of inner speech can be interpreted as the cognitive provision of an “open slot” ( Fernyhough, 1996 , 2009a ) within which a linguistically manifested perspective generated in the inner speech network is represented while an answering perspective is generated. Alongside this, monologic or dialogic forms of inner speech can be deployed to support nonverbal executive processes where this is required (as in the examples of switch tasks, or cognitive control). Representation of voices and situations will also require retrieval of autobiographical information from long-term memory, as in the case of replaying a particular conversation in the mind.

An external file that holds a picture, illustration, etc.
Object name is bul_141_5_931_fig3a.jpg

The inner speech system and its interaction with executive functions, theory-of-mind, and long-term memory. (CA = covert articulation; PS = phonological store)

Implications of a Multicomponent Account of Inner Speech

One implication of a multicomponent view of inner speech is that everyday instances of the phenomenon are likely to be richer and more complex than conceptualizations of inner speech in typical laboratory studies, which have mostly drawn on a Watsonian view of verbal thinking as overt speech without articulation. Two recent neuroimaging studies have begun the attempt to address this issue: the first by using experience sampling ( Kühn, Hurlburt, Alderson-Day, & Fernyhough, 2014 ) and the latter by evoking dialogic inner speech ( Alderson-Day et al., 2015 ).

A perennial problem for neuroimaging research has been how to tie data on neural activations in the scanner to subjective assessments of experience ( Fell, 2013 ). This problem is particularly acute in the study of inner speech, where experimental manipulations intended to elicit inner speech may result in experiences quite dissimilar to ordinary spontaneous inner speech (cf. Jones & Fernyhough, 2007 , and Hubbard, 2010 ). Silent reading has been used as a paradigm for studying featural properties of inner speech ( Yao, Belin, & Scheepers, 2011 ), but even this is by no means certain of tapping spontaneous examples of verbal thinking ( Fernyhough, 2013 ). In an attempt to bridge this gap, Kühn, Hurlburt, Alderson-Day, and Fernyhough (2014) combined the DES method with fMRI to examine randomly beeped moments of inner experience while participants took part in resting-state scans. Participants were trained in the DES method for 1 week before completing a week of MRI scans that contained random DES beeps. Reporting on a case study of one participant who self-described as regularly experiencing inner speech, Kühn et al. observed consistent activation of left inferior frontal gyrus for beeps associated with verbal thinking in general, and inner speech in particular. There was also some preliminary evidence for localized distinctions between experiences of inner speaking and inner hearing. Conclusions are necessarily limited by the single-case design, but such findings at least act as a proof of principle that spontaneous inner speech can be studied in depth both qualitatively and neurally.

Alderson-Day et al. (2015) , in contrast, examined the neural basis of dialogue-like verbal thinking. When participants were asked to generate dialogue in their heads, in contrast to matched monologic scenarios (for instance, telephoning a relative and having a conversation, as compared with leaving a voicemail), a widespread bilateral neural network was implicated, including medial frontal regions, precuneus, posterior cingulate, and right posterior superior temporal gyrus. Activation in the latter region also significantly overlapped with activation linked to ToM reasoning ( Alderson-Day et al., 2015 ). These findings were interpreted in terms of dialogic inner speech involving an interaction between language and social cognition networks. The findings suggest a neural instantiation of this interaction between a system for generating an element of inner speech and a system for responding to it from the perspective of another person—in other words, for the provision of an “open slot” within which an utterance generated in the inner speech network is represented while a dialogic response is being generated.

In addition to providing theoretical detail on the cognitive and neural instantiations of dialogic inner speech, Alderson-Day et al.’s study responds to the challenge set by Jones and Fernyhough (2007) to develop more ecologically valid methods for eliciting inner speech. A further phenomenological feature that is worthy of continued empirical study is the distinction, derived from Vygotsky’s theory, between condensed and expanded inner speech ( Fernyhough, 2004 ). As noted, this distinction bears strongly on the debate about how much inner speech retains phenomenological features of overt speech, such as tone, accent, and timbre (see What is the Relation Between Inner Speech and Overt Speech?). Rather than specifying levels of featural richness for all inner speech, Fernyhough’s (2004) model proposes flexible movement between condensed and expanded forms in typical spontaneous inner speech, an idea that receives some support from studies involving the VISQ self-report instrument ( Alderson-Day et al., 2014 ; McCarthy-Jones & Fernyhough, 2011 ).

Although it has not yet been the focus of neuroimaging studies (not least because of the difficulty, noted above, of capturing heterogeneous forms of inner speech in the scanner), it is possible to speculate on the neural substrates of condensed and expanded inner speech. Because it is not phenomenologically full-blown, condensed inner speech could be predicted not to activate areas involved in detailed phonological representation, such as the STG. The “pure meanings” of condensed inner speech, instead, may be expected to be based in areas associated with semantic representations and abstract knowledge about semantic categories. The ventral posterior middle temporal gyrus has been proposed to provide a “lexical interface” bringing together semantic and phonological representations ( Hickok & Poeppel, 2007 , p. 395), while the anterior temporal pole has been linked to modality-invariant, abstract representations of semantic categories ( Visser, Jefferies, & Lambon Ralph, 2010 ). It is possible that the move from condensed to expanded inner speech will involve translation of such representations into something more fully voiced, via articulation in the left IFG and phonological representation in STG structures.

What are the Relations Among Inner Speech, Inner Hearing, and Auditory Imagery?

Another phenomenological distinction that has emerged from studies of the subjective experience of inner speech is the distinction between inner speaking and inner hearing. This distinction stems from Hurlburt’s (e.g., Hurlburt et al., 2013 ) work showing that some DES participants distinguish episodes in which they feel themselves to be the producers of the speech from those in which inner speech is more passively received (as in listening to one’s own voice on a recording). Such a distinction is absent from many areas of inner speech research, including work on child development and studies of adult executive function. In contrast, evidence for separable mechanisms for inner speaking and hearing is provided in the literature on verbal working memory and auditory imagery: as noted above, the “inner voice” and “inner ear” can both be disrupted under conditions of articulatory suppression and purely auditory interference, but can show separable interference effects depending on the kind of task deployed (see Smith, Reisberg, & Wilson, 1992, for a review ). Indeed, the “articulatory loop” in working memory was renamed the “phonological loop” by Baddeley and colleagues precisely because of evidence that phonological information could be retained in working memory even when articulation is blocked ( Baddeley & Larsen, 2007 ).

What Hurlburt’s observations add is the suggestion that, at least for some people, the everyday experience of “inner speech” may not always involve an experience of actively speaking. Inner speech may be generated in one’s own voice or in that of another, but the experience of it—in the sense of an internal representation of verbalized language—will not necessary feel as though one is involved in its production. If correct, this raises important questions for developmental accounts of inner speech, such as what components underlie inner speaking and inner hearing, and when they are in place. There may also be implications for theories of psychopathology: would a person who reports more frequent experiences of inner hearing than of inner speaking be more prone to unusual experiences, such as hallucinations or passivity phenomena?

It has also been suggested that inner speech is a special case of a more general phenomenon of auditory imagery. For example, Levine et al. (1982) defined inner speech as the “subjective phenomenon of talking to oneself, of developing an auditory-articulatory image of speech without uttering a sound” (p. 391). More recently, Hubbard (2010) reviewed empirical research on auditory imagery and treated inner speech as a form of imagery. On such a reading, inner speech refers to a subset of auditory imagery experiences; namely, just those that happen to include the representation of speech.

In support of such an idea, inner speech and auditory imagery appear to share many similar properties; indeed, some studies using inner speech paradigms refer to it as articulatory imagery or speech imagery. Both inner speech and auditory imagery show evidence of interference under articulatory suppression, for example. Both are also associated with activation in a set of common regions, including inferior frontal gyrus, insula, SMA, and posterior superior temporal gyri (among other regions) in neuroimaging studies ( Hubbard, 2010 ; Price, 2012 ). Considering inner speech as an example of auditory imagery offers one way of subsuming inner speech and related phenomena into a single class of cognitive processes. One reason for not doing so would be if inner speech appeared to rely on underlying mechanisms or have effects that made it function in a different way to imagined sound.

We argue that there are good reasons to retain the label of inner speech as a related but broadly separable process to auditory imagery. First, although motor processes can affect certain kinds of auditory imagery ( Hubbard, 2013 ), subsuming inner speech within imagery would appear to underestimate its articulatory component, in which words are usually actively voiced and expressed rather than simply being “sounded out.” It is not at all clear—and would seem counterintuitive to suggest—that inner speech is “imagined” in the same way that one can imagine the sound of a siren, or even imagine hearing one’s own voice on a recording, notwithstanding the fact that some individuals may experience inner speech more as a “hearing” than as a “speaking” phenomenon.

In neuroimaging studies, this articulatory involvement is reflected in the general pattern of regions associated with inner speech and auditory imagery. Despite some overlap in activations, inner speech paradigms are commonly associated with left inferior frontal gyrus, left insula, and left STG activation ( Fegen et al., 2015 ; McGuire et al., 1996 ); in contrast, auditory imagery for speech (whether imagining hearing one’s own voice or another’s) and auditory imagery for other sounds is associated with activation of SMA, posterior parietal cortex, and STG/MTG bilaterally ( Zatorre & Halpern, 2005 ). Contemporary models of speech processing suggest at least two cortical streams affecting speech perception: a left lateralized dorsal stream, connecting speech motor processing (left inferior frontal gyrus and insula) with posterior temporal regions, and a bilateral ventral stream linking hippocampal structures and the inferior and middle temporal gyri ( Hickok & Poeppel, 2007 ). Evidence from Tian and Poeppel (2010) , for example, suggests that these separate streams produce differential and contrasting repetition priming effects on speech perception. As such, it seems useful to consider articulated language representations as related but importantly different entities to auditory images more generally.

Second, considering inner speech as a kind of imagery would not seem to fit comfortably with the range of evidence reviewed above. Inner speech is used as planner, regulator, reminder, and commentator across many different contexts, and in some cases would appear to have differential effects to engagement in mental imagery (e.g., Stokes & Hirsch, 2010 ). Speech representations are arguably unique in their capacity to generate and maintain propositional content while ordinary perceptual processes are still ongoing. Of other modalities, only visual imagery has similar propositional capacity—I can say “the cat is on the mat” or I can create an image depicting that scenario—but images of situations or states of affairs are difficult to generate while visual processing of the outside world is ongoing (e.g., Borst, Niven, & Logie, 2012 ). In this way, inner speech offers an abstract and flexible code to support ongoing cognitive operations. Perhaps for this reason, inner speech is used much more often as a synonym for thinking that it is for imagery, although usages of both of the latter terms are so broad that their explanatory value is easily questioned. In some cases a distinction between inner speech and imagery has also been framed in terms of the opposition between speaking in one’s own voice and imagining someone else’s voice ( Shergill et al., 2001 ). This, however, would appear to confuse two separable dimensions: first, the extent to which an inner verbal representation is experienced as being articulated rather than being heard, and second, the extent to which a verbal representation has an identity belonging to self or other.

Instead, we advocate an alternative approach utilizing the model depicted in Figure 2 and incorporated into Figure 3 . On this view, inner speech and auditory imagery systems overlap in their use of phonological information from long-term memory, but at its core inner speech is an abstract linguistic code, that shares more resources with overt speech production than does auditory imagery. Often, this will involve concurrent deployment of articulatory processes and phonological representations via the phonological loop, such that inner speech has a sensory-motor and auditory phenomenology of its own. In some circumstances condensed or abstracted inner speech may even be unpacked as an inner hearing experience, if no articulation is involved in its expansion.

Although this work has not yet been conducted, the cognitive and neural dimensions of the distinction between speaking and hearing could be assessed by incorporating items into self-report instruments such as the VISQ, and by attempting to capture such experiences spontaneously during neuroimaging ( Kühn et al., 2014 ). As with the suggestion above concerning experience-capture of dialogic inner speech in the scanner, use of a method such as DES to report on spontaneous occurrences of inner hearing could be correlated with ongoing brain activations in a way that would reveal the neural bases of the distinction.

What are the Relations Between Inner Speech and Mind-Wandering?

Experientially, much of everyday or spontaneous inner speech may also be thought to be similar to verbally based mind-wandering. The growth of interest in cognition in the resting state ( Andrews-Hanna, Reidler, Huang, & Buckner, 2010 ; Buckner, Andrews-Hanna, & Schacter, 2008 ) has recently been accompanied by a more specific interest in the particular modalities present in mind-wandering or stimulus-independent thought ( Delamillieure et al., 2010 ; Doucet et al., 2012 ; Gorgolewski et al., 2014 ). From the results of a semistructured questionnaire assessing subjective experience during fMRI, Delamillieure et al. (2010) reported that 17% of resting-state experiences described by their participants were language-based. It has been suggested by Perrone-Bertolotti et al. (2014) that verbal mind-wandering may involve an abstract form of inner speech while voluntary verbal thought may have a more concrete form, and that this distinction might map on to the anticorrelation between default mode network activation and task-positive activation of language networks. Although there is some preliminary evidence in support of such an idea ( Doucet et al., 2012 ), no studies to date have captured specifically verbal mind-wandering in action, and much of mind-wandering may also involve internal representation of other kinds, such as visual imagery. As such, the incidence of inner speech in the resting state remains largely unclear.

Nevertheless, the idea of concrete and abstract inner speech mapping on to voluntary inner speech and involuntary mind-wandering is an intriguing one, with potential overlaps with some other concepts described above, such as the distinction between condensed and expanded forms of inner speech. Fernyhough’s (2004) model would predict that resting-state inner speech would be predominantly condensed, as the theory holds that reexpansion happens when cognitive challenge increases. If there is no task, there is by definition no cognitive challenge, and thus the default mode of condensed inner speech would predominate.

Challenges for future research include developing improved methods of assessing subjective experience in the scanner that will allow a closer integration of mind-wandering phenomenology with information on neural activations. The methodology described by Kühn et al. (2014) suggests one possible approach to studying verbal mind-wandering using an experience sampling design. Given the proposed role for inner speech in resting-state cognition, there is also a need for functional connectivity studies focusing on how the inner speech network modulates the activities of the default mode network and various task-positive networks, in both healthy participants and patients with disorders such as schizophrenia.

Inner Speech and the Forward Model in Auditory Verbal Hallucinations

As previously noted, perhaps the most prominent use of inner speech as an explanatory concept is in the domain of auditory verbal hallucinations (AVHs). Fernyhough ( Fernyhough, 2004 ; Fernyhough & McCarthy-Jones, 2013 ) has argued that attention to the multifaceted nature of inner speech, particularly the distinction between its condensed and expanded forms, can be instructive in accounting for the paradoxical “alien yet self” quality of such experiences ( Leudar & Thomas, 2000 ). What the foregoing review highlights, in demonstrating the heterogeneity and complexity of these processes, is that disruptions to inner speech (resulting from or leading to psychopathology) are likely to have equally varied effects.

As reviewed in Adult Psychopathology, prominent models of AVH posit that voices arise from a failure of self-monitoring, whereby internal speech productions are misattributed to external sources. Hallucinations are posited to arise from a disruption to signals sent between areas responsible for speech production and perception (e.g., Broca’s area and Wernicke’s area). One criticism of such an explanation is that the voices heard in AVH do not resemble the person hearing them: they both feel alien and resemble the voices of other people, and say things that the hearer may not normally say. However, if inner speech is taken to be the “raw material” of AVH, then this will potentially involve many different kinds of speech representation, varying in phonological detail and identity depending on the personal experience of that individual. And if inner speech is a multicomponent phenomenon, then multiple resources will be recruited to represent more or less featurally rich inner speech, or inner speech in the voices of other people, meaning that many more pathways than a “typical” Broca–Wernicke network may be involved. Potential pathways suggested by MRI studies of hallucinations include right hemisphere homologues of language areas ( Sommer et al., 2008 ), hippocampal cortex ( Diederen, Neggers et al., 2010 ), and subcortical structures ( Hoffman, Fernandez, Pittman, & Hampson, 2011 ).

Evidence of social-cognitive involvement in dialogic inner speech raises intriguing questions as to the role of ToM in representing inner voices. If ToM is drawn upon to shape representations in inner speech, it could be that mental states, rather than verbal representations per se, are being misattributed in the case of AVH ( Bell, 2013 ; Wilkinson & Bell, in press ); that is, the input of ToM processes into internal monologue or dialogue could in themselves be disrupted or atypical (e.g., Koster-Hale & Saxe, 2013 ). Further research on the interrelation of involuntary inner speech and verbal mind-wandering may also shed light on this question, as they have implications for the sense of agency and ownership conferred on one’s own inner speech. Where this is disrupted, inner speech may feel like it is coming from another agent or entity, as is the case in examples of thought insertion ( Langland-Hassan, 2008 ).

A remaining conceptual question for such self-monitoring accounts of AVH is what part of this system should be equated to the experience of inner speech. Due to their basis in motor theory, such accounts typically posit that hallucinations arise from a mismatch in the comparison between an action and a forward model of its predicted sensory consequences: In the case of AVH, a mismatch between an episode of inner speech and its predicted state gives rise to an anomalous internal representation of speech. However, in contrast to overt speech, inner speech has no sensory consequences of its own by definition, leaving its position in self-monitoring accounts unclear.

One solution is to posit that, because inner speech uses many of the same speech-motor processes as overt speech, it is still accompanied by the issuing of a forward model. That is, if someone engages in inner speech (or, effectively, subvocal speech) but does not realise it, prediction signals will still be sent to sensory areas to create an experience of a voice. In contrast, several recent authors have recently proposed that the normal experience of inner speech in some way equates to either the forward model itself, or to the sensory outcomes it predicts. Scott (2013) presented evidence that generation of inner speech attenuates the perception of external sounds, consistent with the view that a sensory prediction of the utterance in question interferes with the perception of an external sound. In two experiments, Scott, Yeung, Gick, and Werker (2013) showed that generating inner speech “captured” the perception of ambiguous auditory sounds, suggesting the functioning of a forward model in shaping the perception of an incoming sensation, both when the vocalization was generated in inner speech and when it was silently mouthed (see Hubbard & Stoeckig, 1988 , for a similar example of priming effects during auditory imagery).

Elsewhere, Pickering and Garrod (2013) have proposed that inner speech might be a stripped-down product of forward models that enables the detection of errors in overt speech before they occur, while Oppenheim (2013) has suggested that inner speech may constitute an internal loop consisting entirely of forward model predictions, bypassing the need for recruitment of standard production and comprehension systems. When one considers Oppenheim and Dell’s (2010) findings of greater phonological detail in inner speech being associated with greater articulatory involvement, this would seem to fit with a conception of inner speech reflecting a predicted state with a level of featural detail that varies according to the degree of articulatory motor involvement.

Specifying the role of the predicted state in inner speech production is an important challenge for future research on the relations between everyday inner speech and atypical experiences such as AVH. In one respect, an account of inner speech as attenuated action is congruent with the Vygotskian view that it represents an internalized (and thus truncated) version of external social exchanges. Particular challenges include accounting for the varied phenomenology of inner speech (particularly the processes such as abbreviation and condensation that are proposed to accompany internalization), and explaining how inner speech has cognitive efficacy in domains such as the self-regulation of cognition and behavior, if indeed it is considered to have its basis in internal speech predictions. Finally, a further problem is that self-monitoring theories in general have been criticized for failings in accounting for the evidence from psychopathology, including the high variability of positive symptoms among patients ( Frith, 2012 ).

Do We Really Need Inner Speech?

A final question, again prompted by phenomenological investigation of inner speech, is whether we overestimate its presence and relevance. As Hurlburt et al. (2013) note, presuppositions about the ubiquity of inner speech may limit the accuracy of efforts to report on its incidence. Introspective methods such as DES tend to result in lower incidence ratings than self-report measures. Alderson-Day and Fernyhough (2014) argue that DES may underestimate the incidence of inner speech for various reasons, including that the DES method may not be sensitive to transformations such as condensation (although see Hurlburt & Heavey, 2015 ). There may be further, more profound reasons why differing assessments of inner experience can lead to such divergent characterizations of the phenomena. Hurlburt and Heavey (2015) argue that instruments such as the VISQ offer at best a self-theoretical description of any one participant’s inner experience. Based on their observations of participants’ first DES sessions, they propose that (at least until participants become appropriately skilled through engagement in an iterative process like DES) people are frequently misguided about their own experience ( Hurlburt et al., 2013 ). Although it seems counterintuitive to suggest that individuals can be wrong about their own experience (cf. Jack, 2004 ), the question of how training in reporting on one’s own inner experience might increase the accuracy of self-reports of inner speech remains an intriguing one for future research.

Whether or not Hurlburt is correct, inner speech would certainly appear important to many people’s subjective views of their own experience. Evidence from bilingualism points to inner speech in first and second languages being associated strongly with personal identity and history ( de Guerrero, 2005 ). Correspondingly, loss of inner speech following brain injury, perhaps through its influence on the self-narration that typically accompanies everyday experience, may lead to the diminution of a sense of self ( Morin, 2009b ). Evidence from cognitive studies also points to a prominent role for inner speech in a diverse range of functions, particularly in childhood. In adulthood, the cognitive benefit of verbalized strategies may wane or be superceded but, for many individuals, the importance of inner speech as a private activity at the core of experience would seem to remain.

Further Conceptual Issues

Researchers who have approached inner speech from a Vygotskian perspective have observed that such an approach can be valuable in accounting for the phenomenological richness and diversity of inner speech, along with its multifunctional properties. Several conceptual issues need to be resolved, however, before the value of the Vygotskian position can be fully assessed. One requirement is a more detailed specification of the important concept of internalization ( Fernyhough, 2008 ), where further progress is needed in characterizing the transition of socially configured functions from the interpsychological to the intrapsychological planes ( Fernyhough, 2009a ), along with the cortical reorganizations proposed by Vygotsky and Luria to accompany that process ( Fernyhough, 2010 ). In the case of inner speech, this problem translates into an issue of specifying the cognitive and neural processes underlying the transformations such as abbreviation proposed by Vygotsky.

Finally, it needs to be considered whether a richer account of inner speech, as outlined here, entails a claim for the constitutive involvement of language in thinking (e.g., Carruthers, 2002 ). Such a claim does not necessarily follow. The present characterization of inner speech may be more appropriately conceived as a model of how typically developing humans perform some forms of high-level cognition, without meaning that such processes necessarily require inner speech. Given the progress that remains to be made in studying this form of speech scientifically, any claim that this involvement of language is constitutive would be premature. In addition, we would hold that claims about the role of inner speech as a “language of thought” are fraught with difficulty ( Machery, 2005 ), through being largely untestable and often conceptually muddled.

Also remaining is the question of what, at root, inner speech is for. Adopting an evolutionary perspective, Agnati et al. (2012) have considered inner speech as an exaptation, in Gould and Vrba’s (1982) sense of a feature that has become diverted from its initial evolutionary “purpose” (see also Oppenheim, 2013 ). On Agnati et al.’s view, inner speech initially developed as a positive tool for planning and internal dialogue. AVHs and other putative pathologies of inner speech such as rumination could be considered as “mis-exaptations,” defined as adaptations that have reached a degree of specialization that is deleterious to the organism. While the focus of Agnati et al.’s account is on some of the psychopathological consequences of exaptation, such an idea may be useful for thinking about inner speech more generally. To go further than Agnati et al., the initial “purpose” of inner speech may not have related to general cognitive planning, so much as to supporting overt speech processing, by enabling internal phonological representation and planning of speech acts. Its exaptation, however, could have come in the application of inner speech to the range of other cognitive domains reviewed above, sometimes in clearly beneficial ways (such as thinking about the future, or regulating behavior), but sometimes in ways that are deleterious to other cognitive functions (such as during pathological worrying). In this sense, much of what we know of inner speech could illustrate its significance as an exaptation: as a motor-based linguistic tool that has by chance created an inner life.

Conclusions

Inner speech is a paradoxical phenomenon. It is an experience that is central to many people’s everyday lives, and yet it presents considerable challenges to any effort to study it scientifically. Nevertheless, a wide range of methodologies and approaches have combined to shed light on the subjective experience of inner speech and its cognitive and neural underpinnings. In childhood, there is evidence for a central role for inner speech in regulating behavior and supporting complex cognitive functions. In adulthood, inner speech is implicated in many cognitive processes, but there appears to be wide interindividual variation in how inner speech is put to use, both cognitively and experientially. Furthering our knowledge of the range of ways in which inner speech can operate is a research priority, not just for its implications for understanding development, cognition, and psychopathology, but for drawing us toward a richer understanding of human beings’ inner lives.

  • Acheson D. J., Hamidi M., Binder J. R., & Postle B. R. (2011). A common neural substrate for language production and verbal working memory . Journal of Cognitive Neuroscience , 23 , 1358–1367. 10.1162/jocn.2010.21519 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ackermann H., & Riecker A. (2004). The contribution of the insula to motor aspects of speech production: A review and a hypothesis . Brain and Language , 89 , 320–328. 10.1016/S0093-934X(03)00347-X [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Agnati L. F., Barlow P., Ghidoni R., Borroto-Escuela D. O., Guidolin D., & Fuxe K. (2012). Possible genetic and epigenetic links between human inner speech, schizophrenia and altruism . Brain Research , 1476 , 38–57. 10.1016/j.brainres.2012.02.074 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alarcón-Rubio D., Sánchez-Medina J. A., & Winsler A. (2013). Private speech in illiterate adults: Cognitive functions, task difficulty, and literacy . Journal of Adult Development , 20 , 100–111. 10.1007/s10804-013-9161-y [ CrossRef ] [ Google Scholar ]
  • Alderson-Day B. (2011). Verbal problem-solving in autism spectrum disorders: A problem of plan construction? Autism Research , 4 , 401–411. 10.1002/aur.222 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alderson-Day B., & Fernyhough C. (2014). More than one voice: Investigating the phenomenological properties of inner speech requires a variety of methods . Consciousness and Cognition , 24 , 113–114. 10.1016/j.concog.2013.12.012 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alderson-Day B., McCarthy-Jones S., Bedford S., Collins H., Dunne H., Rooke C., & Fernyhough C. (2014). Shot through with voices: Dissociation mediates the relationship between varieties of inner speech and auditory hallucination proneness . Consciousness and Cognition , 27 , 288–296. 10.1016/j.concog.2014.05.010 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alderson-Day B., Weis S., McCarthy-Jones S., Moseley P., Smailes D., & Fernyhough C. (2015). The brain’s conversation with itself: Neural substrates of dialogic inner speech . Manuscript submitted for publication. [ PMC free article ] [ PubMed ]
  • Aleman A., Formisano E., Koppenhagen H., Hagoort P., de Haan E. H. F., & Kahn R. S. (2005). The functional neuroanatomy of metrical stress evaluation of perceived and imagined spoken words . Cerebral Cortex , 15 , 221–228. 10.1093/cercor/bhh124 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alexander J. D., & Nygaard L. C. (2008). Reading voices and hearing text: Talker-specific auditory imagery in reading . Journal of Experimental Psychology: Human Perception and Performance , 34 , 446–459. 10.1037/0096-1523.34.2.446 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Allen P., Modinos G., Hubl D., Shields G., Cachia A., Jardri R., et al.Hoffman R. (2012). Neuroimaging auditory hallucinations in schizophrenia: From neuroanatomy to neurochemistry and beyond . Schizophrenia Bulletin , 38 , 695–703. 10.1093/schbul/sbs066 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alloway T. P., Gathercole S. E., & Pickering S. J. (2006). Verbal and visuospatial short-term and working memory in children: Are they separable? Child Development , 77 , 1698–1716. 10.1111/j.1467-8624.2006.00968.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Al-Namlah A. S., Fernyhough C., & Meins E. (2006). Sociocultural influences on the development of verbal mediation: Private speech and phonological recoding in Saudi Arabian and British samples . Developmental Psychology , 42 , 117–131. 10.1037/0012-1649.42.1.117 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Al-Namlah A. S., Meins E., & Fernyhough C. (2012). Self-regulatory private speech relates to children’s recall and organization of autobiographical memories . Early Childhood Research Quarterly , 27 , 441–446. 10.1016/j.ecresq.2012.02.005 [ CrossRef ] [ Google Scholar ]
  • Alogna V. K., Attaya M. K., Aucoin P., Bahnik S., Birch S., Birt A. R., et al.Zwaan R. A. (2014). Registered replication report: Schooler & Engstler-Schooler (1990) . Perspectives on Psychological Science , 9 , 556–578. 10.1177/1745691614545653 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Andreatta R. D., Stemple J. C., Joshi A., & Jiang Y. (2010). Task-related differences in temporo-parietal cortical activation during human phonatory behaviors . Neuroscience Letters , 484 , 51–55. 10.1016/j.neulet.2010.08.017 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Andrews-Hanna J. R., Reidler J. S., Huang C., & Buckner R. L. (2010). Evidence for the default network’s role in spontaneous cognition . Journal of Neurophysiology , 104 , 322–335. 10.1152/jn.00830.2009 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Atencio D. J., & Montero I. (2009). Private speech and motivation: The role of language in a sociocultural account of motivational processes In Winsler A., Fernyhough C., & Montero I. (Eds.), Private speech, executive functioning, and the development of verbal self-regulation (pp. 201–223). Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511581533.017 [ CrossRef ] [ Google Scholar ]
  • Atkinson J. R. (2006). The perceptual characteristics of voice-hallucinations in deaf people: Insights into the nature of subvocal thought and sensory feedback loops . Schizophrenia Bulletin , 32 , 701–708. 10.1093/schbul/sbj063 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Atkinson J. R., Gleeson K., Cromwell J., & O’Rourke S. (2007). Exploring the perceptual characteristics of voice-hallucinations in deaf people . Cognitive Neuropsychiatry , 12 , 339–361. 10.1080/13546800701238229 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baars B. (2003). How brain reveals mind neural studies support the fundamental role of conscious experience . Journal of Consciousness Studies , 10 , 9–10. [ Google Scholar ]
  • Bacon A. M., Handley S. J., & Newstead S. E. (2005). Verbal and spatial strategies in reasoning In Newton E. & Roberts M. (Eds.), Methods of thought: Individual differences in reasoning strategies (pp. 80–105). Hove, UK: Psychology Press. [ Google Scholar ]
  • Baddeley A. (1992). Working memory . Science , 255 , 556–559. 10.1126/science.1736359 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baddeley A. (1996). Exploring the central executive . The Quarterly Journal of Experimental Psychology , 49 , 5–28. 10.1080/713755608 [ CrossRef ] [ Google Scholar ]
  • Baddeley A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences , 4 , 417–423. 10.1016/S1364-6613(00)01538-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baddeley A. (2012). Working memory: Theories, models, and controversies . Annual Review of Psychology , 63 , 1–29. 10.1146/annurev-psych-120710-100422 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baddeley A., Chincotta D., & Adlam A. (2001). Working memory and the control of action: Evidence from task switching . Journal of Experimental Psychology: General , 130 , 641–657. 10.1037/0096-3445.130.4.641 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baddeley A., Gathercole S., & Papagno C. (1998). The phonological loop as a language learning device . Psychological Review , 105 , 158–173. 10.1037/0033-295X.105.1.158 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baddeley A., & Hitch G. (1974). Working memory In Bower G. H. (Ed.), Psychology of learning and motivation (pp. 47–89). New York, NY: Academic Press. [ Google Scholar ]
  • Baddeley A., & Larsen J. D. (2007). The phonological loop: Some answers and some questions . The Quarterly Journal of Experimental Psychology , 60 , 512–518. 10.1080/17470210601147663 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baddeley A., Lewis V., & Vallar G. (1984). Exploring the articulatory loop . The Quarterly Journal of Experimental Psychology , 36 , 233–252. 10.1080/14640748408402157 [ CrossRef ] [ Google Scholar ]
  • Baddeley A., & Logie R. H. (1999). Working memory: The multiple component model In Miyake A. & Shah P. (Eds.), Models of working memory: Mechanisms of active maintenance and executive control (pp. 28–61). Cambridge, UK: Cambridge University Press; 10.1017/CBO9781139174909.005 [ CrossRef ] [ Google Scholar ]
  • Baddeley A., Thomson N., & Buchanan M. (1975). Word length and the structure of short-term memory . Journal of Verbal Learning and Verbal Behavior , 14 , 575–589. 10.1016/S0022-5371(75)80045-4 [ CrossRef ] [ Google Scholar ]
  • Baldo J. V., Bunge S. A., Wilson S. M., & Dronkers N. F. (2010). Is relational reasoning dependent on language? A voxel-based lesion symptom mapping study . Brain and Language , 113 , 59–64. 10.1016/j.bandl.2010.01.004 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baldo J. V., Dronkers N. F., Wilkins D., Ludy C., Raskin P., & Kim J. (2005). Is problem solving dependent on language? Brain and Language , 92 , 240–250. 10.1016/j.bandl.2004.06.103 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baron-Cohen S., Leslie A. M., & Frith U. (1985). Does the autistic child have a “theory of mind”? Cognition , 21 , 37–46. 10.1016/0010-0277(85)90022-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baron-Cohen S., Wheelwright S., Skinner R., Martin J., & Clubley E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians . Journal of Autism and Developmental Disorders , 31 , 5–17. 10.1023/A:1005653411471 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Behar E., Zuellig A. R., & Borkovec T. D. (2005). Thought and imaginal activity during worry and trauma recall . Behavior Therapy , 36 , 157–168. 10.1016/S0005-7894(05)80064-4 [ CrossRef ] [ Google Scholar ]
  • Bell V. (2013). A community of one: Social cognition and auditory verbal hallucinations . PLoS Biology , 11 , 10.1371/journal.pbio.1001723 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bentall R. P. (1990). The illusion of reality: A review and integration of psychological research on hallucinations . Psychological Bulletin , 107 , 82–95. 10.1037/0033-2909.107.1.82 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bergen B. K. (2012). Louder than words: The new science of how the mind makes meaning . New York, NY: Basic Civitas Books. [ Google Scholar ]
  • Berk L. (1986). Relationship of elementary school children’s private speech to behavioral accompaniment to task, attention, and task performance . Developmental Psychology , 22 , 671–680. 10.1037/0012-1649.22.5.671 [ CrossRef ] [ Google Scholar ]
  • Berk L. (1992). Children’s private speech: An overview of theory and the status of research In Diaz R. M. & Berk L. E. (Eds.), Private speech: From social interaction to self-regulation (pp. 17–53). Hillsdale, NJ: Erlbaum, Inc. [ Google Scholar ]
  • Berk L., & Garvin R. A. (1984). Development of private speech among low-income Appalachian children . Developmental Psychology , 20 , 271–286. 10.1037/0012-1649.20.2.271 [ CrossRef ] [ Google Scholar ]
  • Berk L. E., & Potts M. K. (1991). Development and functional significance of private speech among attention-deficit hyperactivity disordered and normal boys . Journal of Abnormal Child Psychology , 19 , 357–377. 10.1007/BF00911237 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borst G., Niven E., & Logie R. H. (2012). Visual mental image generation does not overlap with visual short-term memory: A dual-task interference study . Memory & Cognition , 40 , 360–372. 10.3758/s13421-011-0151-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brinthaupt T. M., Hein M. B., & Kramer T. E. (2009). The self-talk scale: Development, factor analysis, and validation . Journal of Personality Assessment , 91 , 82–92. 10.1080/00223890802484498 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brocklehurst P. H., & Corley M. (2011). Investigating the inner speech of people who stutter: Evidence for (and against) the covert repair hypothesis . Journal of Communication Disorders , 44 , 246–260. 10.1016/j.jcomdis.2010.11.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brookwell M. L., Bentall R. P., & Varese F. (2013). Externalizing biases and hallucinations in source-monitoring, self-monitoring and signal detection studies: A meta-analytic review . Psychological Medicine , 43 , 2465–2475. 10.1017/S0033291712002760 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown T. C., & Latham G. P. (2006). The effect of training in verbal self-guidance on performance effectiveness in a MBA program . Canadian Journal of Behavioural Science , 38 , 1–11. 10.1037/h0087266 [ CrossRef ] [ Google Scholar ]
  • Buckner R. L., Andrews-Hanna J. R., & Schacter D. L. (2008). The brain’s default network: Anatomy, function, and relevance to disease . Annals of the New York Academy of Sciences , 1124 , 1–38. 10.1196/annals.1440.011 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Burnett P. C. (1994). Self-talk in upper elementary school children: Its relationship with irrational beliefs, self-esteem, and depression . Journal of Rational-Emotive & Cognitive-Behavior Therapy , 12 , 181–188. 10.1007/BF02354595 [ CrossRef ] [ Google Scholar ]
  • Calvete E., Estévez A., Landín C., Martínez Y., Cardeñoso O., Villardón L., & Villa A. (2005). Self-talk and affective problems in college students: Valence of thinking and cognitive content specificity . The Spanish Journal of Psychology , 8 , 56–67. 10.1017/S1138741600004960 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Camos V., Mora G., & Barrouillet P. (2013). Phonological similarity effect in complex span task . The Quarterly Journal of Experimental Psychology , 66 , 1927–1950. 10.1080/17470218.2013.768275 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carruthers P. (2002). The cognitive functions of language . Behavioral and Brain Sciences , 25 , 657–674. 10.1017/S0140525X02000122 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cheetham J. M., Rahm B., Kaller C. P., & Unterrainer J. M. (2012). Visuospatial over verbal demands in predicting Tower of London planning tasks . British Journal of Psychology , 103 , 98–116. 10.1111/j.2044-8295.2011.02049.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chin J. M., & Schooler J. W. (2008). Why do words hurt? Content, process, and criterion shift accounts of verbal overshadowing . European Journal of Cognitive Psychology , 20 , 396–413. 10.1080/09541440701728623 [ CrossRef ] [ Google Scholar ]
  • Conrad R. (1971). The chronology of the development of covert speech in children . Developmental Psychology , 5 , 398–405. 10.1037/h0031595 [ CrossRef ] [ Google Scholar ]
  • Conrad R., & Hull A. J. (1964). Information, acoustic confusion and memory span . British Journal of Psychology , 55 , 429–432. 10.1111/j.2044-8295.1964.tb00928.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Corkum P., Humphries K., Mullane J. C., & Theriault F. (2008). Private speech in children with ADHD and their typically developing peers during problem-solving and inhibition tasks . Contemporary Educational Psychology , 33 , 97–115. 10.1016/j.cedpsych.2006.12.003 [ CrossRef ] [ Google Scholar ]
  • Corley M., Brocklehurst P. H., & Moat H. S. (2011). Error biases in inner and overt speech: Evidence from tongue twisters . Journal of Experimental Psychology: Learning, Memory, and Cognition , 37 , 162–175. 10.1037/a0021321 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cragg L., & Nation K. (2010). Language and the development of cognitive control . Topics in Cognitive Science , 2 , 631–642. 10.1111/j.1756-8765.2009.01080.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Csikszentmihalyi M., & Larson R. (1987). Validity and reliability of the experience-sampling method . Journal of Nervous and Mental Disease , 175 , 526–536. 10.1097/00005053-198709000-00004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Damianova M. K., Lucas M., & Sullivan G. B. (2012). Verbal mediation of problem solving in pre-primary and junior primary school children . South African Journal of Psychology , 42 , 445–455. 10.1177/008124631204200316 [ CrossRef ] [ Google Scholar ]
  • D’Argembeau A., Renaud O., & Van der Linden M. (2011). Frequency, characteristics and functions of future-oriented thoughts in daily life . Applied Cognitive Psychology , 25 , 96–103. 10.1002/acp.1647 [ CrossRef ] [ Google Scholar ]
  • Davis P. E., Meins E., & Fernyhough C. (2013). Individual differences in children’s private speech: The role of imaginary companions . Journal of Experimental Child Psychology , 116 , 561–571. 10.1016/j.jecp.2013.06.010 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Day K. L., & Smith C. L. (2013). Understanding the role of private speech in children’s emotion regulation . Early Childhood Research Quarterly , 28 , 405–414. 10.1016/j.ecresq.2012.10.003 [ CrossRef ] [ Google Scholar ]
  • de Guerrero M. C. M. (2005). Inner speech—L2: Thinking words in a second language . New York, NY: Springer; 10.1007/b106255 [ CrossRef ] [ Google Scholar ]
  • Delamillieure P., Doucet G., Mazoyer B., Turbelin M.-R., Delcroix N., Mellet E., et al.Joliot M. (2010). The resting state questionnaire: An introspective questionnaire for evaluation of inner experience during the conscious resting state . Brain Research Bulletin , 81 , 565–573. 10.1016/j.brainresbull.2009.11.014 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Diaz R. M., & Berk L. E. (1992). Private speech: From social interaction to self-regulation . Hillsdale, NJ: Erlbaum, Inc. [ Google Scholar ]
  • Diaz R. M., & Berk L. (1995). A Vygotskian critique of self-instructional training . Development and Psychopathology , 7 , 369–392. 10.1017/S0954579400006568 [ CrossRef ] [ Google Scholar ]
  • Diederen K. M. J., De Weijer A. D., Daalman K., Blom J. D., Neggers S. F. W., Kahn R. S., & Sommer I. E. C. (2010). Decreased language lateralization is characteristic of psychosis, not auditory hallucinations . Brain: A Journal of Neurology , 133 , 3734–3744. 10.1093/brain/awq313 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Diederen K. M. J., Neggers S. F. W., Daalman K., Blom J. D., Goekoop R., Kahn R. S., & Sommer I. E. C. (2010). Deactivation of the parahippocampal gyrus preceding auditory hallucinations in schizophrenia . The American Journal of Psychiatry , 167 , 427–435. 10.1176/appi.ajp.2009.09040456 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dodgson G., & Gordon S. (2009). Avoiding false negatives: Are some auditory hallucinations an evolved design flaw? Behavioural and Cognitive Psychotherapy , 37 , 325–334. 10.1017/S1352465809005244 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dolcos S., & Albarracín D. (2014). The inner speech of behavioral regulation: Intentions and task performance strengthen when you talk to yourself as a you . European Journal of Social Psychology , 44 , 636–642. 10.1002/ejsp.2048 [ CrossRef ] [ Google Scholar ]
  • Doucet G., Naveau M., Petit L., Delcroix N., Zago L., Crivello F., et al.Joliot M. (2011). Brain activity at rest: A multiscale hierarchical functional organization . Journal of Neurophysiology , 105 , 2753–2763. 10.1152/jn.00895.2010 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Doucet G., Naveau M., Petit L., Zago L., Crivello F., Jobard G., et al.Joliot M. (2012). Patterns of hemodynamic low-frequency oscillations in the brain are modulated by the nature of free thought during rest . NeuroImage , 59 , 3194–3200. 10.1016/j.neuroimage.2011.11.059 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • du Feu M., & McKenna P. J. (1999). Prelingually profoundly deaf schizophrenic patients who hear voices: A phenomenological analysis . Acta Psychiatrica Scandinavica , 99 , 453–459. 10.1111/j.1600-0447.1999.tb00992.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Duncan R. M., & Cheyne J. A. (1999). Incidence and functions of self-reported private speech in young adults: A self-verbalization questionnaire . Canadian Journal of Behavioural Science , 31 , 133–136. 10.1037/h0087081 [ CrossRef ] [ Google Scholar ]
  • Duncan R. M., & Cheyne J. A. (2001). Private speech in young adults: Task difficulty, self-regulation, and psychological predication . Cognitive Development , 16 , 889–906. 10.1016/S0885-2014(01)00069-7 [ CrossRef ] [ Google Scholar ]
  • Duncan R. M., & Tarulli D. (2009). On the persistence of private speech: Empirical and theoretical considerations In Winsler A., Fernyhough C., & Montero I. (Eds.), Private speech, executive functioning, and the development of verbal self-regulation (pp. 176–187). Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511581533.015 [ CrossRef ] [ Google Scholar ]
  • Emerson M. J., & Miyake A. (2003). The role of inner speech in task switching: A dual-task investigation . Journal of Memory and Language , 48 , 148–168. 10.1016/S0749-596X(02)00511-9 [ CrossRef ] [ Google Scholar ]
  • Fatzer S. T., & Roebers C. M. (2012). Language and executive functions: The effect of articulatory suppression on executive functioning in children . Journal of Cognition and Development , 13 , 454–472. 10.1080/15248372.2011.608322 [ CrossRef ] [ Google Scholar ]
  • Fegen D., Buchsbaum B. R., & D’Esposito M. (2015). The effect of rehearsal rate and memory load on verbal working memory . NeuroImage , 105 , 120–131. 10.1016/j.neuroimage.2014.10.034 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Feinberg I. (1978). Efference copy and corollary discharge: Implications for thinking and its disorders . Schizophrenia Bulletin , 4 , 636–640. 10.1093/schbul/4.4.636 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fell J. (2013). Unraveling inner experiences during resting state . Frontiers in Human Neuroscience , 7 , 409 10.3389/fnhum.2013.00409 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fernyhough C. (1996). The dialogic mind: A dialogic approach to the higher mental functions . New Ideas in Psychology , 14 , 47–62. 10.1016/0732-118X(95)00024-B [ CrossRef ] [ Google Scholar ]
  • Fernyhough C. (2004). Alien voices and inner dialogue: Towards a developmental account of auditory verbal hallucinations . New Ideas in Psychology , 22 , 49–68. 10.1016/j.newideapsych.2004.09.001 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C. (2008). Getting Vygotskian about theory of mind: Mediation, dialogue, and the development of social understanding . Developmental Review , 28 , 225–262. 10.1016/j.dr.2007.03.001 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C. (2009a). Dialogic thinking In Winsler A., Fernyhough C., & Montero I. (Eds.), Private speech, executive functioning, and the development of verbal self-regulation (pp. 42–52). Cambridge, UK: Cambridge University Press. [ Google Scholar ]
  • Fernyhough C. (2009b). What can we say about the inner experience of the young child? (Commentary on Carruthers) . Behavioral and Brain Sciences , 32 , 143–144. 10.1017/S0140525X09000612 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C. (2010). Vygotsky, Luria, and the social brain In Sokol B., Müller U., Carpendale J., Young A., & Iarocci G. (Eds.), Self- and social-regulation: Exploring the relations between social interaction, social cognition, and the development of executive functions (pp. 56–79). Oxford, UK: OUP; 10.1093/acprof:oso/9780195327694.003.0003 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C. (2013). Inner speech In Pashler H. (Ed.), The encyclopedia of the mind (Vol. 9 , pp. 418–420). Thousand Oaks, CA: Sage Publications; 10.4135/9781452257044.n155 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C., & Fradley E. (2005). Private speech on an executive task: Relations with task difficulty and task performance . Cognitive Development , 20 , 103–120. 10.1016/j.cogdev.2004.11.002 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C., & McCarthy-Jones S. (2013). Thinking aloud about mental voices In Macpherson F. & Platchias D. (Eds.), Hallucination: Philosophy and psychology (pp. 87–104). Cambridge, MA: MIT Press; 10.7551/mitpress/9780262019200.003.0005 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C., & Meins E. (2009). Private speech and theory of mind: Evidence for developing interfunctional relations In Winsler A., Fernyhough C., & Montero I. (Eds.), Private speech, executive functioning, and the development of verbal self-regulation (pp. 95–104). Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511581533.008 [ CrossRef ] [ Google Scholar ]
  • Fernyhough C., & Russell J. (1997). Distinguishing one’s own voice from those of others: A function for private speech? International Journal of Behavioral Development , 20 , 651–665. 10.1080/016502597385108 [ CrossRef ] [ Google Scholar ]
  • Filik R., & Barber E. (2011). Inner speech during silent reading reflects the reader’s regional accent . PLoS ONE , 6 , 10.1371/journal.pone.0025782 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flavell J. H., Flavell E. R., & Green F. L. (2001). Development of children’s understanding of connections between thinking and feeling . Psychological Science , 12 , 430–432. 10.1111/1467-9280.00379 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flavell J. H., Green F. L., & Flavell E. R. (1993). Children’s understanding of the stream of consciousness . Child Development , 64 , 387–398. 10.2307/1131257 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ford J. M., Dierks T., Fisher D. J., Herrmann C. S., Hubl D., Kindler J., et al.van Lutterveld R. (2012). Neurophysiological studies of auditory verbal hallucinations . Schizophrenia Bulletin , 38 , 715–723. 10.1093/schbul/sbs009 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ford J. M., & Mathalon D. H. (2004). Electrophysiological evidence of corollary discharge dysfunction in schizophrenia during talking and thinking . Journal of Psychiatric Research , 38 , 37–46. 10.1016/S0022-3956(03)00095-5 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Forgeot d’Arc B., & Ramus F. (2011). Belief attribution despite verbal interference . The Quarterly Journal of Experimental Psychology , 64 , 975–990. 10.1080/17470218.2010.524413 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frith C. (1992). The cognitive neuropsychology of schizophrenia . Hove, UK: Psychology Press. [ Google Scholar ]
  • Frith C. (2012). Explaining delusions of control: The comparator model 20 years on . Consciousness and Cognition , 21 , 52–54. 10.1016/j.concog.2011.06.010 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Furrow D. (1992). Developmental trends in the differentiation of social and private speech In Diaz R. M. & Berk L. E. (Eds.), Private speech: From social interaction to self-regulation (pp. 143–158). Hove, UK: Erlbaum. [ Google Scholar ]
  • Furth H. G. (1964). Research with the deaf: Implications for language and cognition . Psychological Bulletin , 62 , 145–164. 10.1037/h0046080 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gathercole S. E. (1998). The development of memory . Journal of Child Psychology and Psychiatry , 39 , 3–27. 10.1017/S0021963097001753 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gathercole S. E., & Hitch G. J. (1993). Developmental changes in short-term memory: A revised working memory perspective In Collins A. F., Gathercole S. E., Conway M. A., & Morris P. E. (Eds.), Theories of memory (pp. 189–209). Hillsdale, NJ: Erlbaum. [ Google Scholar ]
  • Gathercole S. E., Pickering S. J., Ambridge B., & Wearing H. (2004). The structure of working memory from 4 to 15 years of age . Developmental Psychology , 40 , 177–190. 10.1037/0012-1649.40.2.177 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Geva S., Bennett S., Warburton E. A., & Patterson K. (2011). Discrepancy between inner and overt speech: Implications for post-stroke aphasia and normal language processing . Aphasiology , 25 , 323–343. 10.1080/02687038.2010.511236 [ CrossRef ] [ Google Scholar ]
  • Geva S., Jones P. S., Crinion J. T., Price C. J., Baron J.-C., & Warburton E. A. (2011). The neural correlates of inner speech defined by voxel-based lesion-symptom mapping . Brain: A Journal of Neurology , 134 , 3071–3082. 10.1093/brain/awr232 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gilhooly K. J. (2005). Working memory and strategies in reasoning In Newton E. & Roberts M. (Eds.), Methods of thought: Individual differences in reasoning strategies (pp. 57–80). Hove, UK: Psychology Press. [ Google Scholar ]
  • Gilhooly K. J., Wynn V., Phillips L. H., Logie R., & Della Sala S. (2002). Visuo-spatial and verbal working memory in the five-disc Tower of London task: An individual differences approach . Thinking & Reasoning , 8 , 165–178. 10.1080/13546780244000006 [ CrossRef ] [ Google Scholar ]
  • Gorgolewski K. J., Lurie D., Urchs S., Kipping J. A., Craddock R. C., Milham M. P., et al.Smallwood J. (2014). A correspondence between individual differences in the brain’s intrinsic functional architecture and the content and form of self-generated thoughts . PLoS ONE , 9 , 10.1371/journal.pone.0097176 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Goudena P. P. (1987). The social nature of private speech of preschoolers during problem solving . International Journal of Behavioral Development , 10 , 187–206. 10.1177/016502548701000204 [ CrossRef ] [ Google Scholar ]
  • Goudena P. P. (1992). The problem of abbreviation and internalization of private speech In Diaz R. M. & Berk L. (Eds.), Private speech: From social interaction to self-regulation (pp. 215–224). Hillsdale, NJ: Erlbaum. [ Google Scholar ]
  • Gould S., & Vrba E. (1982). Exaptation: A missing term in the science of form . Paleobiology , 8 , 4–15. [ Google Scholar ]
  • Grandin T. (1995). How people with autism think In Schopler E. & Mesibov G. B. (Eds.), Learning and cognition in autism (pp. 137–156). New York, NY: Springer; 10.1007/978-1-4899-1286-2_8 [ CrossRef ] [ Google Scholar ]
  • Grush R. (2004). The emulation theory of representation: Motor Control, imagery, and perception . Behavioral and Brain Sciences , 27 , 377–396. 10.1017/S0140525X04000093 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hardy J. (2006). Speaking clearly: A critical review of the self-talk literature . Psychology of Sport and Exercise , 7 , 81–97. 10.1016/j.psychsport.2005.04.002 [ CrossRef ] [ Google Scholar ]
  • Hardy J., Hall C. R., & Hardy L. (2005). Quantifying athlete self-talk . Journal of Sports Sciences , 23 , 905–917. 10.1080/02640410500130706 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hartsuiker R. J., & Kolk H. H. J. (2001). Error monitoring in speech production: A computational test of the perceptual loop theory . Cognitive Psychology , 42 , 113–157. 10.1006/cogp.2000.0744 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hatzigeorgiadis A., Zourbanos N., Mpoumpaki S., & Theodorakis Y. (2009). Mechanisms underlying the self-talk–performance relationship: The effects of motivational self-talk on self-confidence and anxiety . Psychology of Sport and Exercise , 10 , 186–192. 10.1016/j.psychsport.2008.07.009 [ CrossRef ] [ Google Scholar ]
  • Heaton R. K. (1993). Wisconsin card sorting test manual . Odessa, FL: Psychological Assessment Resources. [ Google Scholar ]
  • Henry L. A., Messer D., Luger-Klein S., & Crane L. (2012). Phonological, visual, and semantic coding strategies and children’s short-term picture memory span . The Quarterly Journal of Experimental Psychology , 65 , 2033–2053. 10.1080/17470218.2012.672997 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hermer-Vazquez L., Spelke E. S., & Katsnelson A. S. (1999). Sources of flexibility in human cognition: Dual-task studies of space and language . Cognitive Psychology , 39 , 3–36. 10.1006/cogp.1998.0713 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hickok G. (2012). Computational neuroanatomy of speech production . Nature Reviews Neuroscience , 13 , 135–145. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hickok G., & Poeppel D. (2007). The cortical organization of speech processing . Nature Reviews Neuroscience , 8 , 393–402. 10.1038/nrn2113 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hitch G. J., Halliday M. S., Schaafstal A. M., & Heffernan T. M. (1991). Speech, “inner speech,” and the development of short-term memory: Effects of picture labeling on recall . Journal of Experimental Child Psychology , 51 , 220–234. 10.1016/0022-0965(91)90033-O [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hoffman R. E., Fernandez T., Pittman B., & Hampson M. (2011). Elevated functional connectivity along a corticostriatal loop and the mechanism of auditory/verbal hallucinations in patients with schizophrenia . Biological Psychiatry , 69 , 407–414. 10.1016/j.biopsych.2010.09.050 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Holland L., & Low J. (2010). Do children with autism use inner speech and visuospatial resources for the service of executive control? Evidence from suppression in dual tasks . The British Journal of Developmental Psychology , 28 , 369–391. 10.1348/026151009X424088 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Holmes E. A., Lang T. J., & Shah D. M. (2009). Developing interpretation bias modification as a “cognitive vaccine” for depressed mood: Imagining positive events makes you feel better than thinking about them verbally . Journal of Abnormal Psychology , 118 , 76–88. 10.1037/a0012590 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Holmes E. A., Mathews A., Dalgleish T., & Mackintosh B. (2006). Positive interpretation training: Effects of mental imagery versus verbal training on positive mood . Behavior Therapy , 37 , 237–247. 10.1016/j.beth.2006.02.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hubbard T. L. (2010). Auditory imagery: Empirical findings . Psychological Bulletin , 136 , 302–329. 10.1037/a0018436 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hubbard T. L. (2013). Auditory imagery contains more than audition In Lacey S. & Lawson R. (Eds.), Multisensory imagery (pp. 221–248). New York, NY: Springer; 10.1007/978-1-4614-5879-1_12 [ CrossRef ] [ Google Scholar ]
  • Hubbard T. L., & Stoeckig K. (1988). Musical imagery: Generation of tones and chords . Journal of Experimental Psychology: Learning, Memory, and Cognition , 14 , 656–667. 10.1037/0278-7393.14.4.656 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hughes C., Russell J., & Robbins T. W. (1994). Evidence for executive dysfunction in autism . Neuropsychologia , 32 , 477–492. 10.1016/0028-3932(94)90092-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hurlburt R. T., Happé F., & Frith U. (1994). Sampling the form of inner experience in three adults with Asperger syndrome . Psychological Medicine , 24 , 385–395. 10.1017/S0033291700027367 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hurlburt R. T., & Heavey C. L. (2006). Exploring inner experience: The Descriptive experience sampling method . Amsterdam, The Netherlands: John Benjamins Publishing Company; 10.1075/aicr.64 [ CrossRef ] [ Google Scholar ]
  • Hurlburt R. T., & Heavey C. L. (2015). Investigating pristine inner experience: Implications for experience sampling and questionnaires . Consciousness and Cognition , 31 , 148–159. 10.1016/j.concog.2014.11.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hurlburt R. T., Heavey C. L., & Kelsey J. M. (2013). Toward a phenomenology of inner speaking . Consciousness and Cognition , 22 , 1477–1494. 10.1016/j.concog.2013.10.003 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hurlburt R. T., & Schwitzgebel E. (2007). Describing inner experience? Proponent meets skeptic . Cambridge, MA: MIT Press. [ Google Scholar ]
  • Jack A. I. (2004). Trusting the subject? The use of introspective evidence in cognitive science . Thorverton, UK: Imprint Academic. [ Google Scholar ]
  • Jardri R., Pouchet A., Pins D., & Thomas P. (2011). Cortical activations during auditory verbal hallucinations in schizophrenia: A coordinate-based meta-analysis . The American Journal of Psychiatry , 168 , 73–81. 10.1176/appi.ajp.2010.09101522 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jarrold C., & Citroën R. (2013). Reevaluating key evidence for the development of rehearsal: Phonological similarity effects in children are subject to proportional scaling artifacts . Developmental Psychology , 49 , 837–847. 10.1037/a0028771 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jarrold C., & Tam H. (2010). Rehearsal and the development of working memory In Barrouillet P. & Gaillard V. (Eds.), Cognitive development and working memory: A dialogue between neo-Piagetian and cognitive approaches (pp. 177–199). Hove, UK: Psychology Press. [ Google Scholar ]
  • Johns L. C., Kompus K., Connell M., Humpston C., Lincoln T. M., Longden E., et al.Larøi F. (2014). Auditory verbal hallucinations in persons with and without a need for care . Schizophrenia Bulletin , 40 , S255–S264. 10.1093/schbul/sbu005 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones S. R. (2010). Do we need multiple models of auditory verbal hallucinations? Examining the phenomenological fit of cognitive and neurological models . Schizophrenia Bulletin , 36 , 566–575. 10.1093/schbul/sbn129 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones S. R., & Fernyhough C. (2007). Neural correlates of inner speech and auditory verbal hallucinations: A critical review and theoretical integration . Clinical Psychology Review , 27 , 140–154. 10.1016/j.cpr.2006.10.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kendall P. C., & Treadwell K. R. H. (2007). The role of self-statements as a mediator in treatment for youth with anxiety disorders . Journal of Consulting and Clinical Psychology , 75 , 380–389. 10.1037/0022-006X.75.3.380 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Klinger E., & Cox W. M. (1987). Dimensions of thought flow in everyday life . Imagination, Cognition and Personality , 7 , 105–128. 10.2190/7K24-G343-MTQW-115V [ CrossRef ] [ Google Scholar ]
  • Kohlberg L., Yaeger J., & Hjertholm E. (1968). Private speech: Four studies and a review of theories . Child Development , 39 , 691–736. 10.2307/1126979 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kopecky H., Chang H. T., Klorman R., Thatcher J. E., & Borgstedt A. D. (2005). Performance and private speech of children with attention-deficit/hyperactivity disorder while taking the Tower of Hanoi test: Effects of depth of search, diagnostic subtype, and methylphenidate . Journal of Abnormal Child Psychology , 33 , 625–638. 10.1007/s10802-005-6742-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kosslyn S. M., Reiser B. J., Farah M. J., & Fliegel S. L. (1983). Generating visual images: Units and relations . Journal of Experimental Psychology: General , 112 , 278–303. 10.1037/0096-3445.112.2.278 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Koster-Hale J., & Saxe R. (2013). Theory of mind: A neural prediction problem . Neuron , 79 , 836–848. 10.1016/j.neuron.2013.08.020 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kraemer D. J. M., Macrae C. N., Green A. E., & Kelley W. M. (2005). Musical imagery: Sound of silence activates auditory cortex . Nature , 434 , 158 10.1038/434158a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Krashen S. D. (1983). The din in the head, input, and the language acquisition device . Foreign Language Annals , 16 , 41–44. 10.1111/j.1944-9720.1983.tb01422.x [ CrossRef ] [ Google Scholar ]
  • Kray J., Eber J., & Karbach J. (2008). Verbal self-instructions in task switching: A compensatory tool for action-control deficits in childhood and old age? Developmental Science , 11 , 223–236. 10.1111/j.1467-7687.2008.00673.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kray J., Gaspard H., Karbach J., & Blaye A. (2013). Developmental changes in using verbal self-cueing in task-switching situations: The impact of task practice and task-sequencing demands . Frontiers in Psychology , 4 , 940 10.3389/fpsyg.2013.00940 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kray J., Kipp K. H., & Karbach J. (2009). The development of selective inhibitory control: The influence of verbal labeling . Acta Psychologica , 130 , 48–57. 10.1016/j.actpsy.2008.10.006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kühn S., & Gallinat J. (2012). Quantitative meta-analysis on state and trait aspects of auditory verbal hallucinations in schizophrenia . Schizophrenia Bulletin , 38 , 779–786. 10.1093/schbul/sbq152 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kühn S., Fernyhough C., Alderson-Day B., & Hurlburt R. T. (2014). Inner experience in the scanner: can high fidelity apprehensions of inner experience be integrated with fMRI? Frontiers in Psychology , 5 , 1393 10.3389/fpsyg.2014.01393 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kunda M., & Goel A. K. (2010). Thinking in pictures as a cognitive account of autism . Journal of Autism and Developmental Disorders , 41 , 1157–1177. 10.1007/s10803-010-1137-1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kuvalja M., Verma M., & Whitebread D. (2014). Patterns of co-occurring non-verbal behaviour and self-directed speech; a comparison of three methodological approaches . Metacognition and Learning , 9 , 87–111. 10.1007/s11409-013-9106-7 [ CrossRef ] [ Google Scholar ]
  • Langland-Hassan P. (2008). Fractured phenomenologies: Thought insertion, inner speech, and the puzzle of extraneity . Mind & Language , 23 , 369–401. 10.1111/j.1468-0017.2008.00348.x [ CrossRef ] [ Google Scholar ]
  • Larøi F., Sommer I. E., Blom J. D., Fernyhough C., ffytche D. H., Hugdahl K., et al.Waters F. (2012). The characteristic features of auditory verbal hallucinations in clinical and nonclinical groups: State-of-the-art overview and future directions . Schizophrenia Bulletin , 38 , 724–733. 10.1093/schbul/sbs061 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Larrain A., & Haye A. (2012). The discursive nature of inner speech . Theory & Psychology , 22 , 3–22. 10.1177/0959354311423864 [ CrossRef ] [ Google Scholar ]
  • Larsen S. F., Schrauf R. W., Fromholt P., & Rubin D. C. (2002). Inner speech and bilingual autobiographical memory: A Polish-Danish cross-cultural study . Memory , 10 , 45–54. 10.1080/09658210143000218 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Law A. S., Trawley S. L., Brown L. A., Stephens A. N., & Logie R. H. (2013). The impact of working memory load on task execution and online plan adjustment during multitasking in a virtual environment . The Quarterly Journal of Experimental Psychology , 66 , 1241–1258. 10.1080/17470218.2012.748813 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Leudar I., & Thomas P. (2000). Voices of reason, voices of insanity: Studies of verbal hallucinations . Hove, UK: Psychology Press. [ Google Scholar ]
  • Levine D. N., Calvanio R., & Popovics A. (1982). Language in the absence of inner speech . Neuropsychologia , 20 , 391–409. 10.1016/0028-3932(82)90039-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lidstone J. S., Fernyhough C., Meins E., & Whitehouse A. J. O. (2009). Brief report: Inner speech impairment in children with autism is associated with greater nonverbal than verbal skills . Journal of Autism and Developmental Disorders , 39 , 1222–1225. 10.1007/s10803-009-0731-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lidstone J. S., Meins E., & Fernyhough C. (2010). The roles of private speech and inner speech in planning during middle childhood: Evidence from a dual task paradigm . Journal of Experimental Child Psychology , 107 , 438–451. 10.1016/j.jecp.2010.06.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lidstone J., Meins E., & Fernyhough C. (2011). Individual differences in children’s private speech: Consistency across tasks, timepoints, and contexts . Cognitive Development , 26 , 203–213. 10.1016/j.cogdev.2011.02.002 [ CrossRef ] [ Google Scholar ]
  • Lidstone J. S., Meins E., & Fernyhough C. (2012). Verbal mediation of cognition in children with specific language impairment . Development and Psychopathology , 24 , 651–660. 10.1017/S0954579412000223 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Loftus E. F., & Palmer J. C. (1974). Reconstruction of automobile destruction: An example of the interaction between language and memory . Journal of Verbal Learning and Verbal Behavior , 13 , 585–589. 10.1016/S0022-5371(74)80011-3 [ CrossRef ] [ Google Scholar ]
  • Logie R. H., Zucco G. M., & Baddeley A. D. (1990). Interference with visual short-term memory . Acta Psychologica , 75 , 55–74. 10.1016/0001-6918(90)90066-O [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lord C., Risi S., Lambrecht L., Cook E. H. Jr., Leventhal B. L., DiLavore P. C., et al.Rutter M. (2000). The autism diagnostic observation schedule-generic: A standard measure of social and communication deficits associated with the spectrum of autism . Journal of Autism and Developmental Disorders , 30 , 205–223. 10.1023/A:1005592401947 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lupyan G. (2009). Extracommunicative functions of language: Verbal interference causes selective categorization impairments . Psychonomic Bulletin & Review , 16 , 711–718. 10.3758/PBR.16.4.711 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Luria A. R. (1965). L. S. Vygotsky and the problem of localization of functions . Neuropsychologia , 3 , 387–392. 10.1016/0028-3932(65)90012-6 [ CrossRef ] [ Google Scholar ]
  • Machery E. (2005). You don’t know how you think: Introspection and language of thought . The British Journal for the Philosophy of Science , 56 , 469–485. 10.1093/bjps/axi130 [ CrossRef ] [ Google Scholar ]
  • Macnamara B. N., Moore A. B., & Conway A. R. A. (2011). Phonological similarity effects in simple and complex span tasks . Memory & Cognition , 39 , 1174–1186. 10.3758/s13421-011-0100-5 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mani N., & Plunkett K. (2010). In the infant’s mind’s ear: Evidence for implicit naming in 18-month-olds . Psychological Science , 21 , 908–913. 10.1177/0956797610373371 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marschark M. (2006). Intellectual functioning of deaf adults and children: Answers and questions . European Journal of Cognitive Psychology , 18 , 70–89. 10.1080/09541440500216028 [ CrossRef ] [ Google Scholar ]
  • Marvel C. L., & Desmond J. E. (2010). Functional topography of the cerebellum in verbal working memory . Neuropsychology Review , 20 , 271–279. 10.1007/s11065-010-9137-7 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marvel C. L., & Desmond J. E. (2012). From storage to manipulation: How the neural correlates of verbal working memory reflect varying demands on inner speech . Brain and Language , 120 , 42–51. 10.1016/j.bandl.2011.08.005 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCarthy-Jones S., & Fernyhough C. (2011). The varieties of inner speech: Links between quality of inner speech and psychopathological variables in a sample of young adults . Consciousness and Cognition , 20 , 1586–1593. 10.1016/j.concog.2011.08.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCarthy-Jones S., Knowles R., & Rowse G. (2012). More than words? Hypomanic personality traits, visual imagery and verbal thought in young adults . Consciousness and Cognition , 21 , 1375–1381. 10.1016/j.concog.2012.07.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McGonigle-Chalmers M., Slater H., & Smith A. (2014). Rethinking private speech in preschoolers: The effects of social presence . Developmental Psychology , 50 , 829–836. 10.1037/a0033909 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McGuire P. K., Murray R. M., & Shah G. M. (1993). Increased blood flow in Broca’s area during auditory hallucinations in schizophrenia . Lancet , 342 , 703–706. 10.1016/0140-6736(93)91707-S [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McGuire P. K., Silbersweig D. A., Murray R. M., David A. S., Frackowiak R. S., & Frith C. D. (1996). Functional anatomy of inner speech and auditory verbal imagery . Psychological Medicine , 26 , 29–38. 10.1017/S0033291700033699 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McGuire P. K., David A. S., Murray R. M., Frackowiak R. S., Frith C. D. Wright I., & Silbersweig D. A. (1995). Abnormal monitoring of inner speech: A physiological basis for auditory hallucinations . Lancet , 346 , 596–600. 10.1016/S0140-6736(95)91435-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McNorgan C. (2012). A meta-analytic review of multisensory imagery identifies the neural correlates of modality-specific and modality-general imagery . Frontiers in Human Neuroscience , 6 , 295 10.3389/fnhum.2012.00285 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meissner C. A., & Brigham J. C. (2001). A meta-analysis of the verbal overshadowing effect in face identification . Applied Cognitive Psychology , 15 , 603–616. 10.1002/acp.728 [ CrossRef ] [ Google Scholar ]
  • Michie P. T., Badcock J. C., Waters F. A. V., & Maybery M. T. (2005). Auditory hallucinations: Failure to inhibit irrelevant memories . Cognitive Neuropsychiatry , 10 , 125–136. 10.1080/13546800344000363 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Miyake A., Emerson M. J., Padilla F., & Ahn J. C. (2004). Inner speech as a retrieval aid for task goals: The effects of cue type and articulatory suppression in the random task cuing paradigm . Acta Psychologica , 115 , 123–142. 10.1016/j.actpsy.2003.12.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Miyake A., & Shah P. (Eds.). (1999). Models of working memory: Mechanisms of active maintenance and executive control . Cambridge, UK: Cambridge University Press; 10.1017/CBO9781139174909 [ CrossRef ] [ Google Scholar ]
  • Montague M., & Applegate B. (1993). Middle school students’ mathematical problem solving: An analysis of think-aloud protocols . Learning Disability Quarterly , 16 , 19–32. 10.2307/1511157 [ CrossRef ] [ Google Scholar ]
  • Morin A. (2005). Possible links between self-awareness and inner speech: Theoretical background, underlying mechanisms, and empirical evidence . Journal of Consciousness Studies , 12 , 115–134. [ Google Scholar ]
  • Morin A. (2009a). Inner speech In Bayne T., Cleeremans A., & Wilken P. (Eds.), Oxford Companion to Consciousness (pp. 380–382). Oxford, UK: Oxford Press. [ Google Scholar ]
  • Morin A. (2009b). Self-awareness deficits following loss of inner speech: Dr. Jill Bolte Taylor’s case study . Consciousness and Cognition , 18 , 524–529. 10.1016/j.concog.2008.09.008 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morin A., Uttl B., & Hamper B. (2011). Self-reported frequency, content, and functions of inner speech . Procedia: Social and Behavioral Sciences , 30 , 1714–1718. 10.1016/j.sbspro.2011.10.331 [ CrossRef ] [ Google Scholar ]
  • Moritz S., Hörmann C. C., Schröder J., Berger T., Jacob G. A., Meyer B., et al.Klein J. P. (2014). Beyond words: Sensory properties of depressive thoughts . Cognition and Emotion , 28 , 1047–1056. 10.1080/02699931.2013.868342 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Moseley P., Fernyhough C., & Ellison A. (2013). Auditory verbal hallucinations as atypical inner speech monitoring, and the potential of neurostimulation as a treatment option . Neuroscience and Biobehavioral Reviews , 37 , 2794–2805. 10.1016/j.neubiorev.2013.10.001 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Moseley P., & Wilkinson S. (2014). Inner speech is not so simple: A commentary on Cho and Wu (2013) . Frontiers in Psychiatry , 5 , 42 10.3389/fpsyt.2014.00042 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Müller U., Jacques S., Brocki K., & Zelazo P. D. (2009). The executive functions of language in preschool children In Winsler A., Fernyhough C., & Montero I. (Eds.), Private speech, executive functioning, and the development of verbal self-regulation (pp. 53–68). Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511581533.005 [ CrossRef ] [ Google Scholar ]
  • Newby J. M., & Moulds M. L. (2012). A comparison of the content, themes, and features of intrusive memories and rumination in major depressive disorder . British Journal of Clinical Psychology , 51 , 197–205. 10.1111/j.2044-8260.2011.02020.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Newton A. M., & de Villiers J. G. (2007). Thinking while talking: Adults fail nonverbal false-belief reasoning . Psychological Science , 18 , 574–579. 10.1111/j.1467-9280.2007.01942.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nolen-Hoeksema S. (2004). The response styles theory In Papageorgiou C. & Wells A. (Eds.), Depressive rumination: Nature, theory and treatment (pp. 107–124). Chichester, UK: Wiley. [ Google Scholar ]
  • Nooteboom S. G. (2005). Lexical bias revisited: Detecting, rejecting and repairing speech errors in inner speech . Speech Communication , 47 , 43–58. 10.1016/j.specom.2005.02.003 [ CrossRef ] [ Google Scholar ]
  • Oléron P. (1953). Conceptual thinking of the deaf . American Annals of the Deaf , 98 , 304–310. [ Google Scholar ]
  • Oliver E. J., Markland D., & Hardy J. (2010). Interpretation of self-talk and post-lecture affective states of higher education students: A self-determination theory perspective . British Journal of Educational Psychology , 80 , 307–323. 10.1348/000709909X477215 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Olszewski P. (1987). Individual differences in preschool children’s production of verbal fantasy play . Merrill-Palmer Quarterly , 33 , 69–86. [ Google Scholar ]
  • Oppenheim G. M. (2013). Inner speech as a forward model? Behavioral and Brain Sciences , 36 , 369–370. 10.1017/S0140525X12002798 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Oppenheim G. M., & Dell G. S. (2008). Inner speech slips exhibit lexical bias, but not the phonemic similarity effect . Cognition , 106 , 528–537. 10.1016/j.cognition.2007.02.006 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Oppenheim G. M., & Dell G. S. (2010). Motor movement matters: The flexible abstractness of inner speech . Memory & Cognition , 38 , 1147–1160. 10.3758/MC.38.8.1147 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ozonoff S., Pennington B. F., & Rogers S. J. (1991). Executive function deficits in high-functioning autistic individuals: Relationship to theory of mind . Child Psychology and Psychiatry, and Allied Disciplines , 32 , 1081–1105. 10.1111/j.1469-7610.1991.tb00351.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Paivio A. (1991). Dual coding theory: Retrospect and current status . Canadian Journal of Psychology , 45 , 255–287. 10.1037/h0084295 [ CrossRef ] [ Google Scholar ]
  • Pedersen N., & Nielsen R. (2013). Auditory hallucinations in a deaf patient: A case report . Case Reports in Psychiatry , 2013 , 659–698. 10.1155/2013/659698 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Perrone-Bertolotti M., Rapin L., Lachaux J.-P., Baciu M., & Lœvenbruck H. (2014). What is that little voice inside my head? Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring . Behavioural Brain Research , 261 , 220–239. 10.1016/j.bbr.2013.12.034 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Phillips L. H., Wynn V., Gilhooly K. J., Della Sala S., & Logie R. H. (1999). The role of memory in the Tower of London task . Memory , 7 , 209–231. 10.1080/741944066 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Piaget J. (1959). The language and thought of the child . Hove, UK: Psychology Press. [ Google Scholar ]
  • Pickering M. J., & Garrod S. (2013). An integrated theory of language production and comprehension . Behavioral and Brain Sciences , 36 , 329–347. 10.1017/S0140525X12001495 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pickering S. J. (2001). The development of visuo-spatial working memory . Memory , 9 , 423–432. 10.1080/09658210143000182 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Plato (undated/1987). Theaetetus (Waterford R. H., Trans.). London, UK: Penguin. [ Google Scholar ]
  • Price C. J. (2012). A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading . NeuroImage , 62 , 816–847. 10.1016/j.neuroimage.2012.04.062 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Raichle M. E., MacLeod A. M., Snyder A. Z., Powers W. J., Gusnard D. A., & Shulman G. L. (2001). A default mode of brain function . Proceedings of the National Academy of Sciences of the United States of America , 98 , 676–682. 10.1073/pnas.98.2.676 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rao K. V., & Baddeley A. (2013). Raven’s matrices and working memory: A dual-task approach . The Quarterly Journal of Experimental Psychology , 66 , 1881–1887. 10.1080/17470218.2013.828314 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rosenzweig C., Krawec J., & Montague M. (2011). Metacognitive strategy use of eighth-grade students with and without learning disabilities during mathematical problem solving: A think-aloud analysis . Journal of Learning Disabilities , 44 , 508–520. 10.1177/0022219410378445 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Russell J. (1997). Autism as an executive disorder . Oxford, UK: Oxford University Press. [ Google Scholar ]
  • Russell J., Jarrold C., & Hood B. (1999). Two intact executive capacities in children with autism: Implications for the core executive dysfunctions in the disorder . Journal of Autism and Developmental Disorders , 29 , 103–112. 10.1023/A:1023084425406 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Russell-Smith S. N., Comerford B. J. E., Maybery M. T., & Whitehouse A. J. O. (2014). Further evidence for a link between inner speech limitations and executive function in high-functioning children with autism spectrum disorders . Journal of Autism and Developmental Disorders , 44 , 1236–1243. 10.1007/s10803-013-1975-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • San Martin C., Montero I., Navarro M. I., & Biglia B. (2014). The development of referential communication: Improving message accuracy by coordinating private speech with peer questioning . Early Childhood Research Quarterly , 29 , 76–84. 10.1016/j.ecresq.2013.10.001 [ CrossRef ] [ Google Scholar ]
  • Saxe R., Carey S., & Kanwisher N. (2004). Understanding other minds: Linking developmental psychology and functional neuroimaging . Annual Review of Psychology , 55 , 87–124. 10.1146/annurev.psych.55.090902.142044 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schooler J. W., & Engstler-Schooler T. Y. (1990). Verbal overshadowing of visual memories: Some things are better left unsaid . Cognitive Psychology , 22 , 36–71. 10.1016/0010-0285(90)90003-M [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Scott M. (2013). Corollary discharge provides the sensory content of inner speech . Psychological Science , 24 , 1824–1830. 10.1177/0956797613478614 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Scott M., Yeung H. H., Gick B., & Werker J. F. (2013). Inner speech captures the perception of external speech . The Journal of the Acoustical Society of America , 133 , EL286–EL292. 10.1121/1.4794932 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Senay I., Albarracín D., & Noguchi K. (2010). Motivating goal-directed behavior through introspective self-talk: The role of the interrogative form of simple future tense . Psychological Science , 21 , 499–504. 10.1177/0956797610364751 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shallice T. (1982). Specific impairments of planning . Philosophical Transactions of the Royal Society of London Series B, Biological Sciences , 298 , 199–209. 10.1098/rstb.1982.0082 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shergill S. S., Brammer M. J., Fukuda R., Bullmore E., Amaro E. Jr., Murray R. M., & McGuire P. K. (2002). Modulation of activity in temporal cortex during generation of inner speech . Human Brain Mapping , 16 , 219–227. 10.1002/hbm.10046 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shergill S. S., Brammer M. J., Williams S. C. R., Murray R. M., & McGuire P. K. (2000). Mapping auditory hallucinations in schizophrenia using functional magnetic resonance imaging . Archives of General Psychiatry , 57 , 1033–1038. 10.1001/archpsyc.57.11.1033 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shergill S. S., Bullmore E. T., Brammer M. J., Williams S. C., Murray R. M., & McGuire P. K. (2001). A functional study of auditory verbal imagery . Psychological Medicine , 31 , 241–253. 10.1017/S003329170100335X [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Siegrist M. (1995). Inner speech as a cognitive process mediating self-consciousness and inhibiting self-deception . Psychological Reports , 76 , 259–265. 10.2466/pr0.1995.76.1.259 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith J. D., Reisberg D., & Wilson M. (1992). Subvocalization and auditory imagery: Interactions between the inner ear and inner voice In Reisberg D. (Ed.), Auditory imagery (pp. 95–119). Hillsdale, NJ: Erlbaum. [ Google Scholar ]
  • Sokolov A. N. (1975). Inner speech and thought . New York, NY: Plenum Press Publishing Corporation; 10.1007/978-1-4684-1701-2 [ CrossRef ] [ Google Scholar ]
  • Sommer I. E., Diederen K. M. J., Blom J.-D., Willems A., Kushan L., Slotema K., et al.Kahn R. S. (2008). Auditory verbal hallucinations predominantly activate the right inferior frontal area . Brain: A Journal of Neurology , 131 , 3169–3177. 10.1093/brain/awn251 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sood E. D., & Kendall P. C. (2007). Assessing anxious self-talk in youth: The negative affectivity self-statement questionnaire-anxiety scale . Cognitive Therapy and Research , 31 , 603–618. 10.1007/s10608-006-9043-8 [ CrossRef ] [ Google Scholar ]
  • Stokes C., & Hirsch C. R. (2010). Engaging in imagery versus verbal processing of worry: Impact on negative intrusions in high worriers . Behaviour Research and Therapy , 48 , 418–423. 10.1016/j.brat.2009.12.011 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stokoe W. C., Jr. (2005). Sign language structure: An outline of the visual communication systems of the American deaf . Journal of Deaf Studies and Deaf Education , 10 , 3–37. 10.1093/deafed/eni001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Taylor J. B. (2006). My stroke of insight: A brain scientist’s personal journey . New York, NY: Viking. [ Google Scholar ]
  • Tian X., & Poeppel D. (2010). Mental imagery of speech and movement implicates the dynamics of internal forward models . Frontiers in Psychology , 1 , 166 10.3389/fpsyg.2010.00166 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tian X., & Poeppel D. (2012). Mental imagery of speech: Linking motor and perceptual systems through internal simulation and estimation . Frontiers in Human Neuroscience , 6 , 314 10.3389/fnhum.2012.00314 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tian X., & Poeppel D. (2013). The effect of imagination on stimulation: The functional specificity of efference copies in speech processing . Journal of Cognitive Neuroscience , 25 , 1020–1036. 10.1162/jocn_a_00381 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tullett A. M., & Inzlicht M. (2010). The voice of self-control: Blocking the inner voice increases impulsive responding . Acta Psychologica , 135 , 252–256. 10.1016/j.actpsy.2010.07.008 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Uttl B., Morin A., & Hamper B. (2011). Are inner speech self-report questionnaires reliable and valid? Procedia: Social and Behavioral Sciences , 30 , 1719–1723. 10.1016/j.sbspro.2011.10.332 [ CrossRef ] [ Google Scholar ]
  • Vaccari C., & Marschark M. (1997). Communication between parents and deaf children: Implications for social-emotional development . Child Psychology and Psychiatry and Allied Disciplines , 38 , 793–801. 10.1111/j.1469-7610.1997.tb01597.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van Lutterveld R., Diederen K. M. J., Koops S., Begemann M. J. H., & Sommer I. E. C. (2013). The influence of stimulus detection on activation patterns during auditory hallucinations . Schizophrenia Research , 145 , 27–32. 10.1016/j.schres.2013.01.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van Lutterveld R., Hillebrand A., Diederen K. M. J., Daalman K., Kahn R. S., Stam C. J., & Sommer I. E. C. (2012). Oscillatory cortical network involved in auditory verbal hallucinations in schizophrenia . PLoS ONE , 7 , 10.1371/journal.pone.0041149 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vercueil L., & Perronne-Bertolotti M. (2013). Ictal inner speech jargon . Epilepsy & Behavior , 27 , 307–309. 10.1016/j.yebeh.2013.02.007 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Visser M., Jefferies E., & Lambon Ralph M. A. (2010). Semantic processing in the anterior temporal lobes: A meta-analysis of the functional neuroimaging literature . Journal of Cognitive Neuroscience , 22 , 1083–1094. 10.1162/jocn.2009.21309 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vygotsky L. S. (1930–1935/1978). Mind in society: Development of higher psychological processes . Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Vygotsky L. S. (1934/1987). Thinking and speech. The collected works of Lev Vygotsky (Vol. 1 ). New York, NY: Plenum Press. [ Google Scholar ]
  • Wallace G. L., Silvers J. A., Martin A., & Kenworthy L. E. (2009). Brief report: Further evidence for inner speech deficits in autism spectrum disorders . Journal of Autism and Developmental Disorders , 39 , 1735–1739. 10.1007/s10803-009-0802-8 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Waters F., Woodward T., Allen P., Aleman A., & Sommer I. (2012). Self-recognition deficits in schizophrenia patients with auditory hallucinations: A meta-analysis of the literature . Schizophrenia Bulletin , 38 , 741–750. 10.1093/schbul/sbq144 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Watkins E. R. (2008). Constructive and unconstructive repetitive thought . Psychological Bulletin , 134 , 163–206. 10.1037/0033-2909.134.2.163 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Watson J. B. (1913). Psychology as the behaviorist views it . Psychological Review , 20 , 158–177. 10.1037/h0074428 [ CrossRef ] [ Google Scholar ]
  • Wheeldon L. R., & Levelt W. J. M. (1995). Monitoring the time course of phonological encoding . Journal of Memory and Language , 34 , 311–334. 10.1006/jmla.1995.1014 [ CrossRef ] [ Google Scholar ]
  • White C. S., & Daugherty M. (2009). Creativity and private speech in young children In Winsler A., Fernyhough C., & Montero I. (Eds.), Private speech, executive functioning, and the development of verbal self-regulation (pp. 224–235). Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511581533.018 [ CrossRef ] [ Google Scholar ]
  • Whitehouse A. J., Maybery M. T., & Durkin K. (2006). Inner speech impairments in autism . Journal of Child Psychology and Psychiatry , 47 , 857–865. 10.1111/j.1469-7610.2006.01624.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Whitford T. J., Mathalon D. H., Shenton M. E., Roach B. J., Bammer R., Adcock R. A., et al.Ford J. M. (2011). Electrophysiological and diffusion tensor imaging evidence of delayed corollary discharges in patients with schizophrenia . Psychological Medicine , 41 , 959–969. 10.1017/S0033291710001376 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • WHO (1993). The ICD-10 classification of mental and behavioural disorders: Diagnostic criteria for research . Geneva, Switzerland: World Health Organization. [ Google Scholar ]
  • Wilkinson S., & Bell V. (in press). The representation of agents in auditory verbal hallucinations . Mind & Language . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Willems R. M., Benn Y., Hagoort P., Toni I., & Varley R. (2011). Communicating without a functioning language system: Implications for the role of language in mentalizing . Neuropsychologia , 49 , 3130–3135. 10.1016/j.neuropsychologia.2011.07.023 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Williams D. M., Bowler D. M., & Jarrold C. (2012). Inner speech is used to mediate short-term memory, but not planning, among intellectually high-functioning adults with autism spectrum disorder . Development and Psychopathology , 24 , 225–239. 10.1017/S0954579411000794 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Williams D., Happé F., & Jarrold C. (2008). Intact inner speech use in autism spectrum disorder: Evidence from a short-term memory task . Journal of Child Psychology and Psychiatry , 49 , 51–58. 10.1111/j.1469-7610.2007.01836.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Williams D. M., & Jarrold C. (2010). Brief report: Predicting inner speech use amongst children with autism spectrum disorder (ASD): the roles of verbal ability and cognitive profile . Journal of Autism and Developmental Disorders , 40 , 907–913. 10.1007/s10803-010-0936-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Winsler A. (2009). Still talking to ourselves after all these years: A review of current research on private speech In Winsler A., Fernyhough C., & Montero I. (Eds.), Private speech, executive functioning, and the development of verbal self-regulation (pp. 3–41). Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511581533.003 [ CrossRef ] [ Google Scholar ]
  • Winsler A., Abar B., Feder M. A., Schunn C. D., & Rubio D. A. (2007). Private speech and executive functioning among high-functioning children with autistic spectrum disorders . Journal of Autism and Developmental Disorders , 37 , 1617–1635. 10.1007/s10803-006-0294-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Winsler A., De León J. R., Wallace B. A., Carlton M. P., & Willson-Quayle A. (2003). Private speech in preschool children: Developmental stability and change, across-task consistency, and relations with classroom behaviour . Journal of Child Language , 30 , 583–608. 10.1017/S0305000903005671 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Winsler A., Diaz R. M., & Montero I. (1997). The role of private speech in the transition from collaborative to independent task performance in young children . Early Childhood Research Quarterly , 12 , 59–79. 10.1016/S0885-2006(97)90043-0 [ CrossRef ] [ Google Scholar ]
  • Winsler A., Fernyhough C., & Montero I. (2009). Private speech, executive functioning, and the development of verbal self-regulation . Cambridge, UK: Cambridge University Press; 10.1017/CBO9780511581533 [ CrossRef ] [ Google Scholar ]
  • Winsler A., & Naglieri J. (2003). Overt and covert verbal problem-solving strategies: Developmental trends in use, awareness, and relations with task performance in children aged 5 to 17 . Child Development , 74 , 659–678. 10.1111/1467-8624.00561 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yao B., Belin P., & Scheepers C. (2011). Silent reading of direct versus indirect speech activates voice-selective areas in the auditory cortex . Journal of Cognitive Neuroscience , 23 , 3146–3152. 10.1162/jocn_a_00022 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zatorre R. J., & Halpern A. R. (2005). Mental concerts: Musical imagery and auditory cortex . Neuron , 47 , 9–12. 10.1016/j.neuron.2005.06.013 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zelazo P. D., Craik F. I. M., & Booth L. (2004). Executive function across the life span . Acta Psychologica , 115 , 167–183. 10.1016/j.actpsy.2003.12.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zelazo P. D., Müller U., Frye D., Marcovitch S., Argitis G., Boseovski J., et al.Sutherland A. (2003). The development of executive function in early childhood . Monographs of the Society for Research in Child Development , 68 , vii–137. 10.1111/j.0037-976X.2003.00269.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zimmermann K., & Brugger P. (2013). Signed soliloquy: Visible private speech . Journal of Deaf Studies and Deaf Education , 18 , 261–270. 10.1093/deafed/ens072 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Browse Course Material

Course info.

  • Prof. John D. E. Gabrieli

Departments

  • Brain and Cognitive Sciences

As Taught In

  • Cognitive Science

Learning Resource Types

Introduction to psychology.

« Previous | Next »

Session Overview

Session activities.

Read the following before watching the lecture video.

  • [ Sacks ] Chapter 9, “The President’s Speech” (pp. 80-86)
  • Study outline for K&R Chapter 6 (PDF)
  • [Stangor] Chapter 9, “Intelligence and Language”

Lecture Videos

View Full Video Lecture 12: Language View by Chapter Language Basics: Sounds We Hear and Distinguish From Sound to Meaning: Syntax, Semantics, and Comprehension Problems with Language: Aphasia and the Neural Basis of Speech Language Acquisition: Infants, Bilingualism, and the Case of Genie Video Resources Removed Clips Lecture Slides (PDF - 1.7MB)

Language is just incredible – think about how easy it is for us, as babies, to learn our native language effortlessly, and yet how hard it is, once we’ve already learned a language, to learn another… Read more »

Check Yourself

Short answer questions.

  • Language is a system of communication and representation that is governed by systematic rules. The rules of can be studied at multiple levels. For each of the following terms, identify the features of language they describe:

› View/Hide Answers

  • Syntax: The description of how words are organized into phrases and sentences. Often called the “grammar” of a language.
  • Semantics: The description of the meaning of words, phrases, and sentences.
  • Phonology: The description of the sounds of a language and how they are put together to make words.
  • Language is a powerful system because of the principle of discrete infinity, which means that a small number of basic units can be combined in an unlimited number of ways to represent and communicate ideas.
  • What is the smallest unit of meaning in language called? Give an example of how these can be combined.
  • What is the smallest contrastive unit of sound in language called? Give an example of one of these from English. Give an example of one of these from a foreign language you might have heard, but which is not present in English.
  • Morpheme. For example: Un + believe + able = unbelievable
  • Phoneme. For example: /k/ as in “cat” in English. In other languages but not English, e.g., the trilled “r” in Spanish, the clicks in Bantu, the trilled “r” in French, the hard “h” in Hebrew.
  • It is remarkable that babies learn language so fast and so effectively, even though no one ever explicitly teaches them the rules of their language. Describe two facts you have learned about how babies learn language.

› Sample Answers

  • Girls learn more words earlier than boys.
  • Babies begin to lose the ability to discriminate foreign language speech sounds by 9-12 months.
  • Children might use words wrong in overextensions (e.g. calling all animals “doggie”) or underextensions (refusing to call any other dog besides the family pet a “doggie”).

Further Study

These optional resources are provided for students that wish to explore this topic more fully.

facebook

You are leaving MIT OpenCourseWare

Peshkova Shutterstock

Reviewed by Psychology Today Staff

Aphasia, a communication disorder, is a result of injury or damage to the area of the brain that processes language and communication. People with aphasia have difficulty understanding and expressing language. Aphasia can manifest in both spoken and written forms—a person may have a hard time speaking and understanding spoken words. They may also have difficulty with reading and writing words. This can appear after a head injury , stroke, infection, or problems and conditions such as a brain tumor or neurological diseases such as Alzheimer’s and dementia .

  • Types of Aphasia
  • How to Treat Aphasia

Aphasia varies widely and depends on the severity of the damage and the area of the brain. It is not an uncommon neurocognitive problem, some 25 to 40 percent of people who survive a stroke experience aphasia. A stroke is a common precursor for aphasia —blood flow to the brain is interrupted, which can result in damage to brain areas that process language. This condition can profoundly affect a person’s quality of life.

Aphasia happens across age brackets. But it is more likely in older adults because of problems like stroke and neurodegenerative diseases such as Alzheimer’s, Parkinson’s, and others.

This is a severe type of aphasia , where multiple language areas in the brain have been injured and affected—perhaps by a stroke or brain injury. People with global aphasia have impaired comprehension of single words, full sentences, and whole conversations. They may understand very little that is relayed to them.

This is also called non-fluent or expressive aphasia. The person who suffers from this has a diminished ability to speak spontaneously. They also cannot remember conjunctions and prepositions—"for," "and," "nor," "but," to name a few of many examples. They may understand what is being said to them, and they may be able to understand what they read.

This is also called fluent or receptive aphasia. People who suffer from this can relay connected speech, but they do not understand the meaning of words, may speak nonsense words, speak the wrong words, and have difficulty with written words. They may be unaware of what they are saying.

This is a milder form of expressive aphasia, and people with anomic aphasia cannot find the specific words, especially nouns and verbs, they are thinking about when speaking and writing. They may forget the word apple when looking at an apple.

This type of aphasia is similar to Broca’s, where speech is sparse and effortful, and not spontaneous. It is also similar to Wernicke’s aphasia, where the person is limited in their comprehension of speech.

• Perseverative aphasia: This unintentional repetition of words is also called recurrent perseveration. The conversation has moved on, but the person is still on the previous topic.

• Conductive aphasia: A person with this type of aphasia has halting speech, they are pausing to find the right word, and sometimes they settle on a tangential word.

• Paraphasia: This is also an expressive or receptive type where the person replaces words with wrong and unrelated words.

While aphasia can sometimes improve without treatment, speech and language therapy may be recommended. A speech and language therapist models correct speech and articulation and helps to build language skills.

This approach involves constraining the use of non-verbal communication methods, and encouraging the use of verbal communication; it can promote language recovery.

Various software applications and digital tools are available to aid in language therapy and practice. These can be used under the guidance of a speech therapist or independently.

This is a therapeutic approach that instructs people affected by aphasia and their loved ones in effective communication, both verbal and nonverbal. These strategies encompass various methods like drawing, gesturing, using cues, confirming information, and summarizing discussions to facilitate improved interaction.

A study in Finland found that singing can be an effective treatment. In the study, 50 aphasic  subjects joined a choir for a four-month period. The participants had varying degrees of impairment, from mild to severe. This intervention improved everyday communication and the benefits helped at a four-month follow-up.

speech psychology definition

”As the song goes…” How can singing help with finding lost words and improving psychosocial functioning after a stroke?

speech psychology definition

As Bruce Willis's condition reveals, aphasia is an isolating condition that stifles the ability to express or understand language.

speech psychology definition

Aphasia may impair a person’s ability to speak and understand others but does not affect their intelligence. It is a language disorder, not a cognitive disorder.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Teletherapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Therapy Center NEW
  • Diagnosis Dictionary
  • Types of Therapy

March 2024 magazine cover

Understanding what emotional intelligence looks like and the steps needed to improve it could light a path to a more emotionally adept world.

  • Coronavirus Disease 2019
  • Affective Forecasting
  • Neuroscience

IMAGES

  1. speech presentation styles

    speech psychology definition

  2. Types of Speech

    speech psychology definition

  3. Speech-Language Pathology

    speech psychology definition

  4. 💄 Types of speeches and their examples. The 4 types of speeches

    speech psychology definition

  5. Psychology Terms 101 Series, INTRO to INTRO

    speech psychology definition

  6. What is the meaning and definition of psychology?

    speech psychology definition

VIDEO

  1. जिस इंसान में ये दो आदत हो

  2. Psychology: Definition and It's goals @drbarkhaagrawal9396

  3. What is Psychology/ meaning/ definition / Class 11th Ncert by Professor Viraj

  4. Introduction to psychology/ what is psychology?definition of psychology/ goals of psychologist

  5. psychology video

  6. Definition of psychology

COMMENTS

  1. Assessing Speech

    Interpretation. Slow speech (bradylalia) Depression, Parkinson's disease, cognitive impairment. Normal speech rate. Rapid speech (tachylalia) - fast speech but able to be redirected Normal, mania, anxiety, stimulants. Pressured speech - fast and without taking breaks, talking over other people and unable to be redirected Mania, anxiety.

  2. APA Dictionary of Psychology

    the speech of children roughly between the ages of 18 and 30 months. This is usually in the form of two-word expressions up to the age of about 24 months (see two-word stage) and thereafter is characterized by short but multiword expressions (e.g., dog eat bone ). Also called telegraphic stage.

  3. The Psychology of Speech

    The psychology of speech offers valuable insights into how communication patterns, language use, and speech behaviors influence the dynamics and quality of interpersonal relationships. One crucial aspect explored in the psychology of speech is the role of effective communication in building and maintaining healthy relationships.

  4. APA Dictionary of Psychology

    A trusted reference in the field of psychology, offering more than 25,000 clear and authoritative entries. ... speech. Share button. Updated on 04/19/2018. n. the product of oral-motor movement resulting in articulation of language: the utterance of sounds and words. Browse Dictionary.

  5. PDF The Psychology of Verbal Communication

    The Psychology of Verbal Communication 5 5 In human communication the "information processing devices" are people, the "representations" are mental representations or ideas, and the "modifications of the physical environment" are the uniquely human disturbances of the acoustic surround called speech. FOUR COMMUNICATION PARADIGMS

  6. The 8 Key Elements of Highly Effective Speech

    So before you utter another word to another person, memorize this list of the 8 key elements of highly effective speech: Gentle eye contact. Kind facial expression. Warm tone of voice. Expressive ...

  7. Speech Perception

    Abstract. Speech perception is conventionally defined as the perceptual and cognitive processes leading to the discrimination, identification, and interpretation of speech sounds. However, to gain a broader understanding of the concept, such processes must be investigated relative to their interaction with long-term knowledge—lexical ...

  8. Inner Speech: Development, Cognitive Functions, Phenomenology, and

    Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and ...

  9. What is TELEGRAPHIC SPEECH? definition of ...

    2. the speech of kids roughly between the ages of eighteen and thirty months, that is generally in the shape of two-word expressions. This speech is telegraphic because it utilizes just the most germane and significant aspects of language, passing over prepositions, articles, and other ancillary terms 3. the speech of kids about twenty-four to ...

  10. Language

    This session explores the brain basis of language perception and comprehension, how language contributes to our understanding of our environment, and how we learn languages. Keywords: phoneme, speech, comprehension, hearing, writing, reading, phonology, syntax, evoked response potential (ERP), meaning, pragmatics, aphasia, language acquisition.

  11. Telegraphic Speech

    Telegraphic speech is a stage in language development where young children use simplified, concise phrases to express their thoughts. It marks an important step towards language fluency and paves the way for further linguistic and cognitive growth. Definition of Telegraphic Speech: Telegraphic Speech refers to the simplified form of language ...

  12. Telegraphic speech

    Telegraphic speech, according to linguistics and psychology, is speech during the two-word stage of language acquisition in ... In the field of psychology, telegraphic speech is defined as a form of communication consisting of simple two-word long sentences often composed of a noun and a verb that adhere to the grammatical standards of the ...

  13. Speech disorders: Types, symptoms, causes, and treatment

    A speech-language pathologist (SLP) is a healthcare professional who specializes in speech and language disorders. An SLP will evaluate a person for groups of symptoms that indicate one type of ...

  14. Speech

    Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if they are the same word, e.g., "role" or "hotel"), and using those words in their semantic character as words in ...

  15. PDF Understanding the Mental Status Examination

    Speech Characteristics •Speech can be described in terms of its quantity, rate of production, and quality. •Such as: talkative, garrulous, voluble, taciturn, unspontaneous, or normally responsive to cues from the interviewer. There is alogia or poverty of speech in Schizophrenia •Speech can be rapid or slow, pressured (hard to interrupt

  16. APA Dictionary of Psychology

    n. a system for expressing or communicating thoughts and feelings through speech sounds or written symbols. See natural language. the specific communicative system used by a particular group of speakers, with its distinctive vocabulary, grammar, and phonological system. any comparable nonverbal means of communication, such as sign language or ...

  17. Aphasia

    Aphasia. Aphasia, a communication disorder, is a result of injury or damage to the area of the brain that processes language and communication. People with aphasia have difficulty understanding ...

  18. Vygotsky's Theory of Cognitive Development

    Vygotsky's theory comprises concepts such as culture-specific tools, private speech, and the zone of proximal development. Vygotsky believed cognitive development is influenced by cultural and social factors. He emphasized the role of social interaction in the development of mental abilities e.g., speech and reasoning in children.

  19. The Signs and Causes of Disorganized Speech

    Displacement: citing a similar idea but not the correct one. Contamination: fusing ideas into one another. Accelerated thinking: rapid flow and increased volume of speech. Flight of ideas: losing ...

  20. Poverty of Speech: What Is Alogia a Sign of?

    Let's recap. Alogia is a symptom related to a number of underlying mental health conditions. It's also referred to as poverty of speech or poverty of content. When you live with alogia, you ...

  21. APA Dictionary of Psychology

    Updated on 04/19/2018. incoherent speech. This may be speech in which ideas shift from one subject to another seemingly unrelated subject, sometimes described as loosening of associations. Other types of disorganized speech include responding to questions in an irrelevant way, reaching illogical conclusions, and making up words. See neologism ...

  22. Tangential speech

    Tangential speech or tangentiality is a communication disorder in which the train of thought of the speaker wanders and shows a lack of focus, never returning to the initial topic of the conversation. It tends to occur in situations where a person is experiencing high anxiety, as a manifestation of the psychosis known as schizophrenia, in dementia or in states of delirium.