The human central-auditory system exhibits distinct lateralization effects (speech, space) and encompasses different processing pathways (where, what, who). Using spatialized pseudoword utterances, attentional modulation of the networks bound to sound source localization ('where'), voice recognition ('who'), and the encoding of phonetic-linguistic information ('what') was evaluated by silent functional magnetic resonance imaging. The 'where'-pathway was found to be restricted to posterior parts of the left superior temporal gyrus, speaker ('auditory face') identification exclusively activated temporal lobe structures, and the representation of the sound structure of the utterances was associated with hemodynamic activation of Broca's area. Speech perception in space, therefore, engages at least three distinct neural networks. Furthermore, the findings indicate that voice recognition may depend upon template matching within auditory association cortex whereas the sequencing of phonetic-linguistic information extends to frontal areas.