The Psychometrics of Automatic Speech Recognition

Weerts L, Rosen S, Clopath C, Goodman DFM


Deep neural networks have had considerable success in neuroscience as models of the visual system, and recent work has suggested this may also extend to the auditory system. We tested the behaviour of a range of state of the art deep learning-based automatic speech recognition systems on a wide collection of manipulated sounds used in standard human psychometric experiments. While some systems showed qualitative agreement with humans in certain tests, in others all tested systems diverged markedly from humans. In particular, all systems used spectral invariance, temporal fine structure and speech periodicity differently from humans. We conclude that despite some promising results, none of the tested automatic speech recognition systems can yet act as a strong proxy for human speech recognition. However, we note that the more recent systems with better performance also tend to better match human results, suggesting that continued cross-fertilisation of ideas between human and automatic speech recognition may be fruitful. Our open source toolbox allows researchers to assess future automatic speech recognition systems or add additional psychoacoustic measures.


Related software


Python package for psychophysical tests of automatic speech recognition systems.