The paper investigates the problem of recognizing human emotions by voice using deep learning methods. Deep convolutional neural networks and recurrent neural networks with bidirectional LSTM memory cell were used as models of deep neural networks. On their basis, an ensemble of neural networks is proposed. We carried out computer experiments on using the constructed neural networks and popular machine learning algorithms for recognizing emotions in human speech contained in the RAVDESS audio record database. The computational results showed a higher efficiency of neural network models compared to machine learning algorithms. Accuracy estimates for individual emotions obtained using neural networks were 80%. The directions of further research in the field of recognition of human emotions are proposed. © 2020, Springer Nature Switzerland AG.