In this article, samples of object recognition on video and selection of unique scenes are considered. We used a new algorithm of capsule networks as a tool for video analysis. The algorithm is a continuation of the development of convolutional neural networks. Convolutional networks use a scalar as the base element to be processed. In turn, capsule networks are processing vectors, and use a special routing algorithm. These fundamental differences allow capsule networks to be more invariant to the rotations and changes in illumination of the recognized object. This fact has become the key to choosing this type of networks for analysis of dynamic video. In this article, we propose a method for video segmentation. The essence of this method is as follows. First, you need to determine the main acting objects on adjacent frames. Then, it is necessary to determine whether these objects coincide, if not, then the second frame is considered the moment of transition to another scene. The proposed method was tested on the custom-collected dataset based on videos from YouTube. There were two classes of objects in dataset. The results presented in the article show that we were not able to achieve a high level of accuracy of video segmentation. Also worth noting that the learning process took quite a long time. Copyright © 2018 for the individual papers by the papers’ authors.