Где живут чудовища? Корпусный метод обнаружения англицизмов и их производных в русскоязычном Интернете

Описан метод автоматизированного обнаружения английских заимствований и их производных с помощью менеджера корпусов Sketch Engine и его инструмента Keyword, работающего на основе принципа TF-IDF. Пилотное исследование было проведено на материале небольшого количества блоговых текстов о моде (174 213 словоупотреблений - 218 091 токен) с сайта LiveJournal, в которых благодаря применению функции Keyword было выявлено 84 заимствования в сфере моды (4 506 вхождений) и 32 производных (1 194 вхождения). Автор заявляет об отсутствии конфликта интересов.

Many articles show the results of the study of anglicisms, and we must assume that as long as languages accept anglicisms, their study will remain topical. Nowadays, more and more attention is being paid to the issue of automated detection of English loanwords and their derivatives in different languages. This article describes the corpus method of detecting English loanwords and their derivatives in Russian fashion blogs by means of corpus manager Sketch Engine and its tool Keyword, which operates on TF-IDF principle. The relevance of the study is related to the following objectives: to detect the newest anglicisms that have no lexicographic fixation and to determine their number and frequency; to optimize the search of anglicisms and their derivatives; to reduce the human factor in the search of anglicisms and their derivatives. The structure of this article includes, first, an explanation of the terms anglicism and derivative and ways of anglicisms adaptation to Russian; second, a description of existing software methods for detecting anglicisms on the Internet (based on neural networks training and the use of AntConc corpus manager); third, a description of corpus method to detect anglicisms with Sketch Engine, which has not been used to search for anglicisms on Russian Internet, and an explanation of key terms necessary to understand the mechanism of the described method. A pilot research was conducted on a small number of fashion blog posts (174,213 words -218,091 tokens) from LiveJournal, in which 84 fashion loanwords (4,506 occurrences) and 32 derivatives (1,194 occurrences) were detected using the Keyword function: bini, bodi, nyud, skini, slipon; zamiksovat’, kezhual’shchik, nyudovyy etc. The pilot study has shown that the use of the Sketch Engine contributes to solving the problems of automating the search of anglicisms and their derivatives on Russian Internet. The implementation of the proposed method requires the preliminary preparation of a focus corpus and subsequent keyword analysis. A preliminary preparation implies: (1) selection of texts united by a common topic; (2) manual removal of hidden hyperlinks in the texts if the corpus is not compiled by crawling Internet pages, but by loading texts independently copied from Internet pages; (3) selection of a suitable reference corpus reflecting the colloquial language. Subsequent keyword analysis involves: (1) excluding irrelevant lexical units from the list of keywords; (2) lemmatising anglicisms and their derivatives and lemmatising individual word forms to lemmas where necessary. The proposed method can be applied not only to the search of English loanwords on Russian Internet but also to the texts in other languages covered by Sketch Engine. The prospect of further exploration of this method consists in studying the specifics of its use to search for anglicisms and their derivatives in other languages, other thematic areas and also on a larger array of texts. The author declares no conflicts of interests.

Авторы
Издательство
Tomsk State University
Номер выпуска
80
Язык
Русский
Страницы
5-29
Статус
Опубликовано
Год
2022
Организации
  • 1 Российский университет дружбы народов
Ключевые слова
anglicisms; loanwords; Automatic detections; corpus linguistics; англицизмы; заимствования; поиск англицизмов; корпусная лингвистика; методы корпусного анализа
Цитировать
Поделиться

Другие записи