Similar Terms Grouping Yields Faster Terminological Saturation

Kosa, V.; Chaves-Fraga, D.; Keberle, N.; Birukou, A.

Similar Terms Grouping Yields Faster Terminological Saturation

This paper reports on the refinement of the algorithm for measuring terminological difference between text datasets (THD). This baseline THD algorithm, developed in the OntoElect project, used exact string matches for term comparison. In this work, it has been refined by the use of appropriately selected string similarity measures (SSM) for grouping the terms, which look similar as text strings and presumably have similar meanings. To determine rational term similarity thresholds for several chosen SSMs, the measures have been implemented as software functions and evaluated on the developed test set of term pairs in English. Further, the refined algorithm implementation has been evaluated against the baseline THD algorithm. For this evaluation, the bags of terms have been used that had been extracted from the three different document collections of scientific papers, belonging to different subject domains. The experiment revealed that the use of the refined THD algorithm, compared to the baseline, resulted in quicker terminological saturation on more compact sets of source documents, though at an expense of a noticeably higher computation time. © 2019, Springer Nature Switzerland AG.

Authors

Kosa V. ¹ , Chaves-Fraga D. ² , Keberle N. ¹ , Birukou A. ^3, ⁴

Journal

Communications in Computer and Information Science

Publisher

Springer Verlag

Language

English

Pages

43-70

State

Published

DOI

10.1007/978-3-030-13929-2_3

Volume

1007

Year

2019

Organizations

¹ Department of Computer Science, Zaporizhzhia National University, Zaporizhzhia, Ukraine
² Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
³ Springer-Verlag GmbH, Heidelberg, Germany
⁴ Peoples’ Friendship University of Russia (RUDN University), Moscow, Russian Federation

Keywords

Automated term extraction; Bag of terms; OntoElect; String similarity measure; Terminological difference; Terminological saturation

Cite

ГОСТ MLA RIS BibTex

THE DEVELOPMENT OF DIGITAL ECONOMY AND PUBLIC ADMINISTRATION EDUCATION

Article

Semenov S., Filatova O.

Communications in Computer and Information Science. Vol. 947. 2019. P.. 469-480

PEDAGOGICAL TOOLS: ESSENCE, USE AND ROLE OF THE CONCEPT IN RUSSIAN AND FOREIGN PEDAGOGY

Article

Strelchuk E.N.

Перспективы науки и образования. Vol. 37. 2019. P.. 10-19

Similar Terms Grouping Yields Faster Terminological Saturation

Other records

THE DEVELOPMENT OF DIGITAL ECONOMY AND PUBLIC ADMINISTRATION EDUCATION

PEDAGOGICAL TOOLS: ESSENCE, USE AND ROLE OF THE CONCEPT IN RUSSIAN AND FOREIGN PEDAGOGY

Cite