Publication:
The Two-Stage Algorithm for Extraction of the Significant Pharmaceutical Named Entities and Their Relations in the Russian-Language Reviews on Medications on Base of the XLM-RoBERTa Language Model

Дата
2022
Авторы
Moloshnikov, I.
Selivanov, A.
Rylkov, G.
Rybka, R.
Sboev, A.
Journal Title
Journal ISSN
Volume Title
Издатель
Научные группы
Организационные подразделения
Организационная единица
Институт ядерной физики и технологий
Цель ИЯФиТ и стратегия развития - создание и развитие научно-образовательного центра мирового уровня в области ядерной физики и технологий, радиационного материаловедения, физики элементарных частиц, астрофизики и космофизики.
Выпуск журнала
Аннотация
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.The Internet contains a large amount of heterogeneous information, the extraction and structuring of which is currently a relevant task. This is especially relevant for tasks of social importance, in particular the analysis of the experience of using pharmaceutical products. In this paper, we propose a two-step sequential algorithm for extracting named entities and the relationships between them. Its creation was made possible by the availability of a marked-up corpus of Internet users’ reviews of medicines (Russian Drug Review Corpus). The basis of the algorithm is the language model XLM-RoBERTa-sag, which is pre-trained on a large corpus of unlabeled texts of reviews. The developed algorithm achieves the accuracy of identifying related entities: 71.6 and relations: 80.5, which is the first estimate of the accuracy of the solution of the considered problem on the Russian-language drug review texts.
Описание
Ключевые слова
Цитирование
The Two-Stage Algorithm for Extraction of the Significant Pharmaceutical Named Entities and Their Relations in the Russian-Language Reviews on Medications on Base of the XLM-RoBERTa Language Model / Moloshnikov, I. [et al.] // Studies in Computational Intelligence. - 2022. - 1032 SCI. - P. 463-471. - 10.1007/978-3-030-96993-6_51
Коллекции