Publication: Development of Text Data Processing Pipeline for Scientific Systems
Дата
2020
Авторы
Guseva, A. I.
Kuznetsov, I. A.
Bochkaryov, P. V.
Filippov, S. A.
Kireev, V. S.
Journal Title
Journal ISSN
Volume Title
Издатель
Аннотация
© 2020, Springer Nature Switzerland AG.The aim of this work was to develop pipeline processing of scientific texts, including articles and abstracts, for their further categorization, identify patterns and build recommendations to users of scientific systems. The authors proposed a number of methods of pre-processing of texts, the method of cluster and classification analysis of texts, developed a software system of recommendations to users of scientific publications. To solve the problem of data preprocessing it is proposed to use parametrical approach to retrieve new – semantic – feature from textual publications – the type of scientific result. Scientific result type extraction is built just based on user’s need for content having specific property. To solve the problem of users’ profile clustering it is proposed to use ensemble method with distance metric change. For classification, ensemble method based on entropy is used. Evaluation of proposed methods and algorithms employment efficiency was carried out as applied to operation of search module of “Technologies in Education” International Congress of Conferences information system. Author acknowledges support from the MEPhI Academic Excellence Project (Contract No. 02.a03.21.0005).
Описание
Ключевые слова
Цитирование
Development of Text Data Processing Pipeline for Scientific Systems / Guseva, A.I. [et al.] // Advances in Intelligent Systems and Computing. - 2020. - 948. - P. 124-136. - 10.1007/978-3-030-25719-4_17