Publication:
Development of Text Data Processing Pipeline for Scientific Systems

dc.contributor.authorGuseva, A. I.
dc.contributor.authorKuznetsov, I. A.
dc.contributor.authorBochkaryov, P. V.
dc.contributor.authorFilippov, S. A.
dc.contributor.authorKireev, V. S.
dc.contributor.authorГусева, Анна Ивановна
dc.contributor.authorКузнецов, Игорь Александрович
dc.contributor.authorБочкарёв, Пётр Владимирович
dc.contributor.authorКиреев, Василий Сергеевич
dc.contributor.otherФакультет бизнес-информатики и управления комплексными системами
dc.date.accessioned2024-11-25T14:44:38Z
dc.date.available2024-11-25T14:44:38Z
dc.date.issued2020
dc.description.abstract© 2020, Springer Nature Switzerland AG.The aim of this work was to develop pipeline processing of scientific texts, including articles and abstracts, for their further categorization, identify patterns and build recommendations to users of scientific systems. The authors proposed a number of methods of pre-processing of texts, the method of cluster and classification analysis of texts, developed a software system of recommendations to users of scientific publications. To solve the problem of data preprocessing it is proposed to use parametrical approach to retrieve new – semantic – feature from textual publications – the type of scientific result. Scientific result type extraction is built just based on user’s need for content having specific property. To solve the problem of users’ profile clustering it is proposed to use ensemble method with distance metric change. For classification, ensemble method based on entropy is used. Evaluation of proposed methods and algorithms employment efficiency was carried out as applied to operation of search module of “Technologies in Education” International Congress of Conferences information system. Author acknowledges support from the MEPhI Academic Excellence Project (Contract No. 02.a03.21.0005).
dc.format.extentС. 124-136
dc.identifier.citationDevelopment of Text Data Processing Pipeline for Scientific Systems / Guseva, A.I. [et al.] // Advances in Intelligent Systems and Computing. - 2020. - 948. - P. 124-136. - 10.1007/978-3-030-25719-4_17
dc.identifier.doi10.1007/978-3-030-25719-4_17
dc.identifier.urihttps://www.doi.org/10.1007/978-3-030-25719-4_17
dc.identifier.urihttps://www.scopus.com/record/display.uri?eid=2-s2.0-85070224759&origin=resultslist
dc.identifier.urihttps://openrepository.mephi.ru/handle/123456789/19991
dc.relation.ispartofAdvances in Intelligent Systems and Computing
dc.titleDevelopment of Text Data Processing Pipeline for Scientific Systems
dc.typeConference Paper
dspace.entity.typePublication
oaire.citation.volume948
relation.isAuthorOfPublication23b35aa9-9178-4bd2-8518-a31c29111cb9
relation.isAuthorOfPublication28706dc3-84c9-4ae4-beea-159bf1ee06a4
relation.isAuthorOfPublicationa8b305dd-c229-4f45-845f-0ac9ac5c210e
relation.isAuthorOfPublication1809cf60-3d01-4505-a5ad-b694b85f0d19
relation.isAuthorOfPublication.latestForDiscovery23b35aa9-9178-4bd2-8518-a31c29111cb9
relation.isOrgUnitOfPublication764cf4b3-672d-44a7-847c-c3d0b3e0e552
relation.isOrgUnitOfPublication010157d0-1f75-46b2-ab5b-712e3424b4f5
relation.isOrgUnitOfPublication.latestForDiscovery764cf4b3-672d-44a7-847c-c3d0b3e0e552
Файлы
Коллекции