Publication: ANOMALY DETECTION IN STREAM DATA PROCESSING IN REAL TIME
Дата
2022
Авторы
Journal Title
Journal ISSN
Volume Title
Издатель
Аннотация
© 2022 Little Lion ScientificThe purpose of the present work is to study methods for solving problems of anomaly detection and prediction of time series values when processing streaming data in real-time in a network environment and their improvement. To solve this problem the authors propose a Real-Time K-Means modification with preliminary markup. The effectiveness of the modification is confirmed by comparing it with the frequently used Streaming K-Means from the Apache Spark Mllib library. To solve the problem of predicting time series when processing streaming data in real-time, the authors propose a modification of the autoregression model with a given AR order by adding the inheritance function of the previous values of the time series to it. The results of comparative experiments of the proposed Real-Time AR modification with classical AR confirmed the effectiveness of the modification, which is especially evident in the presence of anomalies in the behavior of the time series. The proposed algorithm modifications allow not only parallelizing calculations using the deferred computing paradigm but also configuring the model fleetingly in the Apache Spark ecosystem. To conduct experiments with algorithms, a dataset was built – a data slice from 1,000 measurements of the Apache Kafka server metrics log with one topic, two producers, and one consumer. Anomalous fragments were artificially added to the dataset, differing by a large number of messages per second and/or message size. The values of the original dataset were normalized and shifted to the average value of the training fetch. Moreover, static and highly correlated metrics were eliminated. The results of the application of the developed algorithms in solving the problems of detecting and predicting the values of time series have shown that even the presence of behavior anomalies does not distort predictions significantly.
Описание
Ключевые слова
Цитирование
ANOMALY DETECTION IN STREAM DATA PROCESSING IN REAL TIME / Dunaev, M. [et al.] // Journal of Theoretical and Applied Information Technology. - 2022. - 100. - № 10. - P. 3467-3477