Publication:
Efficient Exact Algorithm for Count Distinct Problem

Дата
2019
Авторы
Golov, N.
Bruskin, S.
Filatov, A.
Journal Title
Journal ISSN
Volume Title
Издатель
Научные группы
Организационные подразделения
Выпуск журнала
Аннотация
© 2019, Springer Nature Switzerland AG.This paper describes and analyses optimization approaches, which make possible the exact calculation of millions of hierarchical count distinct measures over hundreds of billions data rows. Described approach evolved for several years, in parallel with the growth of tasks from a fast growing internet company, and was finally implemented as a PEAPM (Pipelined Exact Accumulation for Paralleled Measures) algorithm. Current version of an algorithm outputs exact values (not estimates), works in a single thread, in minutes using a general commodity hardware, and requires volume of RAM equal to the doubled size of required measures.
Описание
Ключевые слова
Цитирование
Golov, N. Efficient Exact Algorithm for Count Distinct Problem / Golov, N., Bruskin, S., Filatov, A. // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - 2019. - 11661 LNCS. - P. 67-77. - 10.1007/978-3-030-26831-2_5
Коллекции