Publication: Efficient Exact Algorithm for Count Distinct Problem
Дата
2019
Авторы
Golov, N.
Bruskin, S.
Filatov, A.
Journal Title
Journal ISSN
Volume Title
Издатель
Аннотация
© 2019, Springer Nature Switzerland AG.This paper describes and analyses optimization approaches, which make possible the exact calculation of millions of hierarchical count distinct measures over hundreds of billions data rows. Described approach evolved for several years, in parallel with the growth of tasks from a fast growing internet company, and was finally implemented as a PEAPM (Pipelined Exact Accumulation for Paralleled Measures) algorithm. Current version of an algorithm outputs exact values (not estimates), works in a single thread, in minutes using a general commodity hardware, and requires volume of RAM equal to the doubled size of required measures.
Описание
Ключевые слова
Цитирование
Golov, N. Efficient Exact Algorithm for Count Distinct Problem / Golov, N., Bruskin, S., Filatov, A. // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - 2019. - 11661 LNCS. - P. 67-77. - 10.1007/978-3-030-26831-2_5
URI
https://www.doi.org/10.1007/978-3-030-26831-2_5
https://www.scopus.com/record/display.uri?eid=2-s2.0-85071414798&origin=resultslist
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS_CPL&DestLinkType=FullRecord&UT=WOS:000555272600005
https://openrepository.mephi.ru/handle/123456789/18634
https://www.scopus.com/record/display.uri?eid=2-s2.0-85071414798&origin=resultslist
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS_CPL&DestLinkType=FullRecord&UT=WOS:000555272600005
https://openrepository.mephi.ru/handle/123456789/18634