Кратко про HyperLogLog (подробности и визуализаторы в ссылках). \(F_k\)-момент (\(F_k = \sum_i f_i^k\)).
Алгоритм AMS для \(F_2\)-момента. Выбор случайного элемента из потока. Оценка для \(\sum_i g(f_i)\) при \(g(0) = 0\). Алгоритм для \(F_k\) момента.
Источники:
- The space complexity of approximating the frequency moments, Noga Alon, Yossi Matias, Mario Szegedy, 2002
- Loglog Counting of Large Cardinalities, Marianne Durand and Philippe Flajolet, 2003
- HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm, Philippe Flajolet, Eric Fusy, Olivier Gandouet, Frederic Meunier, 2007
- http://research.neustar.biz/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
- http://research.neustar.biz/2013/04/02/sketch-of-the-day-probabilistic-counting-with-stochastic-averaging-pcsa/
- http://research.neustar.biz/2012/07/09/sketch-of-the-day-k-minimum-values/
- Как HyperLogLog реализован в Redis: http://antirez.com/news/75