Analysing Random Walk Insertions in Cuckoo Hash Tables and Practical Filter Data Structures
Final Report Abstract
This project had two parts: a theoretical part on hash tables and a practical part on filters. A hash table is a fundamental data structure used to store a set of key-value pairs, called elements. The challenge is to compactly place all elements in memory so that each element can be quickly retrieved using its key. The cuckoo hashing strategy assigns each key multiple positions in memory through hash functions, where an associated element may be placed and where it is searched when the key is requested. This way, whenever two keys compete for the same position, one can move to one of its alternative positions. The natural random-walk algorithm dictates that, in the event of a collision, a key chooses one of its alternative positions randomly and moves there, which can trigger a cascade of further movements. This project deepened the understanding of this random process, particularly regarding the expected number of necessary moves when inserting an element. This number remains constant under certain assumptions, i.e. it not grow with the size of the hash table. A filter represents a set of objects and is meant to decide whether a given object is in the set, answering “YES” or “NO”. The filter is allowed a small error probability: it may occasionally answer “YES” when the correct answer is “NO”, but never “NO” when the correct answer is “YES”. This relaxation allows filters to be significantly smaller and faster than exact set data structures. If the filter gives a “NO” response, it is reliable, and access to an exact data structure is unnecessary. Filters play a crucial role in efficient database systems. The project developed and fine-tuned a promising approach to designing filter data structures. The resulting ribbon filters have good theoretical properties and are considerably faster and more compact in practice than their predecessors.
Publications
-
“Ribbon filter: practically smaller than Bloom and Xor”. In: CoRR abs/2103.02515 (2021).
Peter C. Dillinger & Stefan Walzer
-
“Fast Succinct Retrieval and Approximate Membership Using Ribbon”. In: 20th SEA. 2022.
Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders & Stefan Walzer
-
“Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold”. In: 30th ESA. 2022.
Stefan Walzer
-
Lightweight Acquisition and Ranging of Flows in the Data Plane. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 7(3), 1-24.
Monterubbiano, Andrea; Langlet, Jonatan; Walzer, Stefan; Antichi, Gianni; Reviriego, Pedro & Pontarelli, Salvatore
-
On the Privacy of Counting Bloom Filters Under a Black-Box Attacker. IEEE Transactions on Dependable and Secure Computing, 20(5), 4434-4440.
Galán, Sergio; Reviriego, Pedro; Walzer, Stefan; Sánchez-Macian, Alfonso; Liu, Shanshan & Lombardi, Fabrizio
-
On the Privacy of Counting Bloom Filters. IEEE Transactions on Dependable and Secure Computing, 20(2), 1488-1499.
Reviriego, Pedro; Sánchez-Macian, Alfonso; Walzer, Stefan; Merino-Gómez, Elena; Liu, Shanshan & Lombardi, Fabrizio
-
SicHash - Small Irregular Cuckoo Tables for Perfect Hashing. 2023 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), 176-189. Society for Industrial and Applied Mathematics.
Lehmann, Hans-Peter; Sanders, Peter & Walzer, Stefan
-
Simple Set Sketching. Symposium on Simplicity in Algorithms (SOSA), 228-241. Society for Industrial and Applied Mathematics.
Bæk, Tejs Houen Jakob; Pagh, Rasmus & Walzer, Stefan
-
“What if we tried Less Power? Lessons from studying the power of choices in hashing-based data structures”. In: Bull. EATCS 140 (2023).
Stefan Walzer
-
ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force. 2024 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), 194-206. Society for Industrial and Applied Mathematics.
Lehmann, Hans-Peter; Sanders, Peter & Walzer, Stefan
