math in bloom filter
Bloom filter is building block for various applications.
What is Bloom filter (BF)?
Bloom filter is a data structure for probabilistic set membership detection.
Here is a definition.
Bloom Filter BF could be an array of length \(m\) 01 bit, which can represent a set \(A\) with \(n\) elements. \[ A = {a_1, a_2, \cdots, a_n} \] Reader may ask how we gonna put those elements into such an array. The answer is hashing. Along with this array, a BF usually has \(k\) independent hash functions. The basic idea of BF is that to allocate an element \(a_i\) in \(A\), we will calculate \(h_1(a_i),h_2(a_i),\cdots,h_k(a_i)\) and then set \(BF_{A}[h(a)]\) to 1.
How to check set membership?
To check if an element \( a_x \) is in the set \(A\), \(a_x\) is hashed by \(k\) hash functions, if for every hash result, \(BF [h(a_x)]\) is 1, then this element has a great probability in the set. On the contrary, if any \(BF [h(a_x)]\) is 0, then we have probability as 1 for sure that this element is not in the set.
What is the probability that a bit in BF is 1?
To figure out this, first, to figure out the game rule, it is like pour water in bottle. Every time you pick up a bottle by the hash, you need pour some water into this bottle. For a particular bottle, the chance it is non-empty for one hash is \(1-\frac{1}{m}\). After we finish one element, it is in total \(k\) round. so the probability that one particular bit is not set as 1 during one element is \((1-\frac{1}{m})^{k}\). Since we have \(n\) elements in the set, after we process all those elements, this probability that one particular bit is set to 1 should be \[1-(1-\frac{1}{m})^{kn}\]
What is the upper bound of false positive probability?
As mentioned, BF is a probabilistic membership checking method. False positive probability is the probability that you get a resulting indicating that one element is in the set but the actual result is that this element is not in the set. So it is a false alarm. To calculate this probability, we will refer to a paper and learn from it. (To Be Continued.)