Tuesday, September 7, 2010

Naive Bloomier Filter

I've been playing with the Naive implementation of the Bloomier Filter recently. The Bloomier Filter is just a stack of several Bloom Filters on each level. (Well the Naive version described in the first few pages of the paper is any ways) Is it me or is it that I just realized that there is a huge savings that can be had by allocating the second layer to have size FPR*N where FPR is the guaranteed False Positive Rate and N is the total number of items we inserted. So, but how small is the set of items that may make it to the next layer? Well, the expected value can be calculated as FPR*N. I guess if I am smarter, I would go ahead and calculate the variance of this RV, and compute how much variance to the overall expected false positives I add by making this second layer just a fraction of the first layer....

I guess he probably says this in the paper some where, but alas, took me a day to figure out.

Actually in reality it is very rare for me to break out of the first layer unless I construct a really really obnoxious case where I tell the BloomFilter that I plan to put in 100 items, and that I want 0.00001 FPR, and then put in 100k items, then it explodes and generates extra false positives.


This is such an interesting data structure. Have you had success with this algorithm?


1 comment:

  1. He does indeed derive the number of bits subsequent filters need to implement naive bloomier filter, however, the bits were not expressed in closed form expressions written in Wikipedia. Indeed, I think the approximation used in the page makes the solution a lower bound to the fpr, though as the number of bits approaches infinity, the fpr approaches that lower bound.

    I should do some experiments to verify that estimate is good.

    Also, I wish I spent more time learning inequalities in high school... It's been 5 sleepless nights and I still could not derive that solution.

    ReplyDelete