Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

Bloom Filters

0.00/5 (No votes)
4 Apr 2019MIT1 min read 4.2K  
From Wikipedia: A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

From Wikipedia:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

In other words, given a set of elements, bloom filter can tell you that:

  1. given element is definitely not in the set, or
  2. given element is maybe in the set.

It can give you a false positive: it can say that an element is in the set when it in fact is not. But it will never give you a false negative: it will never say that an element is not in the set when in fact it is.

In my past life, I used bloom filters to check whether or not I should perform an expensive database index search. 🙂

In the following example, I construct a bloom filter given a set of strings:

set1

I then verify that each element of:

set1

is in the set according to the bloom filter. Finally, I try a different set of elements:

set2

and test what the bloom filter says about those elements. Given big enough bloom filter, I get 100% correct answers (that none of the elements in...

set2

...are present). Here’s the code (bloom.cpp):

#include <iostream>
#include <string>
#include "bloom.hpp"

using namespace std;

int main()
{
	string set1[] = {"Martin", "Vorbrodt", "C++", "Blog"};
	string set2[] = {"Not", "In", "The", "Set"};

	bloom_filter<string> bloom(128);

	for(auto& s : set1)
		bloom.add(s);

	for(auto& s : set1)
		cout << "Contains \"" << s << "\" : " << bloom.contains(s) << endl;

	for(auto& s : set2)
		cout << "Contains \"" << s << "\" : " << bloom.contains(s) << endl;
}

Program output:

Contains "Martin" : 1
Contains "Vorbrodt" : 1
Contains "C++" : 1
Contains "Blog" : 1
Contains "Not" : 0
Contains "In" : 0
Contains "The" : 0
Contains "Set" : 0

My implementation of the bloom filter is primitive: it uses only one hashing function which increases false positives, but it is very short and clean and can be built upon; maybe at some point, I’ll write a full blown implementation; though there’s plenty of examples online; I want this post to be an introduction to the topic.

In any case, here’s my implementation of the bloom filter (bloom.hpp):

#pragma once

#include <vector>
#include <functional>
#include <cstddef>

template<typename key, typename hash = std::hash<key>>
class bloom_filter
{
public:
	bloom_filter(std::size_t size) : m_bits(size) {}

	void add(const key& data)
	{
		m_bits[hash{}(data) % m_bits.size()] = true;
	}

	bool contains(const key& data)
	{
		return m_bits[hash{}(data) % m_bits.size()];
	}

private:
	std::vector<bool> m_bits;
};

License

This article, along with any associated source code and files, is licensed under The MIT License