Class BloomSHA1

java.lang.Object
org.xlattice.crypto.filters.BloomSHA1

public class BloomSHA1 extends Object
A Bloom filter for sets of SHA1 digests. A Bloom filter uses a set of k hash functions to determine set membership. Each hash function produces a value in the range 0..M-1. The filter is of size M. To add a member to the set, apply each function to the new member and set the corresponding bit in the filter. For M very large relative to k, this will normally set k bits in the filter. To check whether x is a member of the set, apply each of the k hash functions to x and check whether the corresponding bits are set in the filter. If any are not set, x is definitely not a member. If all are set, x may be a member. The probability of error (the false positive rate) is f = (1 - e^(-kN/M))^k, where N is the number of set members. This class takes advantage of the fact that SHA1 digests are good- quality pseudo-random numbers. The k hash functions are the values of distinct sets of bits taken from the 20-byte SHA1 hash. The number of bits in the filter, M, is constrained to be a power of 2; M == 2^m. The number of bits in each hash function may not exceed floor(m/k). This class is designed to be thread-safe, but this has not been exhaustively tested.
Author:
Jim Dixon BloomSHA1.java and KeySelector.java are BSD licensed from the xlattice app - http://xlattice.sourceforge.net/ minor tweaks by jrandom, exposing unsynchronized access and allowing larger M and K. changes released into the public domain. Note that this is used only by DecayingBloomFilter, which uses only the unsynchronized locked_foo() methods. Deprecated for use outside of the router; to be moved to router.jar. As of 0.8.11, the locked_foo() methods are thread-safe, in that they work, but there is a minor risk of false-negatives if two threads are accessing the same bloom filter integer.
  • Constructor Details

    • BloomSHA1

      public BloomSHA1(int m, int k)
      Creates a filter with 2^m bits and k 'hash functions', where each hash function is portion of the 160-bit SHA1 hash.
      Parameters:
      m - determines number of bits in filter
      k - number of hash functionsx See KeySelector for important restriction on max m and k
    • BloomSHA1

      public BloomSHA1(int m)
      Creates a filter of 2^m bits, with the number of 'hash functions" k defaulting to 8.
      Parameters:
      m - determines size of filter
    • BloomSHA1

      public BloomSHA1()
      Creates a filter of 2^20 bits with k defaulting to 8.
  • Method Details

    • clear

      public void clear()
      Synchronized version
    • size

      public final int size()
      Returns the number of keys which have been inserted. This class (BloomSHA1) does not guarantee uniqueness in any sense; if the same key is added N times, the number of set members reported will increase by N.
      Returns:
      number of set members
    • capacity

      public final int capacity()
      Returns:
      number of bits in filter
    • insert

      public void insert(byte[] b)
      Add a key to the set represented by the filter. XXX This version does not maintain 4-bit counters, it is not a counting Bloom filter.
      Parameters:
      b - byte array representing a key (SHA1 digest)
    • insert

      public void insert(byte[] b, int offset, int len)
    • locked_insert

      public final void locked_insert(byte[] b)
    • locked_insert

      public final void locked_insert(byte[] b, int offset, int len)
    • locked_member

      public final boolean locked_member(byte[] b)
    • locked_member

      public final boolean locked_member(byte[] b, int offset, int len)
    • member

      public final boolean member(byte[] b)
      Is a key in the filter. External interface, internally synchronized.
      Parameters:
      b - byte array representing a key (SHA1 digest)
      Returns:
      true if b is in the filter
    • member

      public final boolean member(byte[] b, int offset, int len)
    • getFilterKey

      public BloomSHA1.FilterKey getFilterKey(byte[] b, int offset, int len)
      Get the bloom filter offsets for reuse. Caller should call release(rv) when done with it.
      Since:
      0.8.11
    • locked_insert

      public void locked_insert(BloomSHA1.FilterKey fk)
      Add the key to the filter.
      Since:
      0.8.11
    • locked_member

      public boolean locked_member(BloomSHA1.FilterKey fk)
      Is the key in the filter.
      Since:
      0.8.11
    • release

      public void release(BloomSHA1.FilterKey fk)
      Since:
      0.8.11
    • falsePositives

      public final double falsePositives(int n)
      Parameters:
      n - number of set members
      Returns:
      approximate false positive rate
    • falsePositives

      public final double falsePositives()