HashSet Internal Implementation

In the previous article, we learned what a HashSet is and its different methods. In this article, we are going to learn about the internal implementations of a HashSet in Java. In order to learn any implementation in Java, it is essential to go through the Java source code in your IDE.

Before diving into its implementation, we ought to remember that:

A HashSet does not accept duplicate keys.
It permits at most only one null element.

HashMap as a Backing Data Strcture to a HashSet

Ideally, a HashSet's internal implementation uses a HashMap with a generic key of type E and Object as the value (Remember: To implement a HashMap you need a key and its associated value). Let's have a look at this snippet:

public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    @java.io.Serial
    static final long serialVersionUID = -5024744406713321676L;

    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    /**
     * Constructs a new, empty set; the backing {@code HashMap} instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }
}

If we recall, a HashMap permits duplicate objects as values to keys. This attribute is efficiently applied in the HashSet implementation.

A close review of the above snippet, we can see that whenever a key is added, that element is put in a HashMap with the static field PRESENT as its value. What this means is that for every element added, we expect its value to be of type Object and a constant PRESENT. Consider the following implementation of add () method:

public class SetandHashSet {
    public static void main(String[]args){

        HashSet<String > names = new HashSet<>();
        names.add("Kamau");
        names.add("John");
        names.add("Kiraitu");
        names.add("Mugeni");
        names.add("Thoithi");
    }

}

In the above example, I have created a HashSet of type String and added a few names. Internally, this implementation will create a HashMap of type String with mappings as shown below:

We can see that the value of each key is stored as PRESENT which is defined by the code private static final Object PRESENT = new Object();

Now, let us look at a few methods and how they apply.

add(E e) Method

The add() method adds the specified element in the HashSet if it does not exist. If it exists, the HashSet remains unchanged and it returns false. consider the following example:

public class SetandHashSet {
    public static void main(String[]args){

        HashSet<String > names = new HashSet<>();

        boolean name1 = names.add("Kamau");
        boolean name2 = names.add("Kamau");

        System.out.println(name1);
        System.out.println(name2);
    }

}

The above code will return

true
false

This is because name2 is a duplicate of name1, thus the addition returns false. The internal implementation of the add () method uses put() from HashMap.

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

In the above implementation, the put() method returns the previous value associated with key, or null if there was no mapping for key. This means that if the element is not present in the HashSet, the put () method returns null then the add() method returns null (i.e null == null)as well and the element is added into the HashSet.

However, if the element is present, put() executes to true since a value is PRESENT which is not null. Thus, the add() execution resolves to false (i.e PRESENT == null).

remove() Method

If the HashSet contained the specified element, the remove() method resolves to true.

public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
}

This method uses the remove() method of the HashMap. The method returns the value of the element in the HashSet (i.e PRESENT). If no value is found, the method then returns null;

Other Methods Implementation

iterator(): This method returns an iterator over the HashSet. The returned elements are not ordered.

public Iterator<E> iterator() {
    return map.keySet().iterator();
}

size(): Returns the size of the HashSet based on the numbers present.

public int size() {
    return map.size();
}

isEmpty(): Returns true if the HashSet contains no elements and false if it contains elements

public boolean isEmpty() {
    return map.isEmpty();
}

contains(): Returns true if the HashSet contains the specified element

public boolean contains(Object o) {
    return map.containsKey(o);
}

clear(): Removes all elements in the HashSet

public void clear() {
    map.clear();
}

clone(): Returns a shallow copy of the HashSet instance.

public Object clone() {
    try {
        HashSet<E> newSet = (HashSet<E>) super.clone();
        newSet.map = (HashMap<E, Object>) map.clone();
        return newSet;
    } catch (CloneNotSupportedException e) {
        throw new InternalError(e);
    }
}

Internal Implementation of HashSet in Java

HashSet Internals

HashMap as a Backing Data Strcture to a HashSet

add(E e) Method

remove() Method

Other Methods Implementation