Tuesday, December 4, 2012

hashcode() and equals() method

The methods hashCode() and equals() play a distinct role in the objects you insert into Java collections. The specific contract rules of these two methods are best described in the JavaDoc. Here I will just tell you what role they play. What they are used for, so you know why their implementations are important.

equals()

equals() is used in most collections to determine if a collection contains a given element. For instance:
List list = new ArrayList();
list.add("123");

boolean contains123 = list.contains("123");
The ArrayList iterates all its elements and execute "123".equals(element) to determine if the element is equal to the parameter object "123". It is the String.equals() implementation that determines if two strings are equal.
The equals() method is also used when removing elements. For instance:
List list = new ArrayList();
list.add("123");

boolean removed = list.remove("123");
The ArrayList again iterates all its elements and execute "123".equals(element) to determine if the element is equal to the parameter object "123". The first element it finds that is equal to the given parameter "123" is removed.
As you can see, a proper implementation of .equals() is essential for your own classes to work well with the Java Collection classes. So how do you implement equals() "properly"?
So, when are two objects equal? That depends on your application, the classes, and what you are trying to do. For instance, let's say you are loading and processing Employee objects stored in a database. Here is a simple example of such an Employee class:
public class Employee {
    protected long   employeeId;
    protected String firstName;
    protected String lastName;
}
You could decide that two Employee objects are equal to each other if just their employeeId's are equal. Or, you could decide that all fields must be equal - both employeeId, firstName and lastName. Here are two example implementation of equals() matching these criterias:
public class Employee {
  ...
  public boolean equals(Object o){
    if(o == null)                return false;
    if(!(o instanceof) Employee) return false;

    Employee other = (Employee) o;
    return this.employeeId == other.employeeId;
  }
}
public class Employee {
  ...
  public boolean equals(Object o){
    if(o == null)                return false;
    if(!(o instanceof) Employee) return false;

    Employee other = (Employee) o;
    if(this.employeeId != other.employeeId)      return false;
    if(! this.firstName.equals(other.firstName)) return false;
    if(! this.lastName.equals(other.lastName))   return false;

    return true;
  }
}
Which of these two implementations is "proper" depends on what you need to do. Sometimes you need to lookup an Employee object from a cache. In that case perhaps all you need is for the employeeId to be equal. In other cases you may need more than that - for instance to determine if a copy of an Employee object has changed from the original.

hashCode()

The hashCode() method of objects is used when you insert them into a HashTable, HashMap or HashSet. If you do not know the theory of how a hashtable works internally, you can read about hastables on Wikipedia.org.
When inserting an object into a hastable you use a key. The hash code of this key is calculated, and used to determine where to store the object internally. When you need to lookup an object in a hashtable you also use a key. The hash code of this key is calculated and used to determine where to search for the object.
The hash code only points to a certain "area" (or list, bucket etc) internally. Since different key objects could potentially have the same hash code, the hash code itself is no guarantee that the right key is found. The hashtable then iterates this area (all keys with the same hash code) and uses the key's equals() method to find the right key. Once the right key is found, the object stored for that key is returned.
So, as you can see, a combination of the hashCode() and equals() methods are used when storing and when looking up objects in a hashtable.
Here are two rules that are good to know about implementing the hashCode() method in your own classes, if the hashtables in the Java Collections API are to work correctly:
  1. If object1 and object2 are equal according to their equals() method, they must also have the same hash code.
  2. If object1 and object2 have the same hash code, they do NOT have to be equal too.
In shorter words:
  1. If equal, then same hash codes too.
  2. Same hash codes no guarantee of being equal.
Here are two example implementation of the hashCode() method matching the equals() methods shown earlier:
public class Employee {
  protected long   employeeId;
  protected String firstName;
  protected String lastName;

  public int hashCode(){
    return (int) this.employeeId;
  }
}
public class Employee {
    protected long   employeeId;
    protected String firstName;
    protected String lastName;

  public int hashCode(Object o){
    return (int)this.employeeId *
                firstName.hashCode() *
                lastName.hashCode();
  }
}
Notice, that if two Employee objects are equal, they will also have the same hash code. But, as is especially easy to see in the first example, two Employee objects can be not equal, and still have the same hash code.
In both examples the hash code is the employeeId is rounded down to an int. That means that many employee id's could result in the same hash code, but these Employee objects would still not be equal, since they don't have the same employee id.

More Detail in the JavaDoc

For a 100% precise description of how to implement equals() and hashCode() you should check out the official JavaDoc's. The purpose of this text was mostly to explain how they are used by the Java Collection classes. Understanding this makes it easier to implement them to suit your purposes.

Sorting in Collections

You can sort List collections using the java.util.Collections.sort() method. You can sort these two types of List's.
  1. List
  2. LinkedList

Sorting Objects by their Natural Order

To sort a List you do this:
List list = new ArrayList();

//add elements to the list

Collections.sort(list);
When sorting a list like this the elements are ordered according to their "natural order". For objects to have a natural order they must implement the interface java.lang.Comparable. In other words, the objects must be comparable to determine their order. Here is how the Comparable interface looks:
public interface Comparable<T> {
  int compareTo(T o);
}
The compareTo() method should compare this object to another object, return an int value. Here are the rules for that int value:
  • Return a negative value if this object is smaller than the other object
  • Return 0 (zero) if this object is equal to the other object.
  • Return a positive value if this object is larger than the other object.
There are a few more specific rules to obey in the implementation, but the above is the primary requirements. Check out the JavaDoc for the details.
Let's say you are sorting a List of String elements. To sort them, each string is compared to the others according to some sorting algorithm (not interesting here). Each string compares itself to another string by alphabetic comparison. So, if a string is less than another string by alphabetic comparison it will return a negative number from the compareTo() method.
When you implement the compareTo() method in your own classes you will have to decide how these objects should be compared to each other. For instance, Employee objects can be compared by their first name, last name, salary, start year or whatever else you think makes sense.

Sorting Objects Using a Comparator

Sometimes you may want to sort a list according to another order than their natural order. Perhaps the objects you are sorting do not even have a natural order. In that case you can use a Comparator instead. Here is how you sort a list using a Comparator:
List list = new ArrayList();

//add elements to the list

Comparator comparator = new SomeComparator();

Collections.sort(list, comparator);
Notice how the Collections.sort() method now takes a java.util.Comparator as parameter in addition to the List. This Comparator compares the elements in the list two by two. Here is how the Comparator interface looks:
public interface Comparator<T> {
    int compare(T object1, T object2);
}
The compare() method compares two objects to each other and should:
  • Return a negative value if object1 is smaller than object2
  • Return 0 (zero) if objec1 is equal to object2.
  • Return a positive value if object1 is larger than object2.
There are a few more requirements to the implementation of the compare() method, but these are the primary requirements. Check out the JavaDoc for more specific details.
Here is an example Comparator that compares two fictive Employee objects:
public class MyComparator<Employee> implements Comparator<Employee> {

    public int compare(Employee emp1, Employee emp2){
       if(emp1.getSalary() <  emp2.getSalary()) return -1;
       if(emp1.getSalary() == emp2.getSalary()) return 0;
       return 1;
    }
}
A shorter way to write the comparison would be like this:
public class MyComparator<Employee> implements Comparator<Employee> {

    public int compare(Employee emp1, Employee emp2){
       return emp1.getSalary() - emp2.getSalary();
    }
}
By subtracting one salary from the other, the resulting value is automatically either negative, 0 or positive. Smart, right?
If you want to compare objects by more than one factor, start by comparing by the first factor (e.g first name). Then, if the first factors are equal, compare by the second factor (e.g. last name, or salary) etc