Sacha Labourey asked a good question in response to my recent post on Improvements to Clustered Caching of Frequently Inserted Entity Types. So rather than responding in detail in the comments there, I’ve decided to take the opportunity to put up a separate post and get a little bit deeper into how the Hibernate Second Level Cache works.
The most common use case for the Second Level Cache is to cache entities. However, the second level cache also allows users to cache entity relationship information. Hibernate provides a “collection cache”, where it caches the primary keys of entities that are members of a collection field in another entity type. Say, for example, we have two entity types, Group and Member, where a Member participates in a many-to-one relationship with a Group:
public class Group {
private Integer id;
private String name;
private Set members;
public void addMember(Member member) {
members.add(member);
}
public void removeMember(Member member) {
members.remove(member);
}
..... getters and setters omitted
}
public class Member {
private Integer id;
private String name;
private Group group;
public Group getGroup() {
return this.group;
}
public void setGroup(Group group) {
this.group = group;
}
..... other getters and setters omitted
}
If you tell Hibernate to cache Group entities in the second level cache, for each cached Group it will store the values for the “id” and “name” fields. However, it doesn’t by default store the contents of the “members” field. If a Group is read from the second level cache and the application needs to access the members field, Hibernate will go to the database to determine the current members of the collection.
If you want Hibernate to cache the contents of the members field, you need to tell it to do so by adding a “cache” element to the “members” declaration:
<hibernate-mapping package="org.example">
<class name="Customer" table="Customers">
<cache usage="transactional"/>
<id name="id"><generator class="increment"/></id>
<property name="name" not-null="true"/>
<set name="members" cascade="all" lazy="false">
<!-- Cache the ids of entities are members of this collection -->
<cache usage="transactional"/>
<one-to-many class="org.example.Member"/>
</set>
</class>
</hibernate-mapping>
In a JPA application, the same thing can be accomplished with the @org.hibernate.annotations.Cache annotation on the “members” field:
import javax.persistence.*;
import org.hibernate.annotations.Cache;
import org.hibernate.annotations.CacheConcurrencyStrategy;</code>
@Entity
@Cache(usage=CacheConcurrencyStrategy.TRANSACTIONAL)
public class Group {
@Id
private Integer id;
private String name;
@Cache (usage=CacheConcurrencyStrategy.TRANSACTIONAL)
@OneToMany(mappedBy="customer", fetch=FetchType.EAGER, cascade=CascadeType.ALL)
private Set members;
.....
}
What happens behind the scenes when this collection is cached?
Well, first off, what is cached? Hibernate caches the primary keys of the entities that make up the collection. Not the entities themselves; i.e. there is no Set stored somewhere in the second level cache.
Next, where is it cached? Well exactly where is an implementation detail of the second level cache provider. But the key thing is collections are stored separately from the rest of the data associated with an entity. So, for a Group with an id of “1”, the values of the “id” and “name” fields will be stored in one area of the cache under key “#1”, while the members collection will be stored in a different area under key “members#1”. (Those keys are just examples; the actual keys are not strings.)
What are the caching semantics? Well, the key one is that collections are never updated in the cache; they are only invalidated out of the cache and then potentially cached again later as the result of another database read. So, if an application called Group.addMember(), Hibernate will remove that group’s membership collection from the cache. If JBoss Cache is the second level cache implementation, that removal will be propagated around the cluster; the collection will be removed from the cache on all nodes in the cluster.
If later the application needs to access the members from that group, another database read will occur and the current set of primary keys for the members will be put into the cache.
Ok, now how about Sacha’s question from my other post? What happens when a new Member is created and associated with a Group whose members collection is cached? As I stated above, Hibernate doesn’t update the collection in the cache, it just removes it. So, we’d expect the collection to be removed. And it should be, but there is an important subtlety that application developers need to be aware of:
Collections are only invalidated from the cache as a result of an operation on the Java object that represents the collection! Performing some Java operation that results in a change in the database whereby a fresh read of the database would add the member to the collection isn’t sufficient. To be very specific, this code will not result in the now out-of-date cached collection being removed from the cache:
public Member addNewMember(Integer groupId, String memberName, EntityManager em) {
Group group = em.find(groupId);
Member member = new Member();
member.setName(memberName);
member.setGroup(group);
em.persist(member);
return member;
}
The above method can leave out-of-date data in the cache. The correct implementation is:
public Member addNewMember(Integer groupId, String memberName, EntityManager em) {
Group group = em.find(groupId);
Member member = new Member();
member.setName(memberName);
member.setGroup(group);
// We need to apprise the group of its new member
group.addMember(member);
em.persist(member);
return member;
}
The “group.addMember(member)” is what modifies the collection, and that’s what triggers Hibernate to remove the old, outdated set of member PKs from the second level cache.
Of course, not updating both ends of the relationship in the Java code is bad programming practice anyway. But if you are caching collections, be sure to update those collections when you make changes to their membership at the database level.