Brian Stansberry's Blog

October 14, 2009

Docs, docs, docs

Filed under: Hibernate,JBoss — Brian Stansberry @ 9:49 am

For the past few weeks, a lot of my time has been spent on documentation work, and a fair bit of new stuff is now available. So, without further ado:

JBoss Application Server 5.1 Clustering Guide

The AS 5.1 Clustering Guide is complete and is available at jboss.org. What you need to know to develop, deploy and run clustered applications on JBoss Application Server 5.1. The content in the guide is also correct for AS 5.0.0 and 5.0.1.

Many thanks to Paul Ferraro and Galder Zamarreno for their many contributions to the clustering guide.

Using JBoss Cache as a Hibernate Second Level Cache reference manual

The definitive guide on how to use Hibernate’s Second Level Cache feature in a clustered environment. Describes in detail how to use JBoss Cache as your second level cache provider.

There are currently two versions of this document:

  1. Hibernate 3.5 — Covers the integration of JBoss Cache 3 with the upcoming version of Hibernate.
  2. Hibernate 3.3 — Covers the integration of JBoss Cache 2 or 3 with Hibernate 3.3. The Hibernate 3.3 / JBoss Cache 3 combination is what is used in JBoss AS 5.x and JBoss Enterprise Application Platform 5.

Enjoy! And as always, comments, suggestions and edits are most definitely appreciated.

October 9, 2009

Collection Caching in the Hibernate Second Level Cache

Filed under: Hibernate,JBoss — Brian Stansberry @ 5:44 pm

Sacha Labourey asked a good question in response to my recent post on Improvements to Clustered Caching of Frequently Inserted Entity Types. So rather than responding in detail in the comments there, I’ve decided to take the opportunity to put up a separate post and get a little bit deeper into how the Hibernate Second Level Cache works.

The most common use case for the Second Level Cache is to cache entities. However, the second level cache also allows users to cache entity relationship information. Hibernate provides a “collection cache”, where it caches the primary keys of entities that are members of a collection field in another entity type. Say, for example, we have two entity types, Group and Member, where a Member participates in a many-to-one relationship with a Group:

public class Group {
  private Integer id;
  private String name;
  private Set members;

  public void addMember(Member member) {
    members.add(member);
  }

  public void removeMember(Member member) {
    members.remove(member);
  }

  ..... getters and setters omitted
}

public class Member {
  private Integer id;
  private String name;
  private Group group;

  public Group getGroup() {
    return this.group;
  }

  public void setGroup(Group group) {
    this.group = group;
  }
  ..... other getters and setters omitted
}

If you tell Hibernate to cache Group entities in the second level cache, for each cached Group it will store the values for the “id” and “name” fields. However, it doesn’t by default store the contents of the “members” field. If a Group is read from the second level cache and the application needs to access the members field, Hibernate will go to the database to determine the current members of the collection.

If you want Hibernate to cache the contents of the members field, you need to tell it to do so by adding a “cache” element to the “members” declaration:

<hibernate-mapping	package="org.example">
  <class name="Customer" table="Customers">
    <cache usage="transactional"/>

    <id name="id"><generator class="increment"/></id>
		
    <property name="name" not-null="true"/>
		    
    <set name="members" cascade="all" lazy="false">
      <!-- Cache the ids of entities are members of this collection -->
      <cache usage="transactional"/>
      <one-to-many class="org.example.Member"/>
    </set>
      
  </class>
</hibernate-mapping>

In a JPA application, the same thing can be accomplished with the @org.hibernate.annotations.Cache annotation on the “members” field:

import javax.persistence.*;
import org.hibernate.annotations.Cache;
import org.hibernate.annotations.CacheConcurrencyStrategy;</code>

@Entity
@Cache(usage=CacheConcurrencyStrategy.TRANSACTIONAL)
public class Group {

  @Id
  private Integer id;
  private String name;

  @Cache (usage=CacheConcurrencyStrategy.TRANSACTIONAL)
  @OneToMany(mappedBy="customer", fetch=FetchType.EAGER, cascade=CascadeType.ALL)
  private Set members;

  .....
}

What happens behind the scenes when this collection is cached?

Well, first off, what is cached? Hibernate caches the primary keys of the entities that make up the collection. Not the entities themselves; i.e. there is no Set stored somewhere in the second level cache.

Next, where is it cached? Well exactly where is an implementation detail of the second level cache provider. But the key thing is collections are stored separately from the rest of the data associated with an entity. So, for a Group with an id of “1”, the values of the “id” and “name” fields will be stored in one area of the cache under key “#1”, while the members collection will be stored in a different area under key “members#1”. (Those keys are just examples; the actual keys are not strings.)

What are the caching semantics? Well, the key one is that collections are never updated in the cache; they are only invalidated out of the cache and then potentially cached again later as the result of another database read. So, if an application called Group.addMember(), Hibernate will remove that group’s membership collection from the cache. If JBoss Cache is the second level cache implementation, that removal will be propagated around the cluster; the collection will be removed from the cache on all nodes in the cluster.

If later the application needs to access the members from that group, another database read will occur and the current set of primary keys for the members will be put into the cache.

Ok, now how about Sacha’s question from my other post? What happens when a new Member is created and associated with a Group whose members collection is cached? As I stated above, Hibernate doesn’t update the collection in the cache, it just removes it. So, we’d expect the collection to be removed. And it should be, but there is an important subtlety that application developers need to be aware of:

Collections are only invalidated from the cache as a result of an operation on the Java object that represents the collection! Performing some Java operation that results in a change in the database whereby a fresh read of the database would add the member to the collection isn’t sufficient. To be very specific, this code will not result in the now out-of-date cached collection being removed from the cache:

  public Member addNewMember(Integer groupId, String memberName, EntityManager em) {
    Group group = em.find(groupId);
    Member member = new Member();
    member.setName(memberName);
    member.setGroup(group);
    em.persist(member);

    return member;
  }

The above method can leave out-of-date data in the cache. The correct implementation is:

  public Member addNewMember(Integer groupId, String memberName, EntityManager em) {
    Group group = em.find(groupId);
    Member member = new Member();
    member.setName(memberName);
    member.setGroup(group);

    // We need to apprise the group of its new member
    group.addMember(member);

    em.persist(member);

     return member;
  }

The “group.addMember(member)” is what modifies the collection, and that’s what triggers Hibernate to remove the old, outdated set of member PKs from the second level cache.

Of course, not updating both ends of the relationship in the Java code is bad programming practice anyway. But if you are caching collections, be sure to update those collections when you make changes to their membership at the database level.

October 8, 2009

Improvements to Clustered Caching of Frequently Inserted Entity Types

Filed under: Hibernate,JBoss — Brian Stansberry @ 10:07 pm

I made a very simple change today to Hibernate’s integration with JBoss Cache that should have big benefits. Hibernate integrates with JBoss Cache to allow second-level caching of entities, collections and query results in a clustered environment. I often advise people to be cautious about what types of entities they cache in a cluster. A clustered cache differs from a single node cache in that it needs to maintain consistency around the cluster. This means sending messages around the cluster when cache contents change. For entity types with a relatively high percentage of cache writes, the cost of these messages can outweigh the benefits of caching.

For entity caches, by default JBoss Cache is configured to send invalidation messages around the cluster when its contents change. Well, I realized that sending an invalidation message around the cluster when Hibernate has just inserted a newly created entity into the cache is just silly. The entity is brand new; there’s no way another node in the cluster could have a stale version of the entity that needs to be invalidated out of its local cache. Fortunately, the excellent RegionFactory SPI Steve Ebersole introduced in Hibernate 3.3 gives me all the contextual information I need to know that what’s being cached is a newly created entity. And JBoss Cache’s Option.setCacheModeLocal(true) API gives me the power to disable sending out the invalidation message when I put those newly created entities into JBC. Result: with the addition of a few lines of code I can remove these unnecessary messages.

What’s the benefit? Basically, a whole new category of entity types can now benefit from caching in a cluster. Types that may have a fairly high percentage of cache writes relative to reads, but where those writes represent a database INSERT, rather than an UPDATE. Imagine for example, a purchasing application, where user activity generates lots of Order and OrderLineItem entity inserts. Once those entities are created, they are unlikely to be changed, but in the course of a user’s interaction with the application there is a high enough likelihood that they will look at the order details again to make caching the entities worthwhile. Prior to today’s change, caching Order and OrderLineItem may not have been performant. Now, if reads of those entities are frequent enough to make caching them worthwhile in a non-clustered environment, it’s likely to be worthwhile in a cluster as well.

As always, load test your application with realistic usage scenarios before and after turning on caching of any entity type, collection or query result.

The JIRA for this change can be found in Hibernate’s JIRA at HHH-4484. The improved behavior will be available in the Hibernate Core 3.5 release.

Blog at WordPress.com.