The Bag

Here is my usual explanation of the JPA notion of a persistence context.  I have used this analogy with great success before so finally decided to write it down.  For total accuracy in all edge cases, you’ll want to read the JPA specification.  Of course if you want to do that, you’re not the target audience here anyway, so off you go.

I will restrict myself to talking about the most common usage of JPA—inside a stateless session bean with transactional semantics.

So then, here goes.

First, let’s make a distinction between a normal object and an entity.

A normal object is just a Java object in your hand.  It might have JPA annotations on it or be governed by a JPA deployment descriptor, but the point is from holding it you can’t tell whether it is somehow hooked up to the JPA machinery or not.  (In JPA parlance a normal object is either a “new entity instance” or a “detached entity”.)

An entity on the other hand (a “managed entity” in JPA parlance) is a normal object that is managed by an EntityManager.  “Managed” means “snooped on by”, or “known to”, or “watched”, or “surrounded by”, or “tracked” or any of a whole host of other metaphors that might work for you.

(The JPA specification calls normal objects “new entity instances” and “detached entities”, where “detached” means, simply, not snooped on by an EntityManager.  For this little post I’ll just call them normal objects and will try not to use the word entity in order to emphasize the fact that when they aren’t being snooped on by an EntityManager they are just like any other Java object.)

So how does an EntityManager snoop on or manage a normal object?  It doesn’t, actually.  Instead, the EntityManager keeps track of a bag of entities called the persistence context.  If a normal object gets into the bag, it thereby becomes a managed entity.

So by definition a persistence context is a notional bag of JPA entities.  At the end of a (successful, about-to-be-committed) transaction, whatever is in this bag will be flushed to the backing database, assuming its total state is not the same as that of its database representation, regardless of how it got in the bag.

So how does something get into the bag?  You can get an entity into the bag in these ways:

  • You call some EntityManager retrieval method, like find.  The object that is returned is silently put into the bag for you and is therefore an entity.  As an exercise, see if you can predict what will happen at transaction commit time if you call a setter on that entity after you get it back from a query.
  • You create a query of some kind and run it, and it returns an object or a List of objects.  These objects will be silently placed into the bag and are therefore managed entities.
  • You call persist(), in which case the object you supply must not already have an analog in the bag.  If indeed there is no matching entity in the bag (we’ll talk about matching in a moment), then your object becomes known to the EntityManager as a managed entity.
  • You call merge(), passing in a normal object or an entity, in which case various complicated things will happen—more on that in a moment—and an entity will be returned by the method, which you must use in place of your original object.  (Note that it follows that calling merge() on a brand new normal object will accomplish almost the same thing as persist(): the object it hands you back will be an entity that is now in the bag that wasn’t there before, so at commit time it will be flushed to the database.)

Merging is the most complicated thing here so we’ll spend some time on it.

The most important takeaway, first, is that merge() does not cause anything at all database-related to happen.  It is not a save operation or an update operation.  It is not like a Save menu item.  At all.

When you merge an object into the bag that the bag has never seen before, then a new instance of that object is created in the bag, and the whole state of the object you supplied will be copied over onto it replacing any default state it might have had as a result of its default constructor being called, and this newly-created-and-populated entity is returned to you.  You throw away the object you passed.  (“Merge” is a horrible, horrible word to describe what is happening here, as nothing is actually being merged in any normal sense of the word.  The method should have been called something akin to becomeAwareOf or track or monitor or manage—how interesting that the EntityManager object does not have a manage method!.  For that matter, there should be a PersistenceContext object, which would make things a lot simpler, but I digress.)

If you merge an object into the bag, and the bag already has an entity in it with the same persistent identifier and type as your object (but different state), then the state of your incoming object replaces the state of the entity in the bag.  Once again, “merge” is the wrong word: nothing is being merged, but something is being overwritten or replaced!  Note in particular that “merge” here does not mean that any kind of partial operation happens—in all cases, the full state of your object always overwrites whatever state was present on a managed entity in the bag.  Once again, at the termination of all this, you need to effectively discard the object you passed in and use the entity that was returned to you.

If you somehow get your hands on an entity and then merge it again (i.e. it’s in the bag already and has never left the bag and you were just working with it and then you called merge() for no good reason) then nothing happens—it was already in the bag and therefore being tracked so there’s nothing more to be done.

If your normal object refers to other normal objects, and you call merge(), then your normal object is merged as I’ve described above—and apparently whether you have cascade settings set or not the normal objects it refers to via @ManyToOne and @OneToOne and @OneToMany and @ManyToMany annotations are merged into the bag as well: everything behaves as though each of these normal objects has been cloned into the bag, and as though all references to these normal objects have been replaced with references to their newly minted entity clones.  This is sort of like a poor-man’s cascade, and can’t be turned off; nor would you want it to be turned off: if your object graph, in other words, consists solely of normal objects governed by JPA annotations, then a merge on the root of the graph will conveniently shove them all in the bag, and the returned graph, which you must use in place of the graph you passed in, will consist entirely of entities.

Think hard about that: if you have a parent entity and a reference in your code to one of its children, then after merging the parent, the reference you have to the child is probably not what you want to work with.  You’ll be looking at the (now stale, untracked, un-snooped-on, never-flushed-to-disk) old reference, and any changes you make to it won’t get shoved to the database at transaction commit time.  You’ll want to reacquire a reference to that child, or, better yet, do your merge() early and then get a reference to the child after the merge has returned you a managed entity graph.

If your normal object refers to other entities as well that by definition are already in the bag, and you call merge(), and assuming you haven’t done anything with cascade settings, then the bare minimum is done to ensure that everything referenced by JPA annotations or deployment descriptor bits turns into an entity.  So any combination of normal objects and entities linked together into a graph will become a graph of managed entities when the root of the graph is merged.  But there’s no state copying that goes on.  This is the only part of the specification that probably says things as clearly as is possible, and as you’ll see, it isn’t very clear.  Remember that the spec uses “entity” to mean both “managed entity” and “detached entity” (normal object):

If X is an entity merged to X’, with a reference to another entity Y, where cascade=MERGE or cascade=ALL is not specified, then navigation of the same association from X’ yields a reference to a managed object Y’ with the same persistent identity as Y.

The upshot is that if you put a graph into the bag, all the objects it references of any kind get put into the bag as well.

So now you’ve got entities in the bag.

If you make any change to an entity that is in the bag—even one that got in there as the result of a query or a find operation—then at transaction commit time it will be flushed back to the database.

Commit time in most cases happens automatically in the scenarios I’m talking about—your EJB’s transactional method will cause a flush() to happen, followed by a transaction commit() at the end of the method.

flush() takes all the entities in the bag and compares their state with their state as it exists in their database representation.  Any differences are encoded in either INSERT or UPDATE statements.  So you can see that if you do a query, and then call a setter on one of the entities you get back, your harmless-looking query operation will end up resulting in an UPDATE statement behind your back.

Sometimes you need to take entities out of the bag.  This is known as detaching them, turning them into normal objects (“detached entities”).  In my opinion the notion of a persistence context should have been made more explicit in the API.  Unfortunately, as the API exists, detach() is a method on EntityManager, and takes an entity you wish to detach.  This is unfortunate because the specification does not indicate that an entity is somehow attached to an EntityManager—it constantly talks about entities being “in” a persistence context which in turn is managed by an EntityManager. One of the frustrations of the JPA API is the mixing together of all of these concepts (merging, attaching, detaching, adding to the persistence context, removing from the persistence context, managing and unmanaging entities). Anyhow, whenever you see detach(), know that this means “remove the supplied entity from the bag”.

You can also empty the whole bag by calling EntityManager#clear().

Taking entities out of the bag is not some sort of advanced operation best left to experts. It’s often exactly what you want to do. Managing entities takes a lot of work. Keeping the number of things in the bag to a bare minimum at all times is a good thing.

I hope this article helps you out in your work with JPA. Please feel free to leave comments and tell me where this analogy could be improved.