BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles The OpenJDK Revised Java Memory Model

The OpenJDK Revised Java Memory Model

Lire ce contenu en français

 

The traditional Java Memory Model covers a lot in Java language semantic guarantees. In this article we will highlight a few of those semantics and provide a deeper understanding of them. We will also attempt to communicate the motivation for an update to the existing Java Memory Model (JMM) with respect to the semantics described in this article. Discussion related to this future update to JMM will be referred to as JMM9 in this article.

Java Memory Model

The existing Java Memory Model, as was defined in JSR 133 (henceforth referred to as JMM-JSR133) specifies consistency models for shared memory and also helps provide definitions for developers so that they can be consistent in representing the JMM-JSR133. The goal of the JMM-JSR133 specification was to ensure that the definition of the semantics for threads interacting through memory was refined so as to permit optimizations and provide a clear programming model. The JMM-JSR133 aimed at providing definitions and semantics such that multithreaded programs would not only be correct, but also be performant and have minimal impact on existing code base.

With this in mind, I would like to walk through certain semantic guarantees that were either overspecified or underspecified in JMM-JSR133, while highlighting the community-wide discussion related to how we can improve them in JMM9.

JMM9 - Sequential Consistent - Data Race Free Problem

JMM-JSR133 talked about execution of a program with respect to actions. Such an execution combines actions with orders to describe the relationship of those actions. In this article, I would like to expand on a few of the orders and relationships and then discuss what constitutes a sequentially consistent execution. Let’s start with “program order” - A program order for each thread is a total order that indicates the order in which all actions would be performed by that thread. Sometimes not all the actions need to be ordered. Hence we have some relationships that are only partially-ordered relationships. For example - “happens-before” and “synchronized-with” are two partially-ordered relationships. When one action happens-before another action; the first action is not only visible to the second action, it is also ordered before the second action. This relationship between those two actions is called a happens-before relationship. Sometimes, there are special actions that need to be ordered and they are called “synchronization actions”. Volatile reads and writes, monitor locks and unlocks, etc. are all examples of synchronization actions. A synchronization action induces the “synchronized-with” relation with that action. A synchronizes-with relationship is a partial-order which means not all the pairs of synchronization actions are included. The total order that spans over all the synchronization actions is called a “synchronization order” and every execution has a synchronization order.

Let’s now talk about sequential consistent execution - An execution which seems to occur in a total order over all its read and write actions is said to be sequentially consistent (SC). In an SC execution, the reads will always see the value written by the last write to that particular variable. When an SC execution exhibits no “data race”, then the program is said to be data-race free (DRF). Data races happen when a program has two accesses that are not ordered by the happens-before relation and when they are accessing the same variable and at-least one of those is a write access. SC for DRF means that DRF programs behave as if they are SC. But strictly supporting SC comes at a cost of performance - most systems will reorder memory operations to improve execution speed while “hiding” the latency of expensive operations. At the same time even the compilers could reorder code to optimize execution. In an effort to guarantee strict sequential consistency, all such reordering of memory operations or code optimization can’t be carried out and hence performance suffers. JMM-JSR133 already incorporates relaxed ordering restrictions and any re-ordering by the underlying compiler, cache-memory interactions and the JIT itself is not observable to the program.

Note: Expensive operations are the ones that take a lot of CPU cycles to complete and/or block an execution pipeline.

Performance is an important consideration for JMM9, and moreover any programming language memory model should ideally allow all developers to be able to take advantage of architectural memory models that are weakly-ordered. There are successful implementations and examples that relax strict ordering especially on weakly ordered architectures.

Note: Weak ordering refers to architectures that can reorder reads and writes and would need explicit memory barriers to curb such reordering.

JMM9 - Out-of-Thin-Air (OoTA) Problem

Another major JMM-JSR133 semantic is the prohibition of “Out-of Thin Air” (OoTA) values. The “happens-before” model can sometimes let variable values be created and read “out of thin air” since it doesn’t include causality requirements. An important point to note is that cause by itself doesn’t employ the notion of data and control dependencies as we will see in the following correctly synchronized code example where the illegal writes are caused by the writes themselves.

(Note: x and y are initialized to ‘0’) -

Thread a

Thread b

r1 = x;

r2 = y;

if (r1 != 0)

if (r2 != 0)

y = 42;

x = 42;

This code is happens-before consistent but not really sequentially consistent. For example - if r1 sees the write of x = 42 and r2 sees the write of y = 42, both x and y can have a value of 42, which is the result of a data-race condition.

r1 = x;

y = 42;

r2 = y;

x = 42;

Here, both of the writes were committed before the read of their variables and the reads would see the respective writes and that would lead to OoTA result.

Note: The data-race may happen as a result of speculation, that will eventually turn itself into self-fulfilling prophecy. OoTA guarantees are about adhering to causality rules. The current thinking is that causality can break with speculative writes. JMM9 aims at finding the cause of OoTA and refining ways to avoid OoTA.

In-order to prohibit the OoTA values, some writes need to wait for their reads in-order to avoid data races. Hence, the JMM-JSR133 definition of OoTA prohibition formalized the disallowance of OoTA reads. This formal definition consisted of “executions and causality requirements” of the memory model. Basically, a well-formed execution satisfied the causality requirements if all of the program actions could be committed.

Note: A well formed execution happens in an intra-thread obeying, “happens-before” and “synchronization-order” consistent execution when every read can see the write to the same variable.

As you can probably already tell, JMM-JSR133 definitions were tightened to not let OoTA values creep in. JMM9 aims to identify and correct the formal definition so that it allows for some common optimizations.

JMM9 - Volatile Actions On Non-Volatile Variables

First what is a ‘volatile’ keyword? - Java ‘volatile’ guarantees an interaction between threads such that when one thread writes to a volatile variable, that write is not the only thing visible to other threads, but also the other threads see all the writes visible to the thread that wrote to the volatile variable.

Now what happens to non-volatile variables? - non-volatile variables don’t have the benefit of the interaction guaranteed by the ‘volatile’ keyword. Hence, the compiler can use a cached value of the non-volatile variable instead of the ‘volatile’ guarantee that the ‘volatile’ variable will always be read from memory. The happens-before model can be used to tie synchronized access to the non-volatile variables.

Note: Declaring any field as ‘volatile’ doesn’t mean that locking is involved. Hence volatiles are less expensive than synchronizations that use locks. But it’s important to note that having multiple volatile fields inside your methods could make them more expensive than locking those methods.

JMM9 - Read And Write Atomicity Problem And The Word-Tearing Problem.

JMM-JSR133 also guaranteed (with exceptions) read and write atomicity for shared memory concurrent algorithms. The exceptions were for non-volatile long and double values where a write to either was treated as two separate writes. Thus a single 64-bit value could be written by two separate 32-bit writes, and a thread that is performing a read while one of those writes is still outstanding may see only half the correct value, hence losing atomicity. This is an example of how the atomicity guarantee relies on underlying hardware and also the memory subsystem. For instance, the underlying assembly instruction should be able handle the size of operands so as to guarantee atomicity, else if the read or write operation has to be split into more than one operation that ends up breaking atomicity (as is the case of non-volatile long and double values). Similarly, if the implementation causes more than one memory subsystem transaction, then that breaks atomicity.

Note: Volatile long and double fields and references are always guaranteed for read and write atomicity.

The favoring of one bit-ness over the other isn’t an ideal solution since if, the exception for 64 bit-ness is removed then, 32-bit architectures suffer. If the 64-bit architectures are penalized then you have to introduce ‘volatiles’ for longs and doubles whenever atomicity is desired even though the underlying hardware may guarantee atomic operations anyway. For example: volatiles are not needed with double fields since the underlying architecture or ISA or Floating Point Unit would take care of the atomicity needs of the 64-bit wide field. JMM9 aims at identifying the atomicity guarantee provided by the hardware.

JMM-JSR133 was written more than a decade ago; processor bit-ness has since evolved, and 64-bit has become the mainstream processing bit-ness. This immediately highlighted the compromise that JMM-JSR133 made with respect to 64-bit read and writes - although 64-bit values can be made atomic on any architecture, there is still a need to acquire locks on some architectures. Now, that makes 64-bit reads and writes expensive on those architectures. If a reasonable implementation of atomic 64-bit operations on 32-bit x86 architecture can’t be found, then the atomicity will not be changed.

Note: There is an underlying issue in language design, where the "volatile" keyword is overloaded with meanings. It is difficult for the runtime to figure out if the user put "volatile" to regain atomicity (and hence it can be stripped out on 64-bit platforms), or for memory ordering purposes.

When talking about access atomicity, the independence of read and write operations is an important consideration. A write to a particular field should not interact with a read from or write to any other field. This JMM-JSR133 guarantee means that synchronization should not be needed to provide sequential consistency. The JMM-JSR133 guarantee hence prohibits a problem known as “word-tearing”. Basically when an update to an operand wants to operate at a lower granularity than is made available by the underlying architecture for all its operands, we encounter “word-tearing”. An important point to remember is that the word-tearing problem is one of the reasons that 64-bit longs and doubles are not given an atomicity guarantee. Word-tearing is forbidden in JMM-JSR133 and will continue to stay that way for JMM9 as well.

JMM9 - The Final Field Problem

Final fields are different when compared with other fields. For example, a thread reading a “completely initialized" object with a final field ‘x’; after the object is “completely initialized”, is guaranteed to read the final field’s initialized value of ‘y’. The same can’t be guaranteed of a “normal” non-final field - ‘nonX’.

Note: “completely initialized” means that object’s constructor finishes.

In light of the above, there are some simple things that can be fixed in JMM9. For example: volatile fields - a volatile field initialized in a constructor is not guaranteed to be visible even if the instance itself is visible. Hence a question arises - should the final field guarantees be extended to all field initializations including volatiles? Also, if the value of a “normal” non-final field of a completely initialized object doesn’t change, can we extend the final fields guarantees to this “normal” field.

Bibliography

I have learned a lot from these websites and they also provide great example code sequences. My article should be considered an introductory article and the following are more suited for deeper grasp of the Java Memory Model.

  1. JSR 133: JavaTM Memory Model and Thread Specification Revision
  2. The Java Memory Model 
  3. JAVA CONCURRENCY (&C) 
  4. The jmm-dev Archives 
  5. Threads and Locks 
  6. Synchronization and the Java Memory Model
  7. All Accesses Are Atomic
  8. Java Memory Model Pragmatics (transcript)
  9. Memory Barriers: a Hardware View for Software Hackers

Special Thanks

I would like to thank Jeremy Manson for helping me correct my various misunderstandings and providing cleaner definitions for terms that were new to me. I would also like to thank Aleksey Shipilev for helping with reducing the conceptual complexities that were present in the draft version of this article. Aleksey also guided me/us to his JMM-pragmatics article for deeper understanding, clarifications and examples.

About the Author

Monica Beckwith is a Java Performance Consultant. Her past experiences include working with Oracle/Sun and AMD; optimizing the JVM for server class systems. Monica was voted a Rock Star speaker @JavaOne 2013 and was the performance lead for Garbage First Garbage Collector (G1 GC). You can follow Monica on twitter @mon_beck

Rate this Article

Adoption
Style

BT