Using synchronized block, Lock api, Volatile, Atomic CAS instructions we make sure that we can execute certain logic exclusively and atomically. Whenever such synchronization action occure we always read that, it acts as a memory barrier and obeys happens-before protocol. That way we are sure that the value read or written to shared variable by current thread is communicated to rest of the world.
So what is memory barrier ?
As we know CPU has various cores. Each core has multi level local caches and shared caches. And there may be more than one processors. Any update done to shared variable by a thread running on one core should be somehow made visible to other cores and CPUs. The way it is done is called as cache coherence. To execute a program faster JIT compiles and CPU Cores can do various reordering of statements given that end result is not changed. This reordering matters a lot to shared variables in multi processor environment. As there are multiple threads running parallely and sharing a variable value it is very important that at some point in the program execution you get a chance to force these execution units to maintain some kind of ordering.
A Memory barrier Make sure that at barrier point depending upon barrier type all instructions queued in store/load barrier queuers/caches are flushed to main memory. Any references to it from other processors caches are invalidated. So that it fetches from memory next time.
In simple words, memory barriers make sure certain writes/reads of shared variables happens now. Just keeps all smart reordering aside. There are various such barriers like StoreStore, StoreLoad, LoadStore, LoadLoad. These are mainly to make sure, for a barrier of type XY, X type of operations provided before barrier will not be reordered with Y instructions provided after barrier point.
There are better documents to know more about these memory barriers and different types of barriers here, here and here. We will see how reading about different types of these barriers helped me to appreciate and understand (to some extent) how LazySet operation works.
So whats differene between Atomic*.lazySet way of setting volatile. By the way just if there is any confusion between Atomic* and volatile, Atomic* objects holds their state in volatile variables. So any set/lazyset/get on Atomic* objects is actually using underlying volatile variable to provide those values.
We know that we use volatile to make sure thread never caches its value into local register or relies on local core caches. Every time we read and write volatile variable it fetches its globally visible value across all CPUs. This is achieved by StoreLoad and Load barriers during/after read/write operations on volatile varialble.
Lets say we have these non-local variables.
volatile int v;
int i = 0;
Write:
v = 5
<< StoreLoad >>
Read
<< LoadLoad ..or some other Load barrier >>
int i = v
So In simple words this means that when we write something to non-local volatile fields there is a barrier after the update instruction to make sure this update is visible to all other CPUs. And when we read any volatile variable Load barrier before that makes sure that we read latest value of volatile variable instead of using one cached locally to this core level cache or cpu level cache.
So what exactly StoreLoad do ?
StoreLoad is memory barrier which makes sure that volatile updates are followed by flushing of stores buffers to memory so that the update is visible to other CPUs and it is guaranteed to have performed before any further load instruction. As per Dougs cookbook - "StoreLoad barriers are needed on nearly all recent multiprocessors, and are usually the most expensive kind. Part of the reason they are expensive is that they must disable mechanisms that ordinarily bypass cache to satisfy loads from write-buffers."
StoreLoads main intention is to read latest value of variable after its own write to it. This latest value is required to make sure this processor is seeing latest updates done by may be other processors instead of just using its own last set value. Now it will sound essential to the multiple writer case. But lets assume there is only one writer thread to this variable. So as explained very well here, if we guarantee to update a volatile varialbe only from one thread then such expensive storeload memorey barrier can be avoided.
So why LazySet ?
With Single writer style of sharing volatile variable, we can make use of Lazyset. Lazyset write is followed by StoreStore memory barrier. StoreStore barrier just makes sure that LazySet store operation is guaranteed to have happened before any further store after barrier point. And that no 2 lazyset/(or other) writes can be reordered. With Single write implementation thats what we want to happen anyway. Given that is the definition of StoreStore, main purpose we use them in Single writer over normal volatile set is that it is very cheap than StoreLoad barrier. All those expensive steps, that StoreLoad performs to avoid local caches for read operations, are not performed in StoreStore. Further load operations can rely on local copy of data as there is no one else changing values. This gives more freedom to compiler and CPU's to perform better optimization.
So same code if we use LazySet api to write volatile will look like this:
Write:
v = 5
<< StoreStore >>
So why lazyset is faster, because it does not add expensive barriers when we write to the variable. In normal volatile write it uses a relatively expensive barrier which forces to a read, which follows this write, to fetch current latest value from memory. Which makes sense if there are multiple writers but otherwise using Lazyset is smarter choice.
References:
http://mechanical-sympathy.blogspot.sg/2011/07/memory-barriersfences.html
http://mechanical-sympathy.blogspot.sg/2011/09/single-writer-principle.html
Doug Lea Cookbook - http://g.oswego.edu/dl/jmm/cookbook.html
http://psy-lob-saw.blogspot.sg/2012/12/atomiclazyset-is-performance-win-for.html
References:
http://mechanical-sympathy.blogspot.sg/2011/07/memory-barriersfences.html
http://mechanical-sympathy.blogspot.sg/2011/09/single-writer-principle.html
Doug Lea Cookbook - http://g.oswego.edu/dl/jmm/cookbook.html
http://psy-lob-saw.blogspot.sg/2012/12/atomiclazyset-is-performance-win-for.html