JAX: Does parallel equal performant?
to performant implementations. Except in the really naive case there is always
going to be some overhead due to scheduling work, managing memory sharing and
network communication overhead. Essentially that knowledge is reflected in
Amdahl's law (the amount of serial work limits the benefit from running parts
of your implementation in parallel, http://en.wikipedia.org/wiki/Amdahl's_law),
and Little's law (http://en.wikipedia.org/wiki/Little's_law) in case of queuing
When looking at current Java optimisations there is quite a bit going on to
support better parallelisation: Work is being done to provide for improving
lock contention situations, the GC adaptive sizing policy has been improved to
a usable state, there is added support for parallel arrays and lampbda's
When it comes to better locking optimisations what is most notable is work
towards coarsening locks at compile and JIT time (essentially moving locks from
the inside of a loop to the outside); eliminating locks if objects are being
used in a local, non-threaded context anyway; and support for biased locking
(that is forcing locks only when a second thread is trying to access an
object). All three taken together can lead to performance improvements that
will almost render StringBuffer and StringBuilder to exhibit equal performance
in a single threaded context.
For pieces of code that suffer from false sharing (two variables used in
separate threads independently that end up in the same CPU cacheline and as a
result are both flushed on update) there is a new annotation: Adding the
"@contended" annotation can help the compiler for which pieces of code to add
cacheline padding (or re-arrange entirely) to avoid that false sharing from
happening. One other way to avoid false sharing seems to be to look for class
cohesion - coherent classes where methods and variables are closely related
tend to suffer less from false sharing. If you would like to view the resulting
layout use the "-XX:PrintFieldLayout" option.
Java 8 will bring a few more notable improvements including changes to the
adaptive sizing GC policy, the introduction of parallel arrays that allow for
parallel execution of predicates on array entries, changes to the concurrency
libraries, internalised iterators.