Archive

Posts Tagged ‘Java’

Devoxx – Day one – Java, Performance and Devops

December 15th, 2010 at 9:22pm

In his keynote Mark Reinhold provided some information on the very interesting features to be included in the Java 7 release. Generics will be easier to declare with the diamond operator. Nested try-finally constructs that are nowadays needed to safely close resources will no longer be necessary – their will be the option of implementing a Closeable interface supporting a method close() that get’s called whenever objects of that class’s type go out of scope. That way resources can be freed automatically. Though different in concept, it still reminds me a lot of the functionality typically provided by destructors in C++.

The support for lambda operators and direct method references that will greately help reducing clutter due to nested inner classes has been postponed for later Java releases. Though it took 4 years to come up with the Java 7 release new features are pretty much limited. However the current roadmap looks pretty much release date driven. The intention seems to be to get developers focussed on a limited set of reachable features to finally get the release out into the hands of users.

The speaker claimed Oracle to remain committed to Java development – first and foremost because of being a heavy Java user themselves. However also in order to generate revenue indirectly (through selling support and consulting for Java related products), directly (through Java support) and reducing internal development cost and Java friction.

Though Oracle had a JVM implementation of its own (jRocket) development of HotSpot will be continued – mostly due to a larger number developers being familiar with HotSpot. However monitoring and diagnosis tooling that was superior at jRocket is supposed to be ported to HotSpot.

In the core Java session I also went to the talk on Java performance analysis by Joshua Bloch. He a good job bringing the topic of performance analysis on complex systems to software developers. In ancient times it was quite easy to estimate a piece of code’s static performance by static code analysis. Looking at the expression if (condition && secondCondition) it is still commonly considered to be faster to use “&&” over “&”. However looking at current CPU architectures that make heavy use of instruction pipelines it heavily depends on their branch prediction heuristics whether this statement is still true. Dirtying the pipeline by using && may well be more expensive than doing the extra evaluation. General message: The performance of your code in a real world system depends on the hardware it runs on, the operating system as well as the exact VM version used. Estimating performance based on static analysis only is no longer possible.

However even when doing benchmarks one might well reach false conclusions. It is common knowledge that running a benchmark on a VM is required to be run multiple times – VM warmup phases are well known to developers, so the common performance pattern for on specific function usually looks like that:

However even when repeating the test on the same machine multiple times, the values seen after warm-up may be skewed substantially. The only remedy to reaching false conclusions is to do several VM runs, average of the runs (and provide median etc. that are less susceptible to outliers) and provide error bars for each averaged run. When comparing two different implementations the only way to reliably tell which one is better than the other is to do statistical significance tests. Consider the diagram below. When leaving error bars out, the left implementation seems clearly better than the right. However when taking into account how widely skewed the performance numbers are and adding error bars to the entries, this is no longer the case: Both runs are no longer statistically significantly different.

General , , , ,

Devoxx – Day three

December 10th, 2010 at 9:28pm

The panel discussion on the future of Java was driven by visitor submitted and voted questions on the current state and future of Java. The general take-aways for me included the clear statement that the TCK will never be made available to the ASF. The promise of Oracle to continue supporting the Java community and remaining active in the JCP.

There was some discussion on whether coming Java versions should be backwards-incompatible. One advantage would be the removal of several Java puzzlers thus making it easier for Joe Java to write code in Java without knowing too much about potential inconsistencies. According to Joshua Bloch the language is no longer well suited to the average programmer who just simply wants to get his tasks done in a consistent and easy to use language: It has become too complicated over the course of the years and is in bitter need for simplification.

Having seen his presentation in Berlin at Buzzwords and silently following the project’s progress online I skipped parts of the elastic search presentation. Instead went to the presentation on the Ghost-^wBoilerplate Busters from project Lombok. It always stroke me as odd that in a typical Java project there is so much code that can be generated automatically by Eclipse – such as getters/setters, equals/hashcode, delecation of methods and more. I never really understood why it is possible to generate all that code from Eclipse but not during compile time. Project Lombok however comes to the rescue here. As a compile time dependency it provides several annotations that are automatically converted to the correct code on the fly. It includes support for getter/setter generation, handling of closable resources (even with the current stable version of java), generation of thread safe lazy initialisation of member variables, automatic implementation of the composition over inheritance pattern and much more.

The library can be used from within Eclipse, in maven, ant, ivy, on Google App Engine. One of the developers in charge for IntelliJ who was in the audience announced that the library will be supported by the next version of IntelliJ as well.

General , ,

Devoxx – Day two – Caching

December 7th, 2010 at 9:22pm

Day two started with a really good talk on caching architectures by Greg Luck. He first motivated why caching works: Even with SSIDs being available now there is still a huge performance gap between RAM access times and having to go to disk. The issue is even worse in systems that are architected in a distributed way making frequent calls to remote systems.

When sizing systems for typical load, what is oftentimes forgotten is that there is no such thing as typical load: Usually the load distribution observed over one day for a service used mainly in one time zone has the shape of an elephant – most queries are issued during lunch time (head of the elephant) with another but smaller peak during the afternoon. This pattern repeats when looking at the weekly distribution, repeats again when looking at the yearly distribution. When looking at the peak time of the year, at the peak day, at the peak time your lead may be increased by several orders of magnitude compared to average load.

Although query volume may be high in most applications that reach out for caching, these queries usually exhibit a power law distribution. This means that there are just a few queries being issued very frequently, however many queries are pretty seldom. This pattern allows for high cache hit rates thus reducing load substantially even during very busy times.

The speaker went into some more detail concerning different architectures: Usually projects start with one cache located directly on the frontend server. When scaling horizontally and adding more and more frontends this leads to an ever increasing load on the database during one period of lifetime for one cached item. The first idea employed to remedy this setup is to link the different caches to each other increasing cache hit rates. Problem here are updates racing to the various caches when the same query is issued to the backend by more than one frontend. The usual next step is to go for a distributed remote cache such as memcache. Of course this has the draw-back of now having to do a network call for each cache access slowing down response times by several milliseconds. Another problem with distributed caching systems is a theorem well known to people building distributed NoSQL databases: CAP says that you can get only two of the three desired properties consistency, availability and partition-tolerance. Ehcache with a terracotta back end lets you configure where your priority lies.

Hacking , ,

Devoxx Antwerp

December 3rd, 2010 at 9:16pm

With 3000 attendees Devoxx is the largest Java Community conference world-wide. Each year in autumn it takes place in Antwerp/ Belgium, in recent years in the Metropolis cinema. The conference tickets were sold out long before doors were opened this year.
The focus of the presentations are mainly on enterprise Java featuring talks by famous Joshua Bloch, Mark Reihnhold and others on new features of the upcoming JDK release as well as intricacies of the Java programming language itself.
This year for the first time the scope was extended to include one whole track on NoSQL databases. The track was organised by Steven Noels. It featured fantastic presentations on HBase use cases, easily accessible introductions to the concepts and usage of Hadoop.
To me it was interesting to observe which talks people would go to. In contrast to many other conferences here the NoSQL/ cloud-computing presentations were less visited than I’d have expected. One reason might be the fact that especially on conference day two they had to compete with popular topics such as the Java puzzlers, Live Java posse and others. However when talking to other attendees their seemed to be a clear gap between the two communities caused probably by a mixture of

  • there being very different problems to be solved in the enterprise world vs. the free software, requirements and scalability driven NoSQL community. Although even comparably small companies (compared to the Googles and Yahoo!s of this world) in Germany are already facing scaling issues, these problems are not yet that pervasive in the Java community as a whole. To me this was rather important to learn, as coming from a Machine learning background, now working for a search provider and being involved with Mahout, Lucene and Hadoop scalability and a growth in data has always been one of the major drivers for any projects I have been working on so far.
  • Even when faced with growing amounts of data in the regular enterprise world developers seem to be faced with the problem of not being able to freely select the technologies to be used for implementing a project. In contrast to startups and lean software teams there still seem to be quite a few teams that are not only given what to implement but also how to implement the software unnecessarily restricting the tools to use to solve a given problem.

One final factor that drives developers adopting NoSQL and cloud computing technologies is the observation for the need to optimise the system as a whole – to think outside the box of fixed APIs and module development units. To that end the DevOps movement was especially interesting to me as only by getting the knowledge largely hidden in operations teams into development and mixing that with the skill of software developers can lead to truly elastic and adaptable systems.

General, Mahout, Software Foundation , , , ,

Books I found particularly helpful

March 12th, 2009 at 6:44pm

During the last few years I have quite a few books that one could easily file under the category “Hacking books”. Some of them were particularly interesting to me and have influenced the way I write code. The following list certainly is not complete at all - but it is a nice starting point.

  • Effective C++ - I have comparably little experience with C++ but this book really helped understand some of the particularities.
  • Effective Java - even though I have been developing in Java since a few years reading and revisiting Effective Java helps understanding and dealing with some of the quirks of the JVM.
  • Mythical Man Month - although classical literature for people dealing with software projects, although very well known, although easy to understand it is scaring to see that the exact same mistakes are still common in today’s software projects.
  • Concurrent programming in Java - quick start on concurrent programming patterns - primarily focussed on Java. Fortunately no collection of recipes but thorough background information.
  • Working effectively with legacy code - I really like to have a look into this book from time to time. Shows great ways of untangling bad code, refactoring it and making it testable.
  • XP books by Kent Beck - if you ever had any questions on what XP programming is and how you should implement it: These are the books to read. Don’t trust what people call XP in practice as long as they are not willing to refine and improve their “agile processes”. Keep on working on what stops you from delivering great code.
  • Why programs fail - a guide to systematic debugging - If you ever had to debug complex programs - and I bet you had - this is the book that explains how to do this systematically. How to even have fun along the way.
  • Zen and the art of motorcycle maintenance - Not particularly on Software Development but the techniques described match stunningly well on software development.
  • Release It! - just about to read that one. But already the first few pages are not only valuable and interesting but also entertaining.
  • Implementation Patterns - forgot that yesterday.
  • Presentation Zen - another one I forgot. Really helped me to make better presentations.

There are still quite a few good books on my list. If you have any recommendations - please leave them in the comments.

There are a few other book lists online in various blogs. Two examples are the ones below:
http://www.codinghorror.com/blog/archives/000020.html
http://www.joelonsoftware.com/navLinks/fog0000000262.html

Hacking ,

Erlang User Group - Scala

March 9th, 2009 at 12:25pm

What: Scala Presentation by Stefan Plantikow.
Where: Cockpit of the Box119 http://boxhagener119.de/ (Ring at UPSTREAM)
When: Wednesday, 11.03.2009, 8:00 p.m.

Yesterday the Erlounge, organised by Jan Lehnardt, took place in the Cockpit of Box119 in Berlin. Topic of the evening was an introduction to Scala.

Scala is a functional language that compiles to Java Bytecode and runs on the JVM. It tries to combine the best from two worlds: Object oriented languages and functional programming. So every function is an object and every object is a function.

Some interesting bits of information:

  • Scala is a statically typed language - but you can omit the types most of the times as type inference in the compiler is pretty good.
  • Everything is an object - there is no difference in primitives and objects.
  • There are packages for distributed computing - spawning processes and sending messages is not as fast as in Erlang there is still room for improvement.
  • The developers are currently about to tidy up the syntax and take care of corner cases.
  • It is easy to start with Scala as you can start out with a subset of the language and extend your knowledge as you need.
  • Scala means Scalable language. Scalable in terms of projects and tasks you can accomplish with it.

If you want to see a second nice presentation that is slightly less focussed on comparing Scala to Erlang you might also find this year’s FOSEM presentation interesting: http://www.slideshare.net/Odersky/fosdem-2009-1013261 (video should be up soon as well).

General ,