<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Inductive Bias</title>
	<atom:link href="http://blog.isabel-drost.de/index.php/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.isabel-drost.de</link>
	<description>Yet another free software developer's blog.</description>
	<pubDate>Fri, 24 May 2013 20:41:00 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>JAX: Projec Nashorn</title>
		<link>http://blog.isabel-drost.de/index.php/archives/487/jax-projec-nashorn</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/487/jax-projec-nashorn#comments</comments>
		<pubDate>Fri, 24 May 2013 20:41:00 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[JAX]]></category>

		<category><![CDATA[js]]></category>

		<category><![CDATA[jvm]]></category>

		<category><![CDATA[nashorn]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=487</guid>
		<description><![CDATA[The last talk I went to was on project Nashorn - demonstrating the capability
to run dynamic languages on the JVM by writing a JavaScript implementation as a
proof of concept that is fully ECMA compliant and still performs better than
Mozilla&#8217;s project Rhino.

It was nice to see Lisp, created in 1962, referenced as being the first
language that [...]]]></description>
			<content:encoded><![CDATA[<p>The last talk I went to was on project Nashorn - demonstrating the capability<br />
to run dynamic languages on the JVM by writing a JavaScript implementation as a<br />
proof of concept that is fully ECMA compliant and still performs better than<br />
Mozilla&#8217;s project Rhino.</p>
<p><P><br />
It was nice to see Lisp, created in 1962, referenced as being the first<br />
language that featured a JIT compiler as well as garbage collection. It was<br />
also good to see Smalltalk referenced as pioneering class libraries, visual GUI<br />
driven IDEs and bytecode.</p>
<p><P><br />
As such Java essentially stands on the shoulders of giants. Now dynamic<br />
language writers can themselves use the JVM to boost their productivity by<br />
profiting from the VM&#8217;s memory management, JIT optimisations, native threading.<br />
The result could be a smaller code base and more time to concentrate on<br />
interesting language features (of course another result would be that the JVM<br />
becomes interesting not only for Java developers but also to people who want to<br />
use dynamic languages instead).</p>
<p><P><br />
The projects invoke dynamic as well as the DaVinci machine are both interesting<br />
areas for people to follow who are interested in running dynamic languages on<br />
the JVM.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/487/jax-projec-nashorn/feed</wfw:commentRss>
		</item>
		<item>
		<title>JAX: Tales from production</title>
		<link>http://blog.isabel-drost.de/index.php/archives/486/jax-tales-from-production</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/486/jax-tales-from-production#comments</comments>
		<pubDate>Thu, 23 May 2013 20:38:57 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[JAX]]></category>

		<category><![CDATA[logging]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=486</guid>
		<description><![CDATA[In a second presentation Peter Ro&#195;&#376;bach together with Andreas Schmidt provided
some more detail on what the topic logging entails in real world projects.
Development messages turn into valuable information needed to uncover issues
and downtime of systems, capacity planning, measuring the effect of software
changes, analysing resource usage under real world usage. In addition to these
technical use cases [...]]]></description>
			<content:encoded><![CDATA[<p>In a second presentation Peter Ro&#195;&#376;bach together with Andreas Schmidt provided<br />
some more detail on what the topic logging entails in real world projects.<br />
Development messages turn into valuable information needed to uncover issues<br />
and downtime of systems, capacity planning, measuring the effect of software<br />
changes, analysing resource usage under real world usage. In addition to these<br />
technical use cases there is a need to provide business metrics.</p>
<p><P><br />
When dealing with multiple systems you deal with correlating values across<br />
machines and systems, providing meaningful visualisations to draw the correct<br />
decisions.</p>
<p><P><br />
When thinking of your log architecture you might want to consider storing not<br />
only log messages. In addition facts like release numbers should be tracked<br />
somewhere - ready to join in when needed to correlate behaviour with release<br />
version. To do that also track events like rolling out a release to production.<br />
Launching in a new market, switching traffic to a new system could be other<br />
events. Introduce not only pure log messages but also provide aggregated<br />
metrics and counters. All of these pieces should be stored and tracked<br />
automatically to free operations for more important work.</p>
<p><P><br />
Have you ever thought about documenting not only your software, it&#8217;s interfaces<br />
and input/output format? What about documenting the logged information as well?<br />
What about the fields contained in each log message? Are they documented or do<br />
people have to infer their meaning from the content? What about valid ranges<br />
for values - are they noted down somewhere? Did you store whether a specific<br />
field can only contain integers or whether some day it also could contain<br />
letters? What about the number format - is it decimal, hexadecimal?</p>
<p><P><br />
For a nice architecture documentation of the BBC checkout</p>
<p><a href="http://www.guardian.co.uk/info/developer-blog/2012/oct/04/winning-the-metrics-battle">Winning the metrics battle</a> by the BBC dev blog.</p>
<p><P><br />
There&#8217;s an abundance of tools out there to help you with all sorts of logging<br />
related topics:</p>
<p><P></p>
<p><UL><br />
<LI>For visualisation and transport: Datadog, kibana, logstash, statsd,<br />
graphite, syslog-ng<br />
</LI><br />
<LI>For providing the values: JMX, metrics, Jolokia<br />
</LI><br />
<LI>For collection: collecd, statsd, graphite, newrelic, datadog<br />
</LI><br />
<LI>For storage: typical RRD tools including RRD4j, MongoDB, OpenTSDB based<br />
on HBase, Hadoop<br />
</LI><br />
<LI>For charting: Munin, Cacti, Nagios, Graphit, Ganglia, New Relic, Datadog<br />
</LI><br />
<LI>For Profiling: Dynatrace, New Relic, Boundary<br />
</LI><br />
<LI>For events: Zabbix, Icinga, OMD, OpenNMS, HypericHQ, Nagios,JbossRHQ<br />
</LI><br />
<LI>For logging: splunk, Graylog2, Kibana, logstash<br />
</LI><br />
</UL></p>
<p><P><br />
Make sure to provide metrics consistently and be able to add them with minimal<br />
effort. Self adaption and automation are useful for this. Make sure developers,<br />
operations and product owners are able to use the same system so there is no<br />
information gap on either side. Your logging pipeline should be tailored to<br />
provide easy and fast feedback on the implementation and features of the<br />
product.</p>
<p><P><br />
To reach a decent level of automation a set of tools is needed for:</p>
<p><UL><br />
<LI>Configuration management (where to store passwords, urls or ips, log<br />
levels etc.). Typical names here include Zookeeper,but also CFEngine, Puppet<br />
and Chef.<br />
</LI><br />
<LI>Deployment management. Typical names here are UC4, udeploy, glu, etsy<br />
deployment.<br />
</LI><br />
<LI>Server orchestration (e.g. what is started when during boot). Typical<br />
names include UC4, Nolio, Marionette Collective, rundeck.<br />
</LI><br />
<LI>Automated provisioning (think &#8220;how long does it take from server failure<br />
to bringing that service back up online?&#8221;). Typical names include kickstart,<br />
vagrant, or typical cloud environments.<br />
</LI><br />
<LI>Test driven/ behaviour driven environments (think about adjusting not<br />
only your application but also firewall configurations). Typical tools that<br />
come to mind here include Server spec, rspec, cucumber, c-puppet, chef.<br />
</LI><br />
<LI>When it comes to defining the points of communication for the whole<br />
pipeline there is no tool you can use that is better than traditional pen and<br />
                           paper, socially getting both development and operations into one room.<br />
</LI><br />
</UL></p>
<p><P><br />
The tooling to support this process goes from simple self-written bash scripts<br />
in the startup model to frameworks that support the flow partially, up to<br />
process based suites that help you. No matter which path you choose the goal<br />
should always be to end up with a well documented, reproducable step into<br />
production. When introducing such systems problems in your organisation may<br />
become apparent. Sometimes it helps to just create facts: It&#8217;s easier to ask for<br />
forgiveness than permission.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/486/jax-tales-from-production/feed</wfw:commentRss>
		</item>
		<item>
		<title>JAX: Logging best practices</title>
		<link>http://blog.isabel-drost.de/index.php/archives/485/jax-logging-best-practices</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/485/jax-logging-best-practices#comments</comments>
		<pubDate>Wed, 22 May 2013 20:37:59 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[JAX]]></category>

		<category><![CDATA[logging]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=485</guid>
		<description><![CDATA[The ideal outcome of Peter Ro&#195;&#376;bach&#8217;s talk on logging best practices was to have
attendees leave the room thinking &#8220;we know all this already and are applying
it successfully&#8221; - most likely though the majority left thinking about how to
implement even the most basic advise discussed.

From his consultancy and fire fighter background he has a good overview [...]]]></description>
			<content:encoded><![CDATA[<p>The ideal outcome of Peter Ro&#195;&#376;bach&#8217;s talk on logging best practices was to have<br />
attendees leave the room thinking &#8220;we know all this already and are applying<br />
it successfully&#8221; - most likely though the majority left thinking about how to<br />
implement even the most basic advise discussed.</p>
<p><P><br />
From his consultancy and fire fighter background he has a good overview of what<br />
logging in the average corporate environment looks like: No logging plan, no<br />
rules, dozens of logging frameworks in active use, output in many different<br />
languages, no structured log events but a myriad of different quoting,<br />
formatting and bracketing standards instead.</p>
<p><P><br />
So what should the ideal log line contain? First of all it should really be a<br />
log line instead of a multi line something that cannot be reconstructed when<br />
interleaved with other messages. The line should not only contain the class<br />
name that logged the information (actually that is the least important piece of<br />
information), it should contain the thread id, server name, a (standardised and<br />
always consistently formatted) timestamp in a decent resolution (hint: one new<br />
timestamp per second is not helpful when facing several hundred requests per<br />
second). Make sure to have timing aligned across machines if timestamps are<br />
needed for correlating logs. Ideally there should be context in the form of<br />
request id, flow id, session id.</p>
<p><P><br />
When thinking about logs, do not think too much about human readability - think<br />
more in terms of machine readability and parsability. Treat your logging system<br />
as the db in your data center that has to deal with most traffic. It is what<br />
holds user interactions and system metrics that can be used as business<br />
metrics, for debugging performance problems, for digging up functional issues.<br />
Most likely you will want to turn free text that provides lots of flexibility<br />
for screwing up into a more structured format like json, or even some binary<br />
format that is storage efficient (think protocol buffers, thrift, avro).</p>
<p><P><br />
In terms of log levels, make sure to log development traces on trace, provide<br />
detailed problem analysis stuff on debug, put normal behaviour onto info. In<br />
case of degraded functionality, log to warn. In case of things you cannot<br />
easily recovered from put them on error. When it comes to logging hierarchies -<br />
do not only think in class hierarchies but also in terms of use cases: Just<br />
because your http connector is used in two modules doesn&#8217;t mean that there<br />
should be no way to turn logging on just for one of the modules alone.</p>
<p><P><br />
When designing your logging make sure to talk to all stakeholders to get clear<br />
requirements. Make sure you can find out how the system is being used in the<br />
wild, be able to quantify the number of exceptions; max, min and average<br />
duration of a request and similar metrics.</p>
<p><P><br />
Tools you could look at for help include but are not limited to splunk, jmx,<br />
jconsole, syslog, logstash, statd, redis for log collection and queuing.</p>
<p><P><br />
As a parting exercise: Look at all of your own logfiles and count the different<br />
formats used for storing time.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/485/jax-logging-best-practices/feed</wfw:commentRss>
		</item>
		<item>
		<title>JAX: Java performance myths</title>
		<link>http://blog.isabel-drost.de/index.php/archives/484/jax-java-performance-myths</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/484/jax-java-performance-myths#comments</comments>
		<pubDate>Wed, 22 May 2013 20:37:07 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[JAX]]></category>

		<category><![CDATA[myth]]></category>

		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=484</guid>
		<description><![CDATA[This talk was one of the famous talks on Java performance myths by Arno Haase.
His main point - supported with dozens of illustrative examples was for
software developers to stop trusting in word of mouth, cargo cult like myths
that are abundant among engineers. Again the goal should be to write readable
code above all - for one [...]]]></description>
			<content:encoded><![CDATA[<p>This talk was one of the famous talks on Java performance myths by Arno Haase.<br />
His main point - supported with dozens of illustrative examples was for<br />
software developers to stop trusting in word of mouth, cargo cult like myths<br />
that are abundant among engineers. Again the goal should be to write readable<br />
code above all - for one the Java compiler and JIT are great at optimising. In<br />
addition many of the myths being spread in the Java community that are claimed<br />
to lead to better performance are simply not true.</p>
<p><P><br />
It was interesting to learn how many different aspects of both software and<br />
hardware contribute to code performance. Micro benchmarks are considered<br />
dangerous for a reason - creating a well controlled environment that matches<br />
what the code will encounter in production is influenced by things like just in<br />
time compilation, cpu throttling, etc.</p>
<p><P><br />
Some myths that Arno proved wrong include final making code faster (in case of<br />
method parameters it doesn&#8217;t make a difference up to bytecode being identical<br />
with and without), inheritance being always expensive (even with an abstract<br />
class between the interface and the implementation Java 6 and 7 can still<br />
inline the method in question). Another one was on often wrongly scoped Java<br />
vs. C comparisons. One myth resolved around the creation of temporary objects -<br />
since Java 6 and 7 in simple cases even these can be optimised away.</p>
<p><P><br />
When it comes to (un-)boxing and reflection there is a performance penalty. For<br />
the latter mostly for method lookup, not so much for calling the method. What we<br />
are talking about however are penalties in the range of about 1000 compute<br />
cycles. Compared to doing any remote calls this is still dwarfed. Reflection on<br />
fields is even cheaper.</p>
<p><P><br />
One of the more wide spread myths resolved around string concatenation being<br />
expensive - doing a &#8220;A&#8221; + &#8220;B&#8221; in code will be turned into &#8220;AB&#8221; in<br />
bytecode. Even doing the same with a variable will be turned into the use of<br />
StringBuilder ever since -XX:OptimizeStringConcat was turned on by default.</p>
<p><P><br />
The main message here is to stop trusting your intuition when reasoning about a<br />
system&#8217;s performance and performance bottlenecks. Instead the goal should be to<br />
go and measure what is really going on. Those are simple examples where your<br />
average Java intuition goes wrong. Make sure to stay on top with what the JVM<br />
turns your code into and how that is than executed on the hardware you have<br />
rolled out if you really want to get the last bit of speed out of your<br />
application.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/484/jax-java-performance-myths/feed</wfw:commentRss>
		</item>
		<item>
		<title>JAX: Does parallel equal performant?</title>
		<link>http://blog.isabel-drost.de/index.php/archives/483/jax-does-parallel-equal-performant</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/483/jax-does-parallel-equal-performant#comments</comments>
		<pubDate>Tue, 21 May 2013 20:34:40 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[JAX]]></category>

		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=483</guid>
		<description><![CDATA[In general there is a tendency to set parallel implementations to being equal
to performant implementations. Except in the really naive case there is always
going to be some overhead due to scheduling work, managing memory sharing and
network communication overhead. Essentially that knowledge is reflected in
Amdahl&#8217;s law (the amount of serial work limits the benefit from running [...]]]></description>
			<content:encoded><![CDATA[<p>In general there is a tendency to set parallel implementations to being equal<br />
to performant implementations. Except in the really naive case there is always<br />
going to be some overhead due to scheduling work, managing memory sharing and<br />
network communication overhead. Essentially that knowledge is reflected in<br />
Amdahl&#8217;s law (the amount of serial work limits the benefit from running parts<br />
of your implementation in parallel, http://en.wikipedia.org/wiki/Amdahl&#8217;s_law),<br />
and Little&#8217;s law (http://en.wikipedia.org/wiki/Little&#8217;s_law) in case of queuing<br />
problems.</p>
<p><P><br />
When looking at current Java optimisations there is quite a bit going on to<br />
support better parallelisation: Work is being done to provide for improving<br />
lock contention situations, the GC adaptive sizing policy has been improved to<br />
a usable state, there is added support for parallel arrays and lampbda&#8217;s<br />
splitable interface.</p>
<p><P><br />
When it comes to better locking optimisations what is most notable is work<br />
towards coarsening locks at compile and JIT time (essentially moving locks from<br />
the inside of a loop to the outside); eliminating locks if objects are being<br />
used in a local, non-threaded context anyway; and support for biased locking<br />
(that is forcing locks only when a second thread is trying to access an<br />
object). All three taken together can lead to performance improvements that<br />
will almost render StringBuffer and StringBuilder to exhibit equal performance<br />
in a single threaded context.</p>
<p><P><br />
For pieces of code that suffer from false sharing (two variables used in<br />
separate threads independently that end up in the same CPU cacheline and as a<br />
result are both flushed on update) there is a new annotation: Adding the<br />
&#8220;@contended&#8221; annotation can help the compiler for which pieces of code to add<br />
cacheline padding (or re-arrange entirely) to avoid that false sharing from<br />
happening. One other way to avoid false sharing seems to be to look for class<br />
cohesion - coherent classes where methods and variables are closely related<br />
tend to suffer less from false sharing. If you would like to view the resulting<br />
layout use the &#8220;-XX:PrintFieldLayout&#8221; option.</p>
<p><P><br />
Java 8 will bring a few more notable improvements including changes to the<br />
adaptive sizing GC policy, the introduction of parallel arrays that allow for<br />
parallel execution of predicates on array entries, changes to the concurrency<br />
libraries, internalised iterators.</p>
<p><P></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/483/jax-does-parallel-equal-performant/feed</wfw:commentRss>
		</item>
		<item>
		<title>JAX: Pigs, snakes and deaths by 1k cuts</title>
		<link>http://blog.isabel-drost.de/index.php/archives/482/jax-pigs-snakes-and-deaths-by-1k-cuts</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/482/jax-pigs-snakes-and-deaths-by-1k-cuts#comments</comments>
		<pubDate>Mon, 20 May 2013 20:32:16 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[JAX]]></category>

		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=482</guid>
		<description><![CDATA[In his talk on performance problems Rainer Schuppe gave a great introduction to
which kinds of performance problems can be observed in production and how to
best root-cause them.

Simply put performance issues usually arise due to a difference in either data
volumn, concurrency levels or resource usage between the dev, qa and production
environments. The tooling to uncover and [...]]]></description>
			<content:encoded><![CDATA[<p>In his talk on performance problems Rainer Schuppe gave a great introduction to<br />
which kinds of performance problems can be observed in production and how to<br />
best root-cause them.</p>
<p><P><br />
Simply put performance issues usually arise due to a difference in either data<br />
volumn, concurrency levels or resource usage between the dev, qa and production<br />
environments. The tooling to uncover and explain them is pretty well known:<br />
Staring with looking at logfiles, ARM tools, using aspects, bytecode<br />
instrumentalisation, sampling, watching JMX statistics, and PMI tools.</p>
<p><P><br />
All of theses tools have their own unique advantages and disadvantages. With<br />
logs you get the most freedom, however you have to know what to log at<br />
development time. In addition logging is i/o heavy, so doing too much can slow<br />
the application down itself. In a common distributed system logs need to be<br />
aggregated somehow. As a simple example of what can go wrong are cascading<br />
exceptions spilled to disk that cause machines to run out of disk space one<br />
after the other. When relying on logging make sure to keep transaction<br />
contexts, in particular transaction ids across machines and services to<br />
correlate outages. In terms of tool support, look at scribe, splunk and flume.</p>
<p><P><br />
A tool often used for tracking down performance issues in development is the<br />
well known profiler. Usually it creates lots of very detailed data. However it<br />
is most valuable in development - in production profiling a complete server<br />
stack produces way too much load and data to be feasable. In addition there&#8217;s<br />
usually no transaction context available for correlation again.</p>
<p><P><br />
A third way of watching applications do their work is to watch via JMX. This<br />
capability is built in for any Java application, in particular for servlet<br />
containers. Again there is not transaction context. Unless you take care of it<br />
there won&#8217;t be any historic data.</p>
<p><P><br />
When it comes to diagnosing problems, you are essentially left with fixing<br />
either the &#8220;it does not work&#8221; case or the &#8220;it is slow case&#8221;.</p>
<p><P><br />
For the &#8220;it is slow case&#8221; there are a few incarnations:</p>
<p><UL><br />
<LI>It was always slow, we got used to it.<br />
</LI><br />
<LI>It gets slow over time.<br />
</LI><br />
<LI>It gets slower exponentially.<br />
</LI><br />
<LI>It suddenly gets slow.<br />
</LI><br />
<LI>There is a spontanous crash.<br />
</LI><br />
</UL></p>
<p><P><br />
In the case of &#8220;it does not work&#8221; you are left with the following observations:</p>
<p><UL><br />
<LI>Sudden outages.<br />
</LI><br />
<LI>Always flaky.<br />
</LI><br />
<LI>Sporadic error messages.<br />
</LI><br />
<LI>Silent death.<br />
</LI><br />
<LI>Increasing error rates.<br />
</LI><br />
<LI>Misleading error messages.<br />
</LI><br />
</UL></p>
<p><P><br />
In the end you will always be spinning in a Look at symptoms, Elimnate<br />
non-causes, Identifiy suspects, Confirm and Eliminate comparing to normal. If<br />
not done with that, leather, rinse, repeat. When it comes to causes for errors<br />
and slowness you will usually will run into one of the following causes: In<br />
many cases bad coding practices are a problem, too much load, missing backends,<br />
resource conflicts, memory and resource leakage as well as hardware/networking<br />
issues are causes.</p>
<p><P><br />
Some symptoms you may observe include foreseeable lock ups (it&#8217;s always slow<br />
after four hours, so we just reboot automatically before that), consistent<br />
slowness, sporadic errors (it always happens after a certain request came in),<br />
getting slow and slower (most likely leaking resources), sudden chaos (e.g.<br />
someone pulling the plug or someone removing a hard disk), and high utilisation<br />
of resources.</p>
<h2>Linear memory leak</h2>
<p>In case of a linear memory leak, the application usually runs into an OOM<br />
eventually, getting ever slower before that due to GC pressure. Reasons could<br />
be linear structures being filled but never emptied. What you observe are<br />
growing heap utilisation and growing GC times. In order to find such leakage<br />
make sure to turn on verbose GC logging, do heapdumps to find leaks. One<br />
challenge though: It may be hard to find the leakage if the problem is not one<br />
large object, but many, many small ones that lead to a death by 1000 cuts<br />
bleeding the application to death.</p>
<p><P><br />
In development and testing you will do heap comparisons. Keep in mind that<br />
taking a heap dump causes the JVM to stop. You can use common profilers to look<br />
at the heap dump. There are variants that help with automatic leak detection.</p>
<p><P><br />
A variant is the pig in a python issue where sudden unusually large objects<br />
cause the application to be overloaded.</p>
<p><H2><A NAME="SECTION00332000000000000000"><br />
Resource leaks and conflicts</A><br />
</H2><br />
Another common problem is leaking resources other than memory - not closing<br />
file handles can be one incarnation. Those problems cause a slowness over time,<br />
they may lead to having the heap grow over time - usually that is not the most<br />
visible problem though. If instance tracking does not help here, your last<br />
resort should be doing code audits.</p>
<p><P><br />
In case of conflicting resource usage you usually face code that was developed<br />
with overly cautious locking and data integrity constraints. The way to go are<br />
threaddumps to uncover threads in block and wait states.</p>
<p><H2><A NAME="SECTION00333000000000000000"><br />
Bad coding practices</A><br />
</H2><br />
When it comes to bad coding practices what is usually seen is code in endless<br />
loops (easy to see in thread dumps), cpu bound computations where no result<br />
caching is done. Also layeritis with too much (de-)serialisation can be a<br />
problem. In addition there is a general &#8220;the ORM will save us all&#8221; problem that<br />
may lead to massive SQL statements, or to using the wrong data fetch strategy.<br />
When it comes to caching - if caches are too large, access times of course grow<br />
as well. There could be never ending retry loops, ever blocking networking<br />
calls. Also people tend to catch exceptions but not do anything about them<br />
other than adding a little #fixme annotation to the code.</p>
<p><P><br />
When it comes to locking you might run into dead-/live-lock problems. There<br />
could be chokepoints (resources that all threads need for each processing<br />
chain). In a thread dump you will typically see lots of wait instead of block<br />
time.</p>
<p><P><br />
In addition there could be internal and external bottlenecks. In particular<br />
keep those in mind when dealing with databases.</p>
<p><P><br />
The goal should be to find an optimum for your application between too many too<br />
small requests that waste resources getting dispatched, and one huge request<br />
that everyone else is waiting for.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/482/jax-pigs-snakes-and-deaths-by-1k-cuts/feed</wfw:commentRss>
		</item>
		<item>
		<title>JAX: Java HPC by Norman Maurer</title>
		<link>http://blog.isabel-drost.de/index.php/archives/481/jax-java-hpc-by-norman-maurer</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/481/jax-java-hpc-by-norman-maurer#comments</comments>
		<pubDate>Sun, 19 May 2013 20:31:16 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[JAX]]></category>

		<category><![CDATA[netty]]></category>

		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=481</guid>
		<description><![CDATA[For slides see also: Speakerdeck: High performance networking on the JVM

Norman started his talk clarifying what he means by high scale: Anything above
1000 concurrent connections in his talk are considered high scale, anything
below 100 concurrent connections is fine to be handled with threads and blocking
IO. Before tuning anything, make sure to measure if you have [...]]]></description>
			<content:encoded><![CDATA[<p>For slides see also: <a href="https://speakerdeck.com/normanmaurer/high-performance-networking-on-the-jvm-less<br />
ons-learned">Speakerdeck: High performance networking on the JVM</a></p>
<p><P><br />
Norman started his talk clarifying what he means by high scale: Anything above<br />
1000 concurrent connections in his talk are considered high scale, anything<br />
below 100 concurrent connections is fine to be handled with threads and blocking<br />
IO. Before tuning anything, make sure to measure if you have any problem at<br />
all: Readability should always go before optimisation.</p>
<p><P><br />
He gave a few pointers as to where to look for optimisations: Get started by<br />
studying the socket options - TCP-NO-DELAY as well as the send and receive<br />
buffer sizes are most interesting. When under GC pressure (check the GC locks<br />
to figure out if you are) make sure to minimise allocation and deallocation of<br />
objects. In order to do that consider making objects static and final where<br />
possible. Make sure to use CMS or G1 for garbage collection in order to<br />
maximise throughput. Size areas in the JVM heap according to your access<br />
patterns. The goal should always be to minimise the chance of running into a<br />
stop the world garbage collection.</p>
<p><P><br />
When it comes to using buffers you have the choice of using direct or heap<br />
buffers. While the former are expensive to create, the latter come with the<br />
cost of being zero&#8217;ed out. Often people start buffer pooling, potentially<br />
initialising the pool in a lazy manner. In order to avoid memory fragmentation<br />
in the Java heap, it can be a good idea to create the buffer at startup time<br />
and re-use it later on.</p>
<p><P><br />
In particular when parsing structured messages like they are common in<br />
protocols it usually makes sense to use gathering writes and scattering reads<br />
to minimise the number of system calls for reading and writing. Also try to<br />
buffer more if you want to minimise system calls. Use slice and duplicate to<br />
create views on your buffers to avoid mem copies. Use a file channel when<br />
copying files without modifications.</p>
<p><P><br />
Make sure you do not block - think of DNS servers being unavailable or slow as<br />
an example.</p>
<p><P><br />
As a parting note,  make sure to define and document your threading model. It<br />
may ease development to know that some objects will always only be used in a<br />
single threaded context. It usually helps to reduce context switches as well as<br />
may ease development to know that some objects will always only be used in a<br />
single threaded context. It usually helps to reduce context switches as well as<br />
keeping data in the same thread to avoid having to use synchronisation and the<br />
use of volatile.</p>
<p><P><br />
Also make a conscious decision about which protocol you would like to use for<br />
transport - in addition to tcp there&#8217;s  also udp, udt, sctp. Use pipelining in<br />
order to parallelise.</p>
<p><P></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/481/jax-java-hpc-by-norman-maurer/feed</wfw:commentRss>
		</item>
		<item>
		<title>JAX: Hadoop overview by Bernd Fondermann</title>
		<link>http://blog.isabel-drost.de/index.php/archives/480/jax-hadoop-overview-by-bernd-fondermann</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/480/jax-hadoop-overview-by-bernd-fondermann#comments</comments>
		<pubDate>Sat, 18 May 2013 20:29:57 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[BigDataCon]]></category>

		<category><![CDATA[Hadoop]]></category>

		<category><![CDATA[JAX]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=480</guid>
		<description><![CDATA[
After breakfast was over the first day started with a talk by Bernd on the
Hadoop ecosystem. He did a good job selecting the most important and
interesting projects related to storing data in HDFS and processing it with Map
Reduce. After the usual &#8220;what is Hadoop&#8221;, &#8220;what does the general architecture
look like&#8221;, &#8220;what will change with YARN&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p><P><br />
After breakfast was over the first day started with a talk by Bernd on the<br />
Hadoop ecosystem. He did a good job selecting the most important and<br />
interesting projects related to storing data in HDFS and processing it with Map<br />
Reduce. After the usual &#8220;what is Hadoop&#8221;, &#8220;what does the general architecture<br />
look like&#8221;, &#8220;what will change with YARN&#8221; Bernd gave a nice overview of which<br />
publications each of the relevant projects rely on:</p>
<p><P></p>
<p><UL><br />
<LI>HDFS is mainly based on the paper on GFS.<br />
</LI><br />
<LI>Map Reduce comes with it&#8217;s own publication.<br />
</LI><br />
<LI>The big table paper mainly inspired Cassandra (to some extend), HBase,<br />
Accumulo and Hypertable.<br />
</LI><br />
<LI>Protocol Buffers inspired Avro and Thrift, and is available as free<br />
software itself.<br />
</LI><br />
<LI>Dremel (the storage side of things) inspired Parquet.<br />
</LI><br />
<LI>The query language side of Dremel inspired Drill and Impala.<br />
</LI><br />
<LI>Power Drill might inspire Drill.<br />
</LI><br />
<LI>Pregel (a graph database) inspired Giraph.<br />
</LI><br />
<LI>Percolator provided some inspiration to HBase.<br />
</LI><br />
<LI>Dynamo by Amazon kicked of Cassandra and others.<br />
</LI><br />
<LI>Chubby inspired Zookeeper, both are based on Paxos.<br />
</LI><br />
<LI>On top of Map Reduce today there are tons of higher level languages,<br />
starting with Sawzall inside of Google, continuing with Pig and Hive at Apache<br />
we are now left with added languages like Cascading, Cascalog, Scalding and<br />
many more.<br />
</LI><br />
<LI>There are many other interesting publications (Megastore, Spanner, F1 to<br />
name just a few) for which there is no free implementation yet. In addition<br />
with Storm, Hana and Haystack there are implementations lacking canonical<br />
publications.<br />
</LI><br />
</UL><br />
<P><br />
After this really broad clarification of names and terms used, Bernd went into<br />
some more detail on how Zookeeper is being used for defining the namenode in<br />
Hadoop 2, how high availablility and federation works for namenodes. In<br />
addition he gave a clear explanation of how block reports work on cluster<br />
bootup. The remainder of the talk was reserved for giving an intro to HBase,<br />
Giraph and Drill.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/480/jax-hadoop-overview-by-bernd-fondermann/feed</wfw:commentRss>
		</item>
		<item>
		<title>BigDataCon</title>
		<link>http://blog.isabel-drost.de/index.php/archives/479/bigdatacon</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/479/bigdatacon#comments</comments>
		<pubDate>Fri, 17 May 2013 20:29:03 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[BigDataCon]]></category>

		<category><![CDATA[JAX]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=479</guid>
		<description><![CDATA[
Together with Uwe Schindler I had published a series of articles on Apache
Lucene at Software and Support Media&#8217;s Java Mag several years ago. Earlier this
year S&#38;S kindly invited my to their BigDataCon - co-located with JAX to give a
talk of my choosing that at least touches upon Lucene.

Thinking back and forth about what topic to [...]]]></description>
			<content:encoded><![CDATA[<p><P><br />
Together with Uwe Schindler I had published a series of articles on Apache<br />
Lucene at Software and Support Media&#8217;s Java Mag several years ago. Earlier this<br />
year S&amp;S kindly invited my to their BigDataCon - co-located with JAX to give a<br />
talk of my choosing that at least touches upon Lucene.</p>
<p><P><br />
Thinking back and forth about what topic to cover what came to my mind was to<br />
give a talk on how easy it is to do text classification with Mahout when<br />
relying on Apache Lucene for text analysis, tokenisation and token filtering.<br />
All classes essentially are in place to integrate Lucene Analyzers with Mahout<br />
vector generation - needed e.g. as a pre-processing step for classification or<br />
text clustering.</p>
<p><P><br />
Feel free to check out some of my sandbox code over at &lt;a<br />
href=&#8220;http://github.org/MaineC/sofia&#8221;>github&lt;/a&gt;.</p>
<p><P><br />
After attending the conference I can only recommend everyone interested in Java<br />
programming and able to understand German to buy a ticket for the conference.<br />
It&#8217;s really well executed, great selection of talks (though the sponsored<br />
keynotes usually aren&#8217;t particularly interesting), tasty meals, interesting<br />
people to chat with.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/479/bigdatacon/feed</wfw:commentRss>
		</item>
		<item>
		<title>Hadoop Summit Amsterdam</title>
		<link>http://blog.isabel-drost.de/index.php/archives/478/hadoop-summit-amsterdam</link>
		<comments>http://blog.isabel-drost.de/index.php/archives/478/hadoop-summit-amsterdam#comments</comments>
		<pubDate>Thu, 16 May 2013 20:27:56 +0000</pubDate>
		<dc:creator>mainec</dc:creator>
		
		<category><![CDATA[Event]]></category>

		<category><![CDATA[amsterdam]]></category>

		<category><![CDATA[Hadoop]]></category>

		<category><![CDATA[hadoopsummit]]></category>

		<guid isPermaLink="false">http://blog.isabel-drost.de/?p=478</guid>
		<description><![CDATA[
About a month ago I attended the first European Hadoop Summit, organised by
Hortonworks in Amsterdam. The two day conference brought together both vendors
and users of Apache Hadoop for talks, exhibition and after conference beer
drinking.

Russel Jurney kindly asked me to chair the Hadoop applied track during
Apache Con EU. As a result I had a good excuse [...]]]></description>
			<content:encoded><![CDATA[<p><P><br />
About a month ago I attended the first European Hadoop Summit, organised by<br />
Hortonworks in Amsterdam. The two day conference brought together both vendors<br />
and users of Apache Hadoop for talks, exhibition and after conference beer<br />
drinking.</p>
<p><P><br />
Russel Jurney kindly asked me to chair the Hadoop applied track during<br />
Apache Con EU. As a result I had a good excuse to attend the event. Overall<br />
there were at least three times as many submissions than could reasonably be<br />
accepted. Accordingly accepting proposals was pretty hard.</p>
<p><P><br />
Though some of the Apache community aspect was missing at Hadoop summit it was<br />
interesting nevertheless to see who is active in this space both as users as<br />
well as vendors.</p>
<p><P><br />
If you check out the talks on Youtube make sure to not miss the two sessions by<br />
Ted Dunning as well as the talk on handling logging data by Twitter.</p>
<p><P></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.isabel-drost.de/index.php/archives/478/hadoop-summit-amsterdam/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
