ApacheConNA: Hadoop metrics #
Have you ever measured the general behaviour of your Hadoop jobs? Have you
sized your cluster accordingly? Do
you know whether your work load really is IO
bound or CPU bound? Legend has it noone expecpt Allen Wittenauer over
at
Linked.In, formerly Y! ever did this analysis for his clusters.
Steve Watt gave a pitch for
actually going out into your datacenter measuring
what is going on there and adjusting the deployment accordingly:
In small
clusters it may make sense to rely on raided disks instead of additional
storage nodes to guarantee
``replication levels''. When going out to vendors to
buy hardware don't rely on paper calculations only: Standard
servers in Hadoop
clusters are 1 or 2u. This is quite unlike beefy boxes being sold otherwise.
Figure
out what reference architecture is being used by partners, run your
standard workloads, adjust the configuration. If
you want to run the 10TB
Terrasort to benchmark your hardware and system configuration. Make sure to
capture data
during all your runs - have Ganglia or SAR, watch out for
intersting behaviour in io rates, cpu utilisation, network
traffic. The goal is
to get the cpu busy, not wait for network or disk.
After the instrumentation and
trial run look for over- and underprovisionings,
adjust, leather, rinse, repeat.
Also make sure to
talk to the datacenter people: There are floor space, power
and cooling constraints to keep in mind. Don't let the
whole datacenter go down
because your cpu intensive job is drawing more power than the DC was designed
for. Ther
are also power constraints per floor tile due to cooling issues -
those should dictate the
design.
Take a close look at the disks you deploy: SATA vs. SAS can make a 40%
performance difference
at a 20% cost difference. Also the number of cores per
machines dictates the number of disks to spread the
likelyhood of random read
access. As a rule of thumb - in a 2U machine today there should be at least
twelve
large form factor disks.
When it comes to controllers he goal should be to get a dedicated lane to
disc,
safe one controller if price is an issue. Trade off compute power against
power
consumption.
Designing your network keep in mind that one switch going down means that
one
rack will be gone. This may be a non-issue in a Y! size cluster, in your
smaller scale world it might be
worth the money investing in a second switch
though: Having 20 nodes go black isn't a lot of fun if you cannot farm
out the
work and re-replication to other nodes and racks. Also make sure to have enough
ports in rack switches
for the machines you are planning to provision.
Avoid playing the ops whake-a-mole game by having one
large cluster in the
organisation than many different ones where possible. Multi-tenancy in Hadoop is
still
pre-mature though.
If you want to play with future deployments - watch out for HP currently
packing
270 servers where today are just two via system on a chip designs.