Reviewed: July 2015
With the advent of Hadoop 2.x we were promised YARN, a cluster resource manager that will reassure Hadoop position as a go-to big-data solution, taking it beyond MapReduce into a role of a multi-purpose platform. Therefore, I was really eager to pick up this title and learn how to utilise capabilities of YARN. It seemed like a perfect fit, as according to the authors’ intention Apache Hadoop™ YARN should allow the reader to master details of Apache Hadoop YARN design, architecture and its place in the Hadoop ecosystem. Beyond that, the reader should learn how to install, configure and administer a cluster and write both YARN applications and frameworks that would run on top of YARN. Have they succeeded?
Sadly, I think that the answer is not quite, and I would not recommend this particular book to anyone other than to a hardcore YARN fan. In fact, if you have read the introductory YARN material on either Apache’s or Hortonworks’ web pages, you are probably after details, administration tips, insight on YARN architecture. You would find these, and firsthand too; however, you might be in for a bit of disappointment.
First of all, by the time I received the book for review (October 2014), its content had become partially obsolete, which seems to be a common fate of publications closely covering features of fast moving open source products. Putting inevitable aside, I have few other things to point out. I was a bit surprised that the authors: Murthy, Vavilapalli, Markham, and Niemiec, who work for the Hadoop’s mothership – Hortonworks Inc – decided to cover Hadoop version 2.2.0 at a time when a subsequent version was almost ready. (The 2.2.0 version was a generally available version of Hadoop 2.x that introduced YARN in October 2013, version 2.3.0 shipped in February, to be followed by 2.4.0 in April). Such a decision might have led to my second issue with the book – to put it bluntly, it feels rushed. It might be the side effect of the co-operation of 5 authors on a text just over 300 pages long. This could explain why a few basic concepts are covered multiple times, whilst we cannot find an outline for the direction in which YARN is going in the future. However, I would not expect authors of this calibre to release a book scripts that do not work without minor fixes, and with a companion webpage ( http://yarn-book.com/ ) that has not been updated for the last few months!
On the other hand, this is still the only book that covers YARN in detail. Holmes’ 2nd edition of Hadoop in Practice focuses on YARN and I would recommend that option (especially as the 2nd chapter is available free online) for now. However, in its defence Apache Hadoop™ YARN definitely gives us better understanding why things evolved into what we got in 2.x (if a reader is interested in such historical digressions).
Actually, I think it might be best to wait for Lam & Davis 2nd edition of Hadoop in Action or one of the best Hadoop books ever – Tom White’s 4th edition of Hadoop: The Definitive Guide, in which the YARN chapter will hopefully get updated before it is shipped (the latest Hadoop releases (especially 2.6.0) pushed YARN capabilities quite a bit further).
Of course, we could wish for the 2nd edition of Apache Hadoop™ YARN as well; however, I would rather see the authors focusing on coding, as Hadoop hardcore users are definitely interested in ironing out these few remaining YARN issues at the cost of a concise, well-rounded position offering a bigger picture. After all we have the source code, right?