Mark Watson
4.0 out of 5 stars Ruby-centric tutorials on Semantic Web, Natural Language Processing, and Large-Scale Data Storage and Processing Technologies July 4, 2009
This four-part book is focused on programming techniques and technologies that in the author's opinion can help next generation web applications handle data more “intelligently”. The code samples are implemented in Ruby (and a little bit of Java).
Part One (Chapters 1-3) is basically an introduction to text and natural language processing, sampling tools and techniques for extracting raw text from various document types (e.g., pdf to plain text), classifying a document's subject matter (e.g., is this a document on “Health” or “Politics”) or overall sentiment direction (degree of positiveness or negativeness), and recognizing entities such as persons and places in text (e.g., is “Florida” in a given sentence referring to a U.S. state, which is a place entity, or a person whose last name is Florida?).
Part Two (Chapters 4-7) provides tutorials on the Semantic Web, explaining what the RDF subject-predicate-object data format is and how a query language like SPARQL supports inferencing. URLs for publicly available RDF data sets, as well as tools and services useful for exploring them are given.
Part Three (Chapters 8-12) covers topics relating to the use of object-relational mapping (e.g., ActiveRecord and DataMapper used in standalone mode) and search (e.g., Lucene and Sphinx) technologies, publishing relational data as RDF data,and strategies for large-scale data storage involving the use of multiple servers, memcached, CouchDB, Amazon S3, or Amazon EC2.
Part Four (Chapters 13-15) includes a really good tutorial on the use of Hadoop-like Map Reduce facilities for large scale data processing, and ties things together by showing how the knowledge learned from previous chapters can be applied to the development of more substantial web applications.
The author uses many open-source gems (Ruby-centric software library) and tools in this book (most will work fine on Linux, Mac, or Windows, and with Ruby 1.8.x or 1.9, but exceptions are reported clearly), so in many cases, you only need to write a limited amount of code to follow along. If you don't want to download and install the gems, Appendix A provides instructions on how to apply for an Amazon Web Services account to access a ready to use Amazon Machine Image put together by the author for use on a rented Amazon EC2 Server Instance.
Because of the breadth of coverage, each technology can only be discussed to a limited depth, which some readers may find adequate and some may not, depending on a reader's interest on a particular topic, but most should still find this book to be a valuable resource, and that the author explains things well and concisely.