Practical Technology

for practical people.

Solr: The Most Important Open Source Project You’ve Never Heard Of

| 0 comments

When I did a story recently about in-demand open source jobs, I wasn’t surprised to hear from Dice that the job market was hot, hot, hot for OpenStack (for Big Data, Hadoop in particular) and for the LAMP stack (Linux, Apache, MySQL, PHP/Perl/Python). What did surprise me–indeed, shocked me–was that another red-hot tech jobs area was Solr.

“Solr?” I wondered. “What the heck is Solr?” That was also the reaction of all of my developer friends. And, since my buddies and I, among the lot of us, have centuries in the tech business, we’ve seen a lot of programs. That “Oh, did you say money?” item set me to do my research. Now I’ll let you into the secrets of Solr.

One reason why Solr may not have gained the attention it deserves is it’s actually apart of another larger and much better known open source project, Apache Lucene. This, as I’m sure you know, is a Java-based text search engine library.

Lucene is used by many companies and groups as the foundation for their search engines. These organizations include AOL, Disney, and Eclipse. Lucene’s chief selling point is that the indexing engine, with a footprint of a mere megabyte of RAM, can index up to 150GBs per hour of text on commercial off-the-shelf hardware. That’s darn good!

Solr comes into the picture as the search platform front-end for Lucene. It provides full-text search, including the ability to handle such formats as Microsoft Word and PDF with Apache Tika; hit test highlighting; and faceted search, which incorporates free text searching with topic taxonomy indexing.

Like Lucene, Solr is very popular (even if I didn’t know about it before now). It’s used by sites such as Reddit, Netflix, and Instagram. These are all websites whose users won’t stand for slow response time. Solr can deliver the kind of performance that cranky users demand.

Under the hood, Solr is written in Java and it relies on Lucene for its core functionality.  It usually runs within a servlet container such as the Jetty HTTP server and Javax.servlet.

Solr has REST-like HTTP/XML and JavaScript Object Notation (JSON) APIs for ease of programming from almost any language. So, while you can work with Solr using its native Java, you also can use your language of choice. For example, query results can be returned in XML/XSLT, JSON, Python, Ruby, PHP, Velocity, CSV, or binary formats. You can use this data with whatever package strikes your fancy.

While Solr is built on Lucene, it also expands upon it. For instance, it supports sharded data, geospatial search, and user extensible caching. The end-result is a very fast and flexible back-end DBMS for almost any Web search engine job.

With its exhaustive documentation, the program promises to make it easy to get up to speed. As for administrators, with an AJAX-based administration interface and comprehensive logging facilities, Solr is simple to manage.

While Solr is clearly useful and easy, clearly not enough Solr experts are out there. According to Dice on September 11th, 2013, there were no fewer than 318 Solr jobs listed. Many of these job listings had a phrase like, “Solr experience is a must.” If you’re interested in pursuing this in-demand open source job skill, big data experience is a real plus. In particular, Hadoop and its close relative, Hbase, were frequently mentioned. And, of course, if you can do all this on a cloud architecture, that’s a real plus.

So, in short, if you’d like a programming job sooner rather than later, don’t be like me and my buddies. Learn about Solr today so your resume will look better tomorrow.

Solr: The Most Important Open Source Project You’ve Never Heard Of. This story first appeared on the SmartBear Web site.

Leave a Reply