Testing Lucene and Solr with various JVMs: Bugs, Bugs, Bugs

Track: 
search
Speaker(s): 

When Oracle released Java 7 GA in July 2011, it contained a serious loop optimization bug, affecting Apache projects like Lucene and Solr. Without a public warning posted to the users it may have happened that lots of Lucene indexes were corrupted by misbehaving optimization code in Oracle’s Hotspot compiler. Unfortunately the bug was found too late, so Oracle had no time to fix or disable the optimizations.

Since detection of this problem, the Lucene committers were working on including various JVMs from different vendors - in combination with various optimization settings, platforms, word sizes, and garbage collectors - into their Jenkins builds. In combination with Lucene’s random testing framework this helps to detect bugs early and warn users with an updated list of broken Java versions. Lucene committers found bugs in JVMs from other vendors, too, leading to corrupt indexes. Not only having static tests but also those using repeatable random number generators to generate test data also helped to find other bugs in the Java class library (e.g., regular expressions, BreakIterators, localization issues).

This talk will present the techniques used to customize Jenkins to randomly select JDKs and optimization options. Unfortunately this is not yet done on Apache’s own Jenkins build farm; but to widen the number of different Java installations and operating systems, several sponsored external servers are used to run the thorough, randomized Lucene/Solr test suite 24/7, selecting Java versions from a huge list of options. The talk will also present some bugs (including the famous Java 7 bug) and risky optimization settings commonly applied by users.

About the speaker: 
Uwe is committer and PMC member of Apache Lucene and Solr. His main focus is on development of Lucene Java. He implemented fast numerical search and is maintaining the new attribute-based text analysis API. He studied Physics at the University of Erlangen-Nuremberg and works as managing director for SD DataSolutions GmbH in Bremen, Germany, a company that provides consulting and support for Apache Lucene, ElasticSearch, and Apache Solr. A primary customer of his company is “PANGAEA – Publishing Network for Geoscientific & Environmental Data” where he implemented the portal's geo-spatial retrieval functions with Lucene Java. Uwe had talks about Lucene at various international conferences like the previous Berlin Buzzwords, ApacheCon EU/US, Lucene Revolution, Lucene Eurocon, and various local meetups.

Schedule info

Time slot: 
3 June 14:45 - 15:30
Room: 
Kesselhaus