Language support and linguistics in Lucene/Solr/elasticsearch and the open source and commercial eco-system


In search, language processing is often key to getting a good search experience. This talk gives an overview of language handling and linguistics functionality in Lucene/Solr/elasticsearch and best-practices for using them to handle Western, Asian and multi-language deployments. Pointers and references within the open source and commercial eco-systems for more advanced linguistics and their applications are also discussed.

The presentation is mix of overview and hands-on best-practices the audience can benefit immediately from in their Lucene, Solr or elasticsearch deployments. The eco-system part is meant to inspire how more advanced functionality can be developed by means of the available open source technologies within the Apache eco-system (predominantly) while also highlighting some of the commercial options available.

Attendees will learn these things when attending the session:

  1. Get an overview of the linguistics functionality available in Lucene, Solr and elasticsearch

  2. Understand best-practices for working with common languages, including European, Asian and multi-language deployments

  3. Get pointers to relevant open source software and commercial options for more advanced linguistics and their applications within search

About the speaker: 
Christian Moen is an Apache Lucene/Solr Committer and a software engineer at Atilika Inc., a Japanese corporation providing innovative products and services within search, natural language processing and big data to leading businesses world wide. Christian has 13+ years experience with search across a range of industries and holds an M.Sc. in computer science from the University of Oslo, Norway.

Schedule info

Time slot: 
3 June 16:50 - 17:35