Stratio's Cassandra Lucene index: Geospatial use cases

Thursday 17th

16:30 to 17:10

Stratio’s Cassandra Lucene Index, derived from Stratio Cassandra, is an open sourced plugin for Apache Cassandra that extends its index functionality to provide near real time search such as ElasticSearch or Solr, including full text search capabilities and free multivariable, geospatial and bitemporal search. It is achieved through an Apache Lucene based implementation of Cassandra secondary indexes, where each node of the cluster indexes its own data. Stratio’s Cassandra indexes are one of the core modules on which Stratio’s BigData platform is based.

During our talk, we will discuss the recently added geospatial search features in Stratio's Cassandra Lucene index using some Nephila Capital use cases. These new features include indexing complex polygons, nearest neighbour search, and the application of chained geometrical transformations such as bounding box, convex hull, centroid, union, intersection, exclusion and distance buffer.

Nephila Capital’s main business is to provide property reinsurance against natural catastrophe such as hurricanes or earthquakes. It relies heavily on geospatial tools to index and search properties, detect and fix anomalies, cluster risks, analyze and model peril footprints and report risks.

We will start with a brief explanation of how Stratio’s Lucene-based index works, including state of the art, architecture, installation and usage examples. Also Spark integration and tools will be discussed. Stratio’s Lucene is a general purpose indexing tool, so this first part of the talk will not be focused in geospatial search features.

The second part of the talk will start with a very quick review of the geospatial search features that were presented during last Big Data Spain edition. Then, we will show the new geospatial features that have been added during our collaboration with Nephila Capital. These new functionalities use the open sourced Java Topology Suite (JTS) library through its integration with Lucene’s Spatial4j to index complex geospatial shapes, including points, linerings, polygons, multipoints, multilines, multipolygons and arbitrary shape collections. Finally, we will show the new transformations API, that allows to recursively apply geometrical transformations to shapes. These transformations can be applied both is index and search time.

To discuss the application of the new Stratio's Cassandra Lucene index features, we will use a Cassandra cluster that stores and indexes several millions of geographical shapes taken from the US census database. These use cases will include the search for census blocks inside a geographical area, how to build heat maps using distances to fire and police stations, and we will also search for properties that are in the trajectory of a hurricane.

Andrés de la Peña García

Stratio Big Data Software Architect