SANSA 0.8.0 RC1 (Semantic Analytics Stack) Released

The Smart Data Analytics group [1] is happy to announce the candidate release (0.8.0 RC1) for SANSA Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark in order to allow scalable machine learning, inference, and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • SPARQL querying via Sparqlify, Ontop and Tensors
  • RDFS, RDFS Simple and OWL-Horst forward chaining inference

Noteworthy changes and updates since the previous release are:

  • Support for Ontop Based Query Engine over RDF.
  • Distributed Trig/Turtle record reader.
  • Support to write out RDDs of OWL axioms in a variety of formats.
  • Distributed Data Summaries with ABstraction and STATistics (ABSTAT).
  • Configurable mapping of RDD of triples dataframes.
  • Initial support for RDD of Graphs and Datasets, executing queries on each entry and aggregating over the results.
  • Sparql Transformer for ML-Pipelines.
  • Autosparql Generation for Feature Extraction.
  • Distributed Feature-based Semantic Similarity Estimations.
  • Added a common R2RML abstraction layer for Ontop, Sparqlify, and possible future query engines.
  • Consolidated SANSA layers into a single GIT repository.
  • Retired the support for Apache Flink.

We look forward to your comments on the new features to make them permanent in our upcoming release 0.8.

Kindly note that the candidate is not in the Maven Central, please follow the readme.

We want to thank everyone who helped to create this release, in particular the projects supporting us: PLATOON, BETTER, BOOST, SPECIAL, Simple-ML, LAMBDA, ML-win, CALLISTO, OpertusMundi, & Cleopatra.

Greetings from the SANSA Development Team