Software Release – Smart Data Analytics

SANSA 0.4 (Semantic Analytics Stack) Released

2018-06-262018-06-26JensLehmann

We are happy to announce SANSA 0.4 – the fourth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
Reading OWL files in various standard formats
Support for multiple data partitioning techniques
SPARQL querying via Sparqlify
Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
RDFS, RDFS Simple, OWL-Horst, EL (experimental) forward chaining inference
Automatic inference plan creation (experimental)
RDF graph clustering with different algorithms
Terminological decision trees (experimental)
Anomaly detection (beta)
Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

Parser performance has been improved significantly e.g. DBpedia 2016-10 can be loaded in <100 seconds on a 7 node cluster
Support for a wider range of data partitioning strategies
A better unified API across data representations (RDD, DataFrame, DataSet, Graph) for triple operations
Improved unit test coverage
Improved distributed statistics calculation (see ISWC paper)
Initial scalability tests on 6 billion triple Ethereum blockchain data on a 100 node cluster
New SPARQL-to-GraphX rewriter aiming at providing better performance for queries exploiting graph locality
Numeric outlier detection tested on DBpedia (en)
Improved clustering tested on 20 GB RDF data sets

Deployment and getting started:

There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
Example code is available for various tasks.
We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST and SPECIAL.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

SANSA 0.3 (Semantic Analytics Stack) Released

2017-12-152017-12-15JensLehmann

We are happy to announce SANSA 0.3 – the third release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

Website: http://sansa-stack.net
GitHub: https://github.com/SANSA-Stack
Download: http://sansa-stack.net/downloads-usage/
ChangeLog: https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
Reading OWL files in various standard formats
Support for multiple data partitioning techniques
SPARQL querying via Sparqlify (with some known limitations until the next Spark 2.3.* release)
SPARQL querying via conversion to Gremlin path traversals (experimental)
RDFS, RDFS Simple, OWL-Horst (all in beta status), EL (experimental) forward chaining inference
Automatic inference plan creation (experimental)
RDF graph clustering with different algorithms
Rule mining from RDF graphs based AMIE+
Terminological decision trees (experimental)
Anomaly detection (beta)
Distributed knowledge graph embedding approaches: TransE (beta), DistMult (beta), several further algorithms planned

Deployment and getting started:

There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
There is example code for various tasks available.
We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data Europe, HOBBIT, SAKE, Big Data Ocean, SLIPO, QROWD and BETTER.

Greetings from the SANSA Development Team

SML-Bench 0.2 Released

2017-05-112017-05-13Patrick Westphal

Dear all,

we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. It already comes with adapters for prominent inductive learning systems like the DL-Learner, the General Inductive Logic Programming System (GILPS), and Aleph, as well as Inductive Logic Programming ‘classics’ like Golem and Progol. The framework is easily extensible, be it in terms of new benchmarking scenarios, or support for new learning systems. SML-Bench allows to define, run and report on benchmarks combining different scenarios and learning systems giving insight into the performance characteristics of the respective inductive learning algorithms on a wide range of learning problems.

Website: http://sml-bench.aksw.org/
GitHub page: https://github.com/AKSW/SML-Bench/
Change log: https://github.com/AKSW/SML-Bench/releases/tag/0.2

In the current release we extended the options to configure learning systems in the overall benchmarking configuration, and added support for running multiple instances of a learning system, as well as the nesting of instance-specific settings and settings that apply to all instances of a learning system. Besides internal refactoring to increase the overall software quality, we also extended the reporting capabilities of the benchmark results. We added a new benchmark scenario and experimental support for the Statistical Relational Learning system TreeLiker.

We want to thank everyone who helped to create this release and appreciate any feedback.

Best regards,

Patrick Westphal, Simon Bin, Lorenz Bühmann and Jens Lehmann