We are very pleased to announce that our group got a paper accepted for presentation at IEEE-ICSC 2021. The 15th IEEE International Conference on Semantic Computing (ICSC2021) addresses the derivation, description, integration, and use of semantics (“meaning”, “context”, “intention”) for all types of resource including data, document, tool, device, process and people. The scope of ICSC2021 includes, but is not limited to, analytics, semantics description languages and integration (of data and services), interfaces, and applications.
Here is the abstract and the link to the paper (we also provide a preprint):
Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs with DistSim
By Carsten Draschner
, Jens Lehmann
, and Hajira Jabeen
In this paper, we present DistSim, a Scalable Distributed in-Memory Semantic Similarity Estimation framework for Knowledge Graphs. DistSim provides a multitude of state-of-the-art similarity estimators. We have developed the Similarity Estimation Pipeline by combining generic software modules. For large scale RDF data, DistSim proposes MinHash with locality sensitivity hashing to achieve better scalability over all-pair similarity estimations. The modules of DistSim can be set up using a multitude of (hyper)-parameters allowing to adjust the tradeoff between information taken into account, and processing time. Furthermore, the output of the Similarity Estimation Pipeline is native RDF. DistSim is integrated into the SANSA stack, documented in scala-docs, and covered by unit tests. Additionally, the variables and provided methods follow the Apache Spark MLlib name-space conventions. The performance of DistSim was tested over a distributed cluster, for the dimensions of data set size and processing power versus processing time, which shows the scalability of DistSim w.r.t. increasing data set sizes and processing power. DistSim is already in use for solving several RDF data analytics related use cases. Additionally, DistSim is available and integrated into the open-source GitHub project SANSA.