Papers accepted at ISWC 2017 – Smart Data Analytics

We are very pleased to announce that our group got 6 papers accepted for presentation at ISWC 2017, which will be held on 21-24 October in Vienna, Austria.
The International Semantic Web Conference (ISWC) is the premier international forum where Semantic Web / Linked Data researchers, practitioners, and industry specialists come together to discuss, advance, and shape the future of semantic technologies on the web, within enterprises and in the context of the public institution.

Here is the list of the accepted paper with their abstract:

“Distributed Semantic Analytics using the SANSA Stack” by Jens Lehmann, Gezim Sejdiu, Lorenz Bühmann, Patrick Westphal, Claus Stadler, Ivan Ermilov, Simon Bin, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo and Hajira Jabeen.

Abstract: A major research challenge is to perform scalable analysis of large-scale knowledge graphs to facilitate applications like link prediction, knowledge base completion and reasoning. Analytics methods which exploit expressive structures usually do not scale well to very large knowledge bases, and most analytics approaches which do scale horizontally (i.e., can be executed in a distributed environment) work on simple feature-vector-based input. This software framework paper describes the ongoing Semantic Analytics Stack (SANSA) project, which supports expressive and scalable semantic analytics by providing functionality for distributed computing on RDF data.

“A Corpus for Complex Question Answering over Knowledge Graphs” by Priyansh Trivedi, Gaurav Maheshwari, Mohnish Dubey and Jens Lehmann.

Abstract: Being able to access knowledge bases in an intuitive way has been an active area of research over the past years. In particular, several question answering (QA) approaches which allow to query RDF datasets in natural language have been developed as they allow end users to access knowledge without needing to learn the schema of a knowledge base and learn a formal query language. To foster this research area, several training datasets have been created, e.g.~in the QALD (Question Answering over Linked Data) initiative. However, existing datasets are insufficient in terms of size, variety or complexity to apply and evaluate a range of machine learning based QA approaches for learning complex SPARQL queries. With the provision of the Large-Scale Complex Question Answering Dataset (LC-QuAD), we close this gap by providing a dataset with 5000 questions and their corresponding SPARQL queries over the DBpedia dataset.In this article, we describe the dataset creation process and how we ensure a high variety of questions, which should enable to assess the robustness and accuracy of the next generation of QA systems for knowledge graphs.

“Iguana : A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores” by Felix Conrads, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem and Mohamed Morsey.

Abstract : The performance of triples stores is crucial for applications which rely on RDF data. Several benchmarks have been proposed that assess the performance of triple stores. However, no integrated benchmark-independent execution framework for these benchmarks has been provided so far. We propose a novel SPARQL benchmark execution framework called IGUANA. Our framework complements benchmarks by providing an execution environment which can measure the performance of triple stores during data loading, data updates as well as under different loads. Moreover, it allows a uniform comparison of results on different benchmarks. We execute the FEASIBLE and DBPSB benchmarks using the IGUANA framework and measure the performance of popular triple stores under updates and parallel user requests. We compare our results with state-of-the-art benchmarking results and show that our benchmark execution framework can unveil new insights pertaining to the performance of triple stores.

“Sustainable Linked Data generation: the case of DBpedia” by Wouter Maroy, Anastasia Dimou, Dimitris Kontokostas, Ben De Meester, Jens Lehmann, Erik Mannens and Sebastian Hellmann.

Abstract : DBpedia EF, the generation framework behind one of the Linked Open Data cloud’s central interlinking hubs, has limitations regarding the quality, coverage and sustainability of the generated dataset. Hence, DBpedia can be further improved both on schema and data level. Errors and inconsistencies can be addressed by amending (i) the DBpediaEF; (ii) the DBpedia mapping rules; or (iii) Wikipedia itself. However, even though the DBpedia ef is continuously evolving and several changes were applied to both the DBpedia EF and mapping rules, there are no significant improvements on the DBpedia dataset since the identification of its limitations. To address these shortcomings, we propose adapting a different semantic-driven approach that decouples, in a declarative manner, the extraction, transformation and mapping rules execution. In this paper, we provide details regarding the new DBpedia EF, its architecture, technical implementation and extraction results. This way, we achieve an enhanced data generation process for DBpedia, which can be broadly adopted, that improves its quality, coverage and sustainability.

“Realizing an RDF-based Information Model for a Manufacturing Company – A Case Study” by Niklas Petersen, Lavdim Halilaj, Irlán Grangel-González, Steffen Lohmann, Christoph Lange and Sören Auer.

Abstract: The digitization of the industry requires information models describing assets and information sources of companies to enable the semantic integration and interoperable exchange of data. We report on a case study in which we realized such an information model for a global manufacturing company using semantic technologies. The information model is centered around machine data and describes all relevant assets, key terms and relations in a structured way, making use of existing as well as newly developed RDF vocabularies. In addition, it comprises numerous RML mappings that link different data sources required for integrated data access and querying via SPARQL. The technical infrastructure and methodology used to develop and maintain the information model is based on a Git repository and utilizes the development environment VoCol as well as the Ontop framework for Ontology Based Data Access. Two use cases demonstrate the benefits and opportunities provided by the information model. We evaluated the approach with stakeholders and report on lessons learned from the case study.

“Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches” by Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter.

Abstract: During empirical evaluations of query processing techniques, metrics like execution time, time for the first answer, and throughput are usually reported. Albeit informative, these metrics are unable to quantify and evaluate the efficiency of a query engine over a certain time period -or diefficiency-, thus hampering the distinction of cutting-edge engines able to exhibit high-performance gradually. We tackle this issue and devise two experimental metrics named dief@t and dief@k, which allow for measuring the diefficiency during an elapsed time period or while k answers are produced, respectively. The dief@t and dief@k measurement methods rely on the computation of the area under the curve of answer traces and thus capturing the answer concentration over a time interval. We report experimental results of evaluating the behavior of a generic SPARQL query engine using both metrics. Observed results suggest that dief@t and dief@k are able to measure the performance of SPARQL query engines based on both the amount of answers produced by an engine and the time required to generate these answers.

Acknowledgments
These work were supported by the European Union’s H2020 research and innovation action HOBBIT (GA no. 688227), the European Union’s H2020 research and innovation program BigDataEurope (GA no.644564), German Ministry BMWI under the SAKE project (Grant No. 01MD15006E), WDAqua : Marie Skłodowska-Curie Innovative Training Network and Industrial Data Space.

Looking forward to seeing you at ISWC 2017.