SPARQL queries search for specified patterns in RDF data using triple patterns as building blocks. Although many different solutions for efficient querying have been proposed in the past. We want to explore the novel tensor based RDF representations and use them for fast and scalable Querying on large scale cluster. We will use a computational in-memory framework for distributed SPARQL query answering,based on the notion of degree of freedom of a triple. This algorithm relies on a general model of RDF graph based on the firstprinciples of linear algebra, in particular ontensorial calculus.

RDF compression techniques (en-US)

As a starting point, realizing a fresh state-of-the-art of compression techniques for RDF could be made. These techniques can mainly be divided into two families: the ones that compress as much as possible datasets in order to make transfers easier (see e.g. the study of Fernández et al.) and the ones which still allow data to be queried (see e.g. the HDT structure). Secondly, a reflexion on a new compression model may be thought about and then realized/implemented successfully -obviously, a already have some suggestion which could help the student 😉 like for instance try to compress the RDF graphs according to patterns which could be used in parallel of SPARQL query shapes.

Query Decomposer and Optimizer for querying scientific datasets (en-US)

The task of relation linking in question answering is the identification of the relation (predicate) in a given question and its linking to the corresponding entity in a knowledge base. It is an important step in question answering, which allows us afterwards to build formal queries against, e.g., a knowledge graph. Most of the existing question answering systems focus on the English language and very few question answering components support other languages like German. The goal of this thesis is to identify from the literature as well as develop relation extraction tools that could be adapted to work for German questions.The amount of scientific datasets has increased dramatically in recent years. Copernicus data repository – is a prominent example of a collection of datasets related to climate, atmosphere, agriculture, and marine domains, publicly available on the Web. Until now, scientists have to look for the appropriate datasets, download them, and query/analyze them using their own infrastructure. Being able to query/analyze scientific data without knowing about the underlying datasets is not at the moment possible. The goal of this thesis will be to create a query engine that will be able to query scientific datasets transparently, without being aware of the available datasets.

Recommendation system for RDF partitioners (en-US)

In order to store and query big RDF datasets efficiently in distributed environments, different partitioning techniques need to be implemented. Several techniques have been proposed for splitting Big RDF Data, ranging from vertical, hash, graph to semantic-based partitioners. However, the selection of the “best partitioner” depends highly on the structure of the dataset and the query efficiency and effectiveness are coupled to the query engine used. The goal of this thesis will be to develop a recommender system that will suggest the “best partitioner” based on the structure of the data and specific requirements.

Data quality is considered as a multidimensional concept that covers different aspects of quality such as accuracy, completeness, and timeliness. With the advent of Big Data, traditional quality assessment techniques are facing different challenges. Therefore, we should adopt the traditional techniques to big data technologies. The goal of this thesis is to re-implement the assessment techniques in the SANSA framework.

RDF2Résumé (en-US)

Basically, this subject would offer the student the possibility of entering in the SemanticWeb world while creating a fancy and useful tool. In a nutshell, RDF2Résumé would imply (1) to design a résumé ontology; (2) to be provide a simple tool (a simple piece of software such as a script) able to generate -let’s say- LaTeX code from an RDF file compliant with the aforementioned ontology; (3) in parallel to propose several final résumé templates; (4) and finally to realize a basic user-interface; (+) to give the possibility of automatically changing languages.

Big RDF datasets need to be stored and processed in distributed RDF data stores that are built on top of cluster servers. Several partitioning schemes like horizontal, vertical, and hash partitioning, exist that allow for splitting the datasets into several nodes, in order to achieve scalability and efficient query processing. The goal of this thesis is to study graph partitioning approaches for RDF data, compare the state of the art, and implement corresponding algorithms that will be integrated into the SANSA framework.