Paper accepted at IEEE-ICSC – Smart Data Analytics

We are very pleased to announce that our group got four papers accepted for presentation at IEEE-ICSC 2020.

The 14th IEEE International Conference on Semantic Computing (ICSC2020) addresses the derivation, description, integration, and use of semantics (“meaning”, “context”, “intention”) for all types of resources including data, document, tool, device, process and people. The scope of ICSC2020 includes, but is not limited to, analytics, semantics description languages and integration (of data and services), interfaces, and applications.

Here are the pre-prints of the accepted papers with their abstracts:

“DISE: A Distributed in-Memory SPARQL Processing Engine over Tensor Data” by Hajira Jabeen, Eskender Haziiev, Gezim Sejdiu, and Jens Lehmann.
Abstract:SPARQL is a W3C standard for querying the data stored as Resource Description Framework (RDF). The SPARQL queries are represented using triple-patterns, and are tailored to search for these patterns in RDF. Most of the existing SPARQL evaluators provide centralized, DBMS inspired solutions consuming high resources and offering limited flexibility. In order to deal with the increasing RDF data, it is important to develop scalable and efficient solutions for distributed SPARQL query evaluators. In this paper we present DISE — an open source implementation of distributed in-memory SPARQL engine that can scale out to a cluster of machines. DISE represents an RDF graph as a three way distributed tensor for querying large-scale RDF datasets. This distributed tensor representation offers opportunities for novel distributed applications. DISE relies on translating SPARQL queries into Spark tensor operations by exploiting the information about the query complexity and creating a dynamic execution plan. We have tested the scalability and efficiency of DISE on different datasets and the results have been found scalable and efficient while exploiting the relatively new representation format.

“Let’s build Bridges, not Walls – SPARQL Querying of TinkerPop Graph Databases with sparql-gremlin” by Harsh Thakkar, Renzo Angles, Marko Rodriguez, Stephen Mallette, and Jens Lehmann.
Abstract: This article presents sparql-gremlin, a tool to translate SPARQL queries to Gremlin pattern matching traversals. Currently, sparql-gremlin is a plugin of the Apache TinkerPop graph computing framework, thus the users can run queries expressed in the W3C SPARQL query language over a wide variety of graph data management systems, including both OLTP graph databases and OLAP graph processing frameworks. With sparql-gremlin, we perform the first step to bridgethe query interoperability gap between the Semantic Web and Graph database communities. The plugin has received adoption from both academia and industry research in its short timespan.
“VoColReg: A Registry for Supporting Distributed Ontology Development using Version Control Systems” by Abderrahmane Khiat, Lavdim Halilaj, Ahmad Hemid and Steffen Lohmann (ICSC Resource Track).
Abstract: The number of ontologies used for different pur-poses, such as data integration, information retrieval or search optimization, is constantly increasing. Therefore, it is crucial that ontologies can be developed and explored in an easy way by humans, and are accessible by intelligent agents. To this end, we created VoColReg on top of the VoCol platform. VoColReg provides an integrated registry that hosts VoCol instances, allowing the community to access, browse, reuse, and improve ontologies in a collaborative fashion. VoColReg integrates several improved features, such as RDF-Doctor which is able to simultaneously identify a comprehensive list of syntax errors and automatically correct a subset of them. Currently, the VoColReg platform hosts more than 21 ontologies from various domains, wherenine of them are publicly available. We analyzed those nine ontologies to discover different facts about them such as hosting platforms used, expressivity of the ontologies, number of triples and modules.
“Learning a Lightweight Representation: First Step Towards Automatic Detection of Multidimensional Relationships between Ideas” by Abderrahmane Khiat (ICSC Research Track, Concise Paper).
Abstract: Moving ideation from a closed paradigm (companies) to an open one (crowd) yields several benefits: (1) The crowd allows the generation of a large number of ideas and (2) Its heterogeneity increases the potential in obtaining creative ideas. In practice, however, the crowd often fails at generating innovative solutions, leading to duplicate or ideas that use each other’s description. Thus, it is practically and economically unfeasible to sift through this large number of ideas to select valuable ones. One promising solution to overcome this issue is finding relationships between idea texts such as duplicate, generalize, disjoint, alternative solution, etc. Existing approaches either rely on human judgment, which is expensive and requires domain experts or automatic approaches which compute similarity i.e. one dimension and do not consider other relations. The proposed solution is based on sequence-to-sequence learning, which allows the machine to learn a lightweight structural representation that is used next to establishing complex relations between ideas. This lightweight structural representation is obtained based on our investigation. We found that ideas contain the following patterns: what the idea is about (e.g. window with heat-sensitive material), how it works (e.g. it lights up) and when it works (e.g. in case of fire). Those extracted patterns are then compared with the corresponding patterns of other ideas to establish relations. Our preliminary investigation shows promising results to learn and leverage such lightweight structural representation in identifying the complex relationship between ideas.