Paper accepted at IEEE-ICSC

We are very pleased to announce that our group got four papers accepted for presentation at IEEE-ICSC 2020.

The 14th IEEE International Conference on Semantic Computing (ICSC2020) addresses the derivation, description, integration, and use of semantics (“meaning”, “context”, “intention”) for all types of resources including data, document, tool, device, process and people. The scope of ICSC2020 includes, but is not limited to, analytics, semantics description languages and integration (of data and services), interfaces, and applications.

Here are the pre-prints of the accepted papers with their abstracts:

  • “DISE: A Distributed in-Memory SPARQL Processing Engine over Tensor Data” by Hajira Jabeen, Eskender Haziiev, Gezim Sejdiu, and Jens Lehmann.
    Abstract:SPARQL is a W3C standard for querying the data stored as Resource Description Framework (RDF). The SPARQL queries are represented using triple-patterns, and are tailored to search for these patterns in RDF. Most of the existing SPARQL evaluators provide centralized, DBMS inspired solutions consuming high resources and offering limited flexibility. In order to deal with the increasing RDF data, it is important to develop scalable and efficient solutions for distributed SPARQL query evaluators. In this paper we present DISE — an open source implementation of distributed in-memory SPARQL engine that can scale out to a cluster of machines. DISE represents an RDF graph as a three way distributed tensor for querying large-scale RDF datasets. This distributed tensor representation offers opportunities for novel distributed applications. DISE relies on translating SPARQL queries into Spark tensor operations by exploiting the information about the query complexity and creating a dynamic execution plan. We have tested the scalability and efficiency of DISE on different datasets and the results have been found scalable and efficient while exploiting the relatively new representation format.
  • “Let’s build Bridges, not Walls – SPARQL Querying of TinkerPop Graph Databases with sparql-gremlin” by Harsh Thakkar, Renzo Angles, Marko Rodriguez, Stephen Mallette, and Jens Lehmann.
    Abstract: This article presents sparql-gremlin, a tool to translate SPARQL queries to Gremlin pattern matching traversals. Currently, sparql-gremlin is a plugin of the Apache TinkerPop graph computing framework, thus the users can run queries expressed in the W3C SPARQL query language over a wide variety of graph data management systems, including both OLTP graph databases and OLAP graph processing frameworks. With sparql-gremlin, we perform the first step to bridgethe query interoperability gap between the Semantic Web and Graph database communities. The plugin has received adoption from both academia and industry research in its short timespan.

  • VoColReg: A Registry for Supporting Distributed Ontology Development using Version Control Systems” by Abderrahmane Khiat, Lavdim Halilaj, Ahmad Hemid and Steffen Lohmann (ICSC Resource Track).
    Abstract: The number of ontologies used for different pur-poses, such as data integration, information retrieval or search optimization, is constantly increasing. Therefore, it is crucial that ontologies can be developed and explored in an easy way by humans, and are accessible by intelligent agents. To this end, we created VoColReg on top of the VoCol platform. VoColReg provides an integrated registry that hosts VoCol instances, allowing the community to access, browse, reuse, and improve ontologies in a collaborative fashion. VoColReg integrates several improved features, such as RDF-Doctor which is able to simultaneously identify a comprehensive list of syntax errors and automatically correct a subset of them. Currently, the VoColReg platform hosts more than 21 ontologies from various domains, wherenine of them are publicly available. We analyzed those nine ontologies to discover different facts about them such as hosting platforms used, expressivity of the ontologies, number of triples and modules.

  • Learning a Lightweight Representation: First Step Towards Automatic Detection of Multidimensional Relationships between Ideas” by Abderrahmane Khiat (ICSC Research Track, Concise Paper).
    Abstract: Moving ideation from a closed paradigm (companies) to an open one (crowd) yields several benefits: (1) The crowd allows the generation of a large number of ideas and (2) Its heterogeneity increases the potential in obtaining creative ideas. In practice, however, the crowd often fails at generating innovative solutions, leading to duplicate or ideas that use each other’s description. Thus, it is practically and economically unfeasible to sift through this large number of ideas to select valuable ones. One promising solution to overcome this issue is finding relationships between idea texts such as duplicate, generalize, disjoint, alternative solution, etc. Existing approaches either rely on human judgment, which is expensive and requires domain experts or automatic approaches which compute similarity i.e. one dimension and do not consider other relations. The proposed solution is based on sequence-to-sequence learning, which allows the machine to learn a lightweight structural representation that is used next to establishing complex relations between ideas. This lightweight structural representation is obtained based on our investigation. We found that ideas contain the following patterns: what the idea is about (e.g. window with heat-sensitive material), how it works (e.g. it lights up) and when it works (e.g. in case of fire). Those extracted patterns are then compared with the corresponding patterns of other ideas to establish relations. Our preliminary investigation shows promising results to learn and leverage such lightweight structural representation in identifying the complex relationship between ideas.

Paper accepted at ESWA

We are very pleased to announce that our group got a paper accepted for presentation at ESWA (International Journal for Expert Systems with Applications). With an Impact Factor of 4.3 the journal is one of the major venues in for intelligent systems and information exchange. The focus of the journal is on exchanging information relating to expert and intelligent systems applied in industry, government, and universities worldwide.

Here are the pre-prints of the accepted papers with their abstracts:

Abstract: Open budget data are among the most frequently published datasets of the open data ecosystem, intended to improve public administrations and government transparency. Unfortunately, the prospects of analysis across different open budget data remain limited due to schematic and linguistic differences. Budget and spending datasets are published together with descriptive classifications. Various public administrations typically publish the classifications and concepts in their regional languages. These classifications can be exploited to perform a more in-depth analysis, such as comparing similar items across different, cross-lingual datasets. However, in order to enable such analysis, a mapping across the multilingual classifications of datasets is required. In this paper, we present the framework for Interlinking of Heterogeneous Multilingual Open Fiscal DaTA (IOTA). IOTA makes use of machine translation followed by string similarities to map concepts across different datasets. To the best of our knowledge, IOTA is the first framework to offer scalable implementation of string similarity using distributed computing. The results demonstrate the applicability of the proposed multilingual matching, the scalability of the proposed framework, and an in-depth comparison of string similarity measures.

Paper accepted at ICEGOV

We are very pleased to announce that our group got a paper accepted for presentation at ICEGOV (International Conference on Theory and Practice of Electronic Governance). ICEGOV stands for International Conference on Theory and Practice of Electronic Governance. Established in 2007, the conference runs annually and is coordinated by the United Nations University Operating Unit on Policy-Driven Electronic Governance (UNU-EGOV). Part of the United Nations University and headquartered in the city of Guimarães, north of Portugal, UNU-EGOV is a think tank dedicated to Electronic Governance; a core centre of research, advisory services and training; a bridge between research and public policies; an innovation enhancer; a solid partner within the UN system and its Member States with a particular focus on sustainable development, social inclusion and active citizenship.

Here is the pre-print of the accepted papers with its abstract:

Abstract: To improve governance accountability, public administrations are increasingly publishing their open data, which includes budget and spending data. Analyzing these datasets requires both domain and technical expertise. In civil communities, these technical and domain expertise are often not available. Hence, despite the increasing size of the open fiscal datasets being published, the level of analytics done on top of these datasets is still limited. Providentially, the developments in the computer science community enable further progress in data analysis in different domains, such as performing a comparative analysis of open budgets and spending data (open fiscal data). This is done by adopting and applying semantics on open fiscal data. In this paper, we demonstrate the feasibility of comparative analysis over linked open fiscal data and devise an approach to perform comparative analysis across from different public administrations. Open fiscal data are cleaned, analyzed, transformed (i.e., semantically lied), and have their related concept labels connected across different public administrations so budget/spending items from related concepts can be queried. Additionally, the growing information on linked open data (e.g., DBpedia) can also be used to provide additional context to the analysis and the query.

Update: The paper has received a best paper award nomination.