We are very pleased to announce that our group got 5 papers accepted for presentation at SEMANTiCS 2019: 15th International Conference on Semantic Systems, which will be held on Sept. 09-12, 2019 Karlsruhe Germany.
SEMANTiCS is an established knowledge hub where technology professionals, industry experts, researchers and decision-makers can learn about new technologies, innovations and enterprise implementations in the fields of Linked Data and Semantic AI. Since 2005, the conference series has focused on semantic technologies, which are today together with other methodologies such as NLP and machine learning the core of intelligent systems. The conference highlights the benefits of standards-based approaches.
Here is the list of the accepted papers with their abstract:
Abstract: Over the last two decades, the amount of data which has been created, published and managed using Semantic Web standards and especially via Resource Description Framework (RDF) has been increasing. As a result, efficient processing of such big RDF datasets has become challenging. Indeed, these processes require, both efficient storage strategies and query-processing engines, to be able to scale in terms of data size. In this study, we propose a scalable approach to evaluate SPARQL queries over distributed RDF datasets using a semantic-based partition and is implemented inside the state-of-the-art RDF processing framework: SANSA. An evaluation of the performance of our approach in processing large-scale RDF datasets is also presented. The preliminary results of the conducted experiments show that our approach can scale horizontally and perform well as compared with the previous Hadoop-based system. It is also comparable with the in-memory SPARQL query evaluators when there is less shuffling involved.
Abstract: While the multilingual data on the Semantic Web grows rapidly, the building of multilingual ontologies from monolingual ones is still cumbersome and hampered due to the lack of techniques for cross-lingual ontology enrichment. Cross-lingual ontology enrichment greatly facilitates the semantic interoperability between different ontologies in different natural languages. Achieving such enrichment by human labor is very costly and error-prone. Thus, in this paper, we propose a fully automated ontology enrichment approach (OECM), which builds a multilingual ontology by enriching a monolingual ontology from another one in a different natural language, using a cross-lingual matching technique. OECM selects the best translation among all available translations of ontology concepts based on their semantic similarity with the target ontology concepts. We present a use case of our approach for enriching English Scholarly Communication Ontologies using German and Arabic ontologies from the MultiFarm benchmark. We have compared our results with the results from the Ontology Alignment Evaluation Initiative (OAEI 2018). Our approach has higher precision and recall in comparison to five state-of-the-art approaches. Additionally, we recommend some linguistic corrections in the Arabic ontologies in Multifarm which have enhanced our cross-lingual matching results.
Abstract: The disruptive potential of the upcoming digital transformations for the industrial manufacturing domain have led to several reference frameworks and numerous standardization approaches. On the other hand, the Semantic Web community has elaborated remarkable amounts of work for instance on data and service description, integration of heterogeneous sources and devices, and AI techniques in distributed systems. These two work streams are, however, mostly unrelated and only briefly regard the opposite requirements, practices and terminology. We contribute to this gap by providing the Semantic Asset Administration Shell, a RDF-based representation of the Industrie 4.0 Component. We provide an ontology for the latest data model specification, created a RML mapping, supply resources to validate the RDF entities and introduce basic reasoning on the Asset Administration Shell data model. Furthermore, we discuss the different assumptions and presentation patterns, and analyze the implications of a semantic representation on the original data. We evaluate the thereby created overheads, and conclude that the semantic lifting is manageable also for restricted or embedded devices and therefore meets the conditions of Industrie 4.0 scenarios.
Abstract: Increasing digitization leads to a constantly growing amount of data in a wide variety of application domains. Data analytics, including in particular machine learning, plays the key role to gain actionable insights from this data in a variety of domains and real-world applications. However, configuration of data analytics workflows that include heterogeneous data sources requires significant data science expertise, which hinders wide adoption of existing data analytics frameworks by non-experts. In this paper we present the Simple-ML framework that adopts semantic technologies, including in particular domain-specific semantic data models and dataset profiles, to support efficient configuration, robustness and reusability of data analytics workflows. We present semantic data models that lay the foundation for the framework development and discuss the data analytics workflows based on these models. Furthermore, we present an example instantiation of the Simple-ML data models for a real-world use case in the mobility application domain and discuss the emerging challenges.
Abstract: In the Big Data era, the amount of digital data is increasing exponentially. knowledge graphs are gaining attention to handle the variety dimension of Big Data, allowing machines to understand the semantics present in data. For example, knowledge graphs such as STITCH, SIDER, and DrugBank have been developed in the Biomedical Domain. As the number of data increases, it is critical to perform data analytics. Interaction network analysis is especially important in knowledge graphs,e.g., to detect drug-target interactions. Having a good target identification approach helps in accelerating and reducing the cost of discovering new medicines. In this work, we propose a machine learning-based approach that combines two inputs: (1) interactions and similarities among entities, and (2) translation to embeddings technique. We focus on the problem of discovering missing links in the data, called link prediction. Our approach, named SimTransE, is able to analyze the drug-target interactions and similarities. Based on this analysis, SimTransE is able to predict new drug-target interactions. We empirically evaluate Sim-TransE using existing benchmarks and evaluation protocols defined by existing state-of-the-art approaches. Our results demonstrate the good performance of SimTransE in the task of link prediction.
Furthermore, we got 2 demo/poster papers accepted at the Poster & Demo Track.
Here is the list of the accepted poster/demo papers with their abstract:
Abstract: With the recent trend on blockchain, many users want to know more about the important players of the chain. In this study, we investigate and analyze the Ethereum blockchain network in order to identify the major entities across the transaction network. By leveraging the rich data available through Alethio’s platform in the form of RDF triples we learn about the Hubs and Authorities of the Ethereum transaction network. Alethio uses SANSA for efficient reading and processing of such large-scale RDF data (transactions on Ethereum blockchain) in order to perform analytics e.g. finding top accounts, or typical behavior patterns of exchanges’ deposit wallets and more.
Abstract: Open Data portals often struggle to provide release features (i.e., stable versioning, up-to-date download links, rich metadata descriptions) for their datasets. By this means, wide adoption of publicly available data collections is hindered, since consuming applications cannot access fresh data sources or might break due to data quality issues. While there exists a variety of tools to efficiently control release processes in software development, the management of dataset releases is not as clear. This paper proposes a deployment pipeline for efficient dataset releases that is based on automated enrichment of DCAT/DataID metadata and is a first step towards efficient deployment pipelining for Open Data publishing.
This work was partially funded by the EU Horizon2020 projects Boost4.0 (GA no. 780732), BigDataOcean (GA no. 732310), SLIPO (GA no. 731581), QROWD (GA no. 723088), the Federal Ministry of Transport and Digital Infrastructure (BMVI) for the LIMBO project (GA no. 9F2029A and 19F2029G), and Simple-ML project.
Looking forward to seeing you at the SEMANTiCS 2019.