GezimSejdiu – Page 2 – Smart Data Analytics

Workshop paper accepted at ICAIL/AIAS 2019

2019-08-192019-08-19GezimSejdiu

We are very pleased to announce that our group got a paper accepted for presentation at the AIAS 2019: The First Workshop on Artificial Intelligence and the Administrative State, which was held in conjunction with the 17th International Conference on AI and Law (ICAIL 2019) on Monday, 17 June 2019 in Montréal, Québec, Canada.

Recent advances in AI, Machine Learning, Human Language Technology, Network Science, and Human Factors analysis offer promising new approaches to improving the ability of all stakeholders, including agencies themselves, to operate within this complex regulatory environment. The scale of administrative states means that the benefits of automation have very high potential impact, both in improvements to government processes and in the delivery of services and benefits to citizens. At the same time, the black-box nature of many automated decision-making systems, particularly sub-symbolic AI components such as those generated by machine learning algorithms, can create considerable tension with the norms of transparency, accountability, and reason-giving that typically govern administrative action. Explainable, responsible, and trustworthy AI is vital for addressing these factors.

Here is the pre-print of the accepted paper with its abstract:

“Towards Measuring Risk Factors in Privacy Policies” by Najmeh Mousavi Nejad, Damien Graux, and Diego Collarana.

Abstract: The ubiquitous availability of online services and mobile apps results in a rapid proliferation of contractual agreements in the form of privacy policies. Despite the importance of such consent forms, the majority of users tend to ignore them due to their content length and complexity. Thus, users might be consenting policies that are not aligned to regulations in laws such as the GDPR from the EU law. In this study, we propose a hybrid approach which measures a privacy policy’s risk factor applying both supervised deep learning and rule-based information extraction. Benefiting from an annotated dataset of 115 privacy policies, a deep learning component is first able to predict high-level categories for each paragraph. Then, a rule-based module extracts pre-defined attributes and their values, based on high-level classes. Finally, a privacy policy’s risk factor is computed based on these attribute values.

SDA at ESWC 2019 and a Best Demo Award

2019-07-182019-07-18GezimSejdiu

The Extended Semantic Web Conference (ESWC) is a major venue for discussing the latest scientific results and technology innovations around semantic technologies. Building on its past success, ESWC is seeking to broaden its focus to span other relevant related research areas in which Web semantics plays an important role. ESWC 2019 will present the latest results in research, technologies and applications in its field. Besides the technical program organized over twelve tracks, the conference will feature a workshop and tutorial program, a dedicated track on Semantic Web challenges, system descriptions and demos, a posters exhibition and a doctoral symposium.

We are very pleased to announce that we got 2 papers accepted at ESWC 2019 for presentation at the main conference. Additionally, we also had 1 Workshop, 2 Tutorials and 1 workshop paper and 3 Posters/Demo papers accepted. In addition, we participated in the Networking Session.

Furthermore, we are very happy to announce that we won the Best Demo Award for the “MantisTable: a Tool for Creating Semantic Annotations on Tabular Data” by Marco Cremaschi, Anisa Rula, Alessandra Siano, and Flavio De Paoli.

Congratulations to best demo award winners: @cremmarco, @AnisaRula , Alessandra Siano and @FlavioDePaoli. MantisTable: a Tool for Creating Semantic Annotations on Tabular Data #eswc2019

— ESWC Conferences (@eswc_conf) June 7, 2019

Here are some further pointers in case you want to know more about MantisTable tool:

Website: http://mantistable.disco.unimib.it/
GitHub: https://bitbucket.org/disco_unimib/mantistable-tool
Demo: http://mantistable.disco.unimib.it/tables

Among the other presentations, our colleagues presented the following presentations:

Incorporating Joint Embeddings into Goal-Oriented Dialogues with Multi-Task Learning by Firas Kassawat, Debanjan Chaudhuri, and Jens Lehmann

Firas Kassawat is presenting the work about “Incorporating Joint Embeddings into Goal-Oriented Dialogues with Multi-Task Learning” @eswc_conf #eswc2019 https://t.co/4bQxDCJR7H pic.twitter.com/liXxAbuPyj

— SDA Research (@SDA_Research) June 4, 2019
EVENTSKG: A 5-Star Dataset of Top-ranked Events in Eight Computer Science Communities by Said Fathalla, Christoph Lange, and Sören Auer (a best resource paper nominee).

@sm_fathalla is presenting the work about “EVENTSKG: A 5-Star Dataset of Top-ranked Events in Eight Computer Science Communities” @eswc_conf #eswc2019 https://t.co/bF88h1gNN1 pic.twitter.com/YJ40qepv0P

— SDA Research (@SDA_Research) June 4, 2019

Workshops & Tutorials:

1st Workshop on Large Scale RDF Analytics (LASCAR-19) by Hajira Jabeen, Damien Graux, Gezim Sejdiu, Muhammad Saleem and Jens Lehmann.
We organized a workshop about large-scale rdf analytics co-located with the ESWC’19 conference. More than 20 people attended the workshop containing a keynote about “Analytical processing and reasoning in RDF stores” and 3 paper presentations. At the end of the workshop, a panel discussion was held about the future of rdf analytics at scale and challenges.

#LASCAR workshop just started @eswc_conf . @oliviercure is giving a talk about “Analytical processing and reasoning in #RDF stores” #eswc2019 pic.twitter.com/F6qZbcRQzp

— lascar-workshop (@lascarworkshop) June 3, 2019
SANSA’s Leap of Faith: Scalable RDF and Heterogeneous Data Lakes by Hajira Jabeen, Mohamed Nadjib Mami, Damien Graux, Gezim Sejdiu, and Jens Lehmann.

Our tutorial on ” SANSA’s Leap of Faith: Scalable RDF and Heterogeneous Data Lakes” is happening @eswc_conf #eswc2019 .
Great audience and lot of good discussion about the SANSA API sand its features. pic.twitter.com/Xuxhb6tCqm

— SANSA (@SANSA_Stack) June 3, 2019
Build a Question Answering system overnight by Denis Lukovnikov, Gaurav Maheshwari, Jens Lehmann, Mohnish Dubey and Priyansh Trivedi

@AskNowQA tutorial is happening @eswc_conf #eswc2019 #AskNowQA . @shape_mismatch @saist007 and @JLehmann82 are telling us how to “Build a #QuestionAnswering System Overnight” https://t.co/bFhIbNfPZr pic.twitter.com/YBqwQ1QQmL

— SDA Research (@SDA_Research) June 2, 2019

ESWC’19 was a great venue to meet the community, create new connections, talk about current research challenges, share ideas and settle new collaborations. We look forward to the next ESWC conference.
Until then, meet us at SDA!

Papers accepted at ISWC 2019

2019-07-082019-07-17GezimSejdiu

We are very pleased to announce that our group got 14 papers accepted for presentation at ISWC 2019 – the 18th International Semantic Web Conference, which will be held on October 26 – 30 2019 in Auckland, New Zealand. ISWC is an A-ranked conference (CORE ranking) and currently 11th in Google Scholar in the category “Databases & Information Systems” with an h5-index of 41 as well as 4th in terms WWW related conferences in MS Academic Search.

The International Semantic Web Conference (ISWC) is the premier international forum where Semantic Web / Linked Data / Knowledge Graph researchers, practitioners, and industry specialists come together to discuss, advance, and shape the future of semantic technologies on the web, within enterprises and in the context of the public institution.

Here is the list of the accepted papers with their abstract:

“A Scalable Framework for Quality Assessment of RDF Datasets” by Gezim Sejdiu, Anisa Rula, Jens Lehmann, and Hajira Jabeen (Resources track).
Topic: Scalability, Data Quality

Abstract: Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following Linked Data standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as data integration, search, and interlinking, cannot take full advantage of Linked Data if it is of low quality. There exist a few approaches for the quality assessment of Linked Data, but their performance degrades with the increase in data size and quickly grows beyond the capabilities of a single machine. In this paper, we present DistQualityAssessment — an open source implementation of quality assessment of large RDF datasets that can scale out to a cluster of machines. This is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data. The work presented here is integrated with the SANSA framework and has been applied to at least three use cases beyond the SANSA community. The results show that our approach is more generic, efficient, and scalable as compared to previously proposed approaches.

“Sparklify: A Scalable Software Component for Efficient evaluation of SPARQL queries over distributed RDF datasets” by Claus Stadler, Gezim Sejdiu, Damien Graux, and Jens Lehmann (Resources track).
Topic: Scalability, KG Querying

Abstract: One of the key traits of Big Data is its complexity in terms of representation, structure, or formats. One existing way to deal with it is offered by Semantic Web standards. Among them, RDF –which proposes to model data with triples representing edges in a graph– has received a large success and the semantically annotated data has grown steadily towards a massive scale. Therefore, there is a need for scalable and efficient query engines capable of retrieving such information. In this paper, we propose \emph{Sparklify}: a scalable software component for efficient evaluation of SPARQL queries over distributed RDF datasets. It uses Sparqlify as a SPARQL-to-SQL rewriter for translating SPARQL queries into Spark executable code. Our preliminary results demonstrate that our approach is more extensible, efficient, and scalable as compared to state-of-the-art approaches. Sparklify is integrated into a larger SANSA framework and it serves as a default query engine and has been used by at least three external use scenarios.

“Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources” by Mohamed Nadjib Mami, Damien Graux, Simon Scerri, Hajira Jabeen, Sören Auer, and Jens Lehmann (Resources track).
Topic: Scalability, Querying

Abstract: The last two decades witnessed a remarkable evolution in terms of data formats, modalities, and storage capabilities. Instead of having to adapt one’s application needs to the, earlier limited, available storage options, today there is a wide array of options to choose from to best meet an application’s needs. This has resulted in vast amounts of data available in a variety of forms and formats which, if interlinked and jointly queried, can generate valuable knowledge and insights. In this article, we describe Squerall: a framework that builds on the principles of Ontology-Based Data Access (OBDA) to enable the querying of disparate heterogeneous sources using a unique query language, SPARQL. In Squerall, original data is queried on-the-fly without prior data materialization or transformation. In particular, Squerall allows the aggregation and joining of large data in a distributed manner. Squerall supports out-of-the-box five data sources and moreover, it can be programmatically extended to cover more sources and incorporate new query engines. The framework provides user interfaces for the creation of necessary inputs, as well as guiding non-SPARQL experts to write SPARQL queries. Squerall is integrated into the popular SANSA stack and available as open-source software via GitHub and as a Docker image

“Entity Enabled Relation Linking” by Jeff J Pan, Mei Zhang, Kuldeep Singh, Frank Van Harmelen, Jinguang Gu, and Zhi Zhang (Research track).
Topic: QA/KG Querying

Abstract: Relation linking is an important problem for knowledge graph-based Question Answering. Given a natural language question and a knowledge graph, the task is to identify relevant relations from the given knowledge graph. Since existing techniques for entity extraction and linking are more stable compared to relation linking, our idea is to exploit entities extracted from the question to support relation linking. In this paper, we propose a novel approach, based on DBpedia entities, for computing relation candidates. We have empirically evaluated our approach on different standard benchmarks. Our evaluation shows that our approach significantly outperforms existing baseline systems in both recall, precision and runtime.

“QaldGen: Towards Microbenchmarking of Question Answering Systems over Knowledge Graph” by Kuldeep Singh, Muhammad Saleem, Abhishek Nadgeri, Felix Conrads, Jeff Pan, Axel-Cyrille Ngonga Ngomo, Jens Lehmann (Resources track).
Topic: QA

Abstract: Over the last years, a number of Linked Data-based Question Answering (QA) systems have been developed. Consequently, the series of Question Answering Over Linked Data (QALD1–QALD9) challenges and other datasets have been proposed to evaluate these systems. However, the QA datasets contain a fixed number of natural language questions and do not allow users to generate micro benchmarks tailored towards specific use-cases. We propose QaldGen, a natural language benchmark generation framework for Knowledge Graphs which is able to generate customised QA benchmark from existing QA repositories. The framework is flexible enough to generate benchmarks of varying sizes and according to the user-defined criteria on the most important features to be considered for QA benchmarking. This is achieved using different clustering algorithms. We compare state-of-the-art QA systems over knowledge graphs by using different QA benchmarks. The observed results show that specialised micro-benchmarking is important to pinpoint the limitations of the various components of QA systems.

“Incorporating Literals into Knowledge Graph Embeddings” by Agustinus Kristiadi, Mohammad Asif Khan, Denis Lukovnikov, Jens Lehmann and Asja Fischer (Research track).
Topic: KG Embeddings

Abstract: Knowledge graphs are composed of different elements: entity nodes, relation edges, and literal nodes. Each literal node contains an entity’s attribute value (e.g. the height of an entity of type person) and thereby encodes information which in general cannot be represented by relations between entities alone. However, most of the existing embedding or latent-feature-based methods for knowledge graph analysis only consider entity nodes and relation edges, and thus do not take the information provided by literals into account. In this paper, we extend existing latent feature methods for link prediction by a simple portable module for incorporating literals, which we name LiteralE. Unlike in concurrent methods where literals are incorporated by adding a literal-dependent term to the output of the scoring function and thus only indirectly affect the entity embeddings, LiteralE directly enriches these embeddings with information from literals via a learnable parameterized function. This function can be easily integrated into the scoring function of existing methods and learned along with the entity embeddings in an end-to-end manner. In an extensive empirical study over three datasets, we evaluate LiteralE-extended versions of various state-of-the-art latent feature methods for link prediction and demonstrate that LiteralE presents an effective way to improve their performance. For these experiments, we augmented standard datasets with their literals, which we publicly provide as testbeds for further research. Moreover, we show that LiteralE leads to a qualitative improvement of the embeddings and that it can be easily extended to handle literals from different modalities.

“SemanGit: A Linked Dataset from git” by Dennis Oliver Kubitza, Matthias Böckmann, and Damien Graux (Resources track).
Topic: Data Modelling

Abstract: The growing interest in free and open-source software which occurred over the last decades has accelerated the usage of versioning systems to help developers collaborating together in the same projects. As a consequence, specific tools such as git and specialized open-source on-line platforms gained importance. In this study, we introduce and share SemanGit which provides a resource at the crossroads of both Semantic Web and git web-based version control systems. SemanGit is actually the first collection of linked data extracted from GitHub based on a git ontology we designed and extended to include specific GitHub features. In this article, we present the dataset, describe the extraction process according to the ontology, show some promising analyses of the data and outline how SemanGit could be linked with external datasets or enriched with new sources to allow for more complex analyses.

“SEO: A Scientific Events Data Model” by Said Fathallah, Sahar Vahdati, Christoph Lange, and Sören Auer (Resources track).
Topic: Data Modelling

Abstract: Scientific events have become a key factor of scholarly com- munication for many scientific domains. They are considered as the focal point for establishing scientific relations between scholarly objects such as people (e.g., chairs, participants), places (e.g., location), actions (e.g., roles of participants), and artifacts (e.g., proceedings) in the scholarly communication domain. Metadata of scientific events have been made available in unstructured or semi-structured formats, which hides the interconnected and complex relationships between them and prevents transparency. To facilitate the management of such metadata, the repres- entation of event-related information in an interoperable form requires a uniform conceptual modeling. The Scientific Events Ontology (OR-SEO) has been engineered to represent metadata of scientific events. We describe a systematic redesign of the information model that is used as a schema for the event pages of the OpenResearch.org community wiki, reusing well-known vocabularies to make OR-SEO interoperable in different contexts. OR-SEO is now in use on thousands of Open- Research.org events pages, which enables users to represent structured knowledge about events without tackling technical implementation challenges and ontology development.

“The KEEN Universe: An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability” by Mehdi Ali, Hajira Jabeen, Charles Tapley Hoyt and Jens Lehmann (Resources track).
Topic: KG Embeddings

Abstract: There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can be used to address tasks within KGs such as the prediction of novel links and the disambiguation of entities. They can also be used for downstream tasks like question answering and fact-checking. Overall, these tasks are relevant for the semantic web community. Despite their popularity, the reproducibility of KGE experiments and the transferability of proposed KGE models to research fields outside the machine learning community can be a major challenge. Therefore, we present the KEEN Universe, an ecosystem for knowledge graph embeddings that we have developed with a strong focus on reproducibility and transferability. The KEEN Universe currently consists of the Python packages PyKEEN (Python KnowlEdge Graph EmbeddiNgs), BioKEEN (Biological KnowlEdge Graph EmbeddiNgs), and the KEEN Model Zoo for sharing trained KGE models with the community.

“DBpedia FlexiFusion – Best of Wikipedia > Wikidata > Your Data” by Johannes Frey, Marvin Hofer, Daniel Obraczka, Jens Lehmann and Sebastian Hellmann (Resources track).
Topic: Data Integration

Abstract: The data quality improvement of DBpedia has been in the focus of many publications in the past years with topics covering both knowledge enrichment techniques such as type learning, taxonomy generation, interlinking as well as error detection strategies such as property or value outlier detection, type checking, ontology constraints, or unit-tests, to name just a few. The concrete innovation of the DBpedia FlexiFusion workflow, leveraging the novel DBpedia PreFusion dataset, which we present in this paper, is to massively cut down the engineering workload to apply any of the vast methods available in shorter time and also make it easier to produce customized knowledge graphs or DBpedias. While FlexiFusion is flexible to accommodate other use cases, our main use case in this paper is the generation of richer, language-specific DBpedias for the 20+ DBpedia chapters, which we demonstrate on the Catalan DBpedia. In this paper, we define a set of quality metrics and evaluate them for Wikidata and DBpedia datasets of several language chapters. Moreover, we show that an implementation of FlexiFusion, performed on the proposed PreFusion dataset, increases data size, richness as well as quality in comparison to the source datasets.

“Pretrained Transformers for Simple Question Answering” by Denis Lukovnikov, Asja Fischer and Jens Lehmann (Research track).
Topic: QA

Abstract: Answering simple questions over knowledge graphs is a well-studied problem in question answering. Previous approaches for this task built on recurrent and convolutional neural networks (RNNs and CNNs) based architectures that use pretrained word embeddings. It was recently shown that a pretrained transformer network (BERT) can outperform RNN- and CNN based approaches on various natural language processing tasks. In this work, we investigate how well network BERT performs on the entity span prediction and relation prediction subtasks of simple QA. In addition, we provide an evaluation of both BERT and BiLSTM-based models in limited data scenarios.

“LC-QuAD 2.0: A large dataset for complex question answering over Wikidata and DBpedia” by Mohnish Dubey, Debayan Banerjee, Abdelrahman Abdelkawi and Jens Lehmann (Resources track).
Topic: QA

Abstract: Providing machines with the capability of exploring knowledge graphs and answering natural language questions has been an active area of research over the past decade. In this direction translating natural language questions to formal queries has been one of the key approaches. To advance the research area, several datasets like WebQuestions, QALD and LCQuAD have been published in the past. The biggest data set available for complex questions (LCQuAD) over knowledge graphs contains five thousand questions. We now provide LC-QuAD 2.0 (Large-Scale Complex Question Answering Dataset) with 30,000 questions, their paraphrases and their corresponding SPARQL queries. LC-QuAD 2.0 is compatible with both Wikidata and DBpedia 2018 knowledge graphs. In this article, we explain how the dataset was created and the variety of questions available with corresponding examples. We further provide a statistical analysis of the dataset.

“Non-Goal Oriented Dialogues using KG-Copy Networks” by Debanjan Chaudhuri, Md Rashad Al Hasan Rony, Simon Kwoczek and Jens Lehmann (Research track).
Topic: Dialogue Systems / QA

Abstract: Non-goal oriented, generative dialogue systems lack the ability to generate answers with grounded facts. A knowledge graph can be considered an abstraction of the real world consisting of well-grounded facts. This paper addresses the problem of generating well grounded responses by integrating knowledge graphs into the dialogue systems response generation process, in an end-to-end manner. A dataset for non-goal oriented dialogues is proposed in this paper in the domain of soccer, conversing on different clubs and national teams along with a knowledge graph for each of these teams. A novel neural network architecture is also proposed as a baseline on this dataset, which can integrate knowledge graphs into the response generation process, producing well articulated, knowledge grounded responses. Empirical evidence suggests that the proposed model performs better than other state-of-the-art models for knowledge graph integrated dialogue systems.

“Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs” by Gaurav Maheshwari, Priyansh Trivedi, Denis Lukovnikov, Nilesh Chakraborty, Asja Fischer, and Jens Lehmann (Research track).
Topic: KGQA

Abstract: In this paper, we conduct an empirical investigation of neural query graph ranking approaches for the task of complex question answering over knowledge graphs. We propose a novel self-attention based slot matching model which exploits the inherent structure of query graphs, our logical form of choice. Our proposed model generally outperforms other ranking models on two QA datasets over the DBpedia knowledge graph, evaluated in different settings. We also show that domain adaption and pre-trained language model based transfer learning yield improvements, effectively offsetting the general lack of training data.

Acknowledgment

This work was partly supported by the EU Horizon2020 projects BigDataOcean (GA no. 732310), Boost4.0 (GA no. 780732), QROWD (GA no. 723088), SLIPO (GA no. 731581), BETTER (GA 776280), QualiChain (GA 822404), CLEOPATRA (GA no. 812997), LIMBO (Grant no. 19F2029I), OPAL (no. 19F2028A), KnowGraphs (no. 860801), SOLIDE (no. 13N14456), Bio2Vec (grant no. 3454), LAMBDA (#809965), FAIRplus (#802750), the ERC project ScienceGRAPH (#819536), “Industrial Data Space Plus” (GA 01IS17031), Fraunhofer Cluster of Excellence “Cognitive Internet Technologies” (CCIT), “InclusiveOCW” (grant no. 01PE17004D), the German national funded BmBF project MLwin, the National Natural Science Foundation of China (61673304) and the Key Projects of National Social Science Foundation of China(11&ZD189), EPSRC grant EP/M025268/1, WWTF grant VRG18-013, WMF-funded GlobalFactSync project, and by the ADAPT Centre for Digital Content Technology funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded under the European Regional Development Fund.

Looking forward to seeing you at The ISWC 2019.

Poster and Workshop papers accepted at ESWC 2019

2019-05-142019-05-14GezimSejdiu

We are very pleased to announce that our group got 2 poster papers accepted for presentation at the ESWC 2019 : The 16th edition of the The Extended Semantic Web Conference, which will be held on June 2-6, 2019 in Portorož, Slovenia.

The ESWC is a major venue for discussing the latest scientific results and technology innovations around semantic technologies. Building on its past success, ESWC is seeking to broaden its focus to span other relevant related research areas in which Web semantics plays an important role. ESWC 2019 will present the latest results in research, technologies and applications in its field. Besides the technical program organized over twelve tracks, the conference will feature a workshop and tutorial program, a dedicated track on Semantic Web challenges, system descriptions and demos, a posters exhibition and a doctoral symposium.

Here is the pre-print of the accepted papers with their abstract:

“Clustering Pipelines of large RDF POI Data” by Rajjat Dadwal, Damien Graux, Gezim Sejdiu, Hajira Jabeen, and Jens Lehmann

Abstract: Among the various domains involved in large RDF graphs, applications may rely on geographical information which are often carried and presented via Points Of Interests. In particular, one challenge aims at extracting patterns from POIs sets to discover Areas Of Interest. To tackle it, an usual method is to aggregate various points according to specific distances (e.g. geographical) via clustering algorithms. In this study, we present a flexible architecture to design pipelines able to aggregate POIs from contextual to geographical dimensions in a single run. This solution allows any kind of state-of-the-art clustering algorithm combinations to compute AOIs and is built on top of a Semantic Web stack which allows multiple-source querying and filtering through SPARQL.

“OECM: A Cross-lingual Approach for Ontology Enrichment” by Shimaa Ibrahim, Said Fathalla, Hamed Shariat Yazdi, Jens Lehmann, and Hajira Jabeen

Abstract: Due to the rapid expansion of multilingual data on the web, developing ontology enrichment approaches has become an interesting and active subject of research. In this paper, we propose a cross-lingual matching approach for ontology enrichment (OECM) in order to enrich an ontology using another one in a different natural language. A prototype for the proposed approach has been implemented and evaluated using the MultiFarm benchmark. Evaluation results are promising and show high precision and recall compared to state-of-the-art approaches.

Furthermore, we got a paper accepted at LASCAR Workshop: 1st Workshop on Large Scale RDF Analytics, co-located with the ESWC 2019.

LASCAR Workshop seeks original “articles and posters” describing theoretical and practical methods as well as techniques for performing scalable analytics on knowledge graphs.

Here is the pre-print of the accepted paper with its abstract:

“Towards Enforceable Usage Policies for Industry 4.0” by Sebastian Bader and Maria Maleshkova.

Abstract: Controlling the usage of business-critical data is essential for every company. While the upcoming age of Industry 4.0 propagates a seamless data exchange between all participating devices, facilities and companies along the production chain, the required data control mechanisms are lacking behind. We claim that for an effective protection, both access and usage control enforcement is a must-have for organizing Industry 4.0 collaboration networks. Formalized and machine-readable policies are one fundamental building block to achieve the needed trust level for real data-driven collaborations. We explain the current challenges of specifying access and usage control policies and outline respective approaches relying on Semantic Web of Things practices. We analyze the requirements and implications of existing technologies and discuss their shortcomings. Based on our experiences from the specification of the International Data Spaces Usage Control Language, the necessary next steps towards automatically monitored and enforced policies are outlined and research needs formulated.

Acknowledgment

This work was partially funded by the European H2020 SLIPO project (GA. 731581).

Looking forward to seeing you at The ESWC 2019.

Workshop proposal accepted at ECML 2019

2019-05-082019-05-08GezimSejdiu

We are pleased to announce a new workshop on “New Trends in Representation Learning with Knowledge Graphs” which will be included in the program of ECML 2019: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database. ECML 2019 will take place in Würzburg, Germany, from the 16th to the 20th of September 2019.

This conference is the premier European machine learning and data mining conference and builds upon over 17 years of successful events and conferences held across Europe.

Title: New Trends in Representation Learning with Knowledge Graphs

Organizers:

Volker Tresp (Ludwig-Maximilians University and Siemens, Germany)
Jens Lehmann (Bonn University and Fraunhofer IAIS, Germany)
Aditya Mogadala (Saarland University, Germany)
Achim Rettinger (Trier University, Germany)
Afshin Sadeghi (Fraunhofer IAIS and University of Bonn, Germany)
Mehdi Ali (University of Bonn and Fraunhofer IAIS, Germany)

Website: https://sites.google.com/view/kgrlfr-workshop/home

Abstract Knowledge Graphs are becoming the standard for storing, retrieving and querying structured data. In academia and in industry, they are increasingly used to provide background knowledge. Over the last years, several research contributions are made to show machine learning especially representation learning is successfully applied to knowledge graphs enabling inductive inference about facts with unknown truth values. In this workshop we intend to focus on the most exciting new developments in Knowledge Graph learning, bridging the gap to recent developments in different fields. Also, we want to bring together researchers from different disciplines but united by their adoption of earlier mentioned techniques from machine learning.

Invited speakers:

Maximilian Nickel (Facebook AI Research, USA)
Mathias Niepert (NEC Labs Europe, Germany)

Looking forward to seeing you at The ECML/PKDD 2019.

Paper accepted at ICWE 2019

2019-04-292019-04-29GezimSejdiu

We are very pleased to announce that our group got a paper accepted for presentation at the ICWE 2019: The 19th International Conference on Web Engineering, which will be held on June 11 – 14, 2019 / Daejeon Convention Center (DCC), Daejeon, Korea. The ICWE is the prime yearly international conference on the different aspects of designing, building, maintaining and using Web applications.

The conference will cover the different aspects of Web Engineering, including the design, creation, maintenance and usage of Web applications. ICWE aims to bring together researchers and practitioners from various disciplines in academia and industry to tackle the emerging challenges in the engineering of Web applications and in the problems of its associated technologies, as well as the impact of those technologies on society, media and culture.

Here is the accepted paper with its abstract:

“Jekyll RDF: Template-Based Linked Data Publication with Minimized Effort and Maximum Scalability” by Natanael Arndt, Sebastian Zänker, Gezim Sejdiu, and Sebastian Tramp

Abstract: ver the last decades the Web has evolved from a human–human communication network to a network of complex human–machine interactions. An increasing amount of data is available as Linked Data which allows machines to “understand” the data, but RDF is not meant to be understood by humans. With Jekyll RDF we present a method to close the gap between structured data and human accessible exploration interfaces by publishing RDF datasets as customizable static HTML sites. It consists of an RDF resource mapping system to serve the resources under their respective IRI, a template mapping based on schema classes, and a markup language to define templates to render customized resource pages. Using the template system, it is possible to create domain specific browsing interfaces for RDF data next to the Linked Data resources. This enables content management and knowledge management systems to serve datasets in a highly customizable, low effort, and scalable way to be consumed by machines as well as humans.

Looking forward to seeing you at The ICWE 2019.

Dr. Philipp Senger and Dr. Daniel Burkow from Bayer Crop Science visit SDA

2019-04-242019-04-24GezimSejdiu

Dr. Philipp Senger and Dr. Daniel Burkow from Bayer Crop Science visited the SDA group on April 17, 2019.

On Wednesday, SDA had visitors from Bayer: Philipp Senger is the Head of CLS (Computational Life Science) Translational R&D at Bayer Crop Science. His main research interests comprise data science, semantic modeling, knowledge graphs (KGs), natural language processing and machine learning. He together with his team apply their research results to develop digital products and services. Philipp earned a Ph.D. in computer science from the Eberhard-Karls-Universität Tübingen in cooperation with the Robert Bosch GmbH where he investigated data-based methods for automatic generation of electrical behavior models in the automotive industry.

Daniel Burkow is a Post-Doctoral researcher at CLS Translational R&D at Bayer Crop Science.
His main research interests comprise applied mathematics for life sciences, machine learning, data modeling and knowledge graphs. At Translational R&D at Bayer Crop Science, he works on the development of machine learning applications and data modeling approaches.
Daniel earned his Ph.D. from Arizona State University in the field of Applied Mathematics for Life Sciences in which he examined intramyocellular lipids and the progression of muscular insulin resistance.

Dr. Senger and Dr. Burkow were invited at the bi-weekly “SDA colloquium presentations” where they presented Bayer Crop Science and their main research topics. The goal of the visit was to exchange experience and ideas on combining machine learning approaches with knowledge graphs. In particular, they explained how field trials are modeled at Bayer Crop Science and presented a recent use case in which knowledge graph embeddings have been applied to get more insights about their experimental data.

During the meeting, SDA’s core research topics and main research projects have been presented and future collaborations between Bayer Crop Science R&D and SDA has been discussed. Especially, the application of reasoning and inference methods on large scale KGs describing field experiments, the development of KGE embedding models that incorporate literals (e.g., numerical values) and the application of question answering have been discussed.

We are looking forward to future collaborations with Bayer Crop Science.

Paper accepted at NAACL 2019

2019-04-152019-04-15GezimSejdiu

We are very pleased to announce that our group got a paper accepted for presentation at The 2019 edition of The NAACL conference, which will be held on June 2–7, 2019 Minneapolis, USA.

NAACL aims to bring together researchers interested in the design and study of natural language processing technology as well as its applications to new problem areas. With this goal in mind, the 2019 edition invites the submission of long and short papers on creative, substantial and unpublished research in all aspects of computational linguistics. It covers a diverse technical program–in addition to traditional research results, papers may present negative findings, survey an area, announce the creation of a new resource, argue a position, report novel linguistic insights derived using existing techniques, and reproduce, or fail to reproduce, previous results.

Here is the pre-print of the accepted paper with its abstract:

Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text by Ahmad Sakor, Isaiah Onando Mulang’, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, and Sören Auer.

Abstract: Short texts challenge NLP tasks such as named entity recognition, disambiguation, linking and relation inference because they do not provide sufficient context or are partially malformed (e.g. wrt. capitalization, long tail entities, implicit relations). In this work, we present the Falcon approach which effectively maps entities and relations within a short text to its mentions of a background knowledge graph. Falcon overcomes the challenges of short text using a light-weight linguistic approach relying on a background knowledge graph. Falcon performs joint entity and relation linking of a short text by leveraging several fundamental principles of English morphology (e.g. compounding, headword identification) and utilizes an extended knowledge graph created by merging entities and relations from various knowledge sources. It uses the context of entities for finding relations and does not require training data. Our empirical study using several standard benchmarks and datasets show that Falcon significantly outperforms state-of-the-art entity and relation linking for short text query inventories.

Acknowledgment

This work was partially funded by the Fraunhofer IAIS, and EU H2020 project IASIS.

Looking forward to seeing you at The NAACL 2019 conference.

Paper accepted at EvoStar 2019

2019-04-022019-04-02GezimSejdiu

We are very pleased to announce that our group got a paper accepted for presentation at the EvoStar 2019: The Leading European Event on Bio-Inspired Computation, which will be held on 24-26 April 2019, Leipzig, Germany.

EvoStar comprises of four co-located conferences run each spring at different locations throughout Europe. These events arose out of workshops originally developed by EvoNet, the Network of Excellence in Evolutionary Computing, established by the Information Societies Technology Programme of the European Commission, and they represent a continuity of research collaboration stretching back over 20 years.

Our paper got accepted at the EvoMUSART, the 8th International Conference (and 13th European event) on Evolutionary and Biologically Inspired Music, Sound, Art and Design.

The main goal of evoMUSART 2019 is to bring together researchers who are using Computational Intelligence techniques for artistic tasks, providing the opportunity to promote, present and discuss ongoing work in the area.

Here is the accepted paper with its abstract:

“EvoChef: Show me What to Cook! Artificial Evolution of Culinary Arts” by Hajira Jabeen, Nargis Tahara, Jens Lehmann

Abstract: Computational Intelligence (CI) has proven its artistry in creation of music, graphics, and drawings. EvoChef demonstrates the creativity of CI in artificial evolution of culinary arts. EvoChef takes input from well-rated recipes of different cuisines and evolves new recipes by recombining the instructions, spices, and ingredients. Each recipe is represented as a property graph containing ingredients, their status, spices, and cooking instructions. These recipes are evolved using recombination and mutation operators. The expert opinion (user ratings) has been used as the fitness function for the evolved recipes. It was observed that the overall fitness of the recipes improved with the number of generations and almost all the resulting recipes were found to be conceptually correct. We also conducted a blind-comparison of the original recipes with the EvoChef recipes and the EvoChef was rated to be more innovative. To the best of our knowledge, EvoChef is the first semi-automated, open source, and valid recipe generator that creates easily to follow, and novel recipes.

Acknowledgment

This work was partially funded by the EU H2020 project Big Data Ocean (Gr. No 732310).

Looking forward to seeing you at The EvoStar 2019.

Papers, workshop and tutorials accepted at ESWC 2019

2019-03-282019-03-28GezimSejdiu

We are very pleased to announce that our group got 2 papers accepted for presentation at the ESWC 2019: The 16th edition of The Extended Semantic Web Conference, which will be held on June 2-6, 2019 in Portorož, Slovenia.

Here are the pre-prints of the accepted papers with their abstract:

Incorporating Joint Embeddings into Goal-Oriented Dialogues with Multi-Task Learning by Firas Kassawat, Debanjan Chaudhuri, and Jens Lehmann

Abstract: Attention-based encoder-decoder neural network models have recently shown promising results in goal-oriented dialogue systems. However, these models struggle to reason over and incorporate state-full knowledge while preserving their end-to-end text generation functionality. Since such models can greatly benefit from user intent and knowledge graph integration, in this paper we propose an RNN-based end-to-end encoder-decoder architecture which is trained with joint embeddings of the knowledge graph and the corpus as input. The model provides an additional integration of user intent along with text generation, trained with multi-task learning paradigm along with an additional regularization technique to penalize generating the wrong entity as output. The model further incorporates a Knowledge graph entity lookup during inference to guarantee the generated output is state-full based on the local knowledge graph provided. We finally evaluated the model using the BLEU score, empirical evaluation depicts that our proposed architecture can aid in the betterment of task-oriented dialogue system‘s performance.

EVENTSKG: A 5-Star Dataset of Top-ranked Events in Eight Computer Science Communities by Said Fathalla, Christoph Lange, and Sören Auer

Abstract: Nowadays the organization of scientific events, as well as submission and publication of papers, has become considerably easier than before. Consequently, metadata of scientific events is increasingly available on the Web, albeit often as raw data in various formats, immolating its semantics and interlinking relations. This leads to restricting the usability of this data for, e.g., subsequent analyses and reasoning. Therefore, there is a pressing need to represent this data in a semantic representation, i.e., Linked Data. We present the new release of the EVENTSKG dataset, comprising comprehensive semantic descriptions of scientific events of eight computer science communities. Currently, EVENTSKG is a 5-star dataset containing metadata of 73 top-ranked event series (about 1,950 events in total) established over the last five decades. The new release is a Linked Open Dataset adhering to an updated version of the SEO Scientific Events Ontology, a reference ontology for event metadata representation, leading to richer and cleaner data. To facilitate the maintenance of EVENTSKG and to ensure its sustainability, EVENTSKG is coupled with a Java API that enables users to create/update events metadata without going into the details of the representation of the dataset. We shed light on events characteristics by demonstrating an analysis of the EVENTSKG data, which provides a flexible means for customization in order to better understand the characteristics of top-ranked CS events.

Acknowledgment

This work was partly supported by the European Union‘s Horizon 2020 funded projects WDAqua (grant no. 642795), ScienceGRAPH project (GA no.~819536), and Cleopatra (grant no. 812997), as well as the BmBF funded project Simple-ML.

Furthermore, we are pleased to inform that we got a workshop and two tutorials accepted, which will be co-located with the ESWC 2019.

Here is the accepted workshop and tutorials with their short description:

Workshops
- 1st Workshop on Large Scale RDF Analytics (LASCAR-19) by Hajira Jabeen, Damien Graux, Gezim Sejdiu, Muhammad Saleem and Jens Lehmann.
  Abstract: This workshop on Large Scale RDF Analytics (LASCAR) invites papers and posters related to the problems faced when dealing with the enormous growth of linked datasets, and by the advancement of semantic web technologies in the domain of large scale and distributed computing. LASCAR particularly welcomes research efforts exploring the use of generic big data frameworks like Apache Spark, Apache Flink, or specialized libraries like Giraph, Tinkerpop, SparkSQL etc. for Semantic Web technologies. The goal is to demonstrate the use of existing frameworks and libraries to exploit Knowledge Graph processing and to discuss the solutions to the challenges and issues arising therein. There will be a keynote by an expert speaker, and a panel discussion among experts and scientists working in the area of distributed semantic analytics. LASCAR targets a range of interesting research areas in large scale processing of Knowledge Graphs, like querying, inference, and analytics, therefore we expect a wider audience interested in attending the workshop.
Tutorials
- SANSA’s Leap of Faith: Scalable RDF and Heterogeneous Data Lakes by Hajira Jabeen, Mohamed Nadjib Mami, Damien Graux, Gezim Sejdiu, and Jens Lehmann.
  Abstract: Scalable processing of Knowledge Graphs (KG) is an important requirement for today’s KG engineers. Scalable Semantic Analytics Stack (SANSA) is a library built on top of Apache Spark and it offers several APIs tackling various facets of scalable KG processing. SANSA is organized into several layers: (1) RDF data handling e.g. filtering, computation of RDF statistics, and quality assessment (2) SPARQL querying (3) inference reasoning (4) analytics over KGs. In addition to processing native RDF, SANSA also allows users to query a wide range of heterogeneous data sources (e.g. files stored in Hadoop or other popular NoSQL stores) uniformly using SPARQL. This tutorial, aims to provide an overview, detailed discussion, and a hands-on session on SANSA, covering all the aforementioned layers using simple use-cases.
- Build a Question Answering system overnight by Denis Lukovnikov, Gaurav Maheshwari, Jens Lehmann, Mohnish Dubey and Priyansh Trivedi
  Abstract: With this tutorial, we aim to provide the participants with an overview of the field of Question Answering over Knowledge Graphs, insights into commonly faced problems, its recent trends, and developments. In doing so, we hope to provide a suitable entry point for the people new to this field and ease their process of making informed decisions while creating their own QA systems. At the end of the tutorial, the audience would have hands-on experience of developing a working deep learning based QA system.

Looking forward to seeing you at The ESWC 2019.