Paper Published in IEEE Access

We are happy to announce that our paper “Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain” has been published in IEEE Access. IEEE Access publishes articles that are of high interest to readers: original, technically correct, and clearly presented. The scope of this journal comprises all IEEE’s fields of interest, emphasizing applications-oriented and interdisciplinary articles.

Here is the abstract and the link to the paper:

Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain
By Mojtaba Nayyeri, Gökce Müge Cil, Sahar Vahdati, Francesco Osborne,Andrey Kravchenko, Simone Angioni, Angelo Salatino, Diego Reforgiato Recupero, Enrico Motta, and Jens Lehmann.
Abstract Knowledge graphs (KGs) are widely used for modeling scholarly communication, performing scientometric analyses, and supporting a variety of intelligent services to explore the literature and predict research dynamics. However, they often suffer from incompleteness (e.g., missing affiliations, references, research topics), leading to a reduced scope and quality of the resulting analyses. This issue is usually tackled by computing knowledge graph embeddings (KGEs) and applying link prediction techniques. However, only a few KGE models are capable of taking weights of facts in the knowledge graph into account. Such weights can have different meanings, e.g. describe the degree of association or the degree of truth of a certain triple. In this paper, we propose the Weighted Triple Loss , a new loss function for KGE models that takes full advantage of the additional numerical weights on facts and it is even tolerant to incorrect weights. We also extend the Rule Loss , a loss function that is able to exploit a set of logical rules, in order to work with weighted triples. The evaluation of our solutions on several knowledge graphs indicates significant performance improvements with respect to the state of the art. Our main use case is the large-scale AIDA knowledge graph, which describes 21 million research articles. Our approach enables to complete information about affiliation types, countries, and research topics, greatly improving the scope of the resulting scientometrics analyses and providing better support to systems for monitoring and predicting research dynamics.

Papers Accepted At EMNLP21

We are very pleased to announce that our group got three papers accepted for presentation at EMNLP21. Empirical Methods in Natural Language Processing (EMNLP) is a leading conference in the area of natural language processing and artificial intelligence. Along with the Association for Computational Linguistics (ACL), it is one of the two primary high impact conferences for natural language processing research.

Here are the abstracts and the link to the paper:

  • Time-aware Graph Neural Networks for Entity Alignment between Temporal Knowledge Graphs
    By Chengjin Xu, Fenglong Su, and Jens Lehmann.
    Abstract Entity alignment aims to identify equivalent entity pairs between different knowledge graphs (KGs). Recently, the availability of temporal KGs (TKGs) that contain time information created the need for reasoning over time in such TKGs. Existing embedding- based entity alignment approaches disregard time information that commonly exists in many large-scale KGs, leaving much room for improvement. The figure illustrates the limitation of the existing time-agnostic embedding-based entity align approaches. Given two entities, George H. W. Bush and George Walker Bush, existing in two TKGs respectively, time-agnostic embedding-based approaches are likely to ignore time information and wrongly recognize these two entities as the same person in the real world due to the homogeneity of their neighborhood information. In this paper, we focus on the task of aligning entity pairs between TKGs and propose a novel Time-aware Entity Alignment approach based on Graph Neural Networks (TEA-GNN). We embed entities, relations and timestamps of different KGs into a vector space and use GNNs to learn entity representations. To incorporate both relation and time information into the GNN structure of our model, we use a time-aware attention mechanism which assigns different weights to different nodes with orthogonal transformation matrices computed from embeddings of the relevant relations and timestamps in a neighborhood. Experimental results on multiple real-world TKG datasets show that our method significantly outperforms the state-of-the-art methods due to the inclusion of time information. Our datasets and source code are available at https://github.com/soledad921/TEA-GNN
  • Knowledge Graph Representation Learning using Ordinary Differential Equations
    By Mojtaba Nayyeri, Chengjin Xu, Franca Hoffmann, Mirza Mohtashim Alam, Jens Lehmann, and Sahar Vahdati.
    Abstract Knowledge Graph Embeddings (KGEs) have shown promising performance on link prediction tasks by mapping the entities and relations from a knowledge graph into a geometric space. The capability of KGEs in preserving graph characteristics including structural aspects and semantics, highly depends on the design of their score function, as well as the inherited abilities from the underlying geometry. Many KGEs use the Euclidean geometry which renders them incapable of preserving complex structures and consequently causes wrong inferences by the models. To address this problem, we propose a neuro differential KGE that embeds nodes of a KG on the trajectories of Ordinary Differential Equations (ODEs). To this end, we represent each relation (edge) in a KG as a vector field on several manifolds. We specifically parameterize ODEs by a neural network to represent complex manifolds and complex vector fields on the manifolds. Therefore, the underlying embedding space is capable of assuming the shape of various geometric forms to encode heterogeneous subgraphs. Experiments on synthetic and benchmark datasets using state-of-the-art KGE models justify the ODE trajectories as a means to enable structure preservation and consequently avoiding wrong inferences.
  • Proxy Indicators for the Quality of Open-domain Dialogues
    By Rostislav Nedelchev, Jens Lehmann, and Ricardo Usbeck.
    Abstract The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Despite the abundance of work done in the field, human judges have to evaluate dialogues’ quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and can derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.

Paper Accepted At NAACL21

We are very pleased to announce that our group got a papera accepted for presentation at NAACL21. The North American Chapter of the Association for Computational Linguistics (NAACL) provides a regional focus for members of the Association for Computational Linguistics (ACL) in North America as well as in Central and South America, organizes annual conferences, promotes cooperation and information exchange among related scientific and professional societies, encourages and facilitates ACL membership by people and institutions in the Americas, and provides a source of information on regional activities for the ACL Executive Committee.

Here is the abstract and the link to the paper:

Temporal Knowledge Graph Completion using a Linear Temporal Regularizer and Multivector Embeddings
By Chengjin Xu, Yung-Yu Chen, Mojtaba Nayyeri, and Jens Lehmann.
Abstract Representation learning approaches for knowledge graphs have been mostly designed for static data. However, many knowledge graphs involve evolving data, e.g., the fact (The President of the United States is Barack Obama) is valid only from 2009 to 2017. This introduces important challenges for knowledge representation learning since the knowledge graphs change over time. In this paper, we present a novel time-aware knowledge graph embedding approach, TeLM, which performs 4th-order tensor factorization of a Temporal knowledge graph using a Linear temporal regularizer and Multivector embeddings. Moreover, we investigate the effect of the temporal dataset’s time granularity on temporal knowledge graph completion. Experimental results demonstrate that our proposed models trained with the linear temporal regularizer achieve state-of-the-art performances on link prediction over four well-established temporal knowledge graph completion benchmarks.

Paper Accepted At KEOD21

We are very pleased to announce that our group got a paper accepted for presentation at KEOD21 (International Conference on Knowledge Engineering and Ontology Development). KEOD aims at becoming a major meeting point for researchers and practitioners interested in the study and development of methodologies and technologies for Knowledge Engineering and Ontology Development.

Here is the abstract and the link to the paper:

A Scalable Approach for Distributed Reasoning over Large-scale OWL Datasets
By Heba Mohamed, Said Fathalla, Jens Lehmann, and Hajira Jabeen.
Abstract With the tremendous increase in the volume of semantic data on the Web, reasoning over such an amount of data has become a challenging task. On the other hand, the traditional centralized approaches are no longer feasible for large-scale data due to the limitations of software and hardware resources. Therefore, horizontal scalability is desirable. We develop a scalable distributed approach for RDFS and OWL Horst Reasoning over large-scale OWL datasets. The eminent feature of our approach is that it combines an optimized execution strategy, pre-shuffling method, and duplication elimination strategy, thus achieving an efficient distributed reasoning mechanism. We implemented our approach as open-source in Apache Spark using Resilient Distributed Datasets (RDD) as a parallel programming model. As a use case, our approach is used by the SANSA framework for large-scale semantic reasoning over OWL datasets. The evaluation results have shown the strength of the proposed approach for both data and node scalability.

Paper Published in TPAMI

We are thrilled to announce that our paper “LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules” has been published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). TPAMI publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.

LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules
By Mojtaba Nayyeri, Chengjin Xu,, Mirza Mohtashim Alam, Jens Lehmann, and Hamed Shariat Yazdi.
Abstract Knowledge graph embedding models have gained significant attention in AI research. The aim of knowledge graph embedding is to embed the graphs into a vector space in which the structure of the graph is preserved. Recent works have shown that the inclusion of background knowledge, such as logical rules, can improve the performance of embeddings in downstream machine learning tasks. However, so far, most existing models do not allow the inclusion of rules. We address the challenge of including rules and present a new neural based embedding model (LogicENN). We prove that LogicENN can learn every ground truth of encoded rules in a knowledge graph. To the best of our knowledge, this has not been proved so far for the neural based family of embedding models. Moreover, we derive formulae for the inclusion of various rules, including (anti-)symmetric, inverse, irreflexive and transitive, implication, composition, equivalence, and negation. Our formulation allows avoiding grounding for implication and equivalence relations. Our experiments show that LogicENN outperforms the existing models in link prediction.

Paper Accepted At Semantics 2021

We are very pleased to announce that our group got a paper accepted for presentation at SEMANTICS21. SEMANTiCS conference is the leading European conference on Semantic Technologies and AI. Researchers, industry experts and business leaders can develop a thorough understanding of trends and application scenarios in the fields of Machine Learning, Data Science, Linked Data and Natural Language Processing.

Here is the abstract and the link to the paper:

Literal2Feature: An Automatic Scalable RDF Graph Feature Extractor
By Farshad Bakhshandegan Moghaddam, Carsten Draschner, Jens Lehmann, and Hajira Jabeen.
Abstract The last decades have witnessed significant advancements in terms of data generation, management, and maintenance. This has resulted in vast amounts of data becoming available in a variety of forms and formats including RDF. As RDF data is represented as a graph structure, applying machine learning algorithms to extract valuable knowledge and insights from them is not straightforward, especially when the size of the data is enormous. Although Knowledge Graph Embedding models (KGEs) convert the RDF graphs to low-dimensional vector spaces, these vectors often lack the explainability. On the contrary, in this paper, we introduce a generic, distributed, and scalable software framework that is capable of transforming large RDF data into an explainable feature matrix. This matrix can be exploited in many standard machine learning algorithms. Our approach, by exploiting semantic web and big data technologies, is able to extract a variety of existing features by deep traversing a given large RDF graph. The proposed framework is open-source, well-documented, and fully integrated into the active community project Semantic Analytics Stack (SANSA). The experiments on real-world use cases disclose that the extracted features can be successfully used in machine learning tasks like classification and clustering.

Papers Accepted At ECML 21

We are very pleased to announce that our group got two papers accepted for presentation at ECML21. The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases is the premier European machine learning and data mining conference and builds upon over 19 years of successful events and conferences held across Europe.

Here are the abstract and the links to the paper:

  • Embedding Knowledge Graphs Attentive to Positional and Centrality Qualities
    By Afshin Sadeghi, Diego Collarana, Damien Graux and Jens Lehmann.
    Abstract Knowledge graphs embeddings (KGE) are lately at the center of many artificial intelligence studies due to their applicability for solving downstream tasks, including link prediction and node classification. However, most Knowledge Graph embedding models encode, into the vector space, only the local graph structure of an entity, i.e., information of the 1-hop neighborhood. Capturing not only local graph structure but global features of entities are crucial for prediction tasks on Knowledge Graphs. This work proposes a novel KGE method named Graph Feature Attentive Neural Network (GFA-NN) that computes graphical features of entities. As a consequence, the resulting embeddings are attentive to two types of global network features. First, nodes’ relative centrality is based on the observation that some of the entities are more “prominent” than the others. Second, the relative position of entities in the graph. GFA-NN computes several centrality values per entity, generates a random set of reference nodes’ entities, and computes a given entity’s shortest path to each entity in the reference set. It then learns this information through optimization of objectives specified on each of these features. We investigate GFA-NN on several link prediction benchmarks in the inductive and transductive setting and show that GFA-NN achieves on-par or better results than state-of-the-art KGE solutions.
  • VOGUE: Answer Verbalization through Multi-Task Learning
    By Endri Kacupaj, Shyamnath Premnadh, Kuldeep Singh , Jens Lehmann and Maria Maleshkova.
    Abstract In recent years, there have been significant developments in Question Answering over Knowledge Graphs (KGQA). Despite all the notable advancements, current KGQA systems only focus on answer generation techniques and not on answer verbalization. However, in real-world scenarios (e.g., voice assistants such as Alexa, Siri, etc.), users prefer verbalized answers instead of a generated response. This paper addresses the task of answer verbalization for (complex) question answering over knowledge graphs. In this context, we propose a multi-task-based answer verbalization framework: VOGUE (Verbalization thrOuGh mUlti-task lEarning). The VOGUE framework attempts to generate a verbalized answer using a hybrid approach through a multi-task learning paradigm. Our framework can generate results based on using questions and queries as inputs concurrently. VOGUE comprises four modules that are trained simultaneously through multi-task learning. We evaluate our framework on existing datasets for answer verbalization, and it outperforms all current baselines on both BLEU and METEOR scores.

Paper Published in Neurocomputing

We are happy to announce that our paper “Trans4E: Link Prediction on Scholarly Knowledge Graph” has been published in Neurocomputing. Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.

Here is the abstract and the link to the paper:

Trans4E: Link Prediction on Scholarly Knowledge Graph
By Mojtaba Nayyeri, Gokce Muge Cil, Sahar Vahdati, Francesco Osborne, Mahfuzur Rahman, Simone Angioni,, Angelo Salatino, Diego Reforgiato Recupero, Nadezhda Vassilyeva, Enrico Motta and Jens Lehmann.
Abstract The incompleteness of Knowledge Graphs (KGs) is a crucial issue affecting the quality of AI-based services. In the scholarly domain, KGs describing research publications typically lack important infor-mation, hindering our ability to analyse and predict research dynamics. In recent years, link prediction approaches based on Knowledge Graph Embedding models became the first aid for this issue. In this work, we present Trans4E, a novel embedding model that is particularly fit for KGs which include N to M relations with N≫M. This is typical for KGs that categorize a large number of entities (e.g., re-search articles, patents, persons) according to a relatively small set of categories. Trans4E was applied on two large-scale knowledge graphs, the Academia/Industry DynAmics (AIDA) and Microsoft Academic Graph (MAG), for completing the information about Fields of Study (e.g., ’neural networks’,’machine learning’, ’artificial intelligence’), and affiliation types (e.g., ’education’, ’company’, ’gov-ernment’), improving the scope and accuracy of the resulting data. We evaluated our approach against alternative solutions on AIDA, MAG, and four other benchmarks (FB15k, FB15k-237, WN18, and WN18RR). Trans4E outperforms the other models when using low embedding dimensions and obtains competitive results in high dimensions.

Papers Accepted At ESWC 2021

We are very pleased to announce that our group got four papers accepted for presentation at ESWC2021. The ESWC is a major venue for discussing the latest scientific results and technology innovations around semantic technologies. Building on its past success, ESWC is seeking to broaden its focus to span other relevant related research areas in which Web semantics plays an important role. The goal of the Semantic Web is to create a Web of knowledge and services in which the semantics of content is made explicit and content is linked to both other content and services allowing novel applications to combine content from heterogeneous sites in unforeseen ways and support enhanced matching between users needs and content. This network of knowledge-based functionality will weave together a large network of human knowledge, and make this knowledge machine-processable to support intelligent behaviour by machines. Creating such an interlinked Web of knowledge which spans unstructured text, structured data (e.g. RDF) as well as multimedia content and services requires the collaboration of many disciplines, including but not limited to: Artificial Intelligence, Natural Language Processing, Databases and Information Systems, Information Retrieval, Machine Learning, Multimedia, Distributed Systems, Social Networks, Web Engineering, and Web Science.

Here are the abstracts and the links to the papers:

  • Grounding Dialogue Systems via Knowledge Graph Aware Decoding with Pre-trained Transformers
    By Debanjan Chaudhuri, Md Rashad Al Hasan Rony, and Jens Lehmann.
    Abstract Generating knowledge grounded responses in both goal and non-goal oriented dialogue systems is an important research challenge. Knowledge Graphs (KG) can be viewed as an abstraction of the real world, which can potentially facilitate a dialogue system to produce knowledge grounded responses. However, integrating KGs into the dialogue generation process in an end-to-end manner is a non-trivial task. This paper proposes a novel architecture for integrating KGs into the response generation process by training a BERT model that learns to answer using the elements of the KG (entities and relations) in a multi-task, end-to-end setting. The k-hop subgraph of the KG is incorporated into the model during training and inference using Graph Laplacian. Empirical evaluation suggests that the model achieves better knowledge groundedness (measured via Entity F1 score) compared to other state-of-the-art models for both goal and non-goal oriented dialogues.
  • Context Transformer with Stacked Pointer Networks for Conversational Question Answering over Knowledge Graphs
    By Joan Plepi,, Endri Kacupaj, Kuldeep Singh, Harsh Thakkar, and Jens Lehmann.
    Abstract Neural semantic parsing approaches have been widely used for Question Answering (QA) systems over knowledge graphs. Such methods provide the flexibility to handle QA datasets with complex queries and a large number of entities. In this work, we propose a novel framework named CARTON, which performs multi-task semantic parsing for handling the problem of conversational question answering over a large-scale knowledge graph. Our framework consists of a stack of pointer networks as an extension of a context transformer model for parsing the input question and the dialog history. The framework generates a sequence of actions that can be executed on the knowledge graph. We evaluate CARTON on a standard dataset for complex sequential question answering on which CARTON outperforms all baselines. Specifically, we observe performance improvements in F1-score on eight out of ten question types compared to the previous state of the art. For logical reasoning questions, an improvement of 11 absolute points is reached.
  • ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation
    By Endri Kacupaj, Barshana Banerjee, Kuldeep Singh, and Jens Lehmann.
    Abstract This paper presents ParaQA, a question answering (QA) dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG). The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation. The existing datasets for conversational question answering over KGs (single-turn/multi-turn) focus on question paraphrasing and provide only up to one answer verbalization. However, ParaQA contains 5000 question-answer pairs with a minimum of two and a maximum of eight unique paraphrased responses for each question. We complement the dataset with baseline models and illustrate the advantage of having multiple paraphrased answers through commonly used metrics such as BLEU and METEOR. The ParaQA dataset is publicly available on a persistent URI for broader usage and adaptation in the research community.
  • A Virtual Knowledge Graph for Enabling DefectTraceability and Customer Service Analytics
    By Nico Wilhelm, Diego Collarana, and Jens Lehmann.
    Abstract In this paper, we showcase the implementation of a semantic information model and a virtual knowledge graph at ZF Friedrichshafen AG company, with two main goals in mind: 1) integration of heterogeneous data sources following a pay-as-you-go approach; and the 2) combination core domain concepts from ZF’s production line with meta-data of its internal data sources. We employ the developed semantic information model in two use cases, defect traceability and customer service, demonstrating and discussing the benefits and opportuni-ties provided by following an agile semantic virtual integration approach.

Paper Accepted At ECIR 2021

We are very pleased to announce that our group got a paper accepted for presentation at ECIR 2021. The ECIR conference is the premier European forum for the presentation of new research results in the broadly conceived area of Information Retrieval (IR), and has a strong focus on the active participation of early-career researchers. The General Chairs of ECIR 2021 invite researchers, academics, students, and industry leaders working in the field to join us online for a rich program featuring full-paper and poster presentations, system demonstrations, tutorials, workshops, an industry-oriented event, and great social events.

Here is the abstract and the link to the paper:

Pattern-Aware and Noise-Resilient Embedding Models
By Mojtaba Nayyeri, Sahar Vahdati, Emanuel Sallinger, Mirza Mohtashim Alam, Hamed Shariat Yazdi and Jens Lehmann.
Abstract Knowledge Graph Embeddings (KGE) have become an important area of Information Retrieval (IR), in particular as they provide one of the state-of-the-art methods for Link Prediction. Recent work in the area of KGEs has shown the importance of relational patterns, i.e., logical formulas, to improve the learning process of KGE models significantly. In separate work, the role of noise in many knowledge discovery and IR settings has been studied, including the KGE setting. So far, very few papers have investigated the KGE setting considering both relational patterns and noise. Not considering both together can lead to problems in the performance of KGE models. We investigate the effect of noise in the presence of patterns. We show that by introducing a new loss function that is both pattern-aware and noise-resilient, significant performance issues can be solved. The proposed loss function is model-independent which could be applied in combination with different models. We provide an experimental evaluation both on synthetic and real-world cases.