Uncategorized – Smart Data Analytics

Paper Published in IEEE Access

2022-10-242022-10-24Mehdi Ali

We are happy to announce that our paper “An Unsupervised Approach for Question Answering Over Knowledge Graphs” has been published in IEEE Access. IEEE Access publishes articles that are of high interest to readers: original, technically correct, and clearly presented. The scope of this journal comprises all IEEE fields of interest, emphasizing applications-oriented and interdisciplinary articles.

Here is the abstract and the link to the paper:

Tree-KGQA: An Unsupervised Approach for Question Answering Over Knowledge Graphs
By Md Rashad Al Hasan Rony, Debanjan Chaudhuri, Ricardo Usbeck, and Jens Lehmann.

Abstract

Most Knowledge Graph-based Question Answering (KGQA) systems rely on training data to reach their optimal performance. However, acquiring training data for supervised systems is both time-consuming and resource-intensive. To address this, in this paper, we propose Tree-KGQA, an unsupervised KGQA system leveraging pre-trained language models and tree-based algorithms. Entity and relation linking are essential components of any KGQA system. We employ several pre-trained language models in the entity linking task to recognize the entities mentioned in the question and obtain the contextual representation for indexing. Furthermore, for relation linking we incorporate a pre-trained language model previously trained for language inference task. Finally, we introduce a novel algorithm for extracting the answer entities from a KG, where we construct a forest of interpretations and introduce tree-walking and tree disambiguation techniques. Our algorithm uses the linked relation and predicts the tree branches that eventually lead to the potential answer entities. The proposed method achieves 4.5% and 7.1% gains in F1 score in entity linking tasks on LC-QuAD 2.0 and LC-QuAD 2.0 (KBpearl) datasets, respectively, and a 5.4% increase in the relation linking task on LC-QuAD 2.0 (KBpearl). The comprehensive evaluations demonstrate that our unsupervised KGQA approach outperforms other supervised state-of-the-art methods on the WebQSP-WD test set (1.4% increase in F1 score) – without training on the target dataset.

Paper Accepted At NAACL22

2022-10-122022-10-12Mehdi Ali

We are very pleased to announce that our group got a papera accepted for presentation at NAACL22. The North American Chapter of the Association for Computational Linguistics (NAACL) provides a regional focus for members of the Association for Computational Linguistics (ACL) in North America as well as in Central and South America, organizes annual conferences, promotes cooperation and information exchange among related scientific and professional societies, encourages and facilitates ACL membership by people and institutions in the Americas, and provides a source of information on regional activities for the ACL Executive Committee.

Here is the abstract and the link to the paper:

DialoKG: Knowledge-Structure Aware Task-Oriented Dialogue Generation
By Md Rashad Al Hasan Rony, Ricardo Usbeck, and Jens Lehmann.

Abstract

Task-oriented dialogue generation is challenging since the underlying knowledge is often dynamic and effectively incorporating knowledge into the learning process is hard. It is particularly challenging to generate both human-like and informative responses in this setting. Recent research primarily focused on various knowledge distillation methods where the underlying relationship between the facts in a knowledge base is not effectively captured. In this paper, we go one step further and demonstrate how the structural information of a knowledge graph can improve the system’s inference capabilities. Specifically, we propose DialoKG, a novel task-oriented dialogue system that effectively incorporates knowledge into a language model. Our proposed system views relational knowledge as a knowledge graph and introduces (1) a structure-aware knowledge embedding technique, and (2) a knowledge graph-weighted attention masking strategy to facilitate the system selecting relevant information during the dialogue generation. An empirical evaluation demonstrates the effectiveness of DialoKG over state-of-the-art methods on several standard benchmark datasets.

Paper Accepted At ICEIS 2022

2022-06-102022-06-10Mehdi Ali

We are very pleased to announce that our group got a paper accepted for presentation at ICEIS 2022. The purpose of the International Conference on Enterprise Information Systems (ICEIS) is to bring together researchers, engineers and practitioners interested in the advances and business applications of information systems. Six simultaneous tracks will be held, covering different aspects of Enterprise Information Systems Applications, including Enterprise Database Technology, Systems Integration, Artificial Intelligence, Decision Support Systems, Information Systems Analysis and Specification, Internet Computing, Electronic Commerce, Human Factors and Enterprise Architecture.

Here is the abstract and the link to the paper:

Efficient Computation of Comprehensive Statistical Information of Large-scale OWL Dataset: A Scalable Approach
By Heba Mohamed, Said Fathalla,, Jens Lehmann, and Hajira Jabeen.

Abstract

Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed approach (OWLStats) for collecting comprehensive statistics over large-scale OWL datasets. OWLStats is a distributed in-memory approach for computing 50 statistical criteria for OWL datasets utilizing Apache Spark. We have successfully integrated OWLStats into the SANSA framework. Experiments results prove that OWLStats is linearly scalable in terms of both node and data scalability.

Paper Accepted At ACL22

2022-05-092022-05-09Mehdi Ali

We are very pleased to announce that our group got a paper accepted for presentation at ACL22.
The Association for Computational Linguistics (ACL) is the premier international scientific and professional society for people working on computational problems involving human language, a field often referred to as either computational linguistics or natural language processing (NLP). The association was founded in 1962, originally named the Association for Machine Translation and Computational Linguistics (AMTCL), and became the ACL in 1968. Activities of the ACL include the holding of an annual meeting each summer and the sponsoring of the journal Computational Linguistics, published by MIT Press; this conference and journal are the leading publications of the field.

Here is the abstract and the link to the paper:

RoMe: A Robust Metric for Evaluating Natural Language Generation
By Md Rashad Al Hasan Rony, Liubov Kovriguina, Debanjan Chaudhuri, Ricardo Usbeck and Jens Lehmann.

Abstract

Evaluating Natural Language Generation (NLG) systems is a challenging task. Firstly, the metric should ensure that the generated hypothesis reflects the reference’s semantics. Secondly, it should consider the grammatical quality of the generated sentence. Thirdly, it should be robust enough to handle various surface forms of the generated sentence. Thus, an effective evaluation metric has to be multifaceted. In this paper, we propose an automatic evaluation metric incorporating several core aspects of natural language understanding (language competence, syntactic and semantic variation). Our proposed metric, RoMe, is trained on language features such as semantic similarity combined with tree edit distance and grammatical acceptability, using a self-supervised neural network to assess the overall quality of the generated sentence. Moreover, we perform an extensive robustness analysis of the state-of-the-art methods and RoMe. Empirical results suggest that RoMe has a stronger correlation to human judgment over state-of-the-art metrics in evaluating system-generated sentences across several NLG tasks.

Paper Accepted At NeurIPS21

2022-03-082022-03-08Mehdi Ali

We are very pleased to announce that our group got a paper accepted for presentation at NeurIPS21.
The conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

Here is the abstract and the link to the paper:

Relational Pattern Benchmarking on the Knowledge Graph Link Prediction Task
By Afshin Sadeghi, Hirra Abdul Malik, Diego Collarana, and Jens Lehmann.

Abstract

Knowledge graphs (KGs) encode facts about the world in a graph data structure where entities, represented as nodes, connect via relationships, acting as edges.KGs are widely used in Machine Learning, e.g., to solve Natural Language Processing based tasks. Despite all the advancements in KGs, they plummet when it comes to completeness. Link Prediction based on KG embeddings targets the sparsity and incompleteness of KGs. Available datasets for Link Prediction do not consider different graph patterns, making it difficult to measure the performance of link prediction models on different KG settings. This paper presents a diverse set of pragmatic datasets to facilitate flexible and problem-tailored Link Prediction and Knowledge Graph Embeddings research. We define graph relational patterns, from being entirely inductive in one set to being transductive in the other. For each dataset, we provide uniform evaluation metrics. We analyze the models over our datasets to compare the model’s capabilities on a specific dataset type. Our analysis of datasets over state-of-the-art models provides a better insight into the suitable parameters for each situation, optimizing the KG-embedding-based systems.

Paper Published in IEEE Access

2021-12-292021-12-29Mehdi Ali

We are happy to announce that our paper “Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain” has been published in IEEE Access. IEEE Access publishes articles that are of high interest to readers: original, technically correct, and clearly presented. The scope of this journal comprises all IEEE’s fields of interest, emphasizing applications-oriented and interdisciplinary articles.

Here is the abstract and the link to the paper:

Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain
By Mojtaba Nayyeri, Gökce Müge Cil, Sahar Vahdati, Francesco Osborne,Andrey Kravchenko, Simone Angioni, Angelo Salatino, Diego Reforgiato Recupero, Enrico Motta, and Jens Lehmann.

Abstract

Knowledge graphs (KGs) are widely used for modeling scholarly communication, performing scientometric analyses, and supporting a variety of intelligent services to explore the literature and predict research dynamics. However, they often suffer from incompleteness (e.g., missing affiliations, references, research topics), leading to a reduced scope and quality of the resulting analyses. This issue is usually tackled by computing knowledge graph embeddings (KGEs) and applying link prediction techniques. However, only a few KGE models are capable of taking weights of facts in the knowledge graph into account. Such weights can have different meanings, e.g. describe the degree of association or the degree of truth of a certain triple. In this paper, we propose the Weighted Triple Loss , a new loss function for KGE models that takes full advantage of the additional numerical weights on facts and it is even tolerant to incorrect weights. We also extend the Rule Loss , a loss function that is able to exploit a set of logical rules, in order to work with weighted triples. The evaluation of our solutions on several knowledge graphs indicates significant performance improvements with respect to the state of the art. Our main use case is the large-scale AIDA knowledge graph, which describes 21 million research articles. Our approach enables to complete information about affiliation types, countries, and research topics, greatly improving the scope of the resulting scientometrics analyses and providing better support to systems for monitoring and predicting research dynamics.

Papers Accepted At EMNLP21

2021-12-272021-12-27Mehdi Ali

We are very pleased to announce that our group got three papers accepted for presentation at EMNLP21. Empirical Methods in Natural Language Processing (EMNLP) is a leading conference in the area of natural language processing and artificial intelligence. Along with the Association for Computational Linguistics (ACL), it is one of the two primary high impact conferences for natural language processing research.

Here are the abstracts and the link to the paper:

Time-aware Graph Neural Networks for Entity Alignment between Temporal Knowledge Graphs
By Chengjin Xu, Fenglong Su, and Jens Lehmann.

Abstract
Entity alignment aims to identify equivalent entity pairs between different knowledge graphs (KGs). Recently, the availability of temporal KGs (TKGs) that contain time information created the need for reasoning over time in such TKGs. Existing embedding- based entity alignment approaches disregard time information that commonly exists in many large-scale KGs, leaving much room for improvement. The figure illustrates the limitation of the existing time-agnostic embedding-based entity align approaches. Given two entities, George H. W. Bush and George Walker Bush, existing in two TKGs respectively, time-agnostic embedding-based approaches are likely to ignore time information and wrongly recognize these two entities as the same person in the real world due to the homogeneity of their neighborhood information. In this paper, we focus on the task of aligning entity pairs between TKGs and propose a novel Time-aware Entity Alignment approach based on Graph Neural Networks (TEA-GNN). We embed entities, relations and timestamps of different KGs into a vector space and use GNNs to learn entity representations. To incorporate both relation and time information into the GNN structure of our model, we use a time-aware attention mechanism which assigns different weights to different nodes with orthogonal transformation matrices computed from embeddings of the relevant relations and timestamps in a neighborhood. Experimental results on multiple real-world TKG datasets show that our method significantly outperforms the state-of-the-art methods due to the inclusion of time information. Our datasets and source code are available at https://github.com/soledad921/TEA-GNN
Knowledge Graph Representation Learning using Ordinary Differential Equations
By Mojtaba Nayyeri, Chengjin Xu, Franca Hoffmann, Mirza Mohtashim Alam, Jens Lehmann, and Sahar Vahdati.

Abstract
Knowledge Graph Embeddings (KGEs) have shown promising performance on link prediction tasks by mapping the entities and relations from a knowledge graph into a geometric space. The capability of KGEs in preserving graph characteristics including structural aspects and semantics, highly depends on the design of their score function, as well as the inherited abilities from the underlying geometry. Many KGEs use the Euclidean geometry which renders them incapable of preserving complex structures and consequently causes wrong inferences by the models. To address this problem, we propose a neuro differential KGE that embeds nodes of a KG on the trajectories of Ordinary Differential Equations (ODEs). To this end, we represent each relation (edge) in a KG as a vector field on several manifolds. We specifically parameterize ODEs by a neural network to represent complex manifolds and complex vector fields on the manifolds. Therefore, the underlying embedding space is capable of assuming the shape of various geometric forms to encode heterogeneous subgraphs. Experiments on synthetic and benchmark datasets using state-of-the-art KGE models justify the ODE trajectories as a means to enable structure preservation and consequently avoiding wrong inferences.
Proxy Indicators for the Quality of Open-domain Dialogues
By Rostislav Nedelchev, Jens Lehmann, and Ricardo Usbeck.

Abstract
The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Despite the abundance of work done in the field, human judges have to evaluate dialogues’ quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and can derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.

Paper Accepted At NAACL21

2021-12-222021-12-22Mehdi Ali

We are very pleased to announce that our group got a papera accepted for presentation at NAACL21. The North American Chapter of the Association for Computational Linguistics (NAACL) provides a regional focus for members of the Association for Computational Linguistics (ACL) in North America as well as in Central and South America, organizes annual conferences, promotes cooperation and information exchange among related scientific and professional societies, encourages and facilitates ACL membership by people and institutions in the Americas, and provides a source of information on regional activities for the ACL Executive Committee.

Here is the abstract and the link to the paper:

Temporal Knowledge Graph Completion using a Linear Temporal Regularizer and Multivector Embeddings
By Chengjin Xu, Yung-Yu Chen, Mojtaba Nayyeri, and Jens Lehmann.

Abstract

Representation learning approaches for knowledge graphs have been mostly designed for static data. However, many knowledge graphs involve evolving data, e.g., the fact (The President of the United States is Barack Obama) is valid only from 2009 to 2017. This introduces important challenges for knowledge representation learning since the knowledge graphs change over time. In this paper, we present a novel time-aware knowledge graph embedding approach, TeLM, which performs 4th-order tensor factorization of a Temporal knowledge graph using a Linear temporal regularizer and Multivector embeddings. Moreover, we investigate the effect of the temporal dataset’s time granularity on temporal knowledge graph completion. Experimental results demonstrate that our proposed models trained with the linear temporal regularizer achieve state-of-the-art performances on link prediction over four well-established temporal knowledge graph completion benchmarks.

Paper Accepted At KEOD21

2021-12-172021-12-17Mehdi Ali

We are very pleased to announce that our group got a paper accepted for presentation at KEOD21 (International Conference on Knowledge Engineering and Ontology Development). KEOD aims at becoming a major meeting point for researchers and practitioners interested in the study and development of methodologies and technologies for Knowledge Engineering and Ontology Development.

Here is the abstract and the link to the paper:

A Scalable Approach for Distributed Reasoning over Large-scale OWL Datasets
By Heba Mohamed, Said Fathalla, Jens Lehmann, and Hajira Jabeen.

Abstract

With the tremendous increase in the volume of semantic data on the Web, reasoning over such an amount of data has become a challenging task. On the other hand, the traditional centralized approaches are no longer feasible for large-scale data due to the limitations of software and hardware resources. Therefore, horizontal scalability is desirable. We develop a scalable distributed approach for RDFS and OWL Horst Reasoning over large-scale OWL datasets. The eminent feature of our approach is that it combines an optimized execution strategy, pre-shuffling method, and duplication elimination strategy, thus achieving an efficient distributed reasoning mechanism. We implemented our approach as open-source in Apache Spark using Resilient Distributed Datasets (RDD) as a parallel programming model. As a use case, our approach is used by the SANSA framework for large-scale semantic reasoning over OWL datasets. The evaluation results have shown the strength of the proposed approach for both data and node scalability.

Paper Published in TPAMI

2021-12-142021-12-14Mehdi Ali

We are thrilled to announce that our paper “LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules” has been published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). TPAMI publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.

LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules
By Mojtaba Nayyeri, Chengjin Xu,, Mirza Mohtashim Alam, Jens Lehmann, and Hamed Shariat Yazdi.

Abstract

Knowledge graph embedding models have gained significant attention in AI research. The aim of knowledge graph embedding is to embed the graphs into a vector space in which the structure of the graph is preserved. Recent works have shown that the inclusion of background knowledge, such as logical rules, can improve the performance of embeddings in downstream machine learning tasks. However, so far, most existing models do not allow the inclusion of rules. We address the challenge of including rules and present a new neural based embedding model (LogicENN). We prove that LogicENN can learn every ground truth of encoded rules in a knowledge graph. To the best of our knowledge, this has not been proved so far for the neural based family of embedding models. Moreover, we derive formulae for the inclusion of various rules, including (anti-)symmetric, inverse, irreflexive and transitive, implication, composition, equivalence, and negation. Our formulation allows avoiding grounding for implication and equivalence relations. Our experiments show that LogicENN outperforms the existing models in link prediction.