Papers accepted at K-CAP 2017 – Smart Data Analytics

We are very pleased to announce that our group got 5 papers accepted for presentation at K-CAP 2017, which will be held on December 4th-6th, 2017, Austin, Texas, United States.
The Ninth International Conference on Knowledge Capture attracts researchers from diverse areas of Artificial Intelligence, including knowledge representation, knowledge acquisition, intelligent user interfaces, problem-solving and reasoning, planning, agents, text extraction, and machine learning, information enrichment and visualization, as well as researchers interested in cyber-infrastructures to foster the publication, retrieval, reuse, and integration of data.

Here is the list of the accepted paper with their abstract:

“Capturing Knowledge in Semantically-typed Relational Patterns to Enhance Relation Linking” by Kuldeep Singh, Isaiah Onando Mulang, Ioanna Lytra, Mohamad Yaser Jaradeh, Ahmad Sakor, Maria-Esther Vidal, Christoph Lange and Sören Auer.

Abstract: Transforming natural language questions into formal queries is an integral task in Question Answering (QA) systems. QA systems built on knowledge graphs like DBpedia, require an extra step after Natural Language Processing (NLP) for linking words, specifically including named entities and relations, to their corresponding entities in a knowledge graph. To achieve this task, several approaches rely on background knowledge bases containing semantically-typed relations, e.g., PATTY, for an extra disambiguation step. Two major factors may affect the performance of relation linking approaches whenever background knowledge bases are accessed: a)limited availability of such semantic knowledge sources, and b) lack of a systematic approach on how to maximize the benefits of the collected knowledge. We tackle this problem and devise SIBKB, a semantic-based index able to capture knowledge encoded on background knowledge bases like PATTY. SIBKB represents a background knowledge base as a bi-partite and a dynamic index over the relation patterns included the knowledge base. Moreover, we develop a relation linking component able to exploit SIBKB features. The benefits of SIBKB are empirically studied on existing QA benchmarks. Observed results suggest that SIBKB is able to enhance the accuracy of relation linking by up to three times.

“SimDoc: Topic Sequence Alignment based Document Similarity Framework” by Gaurav Maheshwari, Priyansh Trivedi, Harshita Sahijwani, Kunal Jha, Sourish Dasgupta and Jens Lehmann.

Abstract: Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering, text mining, and question-answering. In this paper, we show that a document’s thematic flow, which is often disregarded by bag-of-word techniques, is pivotal in estimating their similarity. To this end, we propose a novel semantic document similarity framework, called SimDoc. We model documents as topic-sequences, where topics represent latent generative clusters of related words. Then, we use a sequence alignment algorithm to estimate their semantic similarity. We further conceptualize a novel mechanism to compute topic-topic similarity to fine tune our system. In our experiments, we show that SimDoc outperforms many contemporary bag-of-words techniques in accurately computing document similarity, and on practical applications such as document clustering.

“SQCFramework: SPARQL Query Containment Benchmark Generation Framework” by Muhammad Saleem, Claus Stadler, Qaiser Mehmood, Jens Lehmann and Axel-Cyrille Ngonga Ngomo.

Abstract: Query containment is a fundamental problem in data management. Its main application is in global query optimization. A number of SPARQL query containment solvers for SPARQL have been developed recently. To the best of our knowledge, the Query Containment Benchmark (QC-Bench) is the only benchmark for evaluating these containment solvers. However, this benchmark contains a fixed number of synthetic queries, which were handcrafted by its creators. We propose SQCFramework, a SPARQL query containment benchmark generation framework which is able to generate customized SPARQL containment benchmarks from real SPARQL query logs. The framework is flexible enough to generate benchmarks of varying sizes and according to the user-defined criteria on the most important SPARQL features to be considered for query containment benchmarking. The generation of benchmarks is achieved using different clustering algorithms. We compare state-of-the-art SPARQL query containment solvers by using different query containment benchmarks generated from DBpedia and Semantic Web Dog Food query logs.

“Semantic Zooming for Ontology Graph Visualizations” by Vitalis Wiens, Steffen Lohmann and Sören Auer.

Abstract: Visualizations of ontologies, in particular graph visualizations in the form of node-link diagrams, are often used to support ontology development, exploration, verification, and sensemaking. With growing size and complexity of ontology graph visualizations, their represented information tend to become hard to comprehend due to visual clutter and information overload. We present an approach that abstracts and simplifies the underlying graph structure of ontologies. The new approach of semantic zooming for ontology graph visualizations separates the comprised information of an ontology into three layers with discrete levels of detail. The visual appearance layer is defined with the support of expert interviews. The approach is applied on a force-directed layout using the VOWL notation. The mental map is preserved using smart expanding and ordering of elements in the layout. Navigation and sensemaking are supported by local and global exploration methods, halo visualization, and smooth zooming. The results of a user study confirm an increase in readability, visual clarity, and information clarity of ontology graph visualizations enhanced with our semantic zooming approach.

“Bidirectional LSTM with a Context Input Window for Named Entity Recognition in Tweets” by Rafael Peres, Diego Esteves and Gaurav Maheshwari.

Abstract: Lately, with the increasing popularity of social media technologies, applying natural language processing for mining information in tweets has posed itself as a challenging task and has attracted significant research efforts. In contrast with the news text and others formal content, tweets pose a number of new challenges, due to their short and noisy nature. Thus, over the past decade, different Named Entity Recognition (NER) architectures have been proposed to solve this problem. However, most of them are based on handcrafted-features and restricted to a particular domain, which imposes a natural barrier to generalize over different contexts. In this sense, despite the long line of work in NER on formal domains, there are no studies in NER for tweets in Portuguese (despite 17.97
million monthly active users). To bridge this gap, we present a new gold-standard corpus of tweets annotated for Person, Location, and Organization (PLO). Additionally, we also perform multiple NER experiments using a variety of Long Short-Term Memory (LSTM) based models without resorting to any handcrafted rules. Our approach with a centered context input window of word embeddings yields 52.78 F1 score, 38.68% higher compared to a state of the art baseline system

Acknowledgments
These work were supported by the European Union’s H2020 research and innovation program BigDataEurope (GA no. 644564), WDAqua : Marie Skłodowska-Curie Innovative Training Network (GA no. 642795) and by the European Union’s Horizon 2020 research and innovation programme GRACeFUL (GA no. 640954).

Looking forward to seeing you at K-CAP 2017.