New Year at SDA - Looking back at 2018🗓 2019-01-03 ✍ Prof. Dr. Jens Lehmann
2019 has just started and we want to take a moment to look back at a very busy and successful year 2018, full of new members, inspirational discussions, exciting conferences, accepted research papers, new software releases and a lot of highlights we had throughout the year.
Below is a short summary of the main cornerstones for 2018:
An interesting future for AI and knowledge graphs
Artificial intelligence/machine learning and semantic technologies/knowledge graphs are central topics for SDA. Throughout the year, we have been able to accomplish a range of interesting research achievements. One particularly active area was question answering and dialogue systems (with and without knowledge graphs). We acquired new projects for more than a million Euro this year and were able to transfer our expertise to industry via successful projects at Fraunhofer. External interest in our results has been remarkably high. Furthermore, we extended our already established position in scalable distributed querying, inference, and analysis of large RDF datasets. Among the race for ever-improving achievements in AI, which has gone far beyond what many could have imagined 10 years ago, our researchers were able to deliver important contributions and continued to shape different sub-areas of the growing AI research landscape.
We had 41 papers accepted at well-known conferences (i.e., the AAAI 2019 workshops, ISWC 2018, ESWC 2018, Nature Scientific Data Journal, Journal of Web Semantics, Semantic Web Journal, WWW 2018 workshops, EMNLP 2018 workshops, ECML 2018 workshops, CoNLL 2018, SIGMOD 2018 workshops, SIGIR 2018, ICLR 2018, EKAW 2018, SEMANTiCS 2018, ICWE 2018, ICSC 2018, TPDL 2018, JURIX 2018 and more. We estimate that SDA members had approximately 2500+ citations per year (based on Google Scholar profiles).
SANSA - An open source data flow processing engine for performing distributed computation over large-scale RDF datasets had 2 successfully released during 2018 (SANSA 0.5 and SANSA 0.4).
From the funded projects we were happy to launch the first major release of the Big Data Ocean platform - a platform for Exploiting Ocean's of Data for Maritime Applications.
There were several other releases:
- SML-Bench - A Structured Machine Learning benchmark framework 0.2 has been released.
- WebVOWL - A web-based visualization for ontologies had several releases in 2018. AS a major new feature characterizing WebVOWL is the integration of the WebVOWL Editor - a Device-Independent Visual Ontology Modeling.
- AskNowQA - A Suite of Natural Language interaction technologies that behave intelligently through domain knowledge. The 0.1 version has been released.
- Move to the brand new Computer Science Campus: After many delays, we finally moved into our new campus where we have modern rooms and equipment.
- A Best Demo Award at ISWC 2018
- Two PhD defenses: Mikhail Galkin and Lavdim Halilaj both successfully defended their PhD thesis. Congratulations to them again! Four more theses have been submitted, with defenses scheduled for January and February.
- Many invited speakers (Prof. Dr. John Domingue, Prof. Dr. Khalid Saeed, Dr. Anastasia Dimou, Svitlana Vakulenko and Dr. Katherine Thornton).
- We did an off-site meeting together with the EIS department of Fraunhofer IAIS, at their place.
Likewise, SDA deeply values team bonding activities. Often we try to introduce fun activities that involve teamwork and teambuilding. At our X-mas party, we enjoyed a very international and lovely dinner together while exchanging a `Secret Santa` gifts and played some ad-hoc games.
Long-term team building through deeper discussions, genuine connections and healthy communication helps us to connect within the group!
Many thanks to all who have accompanied and supported us on this way! So from all of us at SDA, we wish you a wonderful new year!
Jens Lehmann on behalf of The SDA Research Team
SANSA Collaboration with Alethio🗓 2018-07-13 ✍ Gezim Sejdiu
The SANSA team is excited to announce our collaboration with Alethio (a ConsenSys formation). SANSA is the major distributed, open source solution for RDF querying, reasoning and machine learning. Alethio is building an Ethereum analytics platform that strives to provide transparency over what’s happening on the Ethereum p2p network, the transaction pool and the blockchain and provide “blockchain archeology”. Their 5 billion triple data set contains large scale blockchain transaction data modelled as RDF according to the structure of the Ethereum ontology. EthOn - The Ethereum Ontology - is a formalization of concepts/entities and relations of the Ethereum ecosystem represented in RDF and OWL format. It describes all Ethereum terms including blocks, transactions, contracts, nonces etc. as well as their relationships. Its main goal is to serve as a data model and learning resource for understanding Ethereum. Alethio is interested in using SANSA as a scalable processing engine for their large-scale batch and stream processing tasks, such as querying the data in real time via SPARQL and performing related analytics on a wide range of subjects (e.g. asset turnover for sets of accounts, attack pattern detection or Opcode usage statistics). At the same time, SANSA is interested in further industrial pilot applications for testing the scalability on larger datasets, mature its code base and gain experience on running the stack on production clusters. Specifically, the initial goal of Alethio was to load a 2TB EthOn dataset containing more than 5 billion triples and then performing several analytic queries on it with up to three inner joins. The queries are used to characterize movement between groups of ethereum accounts (e.g. exchanges or investors in ICOs) and aggregate their in and out value flow over the history of the Ethereum blockchain. The experiments were successfully run by Alethio on a cluster with up to 100 worker nodes and 400 cores that have a total of over 3TB of memory available. “I am excited to see that SANSA works and scales well to our data. Now, we want to experiment with more complex queries and tune the Spark parameters to gain the optimal performance for our dataset” said Johannes Pfeffer, co-founder of Alethio. “I am glad that Alethio managed to run their workload and to see how well our methods scale to a 5 billion triple dataset”, added Gezim Sejdiu, PhD student at the Smart Data Analytics Group and SANSA core developer. Parts of the SANSA team, including its leader Prof. Jens Lehmann as well as Dr. Hajira Jabeen, Dr. Damien Graux and Gezim Sejdiu, will now continue the collaboration together with the data science team of Alethio after those successful experiments. Beyond the above initial tests, we are jointly discussing possibilities for efficient stream processing in SANSA, further tuning of aggregate queries as well as suitable Apache Spark parameters for efficient processing of the data. In the future, we want to join hands to optimize the performance of loading the data (e.g. reducing the disk footprint of datasets using compression techniques allowing then more efficient SPARQL evaluation), handling the streaming data, querying, and analytics in real time. The SANSA team is happily looking forward to further interesting scientific research as well as industrial adaptation. Tweet
Core model of the fork history of the Ethereum Blockchain modeled in EthOn
SOLIDE at the BMBF Innovation Forum “Civil Security” 2018🗓 2018-07-13 ✍ Gezim Sejdiu
SDA as part of SOLIDE project participated at the invitation of the Federal Ministry of Education and Research, the BMBF Innovation Forum “Civil Security” 2018 took place on 19 and 20 June 2018. The two-day conference on the framework program “Research for Civil Security” was held in the Café Moskau conference center in Berlin.
SOLIDE, as one of the funded project from BMBF has been presented during the event in the context of the session “Mission Support – Better Situation Management through Intelligent Information Acquisition”
The SOLIDE project aims to examine a new approach for efficient access to operational data using the command mission management software TecBos Command. The focus here is on the fact that information can be accessed in a natural language dialogue. For this purpose, we do research into subject-specific algorithms for filtering relevant knowledge as well as suitable data integration procedures to make the available data usable and retrievable via dialogues.
SOLIDE is a joint project of PRO DV (Dortmund), Aristech GmbH (Heidelberg) together with the research group Smart Data Analytics (SDA) of the University of Bonn and the Data Science Chair (DICE) of the University of Paderborn.
SDA contribute to the project by providing a cut edge dialog system for providing information support in emergency situations.