PatrickWesPhD Student
Agile Knowledge Engineering and Semantic Web (AKSW)
University of Leipzig

Profiles: LinkedInDBLP

Hainstraße 11, 04109 Leipzig
patrick.westphal@informatik.uni-leipzig.de
Phone: +49-341-9732305

 

Short CV


Patrick Westphal is a PhD Student at the University of Leipzig. Patrick’s research interests are in the area of Structured Machine Learning and its application on Big Data problems.

Research Interests


  • Ontology Learning
  • Reasoning
  • Question Answering
  • Big Structured Machine Learning

Publications


2017

  • J. Lehmann, G. Sejdiu, L. Bühmann, P. Westphal, C. Stadler, I. Ermilov, S. Bin, N. Chakraborty, M. Saleem, A. N. Ngonga, and H. Jabeen, “Distributed Semantic Analytics using the SANSA Stack,” in Proceedings of 16th International Semantic Web Conference – Resources Track (ISWC’2017), 2017.
    [BibTeX] [Abstract] [Download PDF]
    Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies. A major research challenge today is to perform scalable analysis of large-scale knowledge graphs to facilitate applications like link prediction, knowledge base completion and question answering. Most analytics approaches, which scale horizontally (i.e., can be executed in a distributed environment) work on simple feature-vector-based input rather than more expressive knowledge structures. On the other hand, analytics methods which exploit expressive structures usually do not scale well to very large knowledge bases. This software framework paper describes the ongoing project Semantic Analytics Stack (SANSA) which supports expressive and scalable semantic analytics by providing functionality for distributed in-memory computing for RDF data. The library provides APIs for RDF storage, querying using SPARQL and forward chaining inference. It includes several machine learning algorithms for RDF knowledge graphs. The article describes the vision, architecture and use cases of SANSA.

    @InProceedings{lehmann-2017-sansa-iswc,
    Title = {Distributed {S}emantic {A}nalytics using the {SANSA} {S}tack},
    Author = {Lehmann, Jens and Sejdiu, Gezim and B\"uhmann, Lorenz and Westphal, Patrick and Stadler, Claus and Ermilov, Ivan and Bin, Simon and Chakraborty, Nilesh and Saleem, Muhammad and Ngonga, Axel-Cyrille Ngomo and Jabeen, Hajira},
    Booktitle = {Proceedings of 16th International Semantic Web Conference - Resources Track (ISWC'2017)},
    Year = {2017},
    Abstract = {Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies. A major research challenge today is to perform scalable analysis of large-scale knowledge graphs to facilitate applications like link prediction, knowledge base completion and question answering. Most analytics approaches, which scale horizontally (i.e., can be executed in a distributed environment) work on simple feature-vector-based input rather than more expressive knowledge structures. On the other hand, analytics methods which exploit expressive structures usually do not scale well to very large knowledge bases. This software framework paper describes the ongoing project Semantic Analytics Stack (SANSA) which supports expressive and scalable semantic analytics by providing functionality for distributed in-memory computing for RDF data. The library provides APIs for RDF storage, querying using SPARQL and forward chaining inference. It includes several machine learning algorithms for RDF knowledge graphs. The article describes the vision, architecture and use cases of SANSA.},
    Added-at = {2017-07-17T14:46:26.000+0200},
    Biburl = {https://www.bibsonomy.org/bibtex/21ae18ac13750f9cf74227fe0a7c50104/aksw},
    Interhash = {eb99dff0ce6a9cdbce2c4cbea115fbee},
    Intrahash = {1ae18ac13750f9cf74227fe0a7c50104},
    Keywords = {2017 bde buehmann chakraborty group_aksw iermilov lehmann ngonga saleem sbin sejdiu stadler westphal},
    Owner = {iermilov},
    Timestamp = {2017-07-17T14:46:26.000+0200},
    Url = {http://svn.aksw.org/papers/2017/ISWC_SANSA_SoftwareFramework/public.pdf}
    }

  • I. Ermilov, J. Lehmann, G. Sejdiu, L. Bühmann, P. Westphal, C. Stadler, S. Bin, N. Chakraborty, H. Petzka, M. Saleem, A. N. Ngonga, and H. Jabeen, “The Tale of Sansa Spark,” in Proceedings of 16th International Semantic Web Conference, Poster & Demos, 2017.
    [BibTeX] [Download PDF]
    @InProceedings{iermilov-2017-sansa-iswc-demo,
    Title = {The {T}ale of {S}ansa {S}park},
    Author = {Ermilov, Ivan and Lehmann, Jens and Sejdiu, Gezim and B\"uhmann, Lorenz and Westphal, Patrick and Stadler, Claus and Bin, Simon and Chakraborty, Nilesh and Petzka, Henning and Saleem, Muhammad and Ngonga, Axel-Cyrille Ngomo and Jabeen, Hajira},
    Booktitle = {Proceedings of 16th International Semantic Web Conference, Poster \& Demos},
    Year = {2017},
    Added-at = {2017-08-31T16:24:45.000+0200},
    Biburl = {https://www.bibsonomy.org/bibtex/2f9b5a69afa4755944984ae63f59ad146/aksw},
    Interhash = {ebabfe08f697304b399c9b6b89f2829e},
    Intrahash = {f9b5a69afa4755944984ae63f59ad146},
    Keywords = {2017 bde buehmann chakraborty group_aksw iermilov lehmann mole ngonga saleem sbin sejdiu stadler westphal},
    Owner = {iermilov},
    Timestamp = {2017-08-31T16:24:45.000+0200},
    Url = {http://jens-lehmann.org/files/2017/iswc_pd_sansa.pdf}
    }

  • S. Bin, P. Westphal, J. Lehmann, and A. N. Ngonga, “Implementing Scalable Structured Machine Learning for Big Data in the SAKE Project,” in IEEE Big Data Conference 2017, 2017.
    [BibTeX] [Download PDF]
    @inproceedings{bin-2017-sake,
    added-at = {2017-11-17T14:26:26.000+0100},
    author = {Bin, Simon and Westphal, Patrick and Lehmann, Jens and Ngonga, Axel-Cyrille Ngomo},
    biburl = {https://www.bibsonomy.org/bibtex/224f107297aa2a27c82b875e63c9b9055/aksw},
    booktitle = {IEEE Big Data Conference 2017},
    interhash = {8ff7e69474050557c9f872c41433cc04},
    intrahash = {24f107297aa2a27c82b875e63c9b9055},
    keywords = {2017 bin group_aksw lehmann mole ngonga sake westphal},
    timestamp = {2017-11-17T14:26:26.000+0100},
    title = {Implementing Scalable Structured Machine Learning for Big Data in the SAKE Project},
    url = {http://jens-lehmann.org/files/2017/ieee_bigdata_sake.pdf},
    year = 2017
    }

2016

  • L. Bühmann, J. Lehmann, and P. Westphal, “DL-Learner – A framework for inductive learning on the Semantic Web,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 39, pp. 15-24, 2016. doi:http://dx.doi.org/10.1016/j.websem.2016.06.001
    [BibTeX] [Abstract] [Download PDF]
    Abstract In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using \{OWL\} and \{RDF\} for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main \{OWL\} and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.

    @Article{Buehmann2016,
    Title = {DL-Learner - A framework for inductive learning on the Semantic Web },
    Author = {Lorenz B{\"u}hmann and Jens Lehmann and Patrick Westphal},
    Journal = {Web Semantics: Science, Services and Agents on the World Wide Web },
    Year = {2016},
    Pages = {15 - 24},
    Volume = {39},
    Abstract = {Abstract In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using \{OWL\} and \{RDF\} for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main \{OWL\} and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.},
    Doi = {http://dx.doi.org/10.1016/j.websem.2016.06.001},
    ISSN = {1570-8268},
    Keywords = {dllearner group_aksw group_mole mole buehmann lehmann westphal dllearner sys:relevantFor:infai sys:relevantFor:bis sys:relevantFor:lmol MOLE},
    Owner = {me},
    Timestamp = {2016.10.13},
    Url = {http://www.sciencedirect.com/science/article/pii/S157082681630018X}
    }

2015

  • C. Stadler, J. Unbehauen, P. Westphal, M. A. Sherif, and J. Lehmann, “Simplified RDB2RDF Mapping,” in Proceedings of the 8th Workshop on Linked Data on the Web (LDOW2015), Florence, Italy, 2015.
    [BibTeX] [Abstract] [Download PDF]
    The combination of the advantages of widely used relational databases and semantic technologies has attracted significant research over the past decade. In particular, mapping languages for the conversion of databases to RDF knowledge bases have been developed and standardized in the form of R2RML. In this article, we first review those mapping languages and then devise work towards a unified formal model for them. Based on this, we present the Sparqlification Mapping Language (SML), which provides an intuitive way to declare mappings based on SQL VIEWS and SPARQL construct queries. We show that SML has the same expressivity as R2RML by enumerating the language features and show the correspondences, and we outline how one syntax can be converted into the other. A conducted user study for this paper juxtaposing SML and R2RML provides evidence that SML is a more compact syntax which is easier to understand and read and thus lowers the barrier to offer SPARQL access to relational databases.

    @InProceedings{sml,
    Title = {Simplified {RDB2RDF} Mapping},
    Author = {Claus Stadler and Joerg Unbehauen and Patrick Westphal and Mohamed Ahmed Sherif and Jens Lehmann},
    Booktitle = {Proceedings of the 8th Workshop on Linked Data on the Web (LDOW2015), Florence, Italy},
    Year = {2015},
    Abstract = {The combination of the advantages of widely used relational databases and semantic technologies has attracted significant research over the past decade. In particular, mapping languages for the conversion of databases to RDF knowledge bases have been developed and standardized in the form of R2RML. In this article, we first review those mapping languages and then devise work towards a unified formal model for them. Based on this, we present the Sparqlification Mapping Language (SML), which provides an intuitive way to declare mappings based on SQL VIEWS and SPARQL construct queries. We show that SML has the same expressivity as R2RML by enumerating the language features and show the correspondences, and we outline how one syntax can be converted into the other. A conducted user study for this paper juxtaposing SML and R2RML provides evidence that SML is a more compact syntax which is easier to understand and read and thus lowers the barrier to offer SPARQL access to relational databases.},
    Bdsk-url-1 = {svn.aksw.org/papers/2015/LDOW_SML/paper-camery-ready_public.pdf},
    Keywords = {2015 group_aksw group_mole mole stadler lehmann sherif sys:relevantFor:geoknow geoknow peer-reviewed MOLE westphal},
    Url = {svn.aksw.org/papers/2015/LDOW_SML/paper-camery-ready_public.pdf}
    }

  • J. Lehmann, S. Athanasiou, A. Both, A. Garcia-Rojas, G. Giannopoulos, D. Hladky, K. Hoeffner, J. J. L. Grange, A. N. Ngomo, M. A. Sherif, C. Stadler, M. Wauer, P. Westphal, and V. Zaslawski, “Managing Geospatial Linked Data in the GeoKnow Project.” , 2015, pp. 51-78.
    [BibTeX] [Download PDF]
    @InBook{ios_geoknow_chapter,
    Title = {Managing Geospatial Linked Data in the GeoKnow Project},
    Author = {Jens Lehmann and Spiros Athanasiou and Andreas Both and Alejandra Garcia-Rojas and Giorgos Giannopoulos and Daniel Hladky and Konrad Hoeffner and Jon Jay Le Grange and Axel-Cyrille Ngonga Ngomo and Mohamed Ahmed Sherif and Claus Stadler and Matthias Wauer and Patrick Westphal and Vadim Zaslawski},
    Pages = {51--78},
    Year = {2015},
    Series = {Studies on the Semantic Web},
    Keywords = {2015 group_aksw sys:relevantFor:infai sys:relevantFor:bis sys:relevantFor:geoknow lehmann ngonga MOLE sherif hoeffner geoknow wauer westphal},
    Url = {http://jens-lehmann.org/files/2015/ios_geoknow_chapter.pdf}
    }

  • J. Lehmann, S. Athanasiou, A. Both, L. Buehmann, A. Garcia-Rojas, G. Giannopoulos, D. Hladky, K. Hoeffner, J. J. L. Grange, A. N. Ngomo, R. Pietzsch, R. Isele, M. A. Sherif, C. Stadler, M. Wauer, and P. Westphal, “The GeoKnow Handbook,” 2015.
    [BibTeX] [Download PDF]
    @TechReport{geoknow_handbook,
    Title = {The {G}eo{K}now Handbook},
    Author = {Jens Lehmann and Spiros Athanasiou and Andreas Both and Lorenz Buehmann and Alejandra Garcia-Rojas and Giorgos Giannopoulos and Daniel Hladky and Konrad Hoeffner and Jon Jay Le Grange and Axel-Cyrille Ngonga Ngomo and Rene Pietzsch and Robert Isele and Mohamed Ahmed Sherif and Claus Stadler and Matthias Wauer and Patrick Westphal},
    Year = {2015},
    Keywords = {2015 group_aksw sys:relevantFor:infai sys:relevantFor:bis sys:relevantFor:geoknow lehmann ngonga MOLE sherif hoeffner geoknow westphal buehmann},
    Url = {http://jens-lehmann.org/files/2015/geoknow_handbook.pdf}
    }

2014

  • P. Westphal, C. Stadler, and J. Lehmann, “Quality Assurance of RDB2RDF Mappings,” 2014.
    [BibTeX] [Download PDF]
    @TechReport{rdb2rdf_qa,
    Title = {Quality Assurance of RDB2RDF Mappings},
    Author = {Patrick Westphal and Claus Stadler and Jens Lehmann},
    Year = {2014},
    Bdsk-url-1 = {http://svn.aksw.org/papers/2014/report_QA_RDB2RDF/public.pdf},
    Institute = {University of Leipzig},
    Keywords = {2014 group_aksw MOLE sys:relevantFor:infai sys:relevantFor:bis lehmann westphal stadler},
    Url = {http://svn.aksw.org/papers/2014/report_QA_RDB2RDF/public.pdf}
    }

  • D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelissen, and A. Zaveri, “Test-driven Evaluation of Linked Data Quality,” in Proceedings of the 23rd International Conference on World Wide Web, 2014, pp. 747-758. doi:10.1145/2566486.2568002
    [BibTeX] [Abstract] [Download PDF]
    Linked Open Data (LOD) comprises of an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowd-sourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue, that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with LOV. One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.

    @InProceedings{kontokostasDatabugger,
    Title = {Test-driven Evaluation of Linked Data Quality},
    Author = {Kontokostas, Dimitris and Westphal, Patrick and Auer, S\"{o}ren and Hellmann, Sebastian and Lehmann, Jens and Cornelissen, Roland and Zaveri, Amrapali},
    Booktitle = {Proceedings of the 23rd International Conference on World Wide Web},
    Year = {2014},
    Pages = {747--758},
    Publisher = {International World Wide Web Conferences Steering Committee},
    Series = {WWW '14},
    Abstract = {Linked Open Data (LOD) comprises of an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowd-sourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue, that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with LOV. One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.},
    Acmid = {2568002},
    Bdsk-url-1 = {http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf},
    Bdsk-url-2 = {http://dx.doi.org/10.1145/2566486.2568002},
    Date-modified = {2015-02-06 06:56:57 +0000},
    Doi = {10.1145/2566486.2568002},
    ISBN = {978-1-4503-2744-2},
    Keywords = {2014 group_aksw dllearner MOLE sys:relevantFor:infai sys:relevantFor:bis sys:relevantFor:lod2 sys:relevantFor:geoknow topic_QualityAnalysis lod2page lehmann kontokostas rdfunit dataquality westphal},
    Location = {Seoul, Korea},
    Numpages = {12},
    Timestamp = {2014.01.23},
    Url = {http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf}
    }

  • D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, and R. Cornelissen, “Databugger: A Test-driven Framework for Debugging the Web of Data,” in Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, 2014, pp. 115-118. doi:10.1145/2567948.2577017
    [BibTeX] [Abstract] [Download PDF]
    Linked Open Data (LOD) comprises of an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowd-sourced or extracted data of often relatively low quality. We present Databugger, a framework for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. Databugger ensures a basic level of quality by accompanying vocabularies, ontologies and knowledge bases with a number of test cases. The formalization behind the tool employs SPARQL query templates, which are instantiated into concrete quality test queries. The test queries can be instantiated automatically based on a vocabulary or manually based on the data semantics. One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.

    @InProceedings{databugger_demo,
    Title = {Databugger: A Test-driven Framework for Debugging the Web of Data},
    Author = {Kontokostas, Dimitris and Westphal, Patrick and Auer, S\"{o}ren and Hellmann, Sebastian and Lehmann, Jens and Cornelissen, Roland},
    Booktitle = {Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion},
    Year = {2014},
    Pages = {115--118},
    Publisher = {International World Wide Web Conferences Steering Committee},
    Series = {WWW Companion '14},
    Abstract = {Linked Open Data (LOD) comprises of an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowd-sourced or extracted data of often relatively low quality. We present Databugger, a framework for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. Databugger ensures a basic level of quality by accompanying vocabularies, ontologies and knowledge bases with a number of test cases. The formalization behind the tool employs SPARQL query templates, which are instantiated into concrete quality test queries. The test queries can be instantiated automatically based on a vocabulary or manually based on the data semantics. One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.},
    Acmid = {2577017},
    Bdsk-url-1 = {http://jens-lehmann.org/files/2014/www_demo_databugger.pdf},
    Bdsk-url-2 = {http://dx.doi.org/10.1145/2567948.2577017},
    Doi = {10.1145/2567948.2577017},
    ISBN = {978-1-4503-2745-9},
    Keywords = {2014 group_aksw dllearner MOLE sys:relevantFor:infai sys:relevantFor:bis sys:relevantFor:lod2 sys:relevantFor:geoknow topic_EvolutionRepair lod2page lehmann kontokostas rdfunit westphal},
    Location = {Seoul, Korea},
    Numpages = {4},
    Url = {http://jens-lehmann.org/files/2014/www_demo_databugger.pdf}
    }

  • C. Stadler, P. Westphal, and J. Lehmann, “Jassa – A JavaScript suite for SPARQL-based faceted search,” in Proceedings of the ISWC Developers Workshop 2014, co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19, 2014., 2014, pp. 31-36.
    [BibTeX] [Download PDF]
    @InProceedings{jassa,
    Title = {Jassa - {A} JavaScript suite for SPARQL-based faceted search},
    Author = {Claus Stadler and Patrick Westphal and Jens Lehmann},
    Booktitle = {Proceedings of the {ISWC} Developers Workshop 2014, co-located with the 13th International Semantic Web Conference {(ISWC} 2014), Riva del Garda, Italy, October 19, 2014.},
    Year = {2014},
    Pages = {31--36},
    Bdsk-url-1 = {http://ceur-ws.org/Vol-1268/paper6.pdf},
    Bibsource = {dblp computer science bibliography, http://dblp.org},
    Biburl = {http://dblp.uni-trier.de/rec/bib/conf/semweb/StadlerWL14},
    Crossref = {DBLP:conf/semweb/2014dev},
    Keywords = {2014 group_aksw geoknow topic_geospatial MOLE sys:relevantFor:infai sys:relevantFor:bis sys:relevantFor:geoknow lehmann westphal stadler},
    Timestamp = {Mon, 27 Oct 2014 20:39:35 +0100},
    Url = {http://ceur-ws.org/Vol-1268/paper6.pdf}
    }