We are very pleased to announce that our group got a paper accepted for presentation at IDA2021. Advancing Intelligent Data Analysis requires novel, potentially game-changing ideas. IDA’s mission is to promote ideas over performance: a solid motivation can be as convincing as exhaustive empirical evaluation.
Here is the abstract and the link to the paper:
HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data
By
Diego Esteves,
José Marcelino,,
Piyush Chawla,
Asja Fischer,, and
Jens Lehmann.
Abstract
Recent work based on Deep Learning presents state-of-the-art (SOTA) performance in the named entity recognition (NER) task. However, such models still have the performance drastically reduced in noisy data (eg, social media, search engines), when compared to the formal domain (eg, newswire). Thus, designing and exploring new methods and architectures is highly necessary to overcome current challenges. In this paper, we shift the focus of existing solutions to an entirely different perspective. We investigate the potential of embedding word-level features extracted from images and news. We performed a very comprehensive study in order to validate the hypothesis that images and news (obtained from an external source) may boost the task on noisy data, revealing very interesting findings. When our proposed architecture is used:(1) We beat SOTA in precision with simple CRFs models (2) The overall performance of decision trees-based models can be drastically improved.(3) Our approach overcomes off-the-shelf models for this task.(4) Images and text consistently increased recall over different datasets for SOTA, but at cost of precision. All experiment configurations, data and models are publicly available to the research community at horus-ner.org