Logical Formalisms for Information Extraction
September 2nd, 2019
EDBT Summer School 2019, Lyon, Saint Germain au Mont D'Or, France
The abundance and availability of valuable textual resources position text analytics as a standard component in data-driven workflows. To facilitate the incorporation of such resources, a core operation is the extraction of structured data from text, a classic task known as Information Extraction (IE). The lecture will begin with a short overview of the algorithmic concepts and techniques used for performing IE tasks, including declarative frameworks that provide abstractions and infrastructures for programming IE. The lecture will then focus on the concept of a "document spanner" that models an IE program as a function that takes as input a text document and produces a relation of spans (intervals in the document) over a predefined schema. For example, a well-studied language for expressing spanners is that of the "regular" spanners: relational algebra over regular expressions with capture variables. The lecture will cover recent advances in the theory of document spanners, including their expressive power and computational complexity, aspects of incompleteness and inconsistency, integration with structured databases, and compilation into parallel executions over document fragments. Finally, the lecture will list relevant open problems and future directions, including aspects of uncertainty and explainability.
Some Clique Enumerations in Database Management
Nov 5, 2018
Database Uncertainty for Computational Social Choice
Sep 15, 2018
Probabilistic Database Repairing
February 14, 2018
This talk has been given in the MoDaS Workshop in Eilat, Israel, and discusses probabilistic notions of database repairs.