Facets of Probabilistic Databases
March 30th, 2020
EDBT/ICDT 2020 Joint Conference
Probabilistic databases are commonly known in the form of the tuple-independent model, where the validity of every tuple is an independent random event. Conceptually, the notion is more general, as a probabilistic database refers to any probability distribution over ordinary databases. A central computational problem is that of marginal inference for database queries: what is the probability that a given tuple is a query answer? In this talk, I will discuss recent developments in several research directions that, collectively, position probabilistic databases as the common and natural foundation of various challenges at the core of data analytics. Examples include reasoning about uncertain preferences from conventional distributions such as the Mallows model, data cleaning and repairing in probabilistic paradigms such as the HoloClean system, and the explanation of query answers through concepts from cooperative game theory such as the Shapley value and the Banzhaf Power Index. While these challenges manifest different facets of probabilistic databases, I will show how they interrelate and, moreover, how they relate to the basic theory of inference over tuple-independent databases. (video)
Logical Formalisms for Information Extraction
September 2nd, 2019
EDBT Summer School 2019, Lyon, Saint Germain au Mont D'Or, France
The abundance and availability of valuable textual resources position text analytics as a standard component in data-driven workflows. To facilitate the incorporation of such resources, a core operation is the extraction of structured data from text, a classic task known as Information Extraction (IE). The lecture will begin with a short overview of the algorithmic concepts and techniques used for performing IE tasks, including declarative frameworks that provide abstractions and infrastructures for programming IE. The lecture will then focus on the concept of a "document spanner" that models an IE program as a function that takes as input a text document and produces a relation of spans (intervals in the document) over a predefined schema. For example, a well-studied language for expressing spanners is that of the "regular" spanners: relational algebra over regular expressions with capture variables. The lecture will cover recent advances in the theory of document spanners, including their expressive power and computational complexity, aspects of incompleteness and inconsistency, integration with structured databases, and compilation into parallel executions over document fragments. Finally, the lecture will list relevant open problems and future directions, including aspects of uncertainty and explainability. (video)  
Some Clique Enumerations in Database Management
Nov 5, 2018
Pisa, Italy
Talk given at WEPA 2018, an international forum for researchers in the area of design, analysis, experimental evaluation and engineering of algorithms for enumeration problems.  
Database Uncertainty for Computational Social Choice
Sep 15, 2018
Bayreuth, Germany
Talk given at Colloquium Logicum 2018 in Bayreuth University.  
Probabilistic Database Repairing
February 14, 2018
Eilat, Israel
This talk has been given in the MoDaS Workshop in Eilat, Israel, and discusses probabilistic notions of database repairs.