Talks
Enumeration Theory through the Lens of Database Challenges
Computational tasks that require producing many answers arise throughout data management systems: SQL engines return all qualifying tuples; search engines retrieve all relevant documents, albeit ranked; graph databases are tasked with producing all paths connecting two nodes; query optimizers routinely explore a vast space of candidate plans; and so on. Abstractly, these tasks are formalized as enumeration problems, solved by enumeration algorithms, and studied through the lens of enumeration complexity. This talk reflects on the evolution of enumeration in database research. It will begin with a brief survey of classic complexity-theoretic notions and algorithmic techniques that play a central role in enumeration across many domains. It will then review past results on enumeration problems arising in various areas of database systems. After that, it will delve deeper into the evolution of ideas and results on answer enumeration in relational queries. It will conclude with challenges for future directions.
Facets of Probabilistic Databases
Probabilistic databases are commonly known in the form of the tuple-independent model, where the validity of every tuple is an independent random event. Conceptually, the notion is more general, as a probabilistic database refers to any probability distribution over ordinary databases. A central computational problem is that of marginal inference for database queries: what is the probability that a given tuple is a query answer? In this talk, I will discuss recent developments in several research directions that, collectively, position probabilistic databases as the common and natural foundation of various challenges at the core of data analytics. Examples include reasoning about uncertain preferences from conventional distributions such as the Mallows model, data cleaning and repairing in probabilistic paradigms such as the HoloClean system, and the explanation of query answers through concepts from cooperative game theory such as the Shapley value and the Banzhaf Power Index. While these challenges manifest different facets of probabilistic databases, I will show how they interrelate and, moreover, how they relate to the basic theory of inference over tuple-independent databases. (video)
Logical Formalisms for Information Extraction
The abundance and availability of valuable textual resources position text analytics as a standard component in data-driven workflows. To facilitate the incorporation of such resources, a core operation is the extraction of structured data from text, a classic task known as Information Extraction (IE). The lecture will begin with a short overview of the algorithmic concepts and techniques used for performing IE tasks, including declarative frameworks that provide abstractions and infrastructures for programming IE. The lecture will then focus on the concept of a "document spanner" that models an IE program as a function that takes as input a text document and produces a relation of spans (intervals in the document) over a predefined schema. For example, a well-studied language for expressing spanners is that of the "regular" spanners: relational algebra over regular expressions with capture variables. The lecture will cover recent advances in the theory of document spanners, including their expressive power and computational complexity, aspects of incompleteness and inconsistency, integration with structured databases, and compilation into parallel executions over document fragments. Finally, the lecture will list relevant open problems and future directions, including aspects of uncertainty and explainability. (video)
Some Clique Enumerations in Database Management
Talk given at WEPA 2018, an international forum for researchers in the area of design, analysis, experimental evaluation and engineering of algorithms for enumeration problems.
Database Uncertainty for Computational Social Choice
Talk given at Colloquium Logicum 2018 in Bayreuth University.
Probabilistic Database Repairing
This talk has been given in the MoDaS Workshop in Eilat, Israel, and discusses probabilistic notions of database repairs.



