Publicaciones

Leopoldo Bertossi

RL2, Publisher: arXiv, Link>

AUTHORS

Leopoldo Bertossi, Marcelo Arenas, Mikaël Monet, Pablo Barceló

ABSTRACT

In Machine Learning, the SHAP-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing SHAP-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing SHAP-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of SHAP-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula φ and features x,y, whether the SHAP-score of x in φ is smaller than the SHAP-score of y in φ.


288 visualizaciones Ir a la publicación

RL2, Publisher: Proceedings of the AAAI Conference on Artificial Intelligence, Link>

AUTHORS

Marcelo Arenas, Leopoldo Bertossi, Mikaël Monet, Pablo Barceló

ABSTRACT

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P

AUTHORS

Marcelo Arenas, Leopoldo Bertossi, Mikaël Monet, Pablo Barceló

-hard).


281 visualizaciones Ir a la publicación

RL2, Publisher: Carleton University School of Computer Science Ottawa, Link>

AUTHORS

Leopoldo Bertossi

ABSTRACT

Virtual Data Integration Page 1 Virtual Data Integration Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada www.scs.carleton.ca/ ∼ bertossi [email protected] Page 2 Chapter 1: Introduction and Issues Page 3 3 Data in Different Forms There is a large and increasing number of data sources People and companies need to integrate data and the systems that handle that data Data in DBMSs: relational, OO, XML, ... Legacy data in different formats and systems Text files repositories Spread sheets Page 4 4 Data on the Web • Non- and semi-structured data in the WWW: Plain text files, HTML text files, native XML • Organizational databases • Libraries, catalogues, etc. • Data in research repositories: genome databases, scientific databases, environmental databases, etc. • Web services • Semantic Web • Knowledge-Based Systems • Ontologies: Structurally and semantically reach do…

 

110 visualizaciones Ir a la publicación

RL2, Publisher: Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Link>

AUTHORS

Leopoldo Bertossi, Gyula OH Katona, Klaus-Dieter Schewe, Bernhard Thalheim

ABSTRACT

In the early days of database research, issues related to database semantics played a prominent role, and papers discussing database models, conceptual design, integrity constraints, normalization often dominated major database conferences. This began to change more than a decade ago, and nowadays those issues do not appear to be part of the mainstream database research. This situation is caused by several reasons: the problem began to be too difficult, the community was hoping on solutions based on better database models, the variety of buzzwords for new models and approaches required foundation and clarification, the problems raised by application required a lot of research, the pending hope on a universal, simple and genius solutions. Nevertheless the semantical foundations are left open for most of the modern database models or are not existing for models such as UML or XML. At the same time, the community was forgetting the achievements of the early database research. The seminar" Semantics in databases" will be a forum for researchers still interested in database semantics and which can contribute on the basis of research on database semantics and modern approaches of logics, algebra and combinatorics to the solution of very difficult problems raised in application. The first workshop" Semantics in Databases" has been organized in Rez in January 1995. The results of the discussions of the workshop have been summarized in the post workshop proceedings published in LNCS 1358. Semantics of databases and information systems can be based on approaches which have been developed and successfully used …


129 visualizaciones Ir a la publicación

RL2, Publisher: Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Link>

AUTHORS

Torsten Schaub, Philippe Besnard, Leopoldo Bertossi, Anthony Hunter

ABSTRACT

Database, Knowledgebase and Software systems, or their logical specifications, may become inconsistent in the sense of containing contradictory pieces of information. Since these types of technology are at some level based on classical logic, there is the major problem that in classical logic, any formula is implied by a contradiction. This therefore raises the need to circumvent this fundamental property of classical logic whilst supporting as much as possible of classical logic for these technologies. To address this, several new logics, with new formalisms, semantics and/or deductive systems, that can accommodate classical inconsistencies without becoming trivial, have been proposed. These logics are starting to be used in databases, knowledgebases and software specifications. In addition, we need strategies for analysing inconsistent information. This need has in part driven the approach of argumentation systems which compare pros and cons for potential conclusions from conflicting information. Also important are strategies for isolating inconsistency and for taking appropriate actions, including resolution actions. This calls for uncertainty reasoning and meta-level reasoning. Furthermore, the cognitive activities involved in reasoning with inconsistent information need to be directly related to the kind of inconsistency. So, in general, we see the need for inconsistency tolerance giving rise to a range of technologies for inconsistency management. We are now at an exciting stage in this direction. Rich foundations are being established, and a number of interesting and complementary application areas are being explored in decision-support …


118 visualizaciones Ir a la publicación

RL2, Publisher: Knowledge and Information Systems, Link>

AUTHORS

Leopoldo Bertossi

ABSTRACT

There is a recently established correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints. In this work, answer-set programs that specify database repairs are used as a basis for solving computational and reasoning problems around causality in databases, including causal responsibility. Furthermore, causes are introduced also at the attribute level by appealing to an attribute-based repair semantics that uses null values. Corresponding repair-programs are introduced, and used as a basis for computation and reasoning about attribute-level causes. The answer-set programs are extended in order to capture causality under integrity constraints.


87 visualizaciones Ir a la publicación

RL2, Publisher: Proceedings of the AAAI Conference on Artificial Intelligence, Link>

AUTHORS

Leopoldo Bertossi, Marcelo Arenas, Mikaël Monet, Pablo Barceló

ABSTRACT

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P

AUTHORS

Leopoldo Bertossi, Marcelo Arenas, Mikaël Monet, Pablo Barceló

-hard).


92 visualizaciones Ir a la publicación

RL2, Publisher: ACM SIGMOD Record, Link>

AUTHORS

Leopoldo Bertossi, Moshe Sebag, Ester Livshits, Benny Kimelfeld

ABSTRACT

Database tuples can be seen as players in the game of jointly realizing the answer to a query. Some tuples may contribute more than others to the outcome, which can be a binary value in the case of a Boolean query, a number for a numerical aggregate query, and so on. To quantify the contributions of tuples, we use the Shapley value that was introduced in cooperative game theory and has found applications in a plethora of domains. Specifically, the Shapley value of an individual tuple quantifies its contribution to the query. We investigate the applicability of the Shapley value in this setting, as well as the computational aspects of its calculation in terms of complexity, algorithms, and approximation.


99 visualizaciones Ir a la publicación

RL2, Publisher: arXiv, Link>

AUTHORS

Leopoldo Bertossi

ABSTRACT

We describe some recent approaches to score-based explanations for query answers in databases and outcomes from classification models in machine learning. The focus is on work done by the author and collaborators. Special emphasis is placed on declarative approaches based on answer-set programming to the use of counterfactual reasoning for score specification and computation. Several examples that illustrate the flexibility of these methods are shown.


132 visualizaciones Ir a la publicación

RL2, Publisher: arXiv, Link>

AUTHORS

Leopoldo Bertossi, Gabriela Reyes

ABSTRACT

We describe how answer-set programs can be used to declaratively specify counterfactual interventions on entities under classification, and reason about them. In particular, they can be used to define and compute responsibility scores as attribution-based explanations for outcomes from classification models. The approach allows for the inclusion of domain knowledge and supports query answering. A detailed example with a naive-Bayes classifier is presented.


100 visualizaciones Ir a la publicación