Publicaciones

Pablo Barceló

RL2, Publisher: Journal of Machine Learning Research, Link>

AUTHORS

Pablo Barceló, Javier Marinkovic, Jorge Pérez

ABSTRACT

Alternatives to recurrent neural networks, in particular, architectures based on self-attention, are gaining momentum for processing input sequences. In spite of their relevance, the computational properties of such networks have not yet been fully explored. We study the computational power of the Transformer, one of the most paradigmatic architectures exemplifying self-attention. We show that the Transformer with hard-attention is Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. Our study also reveals some minimal sets of elements needed to obtain this completeness result.


247 visualizaciones Ir a la publicación

RL2, Publisher: arXiv, Link>

AUTHORS

Leopoldo Bertossi, Marcelo Arenas, Mikaël Monet, Pablo Barceló

ABSTRACT

In Machine Learning, the SHAP-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing SHAP-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing SHAP-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of SHAP-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula φ and features x,y, whether the SHAP-score of x in φ is smaller than the SHAP-score of y in φ.


288 visualizaciones Ir a la publicación

RL2, Publisher: https://github.com/pdm-book/community

AUTHORS

Wim Martens, Marcelo Arenas, Leonid Libkin, Pablo Barceló, Andreas Pieris

Link>

AUTHORS

Wim Martens, Marcelo Arenas, Leonid Libkin, Pablo Barceló, Andreas Pieris

ABSTRACT

This is a release of parts 1, 2, and 4 of the upcoming book “Principles of Databases”, which will be about the foundational and mathematical principles of databases in its various forms. The first two parts focus on an overview of the relational model, and on processing some of the most commonly occurring relational queries. Forthcoming parts will focus on additional aspects of the relational model and will cover tree-structured and graph-structured data as well.


219 visualizaciones Ir a la publicación

RL2, Publisher: Proceedings of the AAAI Conference on Artificial Intelligence, Link>

AUTHORS

Marcelo Arenas, Mikaël Monet, Pablo Barceló, Leopoldo Bertossi

ABSTRACT

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P

AUTHORS

Marcelo Arenas, Mikaël Monet, Pablo Barceló, Leopoldo Bertossi

-hard).


281 visualizaciones Ir a la publicación

RL2, Publisher: Journal of Computer and System Sciences, Link>

AUTHORS

Victor Dalmau, Pablo Barceló, Alexander Baumgartner, Benny Kimelfeld

ABSTRACT

We consider the feature-generation task wherein we are given a database with entities A1:K111 as positive and negative examples, and we want to find feature queries that linearly separate the two sets of examples. We focus on conjunctive feature queries, and explore two problems: (a) deciding if separating feature queries exist (separability), and (b) generating such queries when they exist. To restrict the complexity of the generated classifiers, we explore various ways of regularizing them by limiting their dimension, the number of joins in feature queries, and their generalized hypertreewidth (ghw). We show that the separability problem is tractable for bounded ghw; yet, the generation problem is not because feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without explicitly generating such queries.


163 visualizaciones Ir a la publicación

RL2, Publisher: Advances in Neural Information Processing Systems, Link>

AUTHORS

Maksimilian Ryschkov, Pablo Barceló, Juan Reutter, Floris Geerts

ABSTRACT

Various recent proposals increase the distinguishing power of Graph Neural Networks (GNNs) by propagating features between k-tuples of vertices. The distinguishing power of these “higher-order” GNNs is known to be bounded by the k-dimensional Weisfeiler-Leman (WL) test, yet their O(n^k) memory requirements limit their applicability. Other proposals infuse GNNs with local higher-order graph structural information from the start, hereby inheriting the desirable O(n) memory requirement from GNNs at the cost of a one-time, possibly non-linear, preprocessing step. We propose local graph parameter enabled GNNs as a framework for studying the latter kind of approaches and precisely characterize their distinguishing power, in terms of a variant of the WL test, and in terms of the graph structural properties that they can take into account. Local graph parameters can be added to any GNN architecture, and are cheap to compute. In terms of expressive power, our proposal lies in the middle of GNNs and their higher-order counterparts. Further, we propose several techniques to aide in choosing the right local graph parameters. Our results connect GNNs with deep results in finite model theory and finite variable logics. Our experimental evaluation shows that adding local graph parameters often has a positive effect for a variety of GNNs, datasets and graph learning tasks.


127 visualizaciones Ir a la publicación

RL2, Publisher: Advances in Neural Information Processing Systems, Link>

AUTHORS

Marcelo Arenas, Pablo Barceló, Jorge Pérez, Daniel Baez, Bernardo Subercaseaux

ABSTRACT

Several queries and scores have recently been proposed to explain individual predictions over ML models. Examples include queries based on “anchors”, which are parts of an instance that are sufficient to justify its classification, and “feature-perturbation” scores such as SHAP. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic called FOIL, which allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and more general decision diagrams. Since the number of possible inputs for an ML model is exponential in its dimension, tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models, or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.


134 visualizaciones Ir a la publicación

RL2, Publisher: Iscience, Link>

AUTHORS

Marcelo Arenas, Martín Ugarte, Nicolás Salas, Susana Eyheramendy, Sandra Solari, Pedro Saa, Pedro Bahamondes, Pablo Barceló, Luis Méndez, Andrés Finkelstein-Kulka, Eduardo A Undurraga, avier Pizarro-Berdichevsky, Eduardo Agosin, Carolina López, Carlos Valencia

ABSTRACT

The sudden loss of smell is among the earliest and most prevalent symptoms of COVID-19 when measured with a clinical psychophysical test. Research has shown the potential impact of frequent screening for olfactory dysfunction, but existing tests are expensive and time consuming. We developed a low-cost ($0.50/test) rapid psychophysical olfactory test (KOR) for frequent testing and a model-based COVID-19 screening framework using a Bayes Network symptoms model. We trained and validated the model on two samples: suspected COVID-19 cases in five healthcare centers (n = 926; 33% prevalence, 309 RT-PCR confirmed) and healthy miners (n = 1,365; 1.1% prevalence, 15 RT-PCR confirmed). The model predicted COVID-19 status with 76% and 96% accuracy in the healthcare and miners samples, respectively (healthcare: AUC = 0.79 [0.75–0.82], sensitivity: 59%, specificity: 87%; miners: AUC = 0.71 [0.63–0.79], sensitivity: 40%, specificity: 97%, at 0.50 infection probability threshold). Our results highlight the potential for low-cost, frequent, accessible, routine COVID-19 testing to support society's reopening.


119 visualizaciones Ir a la publicación

RL2, Publisher: Proceedings of the AAAI Conference on Artificial Intelligence, Link>

AUTHORS

Mikaël Monet, Leopoldo Bertossi, Marcelo Arenas, Pablo Barceló

ABSTRACT

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P

AUTHORS

Mikaël Monet, Leopoldo Bertossi, Marcelo Arenas, Pablo Barceló

-hard).


92 visualizaciones Ir a la publicación