Pablo Barceló

Pablo Barceló

Especialidad: Graph neural networks, explainable AI, foundations of machine learning
Pablo Barceló es PhD in Computer Science por la University of Toronto, título que obtuvo en 2006. En el proyecto Expresividad de redes neuronales de grafos, caracterizó su expresividad en términos de lógicas, extendió la arquitectura base para detectar patrones complejos y diseñó e implementó arquitecturas para tratar grafos e hipergrafos con información semántica, caracterizando su capacidad de generalización. En Interpretabilidad de modelos de machine learning, conectó el nivel de explicabilidad con el costo computacional, analizó el costo de computar explicaciones en contextos deterministas y probabilísticos, y diseñó lenguajes para especificar, evaluar y computar explicaciones de forma flexible, robusta y eficiente. En Poder computacional de los Transformers, demostró que la arquitectura puede decidir funciones computables, encontró lenguajes basados en autómatas y lógicas temporales que capturan lo que ciertos componentes pueden hacer, y analizó el número de iteraciones necesarias para la composición de funciones.

PUBLICACIONES

The areas of safety and robustness are key areas where communities from verification, neuro-symbolic AI, and machine learning come together. Safety and robustness are often formalized in terms of point-wise metrics: given an input point, we identify a circle or a region where certain properties hold in terms of the consistency of prediction. However, the broader goal of neuro-symbolic AI applied to machine learning correctness would ideally integrate safety and robustness conditions with explanations. Nonetheless, there is no paper that discusses these properties in a unified manner. What we consider in this paper is a new simple framework for formalizing a variety of such properties. We are able to characterize the robustness condition, safety conditions, hyper-safety conditions, counterfactual explanations, and fairness, among others. We can express these properties using simple notation for an abstract model based on a binary classifier. We hope these definitions would lead to neuro-symbolic frameworks that contribute to all of these areas jointly.

The formal XAI community has studied a plethora of interpretability queries aiming to understand the classifcations made by decision trees. However, a more uniform understanding of what questions we can hope to answer about these models, traditionally deemed to be easily interpretable, has remained elusive. In an initial attempt to understand uniform languages for interpretability, Arenas et al. (2021) proposed FOIL, a logic for explaining black-box ML models, and showed that it can express a variety of interpretability queries. However, we show that FOIL is limited in two important senses: (i) it is not expressive enough to capture some crucial queries, and (ii) its model-agnostic nature results in a high computational complexity for decision trees. In this paper, we carefully craft two fragments of frst-order logic that allow for effciently interpreting decision trees: Q-DT-FOIL and its optimization variant OPT-DT-FOIL. We show that our proposed logics can express not only a variety of interpretability queries considered by previous literature but also elegantly allows users to specify different objectives the sought explanations should optimize for. Using fnite model-theoretic techniques, we show that the different ingredients of Q-DT-FOIL are necessary for its expressiveness, and yet that queries in QDT-FOIL can be evaluated with a polynomial number of queries to a SAT solver, as well as their optimization versions in OPT-DT-FOIL. Besides our theoretical results, we provide a SAT-based implementation of the evaluation for OPT-DTFOIL that is performant on industry-size decision trees.

Publisher: Journal of Machine Learning Research, Link>

ABSTRACT

Alternatives to recurrent neural networks, in particular, architectures based on self-attention, are gaining momentum for processing input sequences. In spite of their relevance, the computational properties of such networks have not yet been fully explored. We study the computational power of the Transformer, one of the most paradigmatic architectures exemplifying self-attention. We show that the Transformer with hard-attention is Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. Our study also reveals some minimal sets of elements needed to obtain this completeness result.


[:en]Publisher: ACM Transactions on Computational Logic (TOCL), Link>

ABSTRACT

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q, we consider the following two problems: Given as input an incomplete database D, (a) return the number of completions of D that satisfy q; or (b) return the number of valuations of the nulls of D yielding a completion that satisfies q. We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when q is a self-join–free conjunctive query and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations: For instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption. Moreover, we find that both (1) and (2) can reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial-time randomized approximation scheme (FPRAS), in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

[:]

The notion of rank of a Boolean function has been a cornerstone in PAC learning, enabling quasipolynomial-time learning algorithms for polynomial-size decision trees. We present a novel characterization of rank, grounded in the well-known Transformer architecture. We show that the rank of a function corresponds to the minimum number of Chain of Thought (CoT) steps required by a single-layer Transformer with hard attention to compute . Based on this characterization we establish tight bounds on the number of CoT steps required for specific problems, showing that-fold function composition necessitates exactly CoT steps. Furthermore, we analyze the problem of identifying the position of the-th occurrence of 1 in a Boolean sequence, proving that it requires CoT steps.

The notion of \emph{rank} of a Boolean function has been a cornerstone in PAC learning theory, enabling quasipolynomial-time learning algorithms for polynomial-size decision trees. We present a novel characterization of rank, grounded in the well-known Transformer architecture. We show that the rank of a function f corresponds to the minimum number of \emph{Chain of Thought} (CoT) steps required by a single-layer Transformer with hard attention to compute f. Based on this characterization we establish tight bounds on the number of CoT steps required for specific problems, showing that \(\ell\)-fold function composition necessitates exactly \(\ell\) CoT steps. Furthermore, we analyze the problem of identifying the position of the \(k\)-th occurrence of 1 in a Boolean sequence, proving that it requires \(k\) CoT steps. Finally, we introduce the notion of the multi-head rank that captures multi-head single-layer transformers, and perform the analysis of PAC-learnability of the classes of functions with bounded multi-head rank.

Despite the wide use of k-Nearest Neighbors as classification models, their explainability properties remain poorly understood from a theoretical perspective. While nearest neighbors classifiers offer interpretability from a ''data perspective'', in which the classification of an input vector x is explained by identifying the vectors v1, ..., vk in the training set that determine the classification of x, we argue that such explanations can be impractical in high-dimensional applications, where each vector has hundreds or thousands of features and it is not clear what their relative importance is. Hence, we focus on understanding nearest neighbor classifications through a ''feature perspective'', in which the goal is to identify how the values of the features in x affect its classification. Concretely, we study abductive explanations such as ''minimum sufficient reasons'', which correspond to sets of features in x that are enough to guarantee its classification, and counterfactual explanations based on the minimum distance feature changes one would have to perform in x to change its classification. We present a detailed landscape of positive and negative complexity results for counterfactual and abductive explanations, distinguishing between discrete and continuous feature spaces, and considering the impact of the choice of distance function involved. Finally, we show that despite some negative complexity results, Integer Quadratic Programming and SAT solving allow for computing explanations in practice.

Publisher: Advances in Neural Information Processing Systems, Link>

ABSTRACT

Several queries and scores have recently been proposed to explain individual predictions over ML models. Examples include queries based on “anchors”, which are parts of an instance that are sufficient to justify its classification, and “feature-perturbation” scores such as SHAP. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic called FOIL, which allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and more general decision diagrams. Since the number of possible inputs for an ML model is exponential in its dimension, tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models, or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.


Publisher: Advances in Neural Information Processing Systems, Link>

ABSTRACT

Various recent proposals increase the distinguishing power of Graph Neural Networks (GNNs) by propagating features between k-tuples of vertices. The distinguishing power of these “higher-order” GNNs is known to be bounded by the k-dimensional Weisfeiler-Leman (WL) test, yet their O(n^k) memory requirements limit their applicability. Other proposals infuse GNNs with local higher-order graph structural information from the start, hereby inheriting the desirable O(n) memory requirement from GNNs at the cost of a one-time, possibly non-linear, preprocessing step. We propose local graph parameter enabled GNNs as a framework for studying the latter kind of approaches and precisely characterize their distinguishing power, in terms of a variant of the WL test, and in terms of the graph structural properties that they can take into account. Local graph parameters can be added to any GNN architecture, and are cheap to compute. In terms of expressive power, our proposal lies in the middle of GNNs and their higher-order counterparts. Further, we propose several techniques to aide in choosing the right local graph parameters. Our results connect GNNs with deep results in finite model theory and finite variable logics. Our experimental evaluation shows that adding local graph parameters often has a positive effect for a variety of GNNs, datasets and graph learning tasks.


We explore different graph construction strategies for relational deep learning, revealing new ways to optimize representation efficiency and expressivity. Our work extends existing methods and highlights new research directions in relational learning

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile