Publicaciones

RL2, Publisher: https://github.com/pdm-book/community

AUTHORS

Wim Martens, Marcelo Arenas, Leonid Libkin, Pablo Barceló, Andreas Pieris

Link>

AUTHORS

Wim Martens, Marcelo Arenas, Leonid Libkin, Pablo Barceló, Andreas Pieris

ABSTRACT

This is a release of parts 1, 2, and 4 of the upcoming book “Principles of Databases”, which will be about the foundational and mathematical principles of databases in its various forms. The first two parts focus on an overview of the relational model, and on processing some of the most commonly occurring relational queries. Forthcoming parts will focus on additional aspects of the relational model and will cover tree-structured and graph-structured data as well.


219 visualizaciones Ir a la publicación

RL2, Publisher: Proceedings of the AAAI Conference on Artificial Intelligence, Link>

AUTHORS

Marcelo Arenas, Leopoldo Bertossi, Mikaël Monet, Pablo Barceló

ABSTRACT

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P

AUTHORS

Marcelo Arenas, Leopoldo Bertossi, Mikaël Monet, Pablo Barceló

-hard).


281 visualizaciones Ir a la publicación

RL2, Publisher: Journal of Computer and System Sciences, Link>

AUTHORS

Victor Dalmau, Pablo Barceló, Alexander Baumgartner, Benny Kimelfeld

ABSTRACT

We consider the feature-generation task wherein we are given a database with entities A1:K111 as positive and negative examples, and we want to find feature queries that linearly separate the two sets of examples. We focus on conjunctive feature queries, and explore two problems: (a) deciding if separating feature queries exist (separability), and (b) generating such queries when they exist. To restrict the complexity of the generated classifiers, we explore various ways of regularizing them by limiting their dimension, the number of joins in feature queries, and their generalized hypertreewidth (ghw). We show that the separability problem is tractable for bounded ghw; yet, the generation problem is not because feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without explicitly generating such queries.


163 visualizaciones Ir a la publicación

RL2, Publisher: Advances in Neural Information Processing Systems, Link>

AUTHORS

Maksimilian Ryschkov, Pablo Barceló, Juan Reutter, Floris Geerts

ABSTRACT

Various recent proposals increase the distinguishing power of Graph Neural Networks (GNNs) by propagating features between k-tuples of vertices. The distinguishing power of these “higher-order” GNNs is known to be bounded by the k-dimensional Weisfeiler-Leman (WL) test, yet their O(n^k) memory requirements limit their applicability. Other proposals infuse GNNs with local higher-order graph structural information from the start, hereby inheriting the desirable O(n) memory requirement from GNNs at the cost of a one-time, possibly non-linear, preprocessing step. We propose local graph parameter enabled GNNs as a framework for studying the latter kind of approaches and precisely characterize their distinguishing power, in terms of a variant of the WL test, and in terms of the graph structural properties that they can take into account. Local graph parameters can be added to any GNN architecture, and are cheap to compute. In terms of expressive power, our proposal lies in the middle of GNNs and their higher-order counterparts. Further, we propose several techniques to aide in choosing the right local graph parameters. Our results connect GNNs with deep results in finite model theory and finite variable logics. Our experimental evaluation shows that adding local graph parameters often has a positive effect for a variety of GNNs, datasets and graph learning tasks.


127 visualizaciones Ir a la publicación

RL2, Publisher: Advances in Neural Information Processing Systems, Link>

AUTHORS

Marcelo Arenas, Jorge Pérez, Pablo Barceló, Bernardo Subercaseaux, Daniel Baez

ABSTRACT

Several queries and scores have recently been proposed to explain individual predictions over ML models. Examples include queries based on “anchors”, which are parts of an instance that are sufficient to justify its classification, and “feature-perturbation” scores such as SHAP. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic called FOIL, which allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and more general decision diagrams. Since the number of possible inputs for an ML model is exponential in its dimension, tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models, or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.


134 visualizaciones Ir a la publicación

RL2, Publisher: Iscience, Link>

AUTHORS

Marcelo Arenas, Martín Ugarte, Luis Méndez, Susana Eyheramendy, Sandra Solari, Pedro Saa, Pedro Bahamondes, Pablo Barceló, Nicolás Salas, Andrés Finkelstein-Kulka, Carlos Valencia, Carolina López, Eduardo A Undurraga, Eduardo Agosin, avier Pizarro-Berdichevsky

ABSTRACT

The sudden loss of smell is among the earliest and most prevalent symptoms of COVID-19 when measured with a clinical psychophysical test. Research has shown the potential impact of frequent screening for olfactory dysfunction, but existing tests are expensive and time consuming. We developed a low-cost ($0.50/test) rapid psychophysical olfactory test (KOR) for frequent testing and a model-based COVID-19 screening framework using a Bayes Network symptoms model. We trained and validated the model on two samples: suspected COVID-19 cases in five healthcare centers (n = 926; 33% prevalence, 309 RT-PCR confirmed) and healthy miners (n = 1,365; 1.1% prevalence, 15 RT-PCR confirmed). The model predicted COVID-19 status with 76% and 96% accuracy in the healthcare and miners samples, respectively (healthcare: AUC = 0.79 [0.75–0.82], sensitivity: 59%, specificity: 87%; miners: AUC = 0.71 [0.63–0.79], sensitivity: 40%, specificity: 97%, at 0.50 infection probability threshold). Our results highlight the potential for low-cost, frequent, accessible, routine COVID-19 testing to support society's reopening.


119 visualizaciones Ir a la publicación

RL5, Publisher: IET, Link>

AUTHORS

Marcelo Mendoza, Fernanda Weiss, Evangelos Milios

ABSTRACT

The automatic detection of rumors in social networks has gained considerable attention from researchers and practitioners during the last decade, due to the consequences of the spread of disinformation in public opinion. Most of the existing methods make use of features extracted from conversational threads, user profiles, and structural information of the network. These features are difficult to capture in practice and are often only partially available during the spread of rumors. In this paper, we study an unexplored approach in rumor detection: time series classification (TSC). By modeling the problem using time series, we avoid using lexical or structural characteristics of the network. Instead, we use information that is simpler to capture, such as the volume of tweets and the number of followers and followees of the users involved in a story. In this way, the characterization of the story is not related to specific users, but to variables aggregated at the event level.We introduce a TSC-based model for detecting rumors based on hypergraph partitioning, aligning time series prototypes with rumor classes. Our approach uses a Siamese network to train a rumor detection model in a supervised way, minimizing the distance between the time series of the training examples and the prototypes of their class. Results on benchmark data show that our approach surpasses other TSC-based methods in detecting rumors. Also, we compare our methods performance with methods that make use of lexical and structural characteristics. Our experiments show that our method has advantages in time-sensitive contexts, outperforming the state of the art in early detection scenarios with incomplete information.


131 visualizaciones Ir a la publicación

RL5, Publisher: PloS one, Link>

AUTHORS

Marcelo Mendoza, Naim Bro

ABSTRACT

Based on a geocoded registry of more than four million residents of Santiago, Chile, we build two surname-based networks that reveal the city’s population structure. The first network is formed from paternal and maternal surname pairs. The second network is formed from the isonymic distances between the city’s neighborhoods. These networks uncover the city’s main ethnic groups and their spatial distribution. We match the networks to a socioeconomic index, and find that surnames of high socioeconomic status tend to cluster, be more diverse, and occupy a well-defined quarter of the city. The results are suggestive of a high degree of urban segregation in Santiago.


117 visualizaciones Ir a la publicación

RL5, Publisher:, Link>

AUTHORS

Marcelo Mendoza, Mauricio Solar, Mauricio Araya, Gabriel Molina, Víctor Castañeda, Ignacio Loayza, Camilo Núñez

ABSTRACT

Medical images are an essential input for the timely diagnosis of pathologies. Despite its wide use in the area, searching for images that can reveal valuable information to support decision-making is difficult and expensive. However, the possibilities that open when making large repositories of images available for search by content are unsuspected. We designed a content-based image retrieval system for medical imaging, which reduces the gap between access to information and the availability of useful repositories to meet these needs. The system operates on the principle of query-by-example, in which users provide medical images, and the system displays a set of related images. Unlike metadata match-driven searches, our system drives content-based search. This allows the system to conduct searches on repositories of medical images that do not necessarily have complete and curated metadata. We explore our system’s feasibility in computational tomography (CT) slices for SARS-CoV-2 infection (COVID-19), showing that our proposal obtains promising results, advantageously comparing it with other search methods.


135 visualizaciones Ir a la publicación