Sebastián Ferrada

Sebastián Ferrada

Especialidad: bases de datos de grafos, bases de datos multimedia, recuperación de información, consultas por similitud en la web
Sebastián es doctor en ciencias de la computación de la Universidad de Chile y actualmente es profesor asistente en la Iniciativa de Datos e Inteligencia Artificial de la misma universidad. Además, hizo un postdoctorado en la Universidad de Linköping, en Suecia. Su investigación se centra en bases de datos de grafos, bases de datos multimedia, recuperación de información y consultas por similitud en la Web. Ha sido galardonado con el Best Paper Award en la International Conference on Cooperative Information Systems 2023, el Best Student Resource Paper Award en la International Semantic Web Conference 2017 y con el primer premio en el XXV Concurso Latinoamericano de Tesis de Magíster del CLEI. Además, cuenta con una sólida trayectoria docente en áreas de bases de datos y recuperación de información.

PUBLICACIONES

One of the main challenges in working with RDF data is its verbosity, as repeated IRIs and IRI prefixes lead to large files that are costly to store and process. HDT, a binary RDF format, addresses this by compressing data while supporting efficient triple pattern evaluation without decompression. However, its performance is highly dependent on index alignment with query patterns. In this paper, we introduce COTTAS, a storage model that encodes RDF graphs directly into the open-source Apache Parquet columnar format. COTTAS represents RDF as a triple table and leverages block range indexes (zone maps) to achieve high compression ratios and fast query execution over compressed data. We also provide pycottas, an open-source Python library that enables compression of RDF data into COTTAS format and supports efficient querying by translating triple patterns into SQL queries over COTTAS files. This implementation facilitates the adoption of COTTAS for managing RDF graphs. Experiments on the WDBench and DBpedia benchmarks show that COTTAS reduces storage requirements by around 50% with respect to HDT and exhibits competitive triple pattern evaluation, with less performance volatility across pattern types.

Extracting information from knowledge graphs is a significant algorithmic challenge, especially when dealing with multimodal knowledge graphs that integrate images, text, and/or videos. While current graph management systems can efficiently evaluate graph queries, they struggle with multimedia data. To address this, systems rely on metadata, such as vector embeddings, for similarity search. While both graph pattern evaluation and similarity search work well independently, real-world applications often require their combination to retrieve media based on both the graph structure and specific similarity criteria. This paper studies the problem of querying multimodal knowledge graphs by combining graph patterns with similarity constraints. We formalize this as an extraction task where some nodes in the graph pattern are filtered by similarity, and then the results must be ordered by a similarity score. While a straightforward approach is to evaluate the graph pattern first and then sort by similarity, we introduce alternative algorithms that evaluate both tasks jointly, leveraging indices for efficient similarity computation. Our implementation employs an approximate version of these indices, and our experiments show that graph database systems can efficiently integrate semantic similarity constraints into their queries.

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile