Felipe Del Río

Felipe Del Río

Cargo: Asesor de Proyectos Estratégicos

Especialidad: Investigación en deep learning y generalización de modelos.
Ingeniero Civil Industrial de la Pontificia Universidad Católica de Chile. Su investigación se centra en el estudio de la generalización en modelos de deep learning, con énfasis en transformers y otros modelos avanzados. Parte relevante de su trabajo aborda el impacto de los datos de entrenamiento en las capacidades de los modelos y en los sesgos de generalización que estos presentan.

PUBLICACIONES

Vision Language Models (VLMs) are designed to extend Large Language Models (LLMs) with visual capabilities, yet in this work we observe a surprising phenomenon: VLMs can outperform their underlying LLMs on purely text-only tasks, particularly in long-context information retrieval. To investigate this effect, we build a controlled synthetic retrieval task and find that a transformer trained only on text achieves perfect in-distribution accuracy but fails to generalize out of distribution, while subsequent training on an image-tokenized version of the same task nearly doubles text-only OOD performance. Mechanistic interpretability reveals that visual training changes the model's internal binding strategy: text-only training encourages positional shortcuts, whereas image-based training disrupts them through spatial translation invariance, forcing the model to adopt a more robust symbolic binding mechanism that persists even after text-only examples are reintroduced. We further characterize how binding strategies vary across training regimes, visual encoders, and initializations, and show that analogous shifts occur during pretrained LLM-to-VLM transitions. Our findings suggest that cross-modal training can enhance reasoning and generalization even for tasks grounded in a single modality.

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile