Valentin Barriere

Valentin Barriere

Especialidad: Procesamiento de lenguaje natural
Valentín Barriere obtuvo un PhD in Computer Science de Télécom Paris en 2019. Es Investigador Principal (PI) en varios proyectos, incluyendo fAIrefighter (detección temprana y predicción de incendios forestales), Wwf – wildfire watch and fight (plataforma de IA para detección temprana de incendios), Proyecto IdeA Tecnología Avanzada, Multimodal Argumentation Mining in Groups Assisted by an Embodied Conversational Agent, y Multicultural Bias Recognition to Detect and Mitigate Racism, Xenophobia and Geographic Inequalities in Multilingual Large Language Models. También es Co-investigador en el proyecto CopernicusLAC.  

PUBLICACIONES

In this paper, we apply a method to quantify biases associated with named entities from various countries. We create counterfactual examples with small perturbations on target-domain data instead of relying on templates or specific datasets for bias detection. On widely used classifiers for subjectivity analysis, including sentiment, emotion, hate speech, and offensive text using Twitter data, our results demonstrate positive biases related to the language spoken in a country across all classifiers studied. Notably, the presence of certain country names in a sentence can strongly influence predictions, up to a 23% change in hate speech detection and up to a 60% change in the prediction of negative emotions such as anger. We hypothesize that these biases stem from the training data of pre-trained language models (PLMs) and find correlations between affect predictions and PLMs likelihood in English and unknown languages like Basque and Maori, revealing distinct patterns with exacerbate correlations. Further, we followed these correlations in-between counterfactual examples from a same sentence to remove the syntactical component, uncovering interesting results suggesting the impact of the pre-training data was more important for English-speaking-country names. Our anonymized code is available here.

Numerous datasets have been proposed to evaluate social bias in Natural Language Processing (NLP) systems. However, assessing bias within specific application domains remains challenging, as existing approaches often face limitations in scalability and fidelity across domains. In this work, we introduce a domain-adaptive framework that utilizes prompting with Large Language Models (LLMs) to automatically transform template-based bias datasets into domain-specific variants. We apply our method to two widely used benchmarks—Equity Evaluation Corpus (EEC) and Identity Phrase Templates Test Set (IPTTS)—adapting them to the Twitter and Wikipedia Talk data. Our results show that the adapted datasets yield bias estimates more closely aligned with real-world data. These findings highlight the potential of LLM-based prompting to enhance the realism and contextual relevance of bias evaluation in NLP systems.

Publisher:  Remote Sensing of Environment Link>

ABSTRACT

Accurate early-season crop type classification is crucial for the crop production estimation and monitoring of agricultural parcels. However, the complexity of the plant growth patterns and their spatio-temporal variability present significant challenges. While current deep learning-based methods show promise in crop type classification from single- and multi-modal time series, most existing methods rely on a single modality, such as satellite optical remote sensing data or crop rotation patterns. We propose a novel approach to fuse multimodal information into a model for improved accuracy and robustness across multiple crop seasons and countries. The approach relies on three modalities used: remote sensing time series from Sentinel-2 and Landsat 8 observations, parcel crop rotation and local crop distribution. To evaluate our approach, we release a new annotated dataset of 7.4 million agricultural parcels in France (FR) and the Netherlands (NL). We associate each parcel with time-series of surface reflectance (Red and NIR) and biophysical variables (LAI, FAPAR). Additionally, we propose a new approach to automatically aggregate crop types into a hierarchical class structure for meaningful model evaluation and a novel data-augmentation technique for early-season classification. Performance of the multimodal approach was assessed at different aggregation levels in the semantic domain, yielding to various ranges of the number of classes spanning from 151 to 8 crop types or groups. It resulted in accuracy ranging from 91% to 95% for the NL dataset and from 85% to 89% for the FR dataset. Pre-training on a dataset improves transferability between countries, allowing for cross- domain and label prediction, and robustness of the performances in a few-shot setting from FR to NL, i.e., when the domain changes as per with significantly new labels. Our proposed approach outperforms comparable methods by enabling deep learning methods to use the often overlooked spatio-temporal context of parcels, resulting in increased precision and generalization capacity.

This paper presents the results of the WASSA 2024 shared task on predicting empathy, emotion, and personality in conversations and reactions to news articles. Participating teams were given access to a new, unpublished extension of the WASSA 2023 shared task dataset. This task is both multi-level and multi-modal: data is available at the person, essay, dialog, and dialog-turn levels and includes formal (news articles) and informal text (essays and dialogs), self-report data (personality and distress), and third-party annotations (empathy and emotion). The shared task included a new focus on conversations between humans and LLM-based virtual agents which occur immediately after reading and reacting to the news articles. Participants were encouraged to explore the multi-level and multi-modal nature of this data. Participation was encouraged in four tracks: (i) predicting the perceived empathy at the dialog level, (ii) predicting turn-level empathy, emotion polarity, and emotion intensity in conversations, (iii) predicting state empathy and distress scores, and (iv) predicting personality. In total, 14 teams participated in the shared task. We summarize the methods and resources used by the participating teams.

Fine-tuning foundation models is a key step in adapting them to a particular task. In the case of Geospatial Foundation Models (GFMs), fine-tuning can be particularly challenging given data scarcity both in terms of the amount of labeled data and, in the case of Satellite Image Time Series (SITS), temporal context. Under these circumstances, the optimal GFM fine-tuning strategy across different labeled data regimes remains poorly understood. In this paper, we thoroughly assess and study the performances of two different GFMs given several combinations of two data scarcity factors: the number of labeled samples and the sequence length. Specifically, we analyze the performances on a crop classification task, particularly, semantic segmentation of the Sentinel-2 images contained in the PASTIS-HD dataset. We compare GFMs to U-TAE, as a fully supervised baseline, across varying amounts of labeled data (1%, 10%, 50%, 100%) and temporal input lengths (1, 6, 15, 25 and 35). Among these explorations, we find that using a smaller learning rate for the pre-trained encoders improves performance in moderate and high data regimes (50%-100%). In contrast, full fine-tuning outperforms partial fine-tuning in very low-label settings (1%-10%). This behavior suggests a nuanced trade-off between feature reuse and adaptation that defies the intuition of standard transfer learning.

Automatic Short Answer Grading (ASAG) refers to automated scoring of open-ended textual responses to specific questions, both in natural language form. In this paper, we propose a method to tackle this task in a setting where annotated data is unavailable. Crucially, our method is competitive with the state-of-theart while being lighter and interpretable. We crafted a unique dataset containing a highly diverse set of questions and a small amount of answers to these questions; making it more challenging compared to previous tasks. Our method uses weak labels generated from other methods proven to be effective in this task, which are then used to train a white-box (linear) regression based on a few interpretable features. The latter are extracted expert features and learned representations that are interpretable per se and aligned with manual labeling. We show the potential of our method by evaluating it on a small annotated portion of the dataset, and demonstrate that its ability compares with that of strong baselines and state-of-the-art methods, comprising an LLM that in contrast to our method comes with a high computational price and an opaque reasoning process. We further validate our model on a public Automatic Essay Scoring dataset in English, and obtained competitive results compared to other unsupervised baselines, outperforming the LLM. To gain further insights of our method, we conducted an interpretability analysis revealing sparse weights in our linear regression model, and alignment between our features and human ratings.

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile