Publicaciones

Barbara Poblete

RL5, Publisher: , Link>

AUTHORS

Hernan Sarmiento, Barbara Poblete

ABSTRACT

Valuable and timely information about crisis situations such as natural disasters, can be rapidly obtained from user-generated content in social media. This has created an emergent research field that has focused mostly on the problem of filtering and classifying potentially relevant messages during emergency situations. However, we believe important insight can be gained from studying online communications during disasters at a more comprehensive level. In this sense, a higher-level analysis could allow us to understand if there are collective patterns associated to certain characteristics of events. Following this motivation, we present a novel comparative analysis of 41 real-world crisis events. This analysis is based on textual and linguistic features of social media messages shared during these crises. For our comparison we considered hazard categories (i.e., human-induced and natural crises) as well as subcategories (i.e., intentional, accidental and so forth). Among other things, our results show that using only a small set of textual features, we can differentiate among types of events with 75% accuracy. Indicating that there are clear patterns in how people react to different extreme situations, depending on, for example, whether the event was triggered by natural causes or by human action. These findings have implications from a crisis response perspective, as they will allow experts to foresee patterns in emerging situations, even if there is no prior experience with an event of such characteristics.1


10 visualizaciones Ir a la publicación

RL5, Publisher: arXiv, Link>

AUTHORS

Aymé Arango, Jorge Pérez, Barbara Poblete

ABSTRACT

Automatic hate speech detection in online social networks is an important open problem in Natural Language Processing (NLP). Hate speech is a multidimensional issue, strongly dependant on language and cultural factors. Despite its relevance, research on this topic has been almost exclusively devoted to English. Most supervised learning resources, such as labeled datasets and NLP tools, have been created for this same language. Considering that a large portion of users worldwide speak in languages other than English, there is an important need for creating efficient approaches for multilingual hate speech detection. In this work we propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language, and to determine effective ways to achieve this. We propose a hate specific data representation and evaluate its effectiveness against general-purpose universal representations most of which, unlike our proposed model, have been trained on massive amounts of data. We focus on a cross-lingual setting, in which one needs to classify hate speech in one language without having access to any labeled data for that language. We show that the use of our simple yet specific multilingual hate representations improves classification results. We explain this with a qualitative analysis showing that our specific representation is able to capture some common patterns in how hate speech presents itself in different languages. Our proposal constitutes, to the best of our knowledge, the first attempt for constructing multilingual specific-task representations. Despite its simplicity, our model outperformed the previous approaches for most of the experimental setups. Our findings can orient future solutions toward the use of domain-specific representations.


15 visualizaciones Ir a la publicación

AUTHORS

Juglar Diaz, Felipe Bravo, Barbara Poblete

Abstract
The popularity of mobile devices with GPS capabilities, along with the worldwide adoption of social media, have created a rich source of text data combined with spatio-temporal information. Text data collected from location-based social networks can be used to gain space–time insights into human behavior and provide a view of time and space from the social media lens. From a data modeling perspective, text, time, and space have different scales and representation approaches; hence, it is not trivial to jointly represent them in a unified model. Existing approaches do not capture the sequential structure present in texts or the patterns that drive how text is generated considering the spatio-temporal context at different levels of granularity. In this work, we present a neural language model architecture that allows us to represent time and space as context for text generation at different granularities. We define the task of modeling text, timestamps, and geo-coordinates as a spatio-temporal conditioned language model task. This task definition allows us to employ the same evaluation methodology used in language modeling, which is a traditional natural language processing task that considers the sequential structure of texts. We conduct experiments over two datasets collected from location-based social networks, Twitter and Foursquare. Our experimental results show that each dataset has particular patterns for language generation under spatio-temporal conditions at different granularities. In addition, we present qualitative analyses to show how the proposed model can be used to characterize urban places.

18 visualizaciones Ir a la publicación