Publicaciones

AUTHORS

Susana Figueroa, Domingo Mery, Laurence Golborne, Daniel Saavedra, Alejandro Kaminetzky

In X-ray testing, the aim is to inspect those inner parts of an object that cannot be detected by the naked eye. Typical applications are the detection of targets like blow holes in casting inspection, cracks in welding inspection, and prohibited objects in baggage inspection. A straightforward solution today is the use of object detection methods based on deep learning models. Nevertheless, this strategy is not effective when the number of available X-ray images for training is low. Unfortunately, the databases in X-ray testing are rather limited. To overcome this problem, we propose a strategy for deep learning training that is performed with a low number of target-free X-ray images with superimposition of many simulated targets. The simulation is based on the Beer–Lambert law that allows to superimpose different layers. Using this method it is very simple to generate training data. The proposed method was used to train known object detection models (e.g. YOLO, RetinaNet, EfficientDet and SSD) in casting inspection, welding inspection and baggage inspection. The learned models were tested on real X-ray images. In our experiments, we show that the proposed solution is simple (the implementation of the training can be done with a few lines of code using open source libraries), effective (average precision was 0.91, 0.60 and 0.88 for casting, welding and baggage inspection respectively), and fast (training was done in a couple of hours, and testing can be performed in 11ms per image). We believe that this strategy makes a contribution to the implementation of practical solutions to the problem of target detection in X-ray testing.

91 visualizaciones Ir a la publicación

RL1, Publisher: , Link >

AUTHORS

Rodrygo LT Santos, Leandro Balby Marinho, Júlio Barreto Guedes da Costa, Denis Parra

Embeddings are core components of modern model-based Collaborative Filtering (CF) methods, such as Matrix Factorization (MF) and Deep Learning variations. In essence, embeddings are mappings of the original sparse representation of categorical features (eg, user and items) to dense low-dimensional representations. A well-known limitation of such methods is that the learned embeddings are opaque and hard to explain to the users. On the other hand, a key feature of simpler KNN-based CF models (aka user/item-based CF) is that they naturally yield similarity-based explanations, ie, similar users/items as evidence to support model recommendations. Unlike related works that try to attribute explicit meaning (via metadata) to the learned embeddings, in this paper, we propose to equip the learned embeddings of MF with meaningful similarity-based explanations. First, we show that the learned user/item …


82 visualizaciones Ir a la publicación

AUTHORS

Juglar Diaz, Felipe Bravo

Abstract
The popularity of mobile devices with GPS capabilities, along with the worldwide adoption of social media, have created a rich source of text data combined with spatio-temporal information. Text data collected from location-based social networks can be used to gain space–time insights into human behavior and provide a view of time and space from the social media lens. From a data modeling perspective, text, time, and space have different scales and representation approaches; hence, it is not trivial to jointly represent them in a unified model. Existing approaches do not capture the sequential structure present in texts or the patterns that drive how text is generated considering the spatio-temporal context at different levels of granularity. In this work, we present a neural language model architecture that allows us to represent time and space as context for text generation at different granularities. We define the task of modeling text, timestamps, and geo-coordinates as a spatio-temporal conditioned language model task. This task definition allows us to employ the same evaluation methodology used in language modeling, which is a traditional natural language processing task that considers the sequential structure of texts. We conduct experiments over two datasets collected from location-based social networks, Twitter and Foursquare. Our experimental results show that each dataset has particular patterns for language generation under spatio-temporal conditions at different granularities. In addition, we present qualitative analyses to show how the proposed model can be used to characterize urban places.

90 visualizaciones Ir a la publicación

RL2, Publisher: Dagstuhl Reports, Link >

AUTHORS

Leopoldo Bertossi

Abstract

The presentation turns around the subject of explainable AI. More specifically, we deal with attribution numerical scores that are assigned to features values of an entity under classification, to identify and rank their importance for the obtained classification label. We concentrate on the popular SHAP score [2] that can be applied with black-box and open models. We show that, in contrast to its general #P

AUTHORS

Leopoldo Bertossi

-hardness, it can be computed in polynomial time for classifiers that are based on decomposable and deterministic Boolean decision circuits. This class of classifiers includes decision trees and ordered binary decision diagrams. This result was established in [1]. The presentation illustrates how the proof heavily relies on the connection to SAT-related computational problems.


83 visualizaciones Ir a la publicación

RL1, Publisher: Computer Vision for X-Ray Testing, Link >

AUTHORS

Domingo Mery, Bernardita Morris

Abstract

Given a facial matcher, in explainable face verification, the task is to answer: how relevant are the parts of a probe image to establish the matching with an enrolled image. In many cases, however, the trained models cannot be manipulated and must be treated as "black-boxes". In this paper, we present six different saliency maps that can be used to explain any face verification algorithm with no manipulation inside of the face recognition model. The key idea of the methods is based on how the matching score of the two face images changes when the probe is perturbed. The proposed methods remove and aggregate different parts of the face, and measure contributions of these parts individually and in-collaboration as well. We test and compare our proposed methods in three different scenarios: synthetic images with different qualities and occlusions, real face images with different facial expressions, poses, and occlusions and faces from different demographic groups. In our experiments, five different face verification algorithms are used: ArcFace, Dlib, FaceNet (trained on VGGface2 and Casia-WebFace), and LBP. We conclude that one of the proposed methods achieves saliency maps that are stable and interpretable to humans. In addition, our method, in combination with a new visualization of saliency maps based on contours, shows promising results in comparison with other state-of-the-art art methods. This paper presents good insights into any face verification algorithm, in which it can be clearly appreciated which are the most relevant face areas that an algorithm takes into account to carry out the recognition process.


134 visualizaciones Ir a la publicación