Publisher: Journal of Research in Science Teaching  Link>

ABSTRACT

Artificial intelligence (AI) technologies generate increasingly sophisticated non-human cognition; however, foundational learning theories only contemplate human cognition, and current research conceptualizes AI as a pedagogical tool. We argue that the incipient abilities of AI for mutual engagement with people could allow AI to participate as a legitimate member in social constructivist learning environments and suggest some potential structures and activities to explore AI's capabilities for full participation. "Participation is an active process, but I will reserve the term for actors who are members of social communities. For instance, I will not say that a computer “participates” in a community of practice…. (Wenger, 1998, p. 56)" Twenty-five years ago, Etienne Wenger published his influential book Communities of practice: Learning, meaning, and identity (Wenger, 1998), where he specifically discounted computers as potential members of a community of practice (CoP). Recently, however, the abilities of computational systems like generative artificial intelligence (AI) oblige us to reconsider the roles non-human cognition could play in communities of practice centered on learning. Recently, the editorial article “Artificial Intelligence and the Journal of Research in Science Teaching” (Sadler et al., 2024) describes the potential for AI technology to transform science education, but notes that “the science education research community is not as far along as it needs to be in terms of understanding, theorizing, and studying the intersections of AI and science education.” (p. 742). In response, this commentary presents our theorization and conceptualization of AI in science education. We apply the lens of social constructivism (Wenger, 1998) to theorize about this question and we argue that the nature of generative AI allows it to transcend an instrumental role and achieve full participation in a CoP. We are convinced that socio-constructivist theory in general, and CoP specifically, can provide conceptual tools and theoretical underpinnings to guide the use of AI in education. In this commentary, we synthesize ideas from current literature to construct a theoretical framework and offer suggestions for the transformative use of generative AI.

Publisher: Journal of Engineering Education  Link>

ABSTRACT

Background

We examine the efficacy of an online collaborative problem-solving (CPS) teaching approach in academic performance and student connections with other peers, among first-year engineering calculus students at a Latin American university. Our research uses communities of practice (CoP) to emphasize the social nature of learning and the importance of participation and interaction within a community.

Methods

The work applies a quasi-experimental design and social network analysis (SNA). A total of 202 engineering students were instructed using CPS methodology (experimental group), while 380 students received traditional online teaching methods (control group) during one semester in the first calculus class for engineers.

Results

Results show no significant difference in the grades obtained between the experimental and control groups. However, students exposed to CPS reported a statistically significant higher passing rate, as well as larger and more significant academic and social connections. Additionally, SNA results suggest that CPS facilitated stronger peer connections and promoted a more equitable distribution of participation among students, particularly women, compared to students taught under traditional online teaching methods.

Conclusions

The study underscores the importance of fostering collaborative learning environments and highlights CPS as a strategy to enhance student performance and network formation. Findings suggest that CPS can improve academic outcomes and promote more equitable learning practices, potentially reducing dropout rates among women engineering students. These findings contribute to the ongoing efforts to address systematic biases and enhance learning experiences in engineering education.

Deep neural networks (DNNs) struggle at systematic generalization (SG). Several studies have evaluated the possibility of promoting SG through the proposal of novel architectures, loss functions, or training methodologies. Few studies, however, have focused on the role of training data properties in promoting SG. In this work, we investigate the impact of certain data distributional properties, as inductive biases for the SG ability of a multi-modal language model. To this end, we study three different properties. First, data diversity, instantiated as an increase in the possible values a latent property in the training distribution may take. Second, burstiness, where we probabilistically restrict the number of possible values of latent factors on particular inputs during training. Third, latent intervention, where a particular latent factor is altered randomly during training. We find that all three factors significantly enhance SG, with diversity contributing an 89% absolute increase in accuracy in the most affected property. Through a series of experiments, we test various hypotheses to understand why these properties promote SG. Finally, we find that Normalized Mutual Information (NMI) between latent attributes in the training distribution is strongly predictive of out-of-distribution generalization. We find that a mechanism by which lower NMI induces SG is in the geometry of representations. In particular, we find that NMI induces more parallelism in neural representations (i.e., input features coded in parallel neural vectors) of the model, a property related to the capacity of reasoning by analogy.

"Delle Rose et al. (COLT’23) introduced an effective version of the Vapnik-Chervonenkis dimension, and showed that it characterizes improper PAC learning with total computable learners. In this paper, we introduce and study a similar effectivization of the notion of Littlestone dimension. Finite effective Littlestone dimension is a necessary condition for computable online learning but is not a sufficient one—which we already establish for classes of the effective Littlestone dimension 2. However, the effective Littlestone dimension equals the optimal mistake bound for computable learners in two special cases: a) for classes of Littlestone dimension 1 and b) when the learner receives as additional information a bound on the numbers to be guessed. Interestingly, finite effective Littlestone dimension also guarantees that the class consists only of computable functions. Keywords: Online learning, Littlestone dimension, computable machine learning"

Despite the wide use of k-Nearest Neighbors as classification models, their explainability properties remain poorly understood from a theoretical perspective. While nearest neighbors classifiers offer interpretability from a ''data perspective'', in which the classification of an input vector x is explained by identifying the vectors v1, ..., vk in the training set that determine the classification of x, we argue that such explanations can be impractical in high-dimensional applications, where each vector has hundreds or thousands of features and it is not clear what their relative importance is. Hence, we focus on understanding nearest neighbor classifications through a ''feature perspective'', in which the goal is to identify how the values of the features in x affect its classification. Concretely, we study abductive explanations such as ''minimum sufficient reasons'', which correspond to sets of features in x that are enough to guarantee its classification, and counterfactual explanations based on the minimum distance feature changes one would have to perform in x to change its classification. We present a detailed landscape of positive and negative complexity results for counterfactual and abductive explanations, distinguishing between discrete and continuous feature spaces, and considering the impact of the choice of distance function involved. Finally, we show that despite some negative complexity results, Integer Quadratic Programming and SAT solving allow for computing explanations in practice.

Hate speech detection is vital for creating safe online environments, as harmful content can drive social polarization. This study explores the impact of enriching text with intent and group tags on machine performance and human moderation workflows. For machine performance, we enriched text with intent and group tags to train hate speech classifiers. Intent tags were the most effective, achieving state-of-the-art F1-score improvements on the IHC, SBIC, and DH datasets, respectively. Cross-dataset evaluations further demonstrated the superior generalization of intent-tagged models compared to other pre-trained approaches. Through a user study (N = 100), we evaluated seven moderation settings, including intent tags, group tags, model probabilities, and randomized counterparts. Intent annotations significantly improved the accuracy of the moderators, allowing them to outperform machine classifiers by 12.9%. Moderators also rated intent tags as the most useful explanation tool, with a 41% increase in perceived helpfulness over the control group. Our findings demonstrate that intent-based annotations enhance both machine classification performance and human moderation workflows.

Fine-tuning foundation models is a key step in adapting them to a particular task. In the case of Geospatial Foundation Models (GFMs), fine-tuning can be particularly challenging given data scarcity both in terms of the amount of labeled data and, in the case of Satellite Image Time Series (SITS), temporal context. Under these circumstances, the optimal GFM fine-tuning strategy across different labeled data regimes remains poorly understood. In this paper, we thoroughly assess and study the performances of two different GFMs given several combinations of two data scarcity factors: the number of labeled samples and the sequence length. Specifically, we analyze the performances on a crop classification task, particularly, semantic segmentation of the Sentinel-2 images contained in the PASTIS-HD dataset. We compare GFMs to U-TAE, as a fully supervised baseline, across varying amounts of labeled data (1%, 10%, 50%, 100%) and temporal input lengths (1, 6, 15, 25 and 35). Among these explorations, we find that using a smaller learning rate for the pre-trained encoders improves performance in moderate and high data regimes (50%-100%). In contrast, full fine-tuning outperforms partial fine-tuning in very low-label settings (1%-10%). This behavior suggests a nuanced trade-off between feature reuse and adaptation that defies the intuition of standard transfer learning.

Automatic Short Answer Grading (ASAG) refers to automated scoring of open-ended textual responses to specific questions, both in natural language form. In this paper, we propose a method to tackle this task in a setting where annotated data is unavailable. Crucially, our method is competitive with the state-of-theart while being lighter and interpretable. We crafted a unique dataset containing a highly diverse set of questions and a small amount of answers to these questions; making it more challenging compared to previous tasks. Our method uses weak labels generated from other methods proven to be effective in this task, which are then used to train a white-box (linear) regression based on a few interpretable features. The latter are extracted expert features and learned representations that are interpretable per se and aligned with manual labeling. We show the potential of our method by evaluating it on a small annotated portion of the dataset, and demonstrate that its ability compares with that of strong baselines and state-of-the-art methods, comprising an LLM that in contrast to our method comes with a high computational price and an opaque reasoning process. We further validate our model on a public Automatic Essay Scoring dataset in English, and obtained competitive results compared to other unsupervised baselines, outperforming the LLM. To gain further insights of our method, we conducted an interpretability analysis revealing sparse weights in our linear regression model, and alignment between our features and human ratings.

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile