A key question about LLMs is whether they solve reasoning tasks by learning transferable algorithms or simply memorizing training data. This distinction matters: while memorization might handle ...
Machine learning has considerably improved in evaluating large language models (LLMs) for their mathematical reasoning abilities, especially in handling complex arithmetic and deductive reasoning ...
These subtests encompass general science (GS), arithmetic reasoning (AR), word knowledge (WK), paragraph comprehension (PC), mathematics knowledge (MK), electronics information (EI), auto and shop ...
To this end, in this paper, we focus on two popular reasoning tasks: arithmetic reasoning and code generation. Particularly, we introduce (i) a general ontology of perturbations for math and coding ...