Next Article in Journal
(SDGFI) Student’s Demographic and Geographic Feature Identification Using Machine Learning Techniques for Real-Time Automated Web Applications
Next Article in Special Issue
Electrophysiological Brain Response to Error in Solving Mathematical Tasks
Previous Article in Journal
Fuzzy Partial Metric Spaces and Fixed Point Theorems
Previous Article in Special Issue
Second Phase of the Adaptation Process of the Mathematics Self-Efficacy Survey (MSES) for the Mexican–Spanish Language: The Confirmation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gender Similarities in the Mathematical Performance of Early School-Age Children

1
Department of Research and Psychology in Education, Faculty of Education, Universidad Complutense de Madrid, 28040 Madrid, Spain
2
Department of Research and Psychology in Education, Faculty of Psychology, Universidad Complutense de Madrid, 28223 Madrid, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(17), 3094; https://doi.org/10.3390/math10173094
Submission received: 15 July 2022 / Revised: 19 August 2022 / Accepted: 25 August 2022 / Published: 28 August 2022

Abstract

:
The role of gender in mathematical abilities has caught the interest of researchers for several decades; however, their findings are not conclusive yet. Recently the need to explore its influence on the development of some foundational mathematic skills has been highlighted. Thus, the current study examined whether gender differentially affects young children’s performance in several basic numeracy skills, using a complex developmentally appropriate assessment that included not only standard curriculum-based measures, but also a non-routine task which required abstract thinking. Further, 136 children (68 girls) aged 6 to 8 years old completed: (a) the third edition of the standardized Test of Early Mathematical Ability (TEMA-3) to measure their mathematical knowledge; (b) the Kaufman Brief Intelligence Test (K-BIT), and (c) a non-routine counting detection task where children watched several characters performing different counts, had to judge their correctness, and justify their answers. Furthermore, frequentist and Bayesian analyses were combined to quantify the evidence of the null (gender similarities) and the alternative (gender differences) hypothesis. The overall results indicated the irrelevance or non-existence of gender differences in most of the measures used, including children’s performance in the non-routine counting task. This would support the gender similarity hypothesis in the basic numerical skills assessed.

1. Introduction

Gender differences in mathematical performance is a topic that has attracted the attention of evolutionary, educational, and social scientists. Over many decades of research, a great deal of information has been collected on the existence, non-existence, or triviality of such differences between males and females.
These conflicting conclusions were drawn from a variety of results. Given the widespread interest in gender differences and the volume of information resulting from the research, we will briefly refer to some findings. For example, there is evidence concerning gender differences that favor males in mathematical performance on standardized tests and also on how these differences are narrowing in successive measurements, e.g., [1,2,3]. Equally frequent are the findings concerning the better performance of boys in tasks that involve the spatial reasoning or the preference for mental strategies and their relationship to higher achievement in mathematics, e.g., [4,5]. However, recent empirical evidence also suggests that there are no associations between spatial and mathematical performance, either because boys scored higher than girls on spatial measures, but no gender differences in mathematical performance were evident, e.g., [5,6], because girls scored higher than boys on spatial measures, but boys had higher mathematical performance [7], or because no gender differences were found, except for a small difference that was restricted to kindergarten children, which favored boys [2].
Findings from studies on the strategies that children use to solve addition and subtraction problems also rendered mixed results. In contrast to Fennema et al. [8], Ginsburg and Pappas [9] reported that they had not found gender differences in 4- and 5-year-old children in strategy use. More recently, some authors indicate that there are substantial differences in the use of addition strategies as well as in its development. They maintain that, in general, girls use counting strategies more frequently than boys, and that boys show an earlier and higher preference for derived fact strategies, e.g., [1,10,11]. However, Dowker [12] found no gender differences after applying her test of derived facts in addition that covers several principles (i.e., Identity, Commutativity, Addition+1, Addition-1, and Inverse Addition/Subtraction) to children 6 and 7 years old. Furthermore, unlike most previous work, Dowker adjusted the difficulty of the problems to the arithmetic level of the children, thus avoiding problems that were simple enough that children opted to calculate or remember the answer, without applying the principles underlying to the derived facts. Or, conversely, that they would find them so excessively difficult that they choose to make wild guesses or simply not want to try to solve them, see [13]. Based on Dowker’s findings, it is possible to infer that gender differences could stem from differences in the way girls and boys generate adaptive choices between strategies [14], which in turn may be strongly influenced by gender differences in the tendency of children to take risks, more pronounced in boys than in girls, particularly in social situations [1,10]. Finally, in the work of Shen et al. [15], gender differences appeared between 6- to 7-year-old boys and girls from the United States and Russia, but not between boys and girls from Taiwan. Thus, their results are illustrative of the importance of educational and cultural factors.
The role of emotional and sociocultural factors in explaining gender differences in mathematics achievement is widely accepted. Numerous investigations have shown the ways in which these factors hinder mathematics-related performance, attitudes, and preferences of girls and women, e.g., [16,17,18,19,20,21]. Given the volume of research on this topic, we will refer only to some works on the link between math anxiety and math performance in young children.
Following Levine and Pantoja [19], studies on gender differences in the relation between math attitudes and math achievement with young children are less frequent than with adolescents and adults, but still provide conflicting evidence. While Van Mier et al. [22] found that the relationship between math performance and math anxiety was stronger in girls than boys of second and fourth grades, with math anxiety being negatively linked to math performance in girls, the study by Cvencek et al. [23] with first and fifth grade students indicated that, despite no gender differences in math grades, implicit math attitude predicted math grades only for boys. In the case of girls, this relationship was not found due to implicit negativity in girls that, according to the authors, originates in math anxiety and gender stereotypes. In addition, Erturan and Jansen [24] indicated that there were no gender differences in the strength of the relation between math anxiety and math achievement in grades three to six. Dowker et al. [25] also found no consistent relationships between math anxiety and math performance in third- and fifth-grade students. The latter findings agreed with the results obtained in the meta-analyses of Zhang et al. [26] covering research with students from elementary school to adulthood.
Even though those three factors—spatial skills, solution strategy choices, and math attitudes—have been considered critical areas in the study of the role of gender in mathematics achievement [1], the lack of agreement in the empirical evidence has recently led some voices to advocate for the need to investigate the core skills or basic numerical processing in young children, e.g., [27,28,29,30]. Exploring the developmental origins of the effect of gender on math performance during preschool and elementary school may qualify the current understanding of the phenomenon. This topic is further discussed in the next section.

1.1. Gender and Basic Numerical Skills in Childhood

Basic numerical skills constitute the foundation for more complex mathematical abilities and higher-level mathematics learning [31,32,33,34]. This term involves a wide range of crucial abilities related to non-symbolic and symbolic numerical magnitude representation; counting skills; understanding of mathematical relations, and arithmetical principles [35].
The systematic research of gender differences in core numerical skills during early childhood might be relevant in several ways [34]. First, it could clarify whether gender differences exist in these foundational competences and, if so, how they might influence not only their own development, but also the acquisition of more advanced mathematical concepts across primary school in different educational systems. Furthermore, it could also provide empirical evidence against some widespread false beliefs regarding “males advantage in math performance” which has been proven to be very detrimental for girls and women’s learning and career aspirations, e.g., [36,37].
Among the studies which have explored gender differences in numerical tasks at young ages, Jordan et al. [38] reported small gender effects in a group of kindergarteners where boys achieved higher total scores than girls on a battery of number sense tasks. Johnson et al. [2] also observed a better performance of boys than girls after applying a battery of mathematics tasks to a group of kindergarteners, third and sixth graders. However, this effect was not consistent across measures taken in kindergarten in the first study and disappeared in older children in the second study. Moreover, other studies obtained different results, also on basic numeracy skills, which showed that there were no consistent gender differences. For instance, the longitudinal study by Lubienski et al. [39], because there were no gender differences at the beginning of kindergarten, but boys outperformed girls at the end. These gender differences were most pronounced in third and fifth grades but became less pronounced between grades five and eight. Similarly, Gibbs [40], using subset measures taken at six different time points, from the end of kindergarten to fifth grade, indicated that there were no consistent gender differences given that girls outperformed boys in basic math skills and boys outperformed girls in fundamental math skills.
In any case, inconsistent findings are a general trend, and those findings would dissent from the belief of a broad male advantage in these skills. On the contrary, it seems that gender differences in children’s basic numerical skills emerge in very specific abilities (favoring either boys or girls depending on the ability evaluated), in line with the gender similarities hypothesis [18,41]. Based on this approach, some recent studies have examined cross-sectional gender differences in a wide range of tasks. For example, Hutchison et al. [28] analyzed the performance of young children on two arithmetic and eight basic numerical tasks, whereas Bakker et al. [27] examined the responses of children to eight numerical tasks that were adequate for the level of the kindergarten they interviewed. The tasks presented by Bakker et al. [27] overlapped mainly with the numeration and numerical comparison components of informal mathematics of the TEMA-3, and some conventions of formal mathematics. Finally, Kersey et al. [29] analyzed several milestones of numerical development, which were also addressed by the two previous studies. For instance, counting skills and the elementary mathematic concepts (informal and formal) covered by TEMA-3. The analyses of these three investigations were also congruent, because the authors supplemented the frequentist analysis between the performance of girls and boys with either Bayesian analyses, to quantify the evidence for gender disparities versus the evidence for gender equality [27,28], or with equivalence analyses between the performance of girls and boys [29].
On the whole, all these works agreed with the gender similarity hypothesis [17,18,41] suggesting that gender differences in basic numerical skills are, in Hutchison and colleagues’ words [28], more the exception than the general rule. Even though they employed a wide range of measures extensively used in previous research [27], most of these measures are characterized by being based on school curricula. In fact, to the authors’ knowledge, the role of gender in young children’s performance when facing unfamiliar tasks or solving tasks with non-routine content has been scarcely explored so far. Non-routine tasks require thinking in novel and abstract ways. To solve them correctly, children need to go beyond “routine expertise” [42]; that is, simply replicating the execution of rote-learned strategies, patterns, or procedures, and to apply flexibly their previous knowledge instead. Therefore, addressing the study of gender differences in basic numerical skills by employing non-routine measures which demand complex reasoning may be an interesting attempt to enrich the existing evidence about the influence of gender in children’s math performance.
In that sense, a task that has been proved to be well-suited to study counting skills (one of the core numerical abilities) with non-routine content is the detection task of erroneous and unconventional correct counts [43,44,45,46].

1.2. Children’s Detection of Non-Routine Counts

Counting is one of the first numerical skills in which children are involved and, after decades of research, it is broadly assumed that it provides a foundation for arithmetical development as well as for later mathematical achievement at school [47,48,49]. Empirical evidence agrees with the view that meaningful counting goes further beyond the enumeration of collections [50]. Indeed, it is a complex cognitive process where conceptual advance is dependent on the understanding of the distinct traits of logical and conventional counting rules [51].
On the one hand, logical counting rules are characterized by their essential nature. They involve the how-to-count principles defined by Gelman and Gallistel [52], which include: (a) one-to-one correspondence (assigning a unique tag to each element); (b) stable order (using a stable list of ordered and unique tags), and (c) cardinal principle (the last tag employed represents not only the last item counted, but also the cardinality of the whole set). Lastly, as they are mandatory, non-compliance with them will inexorably lead to incorrect counts.
On the other hand, the nature of conventional rules is optional and modifiable, as they are related to recommendations based on social customs. The direction of counting can be taken as an example, since it tends to vary according to the direction of reading and writing [53]. Besides left-to-right counting, the following conventional counting rules have been posited: pointing to the elements, starting from one end of the row, spatial adjacency (consecutive counting of adjacent objects) [43], and, more recently, temporal adjacency, which is saying aloud all the number words consecutively, as well as the combination of spatial and temporal adjacency [44]. As opposed to violations of logical rules, conventional rules can be broken without negatively impacting on the correctness of the performance.
Despite the fact that conventional counting rules facilitate the acquisition of the procedure at its early stages [43], children need to recognize the differences between conventional and logical rules (i.e., the arbitrary nature of the former in contrast to the essential nature of the latter) to grasp a full understanding of counting. According to empirical evidence, this distinction is difficult even for 8-to 9-year-old children, who still fail in recognizing the arbitrariness of conventional rules [44,45,46,51]. All the studies reported have employed the detection paradigm—where children watched a character performing different kinds of counts and were asked to evaluate their correctness—to investigate their knowledge of logical and conventional counting rules.
Different kinds of counting trials are usually presented in the detection task: familiar or routine and non-routine or non-familiar. Trials with routine content resemble typical young children’s performance when executing counting. Specifically, they consist either in correct conventional counts (from left-to-right, consecutively all items) or in erroneous counts that break logical rules (e.g., the one-to-one correspondence principle by skipping any element). Empirical findings report high success rates when detecting routine counts even at preschool and kindergarten years (in general, the level of achievement was above 70% in violations of the how-to-count principles [43,45,46,51]).
A different situation appears in the case of non-routine trials: pseudoerrors and compensation errors, because children have seldom seen or experimented with these counting strategies. Furthermore, they pose extra cognitive demands since correct answers depend on children’s ability to go beyond the superficial aspects of counting performance, e.g., [51]. They need to focus on the conceptual or abstract aspects of counting rules; in other words, distinguishing between the essential nature of logical rules and the optional nature of conventional ones.
The first type of non-conventional counting trial, called pseudoerror, has been broadly used in the study of counting comprehension. Pseudoerrors are unconventional but correct counts, since they comply with logical rules, but not with the conventional ones. As previously stated, prior research indicates that children have serious difficulties when detecting these unconventional counts. When the arbitrariness of conventional counting rules is not recognized, primary graders inaccurately think that pseudoerrors are invalid counting procedures [44,46,51]. In spite of this general misconception about the nature of conventional counting rules during elementary school, some characteristics of the detection task may help children to better understand its modifiable role, such as stating explicitly the cardinal value of the set after the unusual count [44] or listening to the testimony of a unanimous majority of teachers who justify the correctness of pseudoerrors [51].
Lastly, compensation errors involve similar performance demands to pseudoerrors. In compensation errors, two violations of the logical rules occur, such as skipping one item and then double counting another item [45]. Even though the effect of the first infraction is canceled by the second and the final tag emitted coincides with the cardinal value of the set, they are conceptually incorrect counts since the logical rules have been broken twice. Unlike pseudoerrors, compensation errors have been scarcely used in the systematic study of children’s understanding of counting rules, but recently Lago et al. [51] reported that 72.4% of compensation errors were accurately detected as incorrect counts in children ranging from 5 to 8 years of age.
In accordance with the empirical evidence based on the counting detection paradigm mentioned above, children find it easier to identify the essential nature of logical rules than the optional role of conventional ones. Furthermore, children’s comprehension of conventional counting rules develops at a slow pace during primary school. However, the study of gender differences in the development of comprehension of counting rules has been somewhat untended, although some exceptions exist. For example, Lachance and Mazzocco [54] analyzed the effect of gender on a detection task with routine counting trials (erroneous and conventional correct counts) and found no differences between girls’ and boys’ performance in grades one and two. In the same line, LeFevre et al. [46] also explored the role of gender in children’s detection of erroneous counts and pseudoerrors (using the terminology adopted in the current research, counting trials with routine and non-routine content). They found a small effect of the interaction of gender by trial type because girls were better than boys in correctly detecting some pseudoerrors as valid counts, whereas boys did better than girls in detecting erroneous counts as incorrect. As the effect size of the interaction was small and logistic regression analyses indicated that gender was not a consistent predictor of children’s performance on pseudoerrors, the authors concluded that there are other factors much more relevant than gender in the development of children’s understanding of logical and conventional counting rules, such as children’s age and mathematical skill. In fact, they did not consider gender again when exploring children’s knowledge of counting rules in latter studies that employed a similar detection task. Taken together, these results can lead researchers to infer that gender does not have an effect on children’s ability to understand both logical and conventional counting rules at primary school. Nevertheless, this assumption needs further evaluation.

1.3. Goals of the Current Study

The aim of the present study is to examine whether gender differentially affects young children’s performance in several basic numeracy skills, using a complex assessment adapted to the development of the participants. We used the third edition of the standardized Test of Early Mathematical Ability (TEMA-3, [55]) and a non-routine detection task devised to evaluate children’s understanding of counting. TEMA-3 assesses children’s early mathematical knowledge, covering informal and formal contents. However, there are several concerns about the use of this kind of instrument in gender differences research. For example, the risk that these measurement tools have a gender bias. Regarding the TEMA-3, the version employed has been proved not to be gender-biased [56]. Or the risk of the so-called gender × content type interaction [40], which entails that focusing exclusively on overall standardized scores may overlook some aspects that are important when studying gender differences in specific math skills [27,40]. Past research has documented this effect in other standardized math ability tests by describing no gender differences during the primary years and an advantage of males over females when starting high school [57]. Yet, at younger ages, this interaction effect has been underexplored [40]. In the present investigation this risk has been dealt with by calculating and performing the statistical analysis with both the aggregated standard global score and a disaggregated score for each of the eight categories of informal and formal mathematical knowledge. In accordance with the above-mentioned recent research on basic numerical skills, which supported the gender similarities hypothesis, we also expect to find gender similarities for the different measures, regardless of the age of the children.
Another novelty of the current study is to evaluate the influence of gender in young children’s performance on a challenging task designed to measure their understanding of the basic numerical ability of counting. There is not an overlapping with the standardized test, which also includes items about counting, because children have rarely seen counting strategies like those presented in the detection task. Based on the results of previous studies that show the difficulty of children to focus on the conceptual and abstract aspects of the counting rules, rather than on superficial aspects of counting performance e.g., [51], we hypothesize that there will be no gender differences in the performance of this non-routine task, regardless of children’s age. We expect to find similar levels of performance for girls and boys in accordance with the gender similarities hypothesis and despite evidence pointing to an achievement gap favoring males on more novel and challenging tasks at high school [57], or to better performance of boys than girls when the contents assessed are not typically taught in classrooms (see [58] for more information]. Likewise, based on the results of previous studies, we expect that pseudoerrors with explicit cardinal values (i.e., condition with cardinal) will be easier to identify as correct counts than pseudoerrors without explicit cardinal values (i.e., condition without cardinal), regardless of children’s gender.
Finally, the current study also seeks to adopt the American Statistical Association (ASA) statement about the interpretation of statistical evidence, in the sense that it should not be based solely on statistical significance and p-values [59]. For that reason, the statement “statistically significant” has been avoided in the current manuscript and traditional frequentist analyses have been complemented with a measure of the evidence in favor of the alternative hypothesis compared to the null (i.e., the Bayes Factor Bound). This frequentist analysis has been combined with a Bayesian approach, which has been proved to be highly informative for the topic under consideration here [27,28].

2. Method

2.1. Participants

Participants were 136 primary students from four semi-private first grade classes (n = 68) and another four semi-private second grade classes (n = 68), all of them located in the western area of the region of Madrid (Spain). These classes were selected from the 18 available that accepted to participate in the study. All students spoke Spanish as their first language and were predominantly Caucasian and belonged to middle-high class families. No student presented learning difficulties. Parents provided written informed consent for their children to take part in the study and all children did so voluntarily. This research was approved by Deontological Commission of Psychology, Complutense University of Madrid.
According to the objectives of the current study, participants were distributed into eight groups, depending on their grade level, their gender and the condition of the non-routine detection task to which they were randomly assigned (without cardinal value vs. with cardinal value), as follows:
  • First Grade girls—without cardinal value: n = 17; mean age (Mage) of 6 years and 9 months [and a standard deviation (SD) of 3.44 months], age range = 6 years and 2 months to 7 years and 3 months.
  • First Grade girls—with cardinal value: n = 17; Mage = 6 years and 8 months (SD = 3.87 months), age range = 6 years and 2 months to 7 years and 3 months.
  • First Grade boys—without cardinal value: n = 16; Mage = 6 years and 6 months (SD = 3.67 months), age range = 6 years and 3 months to 7 years and 2 months.
  • First Grade boys—with cardinal value: n = 18; Mage = 6 years and 8 months (SD = 3.19 months), age range = 6 years and 4 months to 7 years and 2 months.
  • Second Grade girls—without cardinal value: n = 19; Mage = 7 years and 8 months (SD = 3.80 months), age range = 7 years and 3 months to 8 years and 3 months.
  • Second Grade girls—with cardinal value: n = 15; Mage = 7 years and 7 months (SD = 2.97 months), age range = 7 years and 4 months to 8 years and 2 months.
  • Second Grade boys—without cardinal value: n = 15; Mage = 7 years and 8 months (SD = 3.41 months), age range = 7 years and 3 months to 8 years and 3 months.
  • Second Grade boys—with cardinal value: n = 19; Mage = 7 years and 9 months (SD = 2.99 months), age range = 7 years and 3 months to 8 years and 3 months.

2.2. Materials

2.2.1. Kaufman Brief Intelligence Test (K-BIT)

General intelligence was not a primary variable of interest in the current study; however, it has been evaluated and included in the analysis as a control or reference measure. The Spanish version of the K-BIT [60,61] was used to assess general intelligence and was administered individually following the standardized procedure. As the starting point was determined by children’s age and the administration finished after a stablished number of failures (determined by the examiner’s manual [61]), not all participants solved the same number of items. According to the manual, its application used to take around 20 min.
The test was formed by two sections or subscales: verbal and nonverbal. The verbal subscale was made up by a total of 82 items, and it evaluated children’s word knowledge, verbal abilities, and deductive reasoning (i.e., crystallized reasoning). It had, in turn, two subsections: vocabulary, with 45 items where children had to name a picture and definition, with the remaining 37 items in which participants were asked to solve riddles. The nonverbal part contained 48 items to assess fluid reasoning. They involved matrices where children had to recognize relationships, complete visual analogies, and solve problems.
Following the manual guidelines, each item was scored dichotomously (0 for incorrect answers, and 1 for correct ones) and three measures were computed: verbal (from vocabulary and riddles subsections) and nonverbal reasoning (from matrices subscale) and the composite intelligence (IQ) score. This overall composite score (IQ) was taken as the measure of performance for statistical analysis. Its internal consistency has been proved to be high for individuals between 6 and 8 (Cronbach’s alpha values range between 0.82 and 0.88 [61]).

2.2.2. Test of Early Mathematical Ability Third Edition (TEMA-3)

The Spanish adaptation of the TEMA-3 [56] was employed. It is an individually administered norm-referenced test (testing time: 40 min) for children between 3 years and 8 years and 11 months of age. Its internal consistency is high, with Cronbach’s alpha values between 0.91 and 0.95 for the age range examined here [56].
The test is formed by 72 items (in turn, each item contains several problems). According to the classification of the examiner’s manual [56], 41 items assessed four categories of informal mathematical knowledge (i.e., not acquired by direct instruction), and the remaining 31 measured another four categories of formal mathematical knowledge (i.e., acquired by means of school instruction).
On the one hand, the informal mathematical categories were: numbering, formed by 23 items (e.g., producing the numeral sequence, backwards or forwards, from or up to a given number); number comparisons, with six items (e.g., stating which of two numbers is closer to one target); calculation, made up by eight items (e.g., solving word problems with the aid of tokens), and concepts, formed by four items (e.g., part-whole concept problems). On the other hand, the four categories of formal mathematical knowledge were: numeral literacy, with eight items (e.g., writing numerals); number facts, made up by nine items (retrieving answers in simple addition, subtraction and multiplication algorithms); calculation, formed by nine items (e.g., solving written algorithms), and concepts, with five items (e.g., stating how many hundreds are in a thousand).
Its administration obeyed the standardized procedure: the child’s age determined the starting point and the child’s performance in the ending one (since the assessment stopped when the child failed five consecutive items). Only in those cases, where children gave no five consecutive correct responses between their own starting and ending point, were items before the starting point also applied (until five consecutive accurate answers were achieved). Thus, the total amount and the kind of items administered varied among the participants.
Each item applied was scored as 0 or 1 (incorrect and correct, respectively) according to the examiner’s manual guidelines [56]. Based on these raw scores, different measures of performance were calculated. First, the aggregated age-reference standard score, called Math Ability Score (MAS), was obtained following the manual rules. Second, a series of eight disaggregated scores, for each one of the eight categories of knowledge, was computed. As Ginsburg and Baroody [55] did not validate these subdomains, these eight scores were calculated as the percentage of the items administered and correctly solved in each category. Unlike MAS, disaggregated scores only accounted for applied items, see also [29,62].

2.2.3. Counting Detection Task

A modified version of the computerized counting detection task called “The Little House of Numbers” (Registered Intellectual Property Number: M-002197/2012. It has been also used in previous studies [44,51]) was employed. In it, four cartoon characters counted different collections of things. All the characters were girls to avoid any spurious effect from their gender on children’s responses. The characters always counted at the same steady pace; all objects were presented in a row and set sizes were between 7 and 12 elements, to prevent subitizing.
The task began with this brief introduction from Rosa, another character: “We are going to play counting things with some girls. I am going to put some things on the table for them to count. You must pay attention and tell me if they have done it right or wrong.” Afterwards, the presentation of the different counting trials started. All of them followed this structure: a curtain dropped and, immediately upon its rising, one of the girls and a row of objects on a table appeared in the screen. The girl started her count, by moving her arm downwards to touch each object (it made a slight movement, as a signal of having been counted, when the character put her finger on it). In the “without cardinal condition” (to which participants were randomly assigned), the counting trial finished at this point; whereas in the “with cardinal condition” the character explicitly stated “There are …” when she finished her count. Next, in all cases, the participant was asked: “Has she done it right or has she done it wrong?” The experimenter also requested the child to justify his or her answer (e.g., “Why?” or “Why do you think that?”) before continuing with the next counting trial.
Twelve different counting trials, which could be categorized in three different types, were presented to each participant: (i) routine trial, one conventional correct count where seven elements were counted consecutively from left to right, included as control trial; (ii) non-routine trials, three different compensation errors (see Table 1 for a full description of each one), and (iii) non-routine trials, eight pseudoerrors which transgressed different conventional rules in different ways (see Table 1). Their selection was based on data from previous research with samples with similar ages [44,51,54].
Children’s responses were coded as correct (score = 1) when they accurately detected and justified the counting trial. That is, “rejecting” compensation errors (i.e., judging the count as wrong, alluding to the two violations of logical rules) and “accepting” pseudoerrors (i.e., judging the count as right and justifying as correct). Any other responses were considered incorrect and scored 0. Regarding justifications, they were categorized following the criteria established in previous research [51]. Two measures of performance were calculated in this task for each participant: (a) the percentage of compensation errors correctly detected (i.e., rejected and duly justified), and (b) the percentage of pseudoerrors correctly detected (i.e., accepted and duly justified).
Children’s level of success on the correct conventional count, included as control, was 100% in both grades. On the compensation errors, it reached 81.37% and 86.28% in first grade and second grade, respectively. Due to this ceiling effect, compensation errors were excluded from data analysis.

2.3. Procedure

Participants were individually assessed during school hours in a quiet room near their own classrooms. The assessment extended throughout two different sessions separated by a time interval between 3 and 5 weeks. The test order was stable for all participants, who did TEMA-3 in the first session and the counting detection task followed by the K-BIT in the second.
In the first session, a female experimenter administered TEMA-3 individually to each child. As stated before, the standard application rules were followed, and it lasted about 40 min.
In the second session, the female experimenter individually interviewed each child with the computerized detection task for about 15 min. Only a single pseudo-random order was used for all the participants, which was maintained across conditions (without and with cardinal value) and participants: Pseudoerror 1, Compensation error 1, Pseudoerror 2, Compensation error 2, Pseudoerror 3, Pseudoerror 4, Conventional correct, Pseudoerror 5, Compensation error 3, Pseudoerror 6, Pseudoerror 7, Pseudoerror 8. Participants did not receive any feedback on their performance to avoid bias in subsequent counting trials. All their responses were registered manually, and justifications were also recorded for later transcription and categorization. The percentage of agreement of two independent coders in categorization of justifications was 93.38% (calculated for 16.66% of the total answers). When disagreements appeared, they were discussed until consensus was reached. Finally, K-BIT administration obeyed the manual guidelines, and it took approximately 20 min with each participant.

2.4. Data Analysis

Both traditional frequentist and Bayesian analyses were conducted using JASP software [63]. According to the traditional frequentist approach, parametric statistics were employed since the sample size (n > 30) allowed us to adopt the central limit theorem [64] and median-based Levene’s tests suggested homogeneity of variances. We ran several univariate ANOVAs on each measure of performance with Grade Level, Gender, as well as Condition (the latter only in the specific case of the counting detection task), entered as fixed factors. In order to correct for errors derived from multiple comparisons, we employed the Dunn-Sidák [65] corrected threshold of the p value, see also [27,28,66]. The corrected p value for the univariate analyses of variance was p = 0.0047 (It resulted from the formula: 1 − (1 − α) 1/k, where α = 0.05, and k = the number of independence test run [28,66]). Furthermore, according to the ASA statement [59] the meaning of the p-value was supplemented by the Bayes Factor Bound (BFB). BFB was calculated as 1/(−e p log p) and indicated the maximum possible odds that the alternative hypothesis was true [67,68]. For example, a BFB of 13.9 suggests that the odds in favor of the alternative hypothesis (gender differences) relative to the null (absence of gender differences) are at most 13.9 to 1.
Bayesian analysis comprised Bayesian ANOVAs with the same dependent variables and fixed factors as the frequentist univariate ANOVAs. The resulting Bayes Factor (BF) offers an index of the relative evidence for one hypothesis over another. BF10 is evidence for H1 over H0 (the existence of gender differences), whereas BF01 is evidence for H0 over H1 (the absence of gender differences). Following recommendation of scientific literature [69], default values set by JASP for priors were used for the alternative [with a Cauchy (0, 0.707) distribution] and for the null hypothesis (with an effect size of 0). Taking into account the main aim of the current research, the BFs of the main effect of Gender are reported.

3. Results

3.1. Frequentist Analysis

To explore children’s general intelligence, we conducted a 2 (Grade Level: first grade, or second grade) × 2 (Gender: boys, or girls) univariate ANOVA on the K-BIT composite intelligence score (IQ) The univariate analyses of variance were also performed with each subscale of K-BIT (vocabulary and matrices) as measure of performance. This did not alter the results: neither main effects nor interaction influenced the children’s achievement in any of the subscales. For this reason, we focused only on the IQ score. As it can be seen in Table 2 and Table 3, the analysis revealed that neither group, gender nor their interaction had an effect on the IQ of the children tested, suggesting that there were gender similarities.
Regarding TEMA-3, a series of 2 (Grade Level: first grade, or second grade) × 2 (Gender: boys, or girls) univariate ANOVAs were run. The following achievement indicators were considered as dependent variables: (i) the Math Ability Score (MAS, with M = 100 and SD = 15); (ii) the four categories of items to assess informal mathematics knowledge: (a) numbering, (b) number comparisons, (c) calculation, and (d) concepts, and (iii) the four categories of items to assess formal mathematics knowledge: (a) numeral literacy, (b) number facts, (c) calculation, and (d) concepts. Here we scored the different categories of informal and formal mathematical knowledge in terms of the percentage of items administered and solved correctly for each child, because Ginsburg and Baroody [55] did not validate the subconstructs. Table 2 and Table 3 present children’s mean performance and results from the univariate analyses, respectively.
As expected, frequentist analyses showed that there were not generalized differences between the two groups, because TEMA-3 is a standardized age-normed math test. Nevertheless, as shown in Table 3, there was strong evidence that older children correctly solved a larger percentage of items than younger children on the formal knowledge categories of number facts (M = 29.66, SD = 26.70; M = 58.96, SD = 22.81, respectively, for first and second graders) and calculation (M = 22.03, SD = 22.60; M = 67.01, SD = 16.83, respectively, for first and second graders). These group differences in mathematical performance simply showed that older children attained a better formal knowledge of mathematics, because these categories covered content that is commonly taught in the classroom (e.g., subtraction with borrowing, multiplication, the commutative law).
Equally, there was strong evidence of gender differences limited to the MAS and to the formal mathematics items of numerical facts, because boys outperformed girls. It is important to highlight that, in contrast to other categories, the scoring of the items in the number facts category combined accuracy and response time data. To further explore this gender difference, we conducted supplementary analysis in the category of number facts by recounting the percentage of correct answers. That is, we calculated another measure consisting in the percentage of number facts problems correctly responded to. Note that the items contained a different number of problems, so that the problems presented to children ranged from 4 to 24. The majority of children completed at least 10 problems (84% of first graders and all second graders).
The independent t-tests for the percentage of problems in the category of number facts correctly solved by the children, by gender, showed the existence of gender similarities (t(134) = 1.70, 95% CI [−0,99, 13.34], p = 0.09, BFB = 1.69, d = 0.29, 95% CI for d [−0.047, 0.629]). The mean percentages were 59.94 (SD = 20.78) for girls and M = 66.12 (SD = 21.49) for boys. These data seem to suggest that girls scored lower on the items in the number facts category because they tended to be less interested than boys in responding quickly, and due to the all-or-nothing criterion for scoring these items, the performance of the girls may have been underestimated. In this sense, binary logistic regressions were implemented to ascertain whether there were systematic or only isolated gender differences in solving single-digit addition, subtraction, and multiplication problems of the number facts items. Each problem was modeled separately, and gender was included as the categorical predictor variable, girls being the reference category. Findings, reported in Appendix A, suggested that there were no systematic gender differences when solving number facts. Children’s gender only influenced the likelihood of accurately retrieving three number facts out of the twenty-four problems. Specifically:
-
Problem “3 + 4” (first problem of item 47, according to the numbering of TEMA-3 Spanish version): χ2(123, N = 125) = 5.63, p = 0.02, Nagelkerke’s R2 = 0.06; Wald = 5.47, p = 0.02, BFB = 4.82. The model correctly classified 60% of the cases. The odds ratio (OR) showed that the odds that children correctly retrieved this number fact was 41.2 higher for boys than for girls (95% CI for OR [19.9, 86.7]) (see Figure 1).
-
Problem “8-4” (first problem of item 50, according to the numbering of TEMA-3 Spanish version): χ2(111, N = 113) = 6.62, p = 0.01, Nagelkerke’s R2 = 0.09, Wald = 6.08, p = 0.01, BFB = 6.27. In this case, the model correctly classified the 78.8% of the cases. The OR indicated that the odds that children correctly retrieved the result of this number fact was 29.3 higher for boys than for girls (95% CI for OR [11, 77.7]) (see Figure 1).
-
Problem “6 + 4” (first problem of item 51, according to the numbering of TEMA-3 Spanish version): χ2(106, N = 108) = 8, p = 0.01, Nagelkerke’s R2 = 0.10, Wald = 7.64, p = 0.01, BFB = 12.5. This model correctly classified 63% of the cases, and the OR revealed that the odds that children accurately retrieved this answer was 46.4 higher for boys than for girls (95% CI for OR [14.6, 72.1]) (see Figure 1).
As for the performance on the non-routine counting detection task, preliminary analyses indicated that children’s success rate when detecting pseudoerrors was below chance level (M = 35.57%, SD = 27.93): t(135) = −6.03, 95% CI [−19.17, −9.69], p < 0.001, d = 1.27, 95% CI for d [1.08, 1.46]. Then, a 2 (Condition: without cardinal, or with cardinal) × 2 (Grade Level: first grade, or second grade) × 2 (Gender: boys, or girls) univariate ANOVA was run on the percentage of pseudoerrors correctly detected.
The findings from ANOVA on the pseudoerror detection task suggested the existence of gender similarities. Only Condition showed a strong effect on children performance (see Table 4). As expected, children were more likely to consider pseudoerrors as valid counts when the cardinal value was explicitly stated after the counting (M = 43.66%; SD = 28.17) than when the cardinal value was absent (M = 27.24%; SD = 25.28). In spite of the improvement in children’s performance when the cardinal value was present, children’s overall performance was low. This suggests that their understanding of the optional nature of conventional rules still needed further development.
The analysis of children’s justifications in the detection task supported this view (see Table 5). It revealed that the primary reason why our participants rejected pseudoerrors (i.e., judged them as incorrect counts) was the violation of conventional rules. This kind of explanation, which appeared with a higher frequency overall, highlighted the need to comply with conventional rules when executing the counting procedure. For example, a boy from the first grade stated the necessity of proceeding in a left-to-right direction by saying: “Eli has done it wrong, because she has started in the other side. Like that, you count the wrong way round”. Likewise, another first grade boy explained that “you can’t begin at the middle. You must start either at this end (the most-left item) or at this other (the most-right item)”. Similarly, participants emphasized different types of adjacency, such as this girl in second grade who said: “Tina is wrong because she has not counted the ships in order. She has skipped one ship and has counted it at the end” (spatial adjacency); this other girl, also in second grade, who stated: “Eva has counted wrong, because she has pointed the moons and she hasn’t said anything. You can´t do that because, like that, you can´t know how many there are” (temporal adjacency); or this boy in first grade: “You can´t repeat the same bird three times. You must say each number only once and then continue with the next bird” (temporal-spatial adjacency). On the contrary, when children were accurate and accepted pseudoerrors as valid counts (i.e., judged them as correct), they did so by mentioning that the violation of conventional rules did not alter the logic of the count, such as this girl in first grade who explained with respect the temporal adjacency pseudoerror of counting by twos: “She is right. She has said every other balloon, but she has counted them all in her head”. Finally, the analysis of justification also showed that children were able to recall and reproduce the character’s counts; therefore, their failures cannot be associated to memory problems.

3.2. Bayesian Analysis

BFs should be viewed as a continued measure of evidence in support of either the null or the alternative hypothesis [69]. To simplify its interpretation, Jeffreys’ [70] recommendations about BF values have been followed here: BF < 3 anecdotal evidence or not sufficient; BF > 3 moderate evidence; BF > 10 strong evidence, BF > 30 very strong evidence and BF > 100 extreme evidence.
Table 6 includes the BFs values for the main effect of Gender. There was anecdotal evidence for both the alternative (gender differences) and the null hypothesis (absence of gender differences) on TEMA-3 categories of informal mathematics calculation, formal mathematics calculation and formal mathematics concepts.
Bayesian analysis indicated moderate evidence in favor of the null hypothesis and anecdotal evidence supporting the alternative on the K-BIT, on four of the TEMA-3 categories (informal mathematics numbering, informal mathematics number comparison, informal mathematics concepts, and formal mathematics numeral literacy) as well as on the counting pseudoerrors detection task. This pattern of results is in line with existence of gender similarities on those measures.
Lastly, evidence for the presence of gender differences was only found on two measures: TEMA-3 MAS and TEMA-3 category of number facts. In both cases, BF suggested very strong evidence supporting the alternative hypothesis and only anecdotal evidence for the null. These results are congruent with frequentist analysis described above. Thus, following the same analytical approach and taking into account that the measure of performance in the category of numerical facts resulted from a combination of accuracy and response time, a Bayesian independent t-test was run on the percentage of problems correctly responded to, by gender. When recounting the percentage of correct responses by problems (instead of by item), the results seemed to indicate that there were no gender differences. However, the evidence was only anecdotal for both alternative (BF10 = 0.68) and the null hypothesis (BF01 = 1.46). This did not allow a conclusion to be drawn on the effect of gender on this measure.
The Bayesian ANOVAs ran also included Grade Level as main effect. When two main factors are included, JASP builds different models to inform about the possible additive combinations between those factors. In those cases where findings suggested that the additive model Grade Level + Gender was the most probable model (what happened in TEMA-3 categories informal mathematics calculation and formal mathematics number facts), the predictive adequacy of the interaction model Grade Level × Gender was also compared against the additive model. The additive model was favored 4.36 times more than the interaction one in informal calculation and 3.71 times more in formal number facts. As model comparison indicators revealed that the interaction Grade Level × Gender was not informative, the BF of this interaction has not been reported in this section.
Finally, the Bayesian ANOVA ran on the pseudoerror detection task included Condition as main effect as well. Condition model resulted to be the best fitting one. The very strong evidence in favor of the alternative hypothesis (BF10 = 52.7) and the anecdotal evidence for the null (BF01 = 0.02), indicated that the presence of the cardinal value after the counting improved children’s performance when detecting counting pseudoerrors.

4. Discussion

It is currently well established that, with age and experience, children build and use more complex problem-solving strategies. The conceptual and procedural importance of early numerical development goes far beyond the first years, as it forms the basis of mathematical knowledge, e.g., [10,13,26,51,71,72]. Moreover, in recent years it has also been a suitable framework for studying the emergence of gender differences, thus expanding this field of research—an approach that has the advantage of limiting the complex influence of socio-emotional factors, and the disadvantage of applying numerous heterogeneous tasks and materials to measure children’s achievement.
Even though inconsistent findings are a general trend in studies on the exploratory factors for gender differences in mathematical achievement, the overall results of the current research are consistent with recent work [27,28,29] that provide support for the gender similarities hypothesis. That is, they indicate the irrelevance or non-existence of gender differences in most of the measures used. Specifically, congruent with the frequentist analysis of the data, we found moderate evidence in Bayesian analysis in favor of the null hypothesis (absence of gender differences) in 7 out of 11 measures (6 out of 10 mathematics measures), strong evidence in favor of the alternative hypothesis (gender differences) in 2 out of 11 measures, and inconclusive evidence in the remaining 2 measures. In addition, the strong evidence in favor of the gender differences hypothesis appeared in two measures that were related, the standardized MAS score and number facts (i.e., one of the categories that compose the TEMA-3).
To further examine this finding, an additional analysis was conducted on the percentage of the number facts problems, not items, correctly responded to by each child, which revealed that there were no systematic gender differences in the use of strategies across the number facts problems. Our findings only partially agree with those of Shen et al. and Sunde et al. [11,15] and are closer to the results of Bailey et al. [10]. These new data revealed that boys and girls use a variety of strategies to solve number fact problems, in agreement with the results of several works [10,11,14,73]. This occurred for all three operations, with girls’ and boys’ choices being similar for most of the 24 problems, a result that resembles the absence of gender differences in the use of retrieval indicated in the study by Bailey and collaborators [10] with respect to their two younger groups of first and second graders. Girls resorted to counting-based strategies more often than boys, but the influence of gender only appeared in three out of twenty-four problems, with no underlying pattern that could explain this change in girls’ choices. All three problems appeared to be equally or even less demanding than other larger problems in the items. Besides, neither boys’ nor girls’ performance was homogeneous across the different problems but in these three problems, and not only girls but also boys tended to decrease their achievement (compared to their pair problem in the item). Another similarity between girls and boys was that they experienced no more difficulty to solve subtraction problems correctly than addition problems, suggesting that they have developed efficient strategies. These results differ from the strong and widespread gender differences defended by Shen et al. and Sunde et al. [11,15].
A plausible explanation for the discrepancy in the results of the current research obtained in the different analyses run on the number facts problems, regarding the influence of gender, is that the performance of the girls may have been underestimated due to the all-or-nothing criterion by which the items are scored. Another complementary explanation could be that for girls an effective strategy has to generate precise, not necessarily quick, responses, as suggested frequently in the literature. In accordance with the above-mentioned results, girls put accuracy over speed in choosing the most effective strategy in a given problem because they simply take fewer risks than boys e.g., [1,10]. However, girls also generate increasingly adaptative strategy choices, although their greater reliance on secure choices are penalized in the correction of number facts problems. For example, they count down to solve single-digit subtraction problems such as 9 − 1 or use derived facts to solve other problems. An example of the latter was described by a girl from first grade who correctly calculated the outcome of the problem 3 + 4 as (4 + 4) − 1, or the 7 + 7 problem using the result of the previous problem 8 + 8 by calculating (outcome of problem 8 + 8) − 2. It should be noted that these faster strategies are indeed effective and, if children do not self-report their use, they would go unnoticed on most occasions. This leaves open the possibility that they would sometimes be considered as direct retrieval. In this sense, Ginsburg and Baroody [55,56] explicitly acknowledge the possible subtle use of elaborate counting strategies, such as count-down or -up to subtract, depending on the size of the subtrahends. Similarly, Shen et al. [15] also claimed, in relation to complex addition problems, that responses categorized as retrieval could be the result of calculations executed very quickly. The same and to a greater extent could be expected in single-digit addition and subtraction problems. Finally, given the crucial importance of speed, how do children know if the interviewer considers their responses to be fast enough? They do not receive any feedback and some studies do not even ask them to respond quickly. For instance, Shen and colleagues [15] provided the children with all the time they needed to solve the problem, and in Sunde et al.’s [11] study the children were not asked to do so either. However, as argued by Bailey et al. [10], the pattern of relations between skill and preference may be altered depending on whether the emphasis is on speed or accuracy.
As for the multiplication problems, the children showed some understanding of the 1 (n × 1 = n) and zero (n × 0 = 0) rules alluded to in the problems in item 48. The more complex multiplication problems in item 68 were presented almost exclusively to second graders (i.e., 26 boys and 15 girls). No gender differences were found, and children performed better in solving the 3 × 2 problem than the 8 × 2 problem. Their justifications in the latter suggest that they use the multiplication table of two rather than retrieval, which was more time-consuming than allowed. Thus, this pattern of findings suggests that there were no gender differences in children’s use of strategies to solve less-familiar problems.
Besides exploring the gender similarity hypothesis on curriculum-based tasks, one of the novel contributions here has been extending analysis to non-familiar tasks developmentally appropriate for the age range considered. Specifically, a non-routine detection task which evaluated children’s understanding of counting. Children’s success rate when detecting counting pseudoerrors has been low and was not dependent on either grade level or gender. Overall, the mean percentage of correct responses (when pseudoerrors are considered valid counts) was 35.6%, very close to the attainment of first and second graders reported by, for example, Lago et al. [51] (30.05% of the pseudoerrors correctly detected). This poor performance on pseudoerrors contrasted with the high levels of success observed in compensation errors (83.82% correct responses judging them as erroneous counts, due to the transgression of logical rules, even if the cardinal was correct). As previously stated, compensation errors shared superficial aspects with pseudoerrors (such as involving several elements of the set) but are conceptually opposite, since compensation errors break logical rules and are conceptually incorrect. Even though they have not been analyzed in the current research, it is relevant to take into account the differential pattern of children’s responses in these two types of non-routine counting trials for several reasons. Among others, it indicates that children could keep track of the character’s counts and that the non-routine detection task does not misrepresent children’s understanding of conventional counting rules, see also [51]. In this sense, what these data illustrate is that children correctly comprehend the essential nature of logical rules, but they have erroneous knowledge about the optional nature of conventional counting rules, since they view these rules as essential for correct counting, penalizing their breaches. Furthermore, the absence of developmental differences in first and second graders’ performance when detecting pseudoerrors also exemplifies the slow pace of development to distinguish logical and conventional counting rules [43,44,46,74].
Both Bayesian and frequentist analysis agreed in showing that explicitly stating the cardinal value facilitates the acceptance of pseudoerrors as valid ways of counts. Our participants’ performance and justifications supported the view proposed in previous works [44], since the presence of the cardinal value helped them to focus on the numerosity of the collection and downgrades the role of the non-conventional procedure. In other words, it highlights the purpose or the functional aspect of counting as a quantification process beyond mere enumeration.
Turning to the issue of the role of gender in performance on non-routine tasks, the results of the current research also side with the gender similarities hypothesis. Congruent with our expectations, no “boy” advantage was found in the non-routine detection task which is characterized by requiring sophisticated understanding of counting rules and cannot be solved by simply applying rote-learned procedures. On the contrary, the widespread belief that emerges from literature is that gender differences appear, favoring males, when solving complex, unfamiliar or novel tasks not curriculum-based during high school years, e.g., [1], although some voices alert that these results should be considered with caution [57]. In fact, other studies have failed in observing differential effects of gender in performance on other non-routine tasks at primary school. For instance, Palm and Nyström [58] found no gender differences in a sample of fifth graders, either in the activation or in the utilization of the real-world knowledge needed to accurately solve non-familiar word problems. Whilst Palm and Nyström’s [58] study analyzed an older sample and their word problems go further than the basic numerical skills considered in the current research, their non-routine task differed from those typically presented in the school setting and, like the one used here, also demanded a thorough interpretation of the situation and cannot be solved by using mechanic procedures or superficial strategies.

5. Implications for Practice

Although the findings presented here are only a starting point, they have the potential to inform teaching and educational practice that de-emphasizes gender differences, helping adults to see that boys and girls can participate without distinction in learning mathematics. The gender similarities hypothesis may lessen teachers’ concerns about whether to focus on fostering children’s positive dispositions and feelings or on the teaching of mathematics concepts and skills.
Additionally, children’s poor performance in the non-routine detection task, related to a basic numeracy skill, which is taken for granted, may be a reflection of the tendency of typical mathematics instruction during primary school to encourage the use of routines and repetition procedures, not only in the domain of counting, but also in other mathematical areas [75]. This would lead children to follow them blindly and to fail in non-routine or unfamiliar tasks that require explicit reflection on the rules or concepts involved. It also illustrates the role that metacognition could play in mathematics learning. Children’s awareness of and verbalization of their mathematical strategies may be crucial if there is to be a link between children’s mathematical reasoning and the design of curriculum and the teaching of mathematics. This is important not only for informal but also for formal knowledge. For example, the children’s justifications when solving an 8 × 2 problem showed that they used the table of 2 in order to answer and compensate for their lack of knowledge of the 8-multiplication table. Instruction would need to focus not only on correct answers, but also on students’ failures as a source of learning to help children to build new knowledge. In line with Siegler et al. [76], we believe that educators should seek to promote conceptual understanding, an educational practice that would foster mathematical learning. In the specific case of counting, the differences between logical and conventional counting rules could help to build a sound and reflective understanding of the concepts involved.

6. Limitations and Future Directions

One of the limitations of the present work is that, although it focuses on gender similarities between boys and girls, it has not covered the analysis of the development and importance of the stereotype against girls’ math achievement. Leaving aside that the explanatory power of one of its mechanisms of influence in the field of mathematical competence, the stereotype threat, has been questioned in recent years, e.g., [77], and has even been claimed to distance researchers from the analysis of more relevant factors and to suffer from publication bias e.g., [37], the stereotype effect has been widely acknowledged in the literature. For example, the study by Bian et al. [36] found that children do not follow the “brilliance = males” stereotype at age 5, but do at age 6, the age of our youngest participants. Moreover, according to these authors, this stereotype begins to influence children’s interests as soon as it is acquired, leading girls to avoid activities that require brilliance. These results and the fact that the stereotype threat is only one of the mechanisms by which the stereotype that girls perform more poorly than boys exerts its influence, suggesting the desirability of incorporating the analysis of this factor into future work on early gender similarities.
The conceptualization of gender as discrete and binary may be another limitation of this study. Future work could consider incorporating a broader view of gender, that is, gender-diverse conceptualizations, to gain a more comprehensive understanding of children’s perceptions of mathematics. For instance, in relation to the limitation mentioned above, Olson et al. [78] found that transgender children (6–8 years) showed little endorsement of gender stereotypes.
Another limitation is that these participants were recruited from available centers, skewed towards middle-high class families. Thus, a more nationally representative sample, randomly selected, would be advisable in future studies to prevent possible biased data and not to compromise generalizability. Moreover, the results of this study may vary depending on the country of origin of the children. However, the current findings are consistent with those obtained in different Western countries, where gender differences were not evident in younger ages [27,28,29].
Finally, an additional limitation is the absence of a fine-grained analysis of very important acquisitions of girls’ and boys’ cognitive development, such as the influence of metacognitive skills or executive functions, which could contribute to a better understanding of the relationship between gender similarities and mathematical performance. Regarding metacognitive abilities, its crucial role in learning processes has been proved, but further studies are needed to disentangle how these mechanisms emerge, improve and interact, e.g., [79,80]. As for executive functions, research indicates that they are linked to primary school children’s performance in tests of general math ability. However, more insight is needed to know how each component of executive functions (i.e., working memory, inhibitory control and shifting) contributes to specific math skills [81], such as those considered in the current study.

7. Conclusions

Most research on the influence of gender on mathematical achievement has focused on older children or adolescents, pointing to the existence of an achievement gap between males and females, which favors the former. Recently, however, several studies have suggested the importance of analyzing gender differences also at younger ages. The results of the present study, characterized by an inclusive statistical approach with both frequentist and Bayesian analyses, are in line with those found in the latter works, as they illustrate the importance of adopting a gender similarities perspective due to the non-existence, or triviality, of gender differences in young children’s mathematical performance in several skills. These results were obtained with standard and non-routine tasks, which covered a wide range of concepts and skills of children’s early mathematical knowledge, suggesting the robustness of the data.

Author Contributions

A.E., M.O.L. and C.D. have equally participated in all phases: conceptualization, methodology, software, validation, data-curation, formal analysis, and writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a project grant (PSI2017-82339-P) from the Spanish Ministerio de Ciencia e Innovación.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Deontological Commission of Psychology, Complutense University of Madrid (ref. 2019/20-047 on 25 July 2020).

Informed Consent Statement

Informed consent was obtained from parents of all participants involved in the study.

Data Availability Statement

The raw data collected for this study will be made available by the authors under request, without undue reservation.

Acknowledgments

We thank all children, their families and school staff for their cooperation. We are also very grateful to Sonia Caballero for her valuable help on several phases of the study. Finally, we thank the reviewers and the editors for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Statistics for Gender (as categorial predictor) in binary logistic regression models. Each “Number Fact” problem was modeled separately.
Table A1. Statistics for Gender (as categorial predictor) in binary logistic regression models. Each “Number Fact” problem was modeled separately.
Item NumberingNumber Facts Problemχ2dfNpNagelkerke’s R2Wald (Wald’s p)BFB
Item 36 Subtract.n − n0.741151170.390.010.73 (p = 0.39)1
n − 10.621151170.430.010.61 (p = 0.44)1
n − n0.171151170.690.000.16 (p = 0.69)1
n − 12.321151170.130.042.09 (p = 0.15)1.30
Item 47 Addit. (up to 9)n + y = 75.631231250.020.065.47 (p = 0.02)4.82
n + y = 91.651231250.200.021.67 (p = 0.20)1.14
Item 48 Multip.n × 00.671231250.410.010.67 (p = 0.42)1
n × 10.131231250.720.000.13 (p = 0.72)1
n × 00.091231250.760.000.09 (p = 0.76)1
n × 10.291231250.590.000.29 (p = 0.59)1
Item 50 Subtract.2n − n2.771111130.100.042.67 (p = 0.10)1.60
2n − n6.621111130.010.096.08 (p = 0.01)6.27
Item 51 Addit. (up to 10)n + y = 1081061080.010.107.64 (p = 0.01)12.5
n + n = 60.081061080.770.000.08 (p = 0.77)1
n + y = 100.261061080.110.032.58 (p = 0.11)1.53
n + n = 80.521061080.470.020.49 (p = 0.49)1
Item 52 Addit. double (>10)n + n = 162.14991010.140.032.08 (p = 0.15)1.3
n + n = 142.14991010.140.032.08 (p = 0.15)1.3
Item 61 Subtract.10 − n3.7878800.050.063.7 (p = 0.06)2.32
10 − n1.9578800.160.031.91 (p = 0.17)1.23
Item 67 Addit. between 11–19n + y1.1750520.280.031.16 (p = 0.28)1.03
n + y2.2150520.140.062.14 (p = 0.14)1.32
Item 68 Multip. doublen × 20.0239410.900.000.02 (p = 0.90)1
n × 20.5039410.480.020.51 (p = 0.48)1
Note. Items numbering correspond to the TEMA-3 Spanish version [56]. Only 8 first grade boys and 1 first grade girl reached item 68, so just second graders were considered in its analysis. Nevertheless, the inclusion of the nine first graders in the analysis of item 68 did not alter the results of the binary logistic regression.

References

  1. Casey, B.M.; Ganley, C.M. An examination of gender differences in spatial skills and math attitudes in relation to mathematics success: A bio-psycho-social model. Dev. Rev. 2021, 60, 100963. [Google Scholar] [CrossRef]
  2. Johnson, T.; Burgoyne, A.P.; Mix, K.S.; Young, C.J.; Levine, S.C. Spatial and mathematics skills: Similarities and differences related to age, SES, and gender. Cognition 2022, 218, 104918. [Google Scholar] [CrossRef] [PubMed]
  3. Cantley, I.; McAllister, J. The Gender Similarities Hypothesis: Insights from a Multilevel Analysis of High-Stakes Examination Results in Mathematics. Sex Roles 2021, 85, 481–496. [Google Scholar] [CrossRef]
  4. Harris, D.; Lowrie, T.; Logan, T.; Hegarty, M. Spatial reasoning, mathematics, and gender: Do spatial constructs differ in their contribution to performance? Br. J. Educ. Psychol. 2021, 91, 409–441. [Google Scholar] [CrossRef] [PubMed]
  5. Carr, M.; Alexeev, N.; Wang, L.; Barned, N.; Horan, E.; Reed, A. The development of spatial skills in elementary school students. Child Dev. 2018, 89, 446–460. [Google Scholar] [CrossRef] [PubMed]
  6. Moè, A. Mental rotation and mathematics: Gender-stereotyped beliefs and relationships in primary school children. Learn. Individ. Differ. 2018, 61, 172–180. [Google Scholar] [CrossRef]
  7. Gilligan, K.A.; Flouri, E.; Farran, E.K. The contribution of spatial ability to mathematics achievement in middle childhood. J. Exp. Child Psychol. 2017, 163, 107–125. [Google Scholar] [CrossRef]
  8. Fennema, E.; Carpenter, T.P.; Jacobs, V.R.; Franke, M.L.; Levi, L.W. A longitudinal study of gender differences in young children’s mathematical thinking. Educ. Res. 1998, 7, 6–11. [Google Scholar]
  9. Ginsburg, H.P.; Pappas, S. SES, ethnic, and gender differences in young children’s informal addition and subtraction: A clinical interview investigation. J. Appl. Dev. Psychol. 2004, 25, 171–192. [Google Scholar] [CrossRef]
  10. Bailey, D.H.; Littlefield, A.; Geary, D.C. The co-development of skill at and preference for use of retrieval-based processes for solving addition problems: Individual and sex differences from first to sixth grade. J. Exp. Child Psychol. 2012, 113, 78–92. [Google Scholar] [CrossRef]
  11. Sunde, P.B.; Sunde, P.; Sayers, J. Sex differences in mental strategies for single-digit addition in the first years of school. Educ. Psychol. 2020, 40, 82–102. [Google Scholar] [CrossRef]
  12. Dowker, A. Use of derived fact strategies by children with mathematical difficulties. Cogn. Dev. 2009, 24, 401–410. [Google Scholar] [CrossRef]
  13. Dowker, A. Young children’s use of derived fact strategies for addition and subtraction. Front. Hum. Neurosci. 2014, 7, 924. [Google Scholar] [CrossRef] [PubMed]
  14. Siegler, R.S.; Braithwaite, D.W. Numerical development. Annu. Rev. Psychol. 2017, 68, 87–213. [Google Scholar] [CrossRef]
  15. Shen, C.; Vasilyeva, M.; Laski, E.V. Here but not there: Cross-national variability of gender effects in arithmetic. J. Exp. Child Psychol. 2016, 146, 50–65. [Google Scholar] [CrossRef]
  16. Herts, J.B.; Beilock, S.L. From Janet T. Spence’s Manifest Anxiety Scale to the pressent day: Exploring math anxiety and its relation to math achievement. Sex Roles 2017, 77, 718–724. [Google Scholar] [CrossRef]
  17. Hyde, J.S. Gender similarities and differences. Annu. Rev. Psychol. 2014, 65, 373–398. [Google Scholar] [CrossRef]
  18. Hyde, J.S.; Linn, M. Gender Similarities in mathematics and science. Science 2006, 314, 599–600. [Google Scholar] [CrossRef]
  19. Levine, S.C.; Pantoja, N. Development of children’s math attitudes: Gender differences, key socializers, and intervention approaches. Dev. Rev. 2021, 62, 100997. [Google Scholar] [CrossRef]
  20. Reilly, D.; Neumann, D.L.; Andrews, G. Investigating gender differences in mathematics and science: Results from the 2011 Trends in Mathematics and Science Survey. Res. Sci. Educ. 2019, 49, 25–50. [Google Scholar] [CrossRef]
  21. Stoet, G.; Bailey, D.; Moore, A.; Geary, D.C. Countries with Higher Levels of Gender Equality Show Larger National Sex Differences in Mathematics Anxiety and Relatively Lower Parental Mathematics Valuation for Girls. PLoS ONE 2016, 11, 0153857. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Van Mier, H.I.; Schleepen, T.M.J.; Van den Berg, F.C.G. Gender differences regarding the impact of math anxiety on arithmetic performance in second and fourth graders. Front. Psychol. 2019, 9, 2690. [Google Scholar] [CrossRef] [PubMed]
  23. Cvencek, D.; Brecic, R.; Gacesa, D.; Meltzoff, A.N. Development of math attitudes and math self-concepts: Gender differences, implicit–explicit dissociations, and relations to math achievement. Child Dev. 2021, 92, e940–e956. [Google Scholar] [CrossRef]
  24. Erturan, S.; Jansen, B. An investigation of boys’ and girls’ emotional experience of math, their math performance, and the relation between these variables. Eur. J. Psychol. Educ. 2015, 30, 421–435. [Google Scholar] [CrossRef]
  25. Dowker, A.; Bennett, K.; Smith, L. Attitudes to Mathematics in Primary School Children. Child Dev. Res. 2012, 2012, 124939. [Google Scholar] [CrossRef]
  26. Zhang, X.; Koponen, T.; Räsänen, P.; Aunola, K.; Lerkkanen, M.K.; Nurmi, J.E. Linguistic and spatial skills predict early arithmetic development via counting sequence knowledge. Child Dev. 2014, 85, 1091–1107. [Google Scholar] [CrossRef] [PubMed]
  27. Bakker, M.; Torbeyns, J.; Wijns, N.; Verschaffel, L.; De Smedt, B. Gender equality in 4- to 5-year-old preschoolers’ early numerical competencies. Dev. Sci. 2019, 22, e12718. [Google Scholar] [CrossRef] [PubMed]
  28. Hutchison, J.E.; Lyons, I.M.; Ansari, D. More similar than different: Gender differences in children’s basic numerical skills are the exception not the rule. Child Dev. 2019, 90, e66–e79. [Google Scholar] [CrossRef]
  29. Kersey, A.J.; Braham, E.J.; Csumitta, K.D.; Libertus, M.E.; Cantlon, J.F. No intrinsic gender differences in children’s earliest numerical abilities. npj Sci. Learn. 2018, 3, 12. [Google Scholar] [CrossRef]
  30. Kersey, A.J.; Csumitta, K.D.; Cantlon, J.F. Gender similarities in the brain during mathematics development. NPJ Sci. Learn. 2019, 4, 19. [Google Scholar] [CrossRef]
  31. De Smedt, B.; Noël, M.P.; Gilmore, C.; Ansari, D. How do symbolic and non-symbolic numerical magnitude processing skills relate to individual differences in children’s mathematical skills? A review of evidence from brain and behavior. Trends Neurosci. Educ. 2013, 2, 48–55. [Google Scholar] [CrossRef] [Green Version]
  32. Feigenson, L.; Libertus, M.E.; Halberda, J. Links between the intuitive sense of number and formal mathematics ability. Child Dev. Perspect. 2013, 7, 74–79. [Google Scholar] [CrossRef] [PubMed]
  33. Lyons, I.M.; Price, G.R.; Vaessen, A.; Blomert, L.; Ansari, D. Numerical predictors of arithmetic success in grades 1–6. Dev. Sci. 2014, 17, 714–726. [Google Scholar] [CrossRef] [PubMed]
  34. Schneider, M.; Beeres, K.; Coban, L.; Merz, S.; Schmidt, S.; Stricker, J.; De Smedt, B. Associations of non-symbolic and symbolic numerical magnitude processing with mathematical competence: A meta-analysis. Dev. Sci. 2017, 20, e12372. [Google Scholar] [CrossRef] [PubMed]
  35. Aunio, P.; Räsänen, P. Core numerical skills for learning mathematics in children aged five to eight years—A working model for educators. Eur. Early Child. Educ. Res. J. 2016, 24, 684–704. [Google Scholar] [CrossRef]
  36. Bian, L.; Leslie, S.J.; Cimpian, A. Gender stereotypes about intellectual ability emerge early and influence children’s interests. Science 2017, 355, 389–391. [Google Scholar] [CrossRef] [PubMed]
  37. Ganley, C.M.; Mingle, L.A.; Ryan, A.M.; Ryan, K.; Vasilyev, M.; Perry, M. An examination of stereotype threat effects on girls’ mathematics performance. Dev. Psychol. 2013, 49, 1886–1897. [Google Scholar] [CrossRef]
  38. Jordan, N.C.; Kaplan, D.; Olah, L.N.; Locuniak, M.N. Number sense growth in kindergarten: A longitudinal investigation of children at-risk for mathematics difficulities. Child Dev. 2006, 77, 153–177. [Google Scholar] [CrossRef]
  39. Lubienski, S.T.; Robinson, J.P.; Crane, C.C.; Ganley, C.M. Girls’ and boys’ mathematics achievement, affect, and experiences: Findings from ECLSK. Math. Educ. Res. J. 2013, 44, 634–645. [Google Scholar] [CrossRef]
  40. Gibbs, B.G. Reversing fortunes or content change? Gender gaps in math-related skill throughout childhood. Soc. Sci. Res. 2010, 39, 540–569. [Google Scholar] [CrossRef]
  41. Hyde, J.S. Sex and cognition: Gender and cognitive functions. Curr. Opin. Neurobiol. 2016, 38, 53–56. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. De Corte, E. Constructive, self-regulated, situated, and collaborative learning: An approach for the acquisition of adaptive competence. J. Educ. 2012, 192, 33–47. [Google Scholar] [CrossRef]
  43. Briars, D.; Siegler, R.S. A featural analysis of preschoolers’ counting knowledge. Dev. Psychol. 1984, 20, 607–618. [Google Scholar] [CrossRef]
  44. Escudero, A.; Rodríguez, P.; Lago, M.O.; Enesco, I. A 3-year longitudinal study of children’s comprehension of counting: Do they recognize the optional nature of nonessential counting features? Cogn. Dev. 2015, 33, 73–83. [Google Scholar] [CrossRef]
  45. Gelman, R.; Meck, E. The notion of principle: The case of counting. In Conceptual and Procedural Knowledge: The Case of Mathematics; Hiebert, J., Ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1986; pp. 29–57. [Google Scholar]
  46. LeFevre, J.; Smith-Chant, B.; Fast, L.; Skwarchuk, S.; Sargla, E.; Arnup, J.; Penner-Wilger, M.; Bisanz, J.; Kamawar, D. What counts as knowing? The development of conceptual and procedural knowledge of counting from kindergarten through Grade 2. J. Exp. Child Psychol. 2006, 93, 285–303. [Google Scholar] [CrossRef]
  47. Chu, F.; vanMarle, K.; Rouder, J.; Geary, D. Children’s early understanding of number predicts their later problem solving sophistication in addition. J. Exp. Child Psychol. 2018, 169, 73–92. [Google Scholar] [CrossRef]
  48. Davis-Kean, P.E.; Domina, T.; Kuhfeld, M.; Ellis, A.; Gershoff, E.T. It matters how you start: Early numeracy mastery predicts high school math course-taking and college attendance. Infant Child Dev. 2022, 31, e2281. [Google Scholar] [CrossRef]
  49. Geary, D.C.; vanMarle, K. Growth of symbolic number knowledge accelerates after children understand cardinality. Cognition 2018, 177, 69–78. [Google Scholar] [CrossRef]
  50. Paliwal, V.; Baroody, A.J. Fostering the learning of subtraction concepts and the subtraction-as-addition reasoning strategy. Early Child. Res. Q. 2020, 51, 403–415. [Google Scholar] [CrossRef]
  51. Lago, M.O.; Escudero, A.; Dopico, C. The relationship between confidence and conformity in a non-routine counting task with young children: Dedicated to the Memory of Purificación Rodríguez. Front. Psychol. 2021, 12, 593509. [Google Scholar] [CrossRef]
  52. Gelman, R.; Gallistel, C.R. The Child’s Understanding of Number; Harvard University Press: Cambridge, MA, USA, 1978. [Google Scholar]
  53. Göbel, S.M.; McCrink, K.; Fischer, M.H.; Shaki, S. Observation of directional storybook reading influences young children’s counting direction. J. Exp. Child Psychol. 2018, 166, 49–66. [Google Scholar] [CrossRef] [PubMed]
  54. Lachance, J.A.; Mazzocco, M.M.M. A longitudinal analysis of sex differences in math and spatial skills in primary school age children. Learn. Individ. Differ. 2006, 16, 195–216. [Google Scholar] [CrossRef] [PubMed]
  55. Ginsburg, H.P.; Baroody, A.J. Test of Early Mathematics Ability, 3rd ed.; Pro-Ed.: Austin, TX, USA, 2003. [Google Scholar]
  56. Ginsburg, H.P.; Baroody, A.J. TEMA-3. Test de Competencia Matemática Básica; TEA Ediciones: Madrid, Spain, 2007. [Google Scholar]
  57. Lindberg, S.M.; Hyde, J.S.; Petersen, J.; Linn, M.C. New trends in gender and mathematics performance: A meta-analysis. Psychol. Bull. 2010, 136, 1123–1135. [Google Scholar] [CrossRef] [PubMed]
  58. Palm, T.; Nyström, P. Gender aspects of sense making in word problem solving. Appl. Math. Model. 2009, 1, 59–76. [Google Scholar]
  59. Wasserstein, R.L.; Lazar, N.A. The ASA’s statement on p-values: Context, process, and purpose. Am. Stat. 2016, 70, 129–133. [Google Scholar] [CrossRef]
  60. Kaufman, A.S.; Kaufman, N.L. Kaufman Brief Intelligence Test: KBIT; AGS, American Guidance Service: Circle Pines, MN, USA, 1990. [Google Scholar]
  61. Kaufman, A.S.; Kaufman, N.L. K-BIT, Test Breve de Inteligencia de Kaufman, Manual; TEA Ediciones: Madrid, Spain, 2011. [Google Scholar]
  62. Libertus, M.E.; Feigenson, L.; Halberda, J. Numerical approximation abilities correlate with and predict informal but not formal mathematics abilities. J. Exp. Child Psychol. 2013, 116, 829–838. [Google Scholar] [CrossRef]
  63. JASP Team. JASP (Version 0.16.3) [Computer Software]. 2022. Available online: https://jasp-stats.org/ (accessed on 10 July 2022).
  64. Little, T.D. The Oxford Handbook of Quantitative Methods: Volume 1 Foundations; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
  65. Sidák, Z. Rectangular Confidence Regions for the Means of Multivariate Normal Distributions. J. Am. Stat. Assoc. 1967, 62, 626–633. [Google Scholar] [CrossRef]
  66. Abdi, H. Partial Least Square Regression: PLS Regression. In Encyclopedia of Measurement and Statistics; Neil, S., Ed.; Sage Publications: New York, NY, USA, 2007; pp. 1–13. [Google Scholar]
  67. Benjamin, D.J.; Berger, J.O. Three recommendations for improving the use of p-values. Am. Stat. 2019, 73, 186–191. [Google Scholar] [CrossRef]
  68. Wasserstein, R.L.; Schirm, A.L.; Lazar, N.A. Moving to a world beyond “p < 0.05”. Am. Stat. 2019, 73, 1–19. [Google Scholar]
  69. Faulkenberry, T.J.; Ly, A.; Wagenmakers, E.J. Bayesian inference in numerical cognition: A tutorial using JASP. J. Numer. Cogn. 2020, 6, 231–259. [Google Scholar] [CrossRef]
  70. Jeffreys, H. Theory of Probability, 3rd ed.; Oxford University Press: New York, NY, USA, 1961. [Google Scholar]
  71. Chan, Y.C.; Mazzocco, M.M. Competing features influence children’s attention to number. J. Exp. Child Psychol. 2017, 156, 62–81. [Google Scholar] [CrossRef] [PubMed]
  72. Geary, D.C.; Hoard, M.K.; Nugent, L. Independent contributions of the central executive, intelligence, and in-class attentive behavior to developmental change in the strategies used to solve addition problems. J. Exp. Child Psychol. 2012, 113, 49–65. [Google Scholar] [CrossRef] [PubMed]
  73. Ashcraft, M.H.; Guillaume, M.M. Mathematical cognition and the problem size effect. In The Psychology of Learning and Motivation; Brian Ross, B., Ed.; Academic Press: Burlington, NJ, USA, 2009; Volume 51, pp. 121–151. [Google Scholar]
  74. Enesco, I.; Rodríguez, P.; Lago, M.O.; Dopico, C.; Escudero, A. Do teachers’ conflicting testimonies influence children’s decisions about unconventional rules of counting? Eur. J. Psychol. Educ. 2017, 32, 483–500. [Google Scholar] [CrossRef]
  75. McNeil, N.M. U-Shaped development in math: 7-year-olds outperform 9-year-olds on equivalence problems. Dev. Psychol. 2007, 43, 687–695. [Google Scholar] [CrossRef] [PubMed]
  76. Siegler, R.S.; Im, S.H.; Schiller, L.K.; Tian, J.; Braithwaite, D.W. The sleep of reason produces monsters: How and when biased input shapes mathematics learning. Annu. Rev. Psychol. 2020, 2, 413–435. [Google Scholar] [CrossRef]
  77. Stoet, G.; Geary, D.C. Can stereotype threat explain the gender gap in mathematics performance and achievement? Rev. Gen. Psychol. 2012, 16, 93–102. [Google Scholar] [CrossRef]
  78. Olson, K.R.; Enright, E.A. Do transgender children (gender) stereotype less than their peers and siblings? Dev. Sci. 2018, 21, e12606. [Google Scholar] [CrossRef]
  79. Chytrý, V.; Říčan, J.; Eisenmann, P.; Medová, J. Metacognitive Knowledge and Mathematical Intelligence—Two Significant Factors Influencing School Performance. Mathematics 2020, 8, 969. [Google Scholar] [CrossRef]
  80. Destan, N.; Hembacher, E.; Ghetti, S.; Roebers, C.M. Early metacognitive abilities: The interplay of monitoring and control processes in 5- to 7-yearold children. J. Exp. Child Psychol. 2014, 126, 213–228. [Google Scholar] [CrossRef]
  81. Van der Ven, S.H.G.; Kroesbergen, E.H.; Boom, J.; Leseman, P.P.M. The development of executive functions and early mathematics. A dynamic relationship. Br. J. Educ. Psychol. 2012, 82, 100–119. [Google Scholar] [CrossRef]
Figure 1. Strategies employed to solve a selection of number-facts problems as a function of gender (items numbering correspond to the TEMA-3 Spanish version [56]). Note. Definition of categories: Correct and fast: correct and fast responses assumed to be obtained by retrieval. They were coded as valid according to the test correction rules. Correct and fast by counting: correct speedy responses obtained by self-reported, explicit or observed counting. They were coded as incorrect according to the test correction rules. Correct and slow: the execution time exceeded the time limit allowed and were coded as incorrect according to the test correction rules. They included correct answers calculated by derived facts, commutative principle or counting. Incorrect answers: erroneous responses, regardless the type of strategy employed. They were coded as incorrect according to the test correction rules.
Figure 1. Strategies employed to solve a selection of number-facts problems as a function of gender (items numbering correspond to the TEMA-3 Spanish version [56]). Note. Definition of categories: Correct and fast: correct and fast responses assumed to be obtained by retrieval. They were coded as valid according to the test correction rules. Correct and fast by counting: correct speedy responses obtained by self-reported, explicit or observed counting. They were coded as incorrect according to the test correction rules. Correct and slow: the execution time exceeded the time limit allowed and were coded as incorrect according to the test correction rules. They included correct answers calculated by derived facts, commutative principle or counting. Incorrect answers: erroneous responses, regardless the type of strategy employed. They were coded as incorrect according to the test correction rules.
Mathematics 10 03094 g001
Table 1. Description of the non-routine counting trials.
Table 1. Description of the non-routine counting trials.
TrialsRules TransgressedSet SizeTrial Description
Compensation errors
Compensation error 1Logical rules and temporal-spatial adjacency8Skipping the number word “3”, tagging the third element as “4” (temporal violation) and skipping the sixth item (spatial violation)
Compensation error 2Logical rules and temporal adjacency9Tagging the second element “3” (instead of “2”) and repeating the tag “7” for both the sixth and the seventh elements
Compensation error 3Logical rules and spatial adjacency8Skipping the third element and then double-counting the six element with the tags “5” and “6”
Pseudoerrors
Pseudoerror 1Spatial adjacency9Skipping the seventh element and counted it at the end (with the number “9”)
Pseudoerror 2Left-to-right direction10Counting all elements, consecutively, from right to left
Pseudoerror 3Spatial adjacency12Counting first six orange elements, followed by other six white and black elements that were interspersed along the row
Pseudoerror 4Temporal-spatial adjacency9Pretending to have forgotten the number 6 (saying, “1, 2, 3, 4, 5, hmmm”); proceeded to the number 7. Afterwards, reversing the direction of counting to say “6!” and then finishing with the remaining items (“8, 9”).
Pseudoerror 5Temporal adjacency10Pointing to all elements, counted out loud the first six elements of the row, counting the seventh and the eighth silently and the remaining two out loud again
Pseudoerror 6Starting from one end7Starting the count from the fourth element, and continuing with the rest from left to right
Pseudoerror 7Temporal-spatial adjacency9Pointing to and tagging three consecutive times with the same tag the sixth element (“6,6,6”)
Pseudoerror 8Temporal adjacency8Counting by twos, saying aloud only the even numbers (“2–4–6–8”)
Table 2. Descriptive statistics of the dependent variables in univariate ANOVAs.
Table 2. Descriptive statistics of the dependent variables in univariate ANOVAs.
First GradeSecond Grade
MeasuresGirls (N = 34)Boys (N = 34)Girls (N = 34)Boys (N = 34)
KBIT—IQ (theoretical maximum: 160)
M (SD)106.24 (8.46)107.79 (11.17)106.71 (10.13)107.26 (10.64)
Range [95% CI]92–123 [102.79, 109.68]82–134 [104.35, 111.24]87–131 [103.26, 110.15]81–124 [103.82, 110.71]
TEMA-3 Math Ability Score (theoretical maximum: 159)
M (SD)96.68 (9.93)105.97 (16.73)99.06 (15.67)107.41 (15.99)
Range [95% CI]78–121 [91.65, 101.71]79–147 [100.94, 111]66–136 [94.03, 104.09]66–135 [102.38, 112.44]
Informal mathematics NUMBERING (theoretical maximum: 100) 1 second grade girl was not included *
M (SD)80.15 (15.97)82.68 (13.06)87.97 (13.08)79.79 (25.30)
Range [95% CI]33–100 [74.17, 86.12]50–100 [76.7, 88.65]50–100 [81.9, 94.04]0–100 [73.82, 85.77]
Informal mathematics NUMBER COMPARISONS (theoretical maximum: 100) 1 second grade girl was not included *
M (SD)90.71 (22.52)80.44 (33.68)72.73 (33.29)73.53 (35.32)
Range [95% CI]0–100 [79.99, 101.43]0–100 [69.72, 91.16]0–100 [61.85, 83.61]0–100 [62.81, 84.25]
Informal mathematics CALCULATION (theoretical maximum: 100) 1 second grade girl was not included *
M (SD)62.88 (36.91)74.79 (28.14)50 (36.36)61.77 (30.20)
Range [95% CI]0–100 [51.65, 74.11]0–100 [63.57, 86.02]0–100 [38.60, 61.4]0–100 [50.54, 72.99]
Informal mathematics CONCEPTS (theoretical maximum: 100) 5 girls and 8 boys from second grade were not included *
M (SD)48.53 (41.72)42.65 (44.61)60.35 (38.68)65.39 (36.80)
Range [95% CI]0–100 [34.65, 62.41]0–100 [28.77, 56.53]0–100 [45.32, 75.36]0–100 [49.51, 81.26]
Formal mathematics NUMERAL LITERACY (theoretical maximum: 100)
M (SD)71.94 (19.09)78.82 (19.77)76.47 (25.64)76.06 (33.62)
Range [95% CI]33–100 [63.39, 80.5]33–100 [70.27, 87.38]0–100 [67.92, 85.02]0–100 [67.51, 84.61]
Formal mathematics NUMBER FACTS (theoretical maximum: 100)
M (SD)22.29 (23.50)37.03 (27.99)50.06 (19.90)67.85 (22.31)
Range [95% CI]0–89 [14.29, 30.3]0–100 [29.02, 45.04]0–100 [42.05, 58.07]14–100 [59.84, 75.86]
Formal mathematics CALCULATION (theoretical maximum: 100) 2 girls and 1 boy from first grade were not included *
M (SD)17.03 (21.52)26.88 (22.89)65.91 (18.57)68.12 (15.09)
Range [95% CI]0–67 [10.15, 23.92]0–89 [21, 33.66]0–100 [59.23, 72.59]29–100 [61.44, 74.8]
Formal mathematics CONCEPTS (theoretical maximum: 100) not included: 21 girls, 13 boys from first grade and 2 girls from second *
M (SD)18.62 (26)33.71 (35.53)34.59 (33.81)43.88 (30.88)
Range [95% CI]0–67 [0.82, 36.41]0–100 [19.72, 47.71]0–100 [23.25, 45.93]0–100 [32.88, 54.88]
Detection Task—Pseudoerrors (theoretical maximum: 100) (between-subject measure, sample size “n” indicated in brackets)
Pseudoerrors without cardinal value
M (SD)30.88 (26.19)24.22 (24.78)24.34 (18.85)30 (32.66)
Range [95% CI]0–75 (n = 17)
[17.9, 43.8]
0–75 (n = 16)
[10.92, 37.52]
0–62.5 (n = 19)
[12.14, 36.55]
0–100 (n = 15)
[16.24, 43.74]
Pseudoerrors with cardinal value
M (SD)49.27 (27.41)34.72 (25.57)40 (28.82)50 (30.05)
Range [95% CI]0–100 (n = 17)
[36.4, 62.2]
0–75 (n = 18)
[22.18, 47.21]
12.5–100 (n = 15)
[26.3, 53.74]
0–100 (n = 19)
[37.8, 62.21]
* Reason for exclusion: absence of responses in any item of that specific category, due to the standard application rules of TEMA-3.
Table 3. Statistics for fixed effects (Grade Level and Gender) in univariate ANOVAs conducted on K-BIT and TEMA-3.
Table 3. Statistics for fixed effects (Grade Level and Gender) in univariate ANOVAs conducted on K-BIT and TEMA-3.
MeasuresStatisticsGrade LevelGenderGrade × Gender Interaction
KBIT—IQF0.000.370.08
p0.990.540.77
BFB1.001.001.00
ηp2 [90% CI]0.00 [0, 0]0.00 [0, 0.04]0.00 [0, 0.02]
TEMA-3 Math ability score (MAS)F0.5712.030.03
p0.45<0.0010.85
BFB1.0071.801.00
ηp2 [90% CI]0.00 [0, 0.04]0.08 [0.02, 0.17]0.00 [0, 0.01]
TEMA-3 Informal mathematics numberingF0.660.873.12
P0.420.350.08
BFB1.001.001.82
ηp2 [90% CI]0.01 [0, 0.04]0.01 [0, 0.05]0.02 [0, 0.08]
TEMA-3 Informal mathematics number comparisonsF5.230.761.04
p0.020.390.31
BFB4.141.001.01
ηp2 [90% CI]0.04 [0, 0.11]0.01 [0, 0.05]0.01 [0, 0.05]
TEMA-3 Informal mathematics calculationF5.174.320.00
p0.030.040.99
BFB4.042.871.00
ηp2 [90% CI]0.04 [0, 0.1]0.03 [0, 0.1]0.00 [0, 0]
TEMA-3 Informal mathematics conceptsF5.420.000.54
p0.020.960.46
BFB4.451.001.00
ηp2 [90% CI]0.04 [0, 0.12]0.00 [0, 0]0.01 [0, 0.04]
TEMA-3 Formal mathematics numeral literacyF0.040.560.71
p0.840.460.40
BFB1.001.001.00
ηp2 [90% CI]0.00 [0, 0.01]0.00 [0, 0.04]0.01 [0, 0.04]
TEMA-3 Formal mathematics number factsF52.3516.140.14
p<0.001<0.0010.71
BFB4.485 × 108405.361.00
ηp2 [90% CI]0.28 [0.18, 0.38]0.11 [0.04, 0.2]0.00 [0, 0.03]
TEMA-3 Formal mathematics calculationF174.023.111.25
p<0.0010.080.27
BFB5.814 × 10221.821.05
ηp2 [90% CI]0.57 [0.48, 0.64]0.02 [0, 0.08]0.01 [0, 0.06]
TEMA-3 Formal mathematics conceptsF3.533.080.18
p0.060.080.68
BFB2.111.791.00
ηp2 [90% CI]0.03 [0, 0.11]0.03 [0, 0.11]0.00 [0, 0.04]
Note. The corrected threshold was p < 0.0047.
Table 4. Statistics for fixed effects (Condition, Grade Level and Gender) in univariate ANOVAs conducted on pseudoerror detection task.
Table 4. Statistics for fixed effects (Condition, Grade Level and Gender) in univariate ANOVAs conducted on pseudoerror detection task.
ConditionGrade LevelGenderCondition × GradeCondition × GenderGrade × GenderCondition × Grade × Gender
F12.150.080.900.130.043.960.44
p<0.0010.780.770.720.850.050.51
BFB74.8711112.501
ηp20.090.000.000.000.000.030.00
[90% CI][0.02, 0.17][0, 0.02][0, 0.05][0, 0.03][0, 0.01][0, 0.09][0, 0.04]
Note. The corrected threshold was p < 0.004.
Table 5. Percentages of children’s justifications when detecting counting pseudoerrors.
Table 5. Percentages of children’s justifications when detecting counting pseudoerrors.
Pseudoerror Detection TaskFirst GradersSecond Graders
Without cardinal value
Accepted and correct justification27.6526.84
Rejected by violation of conventional rules66.2968.01
Others6.075.15
With cardinal value
Accepted and correct justification41.7945.59
Rejected by violation of conventional rules53.9348.16
Others4.296.25
Table 6. Bayes factor for the main effect of Gender.
Table 6. Bayes factor for the main effect of Gender.
Measures BF GenderStrength of Evidence
K-BIT—IQB100.22Anecdotal
B014.57Moderate
TEMA-3 Math ability score (MAS)B1040.41Very strong
B010.03Anecdotal
TEMA-3 Informal mathematics numberingB100.027Anecdotal
B013.73Moderate
TEMA-3 Informal mathematics number comparisonsB100.26Anecdotal
B013.81Moderate
TEMA-3 Informal mathematics calculationB101.20Anecdotal
B010.38Anecdotal
TEMA-3 Informal mathematics conceptsB100.20Anecdotal
B015.11Moderate
TEMA-3 Formal mathematics numeral literacyB100.24Anecdotal
B014.21Moderate
TEMA-3 Formal mathematics number factsB1033.45Very strong
B010.03Anecdotal
TEMA-3 Formal mathematics calculationB100.31Anecdotal
B013.19Moderate
TEMA-3 Formal mathematics conceptsB100.60Anecdotal
B011.68Anecdotal
Pseudoerrors detection taskB100.18Anecdotal
B015.44Moderate
Note. B10: Bayes factor for the alternative hypothesis (the existence of gender differences); B01: Bayes factor for the null hypothesis (absence of gender differences).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Escudero, A.; Lago, M.O.; Dopico, C. Gender Similarities in the Mathematical Performance of Early School-Age Children. Mathematics 2022, 10, 3094. https://doi.org/10.3390/math10173094

AMA Style

Escudero A, Lago MO, Dopico C. Gender Similarities in the Mathematical Performance of Early School-Age Children. Mathematics. 2022; 10(17):3094. https://doi.org/10.3390/math10173094

Chicago/Turabian Style

Escudero, Ana, Mᵃ Oliva Lago, and Cristina Dopico. 2022. "Gender Similarities in the Mathematical Performance of Early School-Age Children" Mathematics 10, no. 17: 3094. https://doi.org/10.3390/math10173094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop