Evaluating the quality of children's utterances in adult-child dialogue remains challenging due to insufficient context-sensitive metrics. Common proxies such as Mean Length of Utterance (MLU), lexical diversity (vocd-D), and readability indices (Flesch-Kincaid Grade Level, Gunning Fog Index) are dominated by length and ignore conversational context, missing aspects of response quality such as reasoning depth, topic maintenance, and discourse planning. We introduce an LLM-as-a-judge framework that first classifies the Previous Adult Utterance Type and then scores the child's response along two axes: Expansion (contextual elaboration and inferential depth) and Independence (the child's contribution to advancing the discourse). These axes reflect fundamental dimensions in child language development, where Expansion captures elaboration, clause combining, and causal and contrastive connectives. Independence captures initiative, topic control, decreasing reliance on adult scaffolding through growing self-regulation, and audience design. We establish developmental validity by showing age-related patterns and demonstrate predictive value by improving age estimation over common baselines.
Social robots, owing to their embodied physical presence in human spaces and the ability to directly interact with the users and their environment, have a great potential to support children in various activities in education, healthcare and daily life. Child-Robot Interaction (CRI), as any domain involving children, inevitably faces the major challenge of designing generalized strategies to work with unique, turbulent and very diverse individuals. Addressing this challenging endeavor requires to combine the standpoint of the robot-centered perspective, i.e. what robots technically can and are best positioned to do, with that of the child-centered perspective, i.e. what children may gain from the robot and how the robot should act to best support them in reaching the goals of the interaction. This article aims to help researchers bridge the two perspectives and proposes to address the development of CRI scenarios with insights from child psychology and child development theories. To that end, we review the outcomes of the CRI studies, outline common trends and challenges, and identify two key factors from child psychology that impact child-robot interactions, especially in a long-t
This report explores the potential implications of rapidly integrating Artificial Intelligence (AI) applications into children's environments. The introduction of AI in our daily lives necessitates scrutiny considering the significant role of the environment in shaping cognition, socio-emotional skills, and behaviors, especially during the first 25 years of cerebral development. As AI becomes prevalent in educational and leisure activities, it will significantly modify the experiences of children and adolescents, presenting both challenges and opportunities for their developmental trajectories. This analysis was informed by consulting with 15 experts from pertinent disciplines (AI, product development, child development, and neurosciences), along with a comprehensive review of scientific literature on children development and child-technology interactions. Overall, AI experts anticipate that AI will transform leisure activities, revolutionize education, and redefine human-machine interactions. While AI offers substantial benefits in fostering interactive engagement, it also poses risks that require careful considerations, especially during sensitive developmental periods. The repor
The rapid dissemination of misinformation on the internet complicates the decision-making process for individuals seeking reliable information, particularly parents researching child development topics. This misinformation can lead to adverse consequences, such as inappropriate treatment of children based on myths. While previous research has utilized text-mining techniques to predict child abuse cases, there has been a gap in the analysis of child development myths and facts. This study addresses this gap by applying text mining techniques and classification models to distinguish between myths and facts about child development, leveraging newly gathered data from publicly available websites. The research methodology involved several stages. First, text mining techniques were employed to pre-process the data, ensuring enhanced accuracy. Subsequently, the structured data was analysed using six robust Machine Learning (ML) classifiers and one Deep Learning (DL) model, with two feature extraction techniques applied to assess their performance across three different training-testing splits. To ensure the reliability of the results, cross-validation was performed using both k-fold and l
Children increasingly use applications utilizing Artificial Intelligence / Machine Learning (AI/ML). Given the propensity of such applications to propagate existing social, gender, and racial biases, it becomes imperative to consider designing and developing child-centered AI applications for children. Furthermore, children should have opportunities and skills to critically reflect on current applications and envision and design better AI/ML applications that are ethical, specifically, those that are inclusive and fair. In our work, we focus on child-centered AI and inclusion. Using a two-fanged approach to inclusion and employing design futuring in our research with schools in India and Finland, children critically considered future technology design for all. In this paper, we present three cases of this work: a study with students at a school in New Delhi and two studies with students at schools in Oulu. Our work showcases how to design for inclusion - by designing for all, and how to design inclusively - by inviting children to envision the future, through design futuring approaches.
Child speech differs from adult speech in acoustics, prosody, and language development, and disfluencies (repetitions, prolongations, blocks) further challenge Automatic Speech Recognition (ASR) and downstream Natural Language Processing (NLP). Recent large audio-language models (LALMs) demonstrate strong cross-modal audio understanding; however, their behavior in disfluent child speech remains underexplored. We evaluate several state-of-the-art LALMs in two settings: an interview (mixed speakers) and a reading task (single child). The tasks are (i) single-channel source separation to isolate the child and (ii) child-only summarization that preserves clinically relevant disfluencies and avoids adult-speech leakage. Evaluation combines Large Language Model (LLM) as a judge, human expert ratings, and BERTScore (F1), and we report agreement between models and between models and humans to assess reliability. Our findings delineate the conditions under which LALMs produce faithful child-only summaries from mixed audio and where they fail, offering practical guidance for clinical and educational deployments. We provide prompts and evaluation scripts to support replication.
Understanding non-human primate behavior is crucial for improving animal welfare, modeling social behavior, and gaining insights into both distinctly human and shared behaviors. Despite recent advances in computer vision, automated analysis of primate behavior remains challenging due to the complexity of their social interactions and the lack of specialized algorithms. Existing methods often struggle with the nuanced behaviors and frequent occlusions characteristic of primate social dynamics. This study aims to develop an effective method for automated detection, tracking, and recognition of chimpanzee behaviors in video footage. Here we show that our proposed method, AlphaChimp, an end-to-end approach that simultaneously detects chimpanzee positions and estimates behavior categories from videos, significantly outperforms existing methods in behavior recognition. AlphaChimp achieves approximately 10% higher tracking accuracy and a 20% improvement in behavior recognition compared to state-of-the-art methods, particularly excelling in the recognition of social behaviors. This superior performance stems from AlphaChimp's innovative architecture, which integrates temporal feature fusio
Language models (LMs) have demonstrated remarkable proficiency in generating linguistically coherent text, sparking discussions about their relevance to understanding human language learnability. However, a significant gap exists between the training data for these models and the linguistic input a child receives. LMs are typically trained on data that is orders of magnitude larger and fundamentally different from child-directed speech (Warstadt and Bowman, 2022; Warstadt et al., 2023; Frank, 2023a). Addressing this discrepancy, our research focuses on training LMs on subsets of a single child's linguistic input. Previously, Wang, Vong, Kim, and Lake (2023) found that LMs trained in this setting can form syntactic and semantic word clusters and develop sensitivity to certain linguistic phenomena, but they only considered LSTMs and simpler neural networks trained from just one single-child dataset. Here, to examine the robustness of learnability from single-child input, we systematically train six different model architectures on five datasets (3 single-child and 2 baselines). We find that the models trained on single-child datasets showed consistent results that matched with previo
Automated recognition of autistic behaviors in children is essential for early intervention and objective clinical assessment. However, the development of robust models is severely hindered by strict privacy regulations (e.g., HIPAA) and the sensitive nature of pediatric data, which prevents the centralized aggregation of clinical datasets. Furthermore, individual clinical sites often suffer from data scarcity, making it difficult to learn generalized behavior patterns or tailor models to site-specific patient distributions. To address these challenges, we observe that Federated Learning (FL) can decouple model training from raw data access, enabling multi-site collaboration while maintaining strict data residency. In this paper, we present the first study exploring Federated Learning for pose-based child autism behavior recognition. Our framework employs a two-layer privacy protection mechanism: utilizing human skeletal abstraction to remove identifiable visual information from the raw RGB videos and FL to ensure sensitive pose data remains within the clinic. This approach leverages distributed clinical data to learn generalized representations while providing the flexibility for
We explore ideas and inclusive practices for designing and testing child-centered artificially intelligent technologies for neurodivergent children. AI is promising for supporting social communication, self-regulation, and sensory processing challenges common for neurodivergent children. The authors, both neurodivergent individuals and related to neurodivergent people, draw from their professional and personal experiences to offer insights on creating AI technologies that are accessible and include input from neurodivergent children. We offer ideas for designing AI technologies for neurodivergent children and considerations for including them in the design process while accounting for their sensory sensitivities. We conclude by emphasizing the importance of adaptable and supportive AI technologies and design processes and call for further conversation to refine child-centered AI design and testing methods.
Around ten percent of children may present with a disorder where language does not develop as expected. This often affects vocabulary skills, i.e., finding the words to express wants, needs and ideas, which can influence behaviours linked to wellbeing and daily functioning, such as concentration, independence, social interactions and managing emotions. Without specialist support, needs can increase in severity and continue to adulthood. The type of support, known as interventions, showing strongest evidence for improving vocabulary with some signs of improved behaviour and wellbeing are ones that use word webs. These are diagrams consisting of lines that connect sound and meaning information about a word to strengthen the child's word knowledge and use. The diagrams resemble what is commonly known as mind-maps and are widely used by Speech and Language Therapists in partnership with school educators to help children with language difficulties. In addition, interventions delivered through mobile-devices has led in some cases to increased vocabulary gains with positive influence on wellbeing and academic attainment. With advances in technology and availability of user-friendly mobile
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under epistemic uncertainty, and investigate the connection between them. We categorize fallback behaviors - sequence repetitions, degenerate text, and hallucinations - and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed during the generation of a single sequence, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and finally sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, alleviate unwanted be
Despite advancements in ASR, child speech recognition remains challenging due to acoustic variability and limited annotated data. While fine-tuning adult ASR models on child speech is common, comparisons with flat-start training remain underexplored. We compare flat-start training across multiple datasets, SSL representations (WavLM, XEUS), and decoder architectures. Our results show that SSL representations are biased toward adult speech, with flat-start training on child speech mitigating these biases. We also analyze model scaling, finding consistent improvements up to 1B parameters, beyond which performance plateaus. Additionally, age-related ASR and speaker verification analysis highlights the limitations of proprietary models like Whisper, emphasizing the need for open-data models for reliable child speech research. All investigations are conducted using ESPnet, and our publicly available benchmark provides insights into training strategies for robust child speech processing.
Joint reading is a key activity for early learners, with caregiver-child interactions such as questioning and feedback playing an essential role in children's cognitive and linguistic development. However, for some parents, actively engaging children in storytelling can be challenging. To address this, we introduce TaleMate a platform designed to enhance shared reading by leveraging conversational agents that have been shown to support children's engagement and learning. TaleMate enables a dynamic, participatory reading experience where parents and children can choose which characters they wish to embody. Moreover, the system navigates the challenges posed by digital reading tools, such as decreased parent-child interaction, and builds upon the benefits of traditional and digital reading techniques. TaleMate offers an innovative approach to fostering early reading habits, bridging the gap between traditional joint reading practices and the digital reading landscape.
This paper proposes a novel metaheuristic Child Drawing Development Optimization (CDDO) algorithm inspired by the child's learning behaviour and cognitive development using the golden ratio to optimize the beauty behind their art. The golden ratio was first introduced by the famous mathematician Fibonacci. The ratio of two consecutive numbers in the Fibonacci sequence is similar, and it is called the golden ratio, which is prevalent in nature, art, architecture, and design. CDDO uses golden ratio and mimics cognitive learning and child's drawing development stages starting from the scribbling stage to the advanced pattern-based stage. Hand pressure width, length and golden ratio of the child's drawing are tuned to attain better results. This helps children with evolving, improving their intelligence and collectively achieving shared goals. CDDO shows superior performance in finding the global optimum solution for the optimization problems tested by 19 benchmark functions. Its results are evaluated against more than one state of art algorithms such as PSO, DE, WOA, GSA, and FEP. The performance of the CDDO is assessed, and the test result shows that CDDO is relatively competitive th
This article provides a comprehensive overview of recent research in the area of Child-Computer Interaction (CCI). The main contributions of the present article are two-fold. First, we present a novel longitudinal CCI database named ChildCIdbLong, which comprises over 600 children aged 18 months to 8 years old, acquired continuously over 4 academic years (2019-2023). As a result, ChildCIdbLong comprises over 12K test acquisitions over a tablet device. Different tests are considered in ChildCIdbLong, requiring different touch and stylus gestures, enabling the evaluation of praxical and cognitive skills such as attentional, visuo-spatial, and executive, among others. In addition to the ChildCIdbLong database, we propose a novel quantitative metric called Test Quality (Q), designed to measure the motor and cognitive development of children through their interaction with a tablet device. In order to provide a better comprehension of the proposed Q metric, popular percentile-based growth representations are introduced for each test, providing a two-dimensional space to compare children's development with respect to the typical age skills of the population. The results achieved in the pr
Researchers and policy-makers have started creating frameworks and guidelines for building machine-learning (ML) pipelines with a human-centered lens. Machine Learning pipelines stand for all the necessary steps to develop ML systems (e.g., developing a predictive keyboard). On the other hand, a child-centered focus in developing ML systems has been recently gaining interest as children are becoming users of these products. These efforts dominantly focus on children's interaction with ML-based systems. However, from our experience, ML pipelines are yet to be adapted using a child-centered lens. In this paper, we list the questions we ask ourselves in adapting human-centered ML pipelines to child-centered ones. We also summarize two case studies of building end-to-end ML pipelines for children's products.
This study explores the marriage matching of only-child individuals and its outcome. Specifically, we analyze two aspects. First, we investigate how marital status (i.e., marriage with an only child, that with a non-only child and remaining single) differs between only children and non-only children. This analysis allows us to know whether people choose mates in a positive or a negative assortative manner regarding only-child status, and to predict whether only-child individuals benefit from marriage matching premiums or are subject to penalties regarding partner attractiveness. Second, we measure the premium/penalty by the size of the gap in partner's socio economic status (SES, here, years of schooling) between only-child and non--only-child individuals. The conventional economic theory and the observed marriage patterns of positive assortative mating on only-child status predict that only-child individuals are subject to a matching penalty in the marriage market, especially when their partner is also an only child. Furthermore, our estimation confirms that among especially women marrying an only-child husband, only children are penalized in terms of 0.57-years-lower educational
This document proposes an algorithm for a mobile application designed to monitor multidimensional child growth through digital phenotyping. Digital phenotyping offers a unique opportunity to collect and analyze high-frequency data in real time, capturing behavioral, psychological, and physiological states of children in naturalistic settings. Traditional models of child growth primarily focus on physical metrics, often overlooking multidimensional aspects such as emotional, social, and cognitive development. In this paper, we introduce a Bayesian artificial intelligence (AI) algorithm that leverages digital phenotyping to create a Multidimensional Index of Child Growth (MICG). This index integrates data from various dimensions of child development, including physical, emotional, cognitive, and environmental factors. By incorporating probabilistic modeling, the proposed algorithm dynamically updates its learning based on data collected by the mobile app used by mothers and children. The app also infers uncertainty from response times, adjusting the importance of each dimension of child growth accordingly. Our contribution applies state-of-the-art technology to track multidimensional
Recently developed methods for video analysis, especially models for pose estimation and behavior classification, are transforming behavioral quantification to be more precise, scalable, and reproducible in fields such as neuroscience and ethology. These tools overcome long-standing limitations of manual scoring of video frames and traditional "center of mass" tracking algorithms to enable video analysis at scale. The expansion of open-source tools for video acquisition and analysis has led to new experimental approaches to understand behavior. Here, we review currently available open-source tools for video analysis and discuss how to set up these methods for labs new to video recording. We also discuss best practices for developing and using video analysis methods, including community-wide standards and critical needs for the open sharing of datasets and code, more widespread comparisons of video analysis methods, and better documentation for these methods especially for new users. We encourage broader adoption and continued development of these tools, which have tremendous potential for accelerating scientific progress in understanding the brain and behavior.