搜索 — ResearchTracker

In this paper, we present TituLLMs, the first large pretrained Bangla LLMs, available in 1b and 3b parameter sizes. Due to computational constraints during both training and inference, we focused on smaller models. To train TituLLMs, we collected a pretraining dataset of approximately ~37 billion tokens. We extended the Llama-3.2 tokenizer to incorporate language- and culture-specific knowledge, which also enables faster training and inference. There was a lack of benchmarking datasets to benchmark LLMs for Bangla. To address this gap, we developed five benchmarking datasets. We benchmarked various LLMs, including TituLLMs, and demonstrated that TituLLMs outperforms its initial multilingual versions. However, this is not always the case, highlighting the complexities of language adaptation. Our work lays the groundwork for adapting existing multilingual open models to other low-resource languages. To facilitate broader adoption and further research, we have made the TituLLMs models and benchmarking datasets publicly available (https://huggingface.co/collections/hishab/titulm-llama-family-6718d31fc1b83529276f490a).

Can LLMs Produce Original Astronomy Research in a Semester? A Graduate Class Experiment

arXiv2026-03-27作者：Ann Zabludoff, Chen-Yu Chuang, Parker Thomas Johnson

We discuss the results of using large language models (LLMs) to conduct original scientific research in an unfamiliar subject area during the Fall 2025 semester. Students in a graduate astronomy and astrophysics course were asked to test whether LLMs could help them complete research tasks faster and at a level of detail and accuracy required for scientific publication. Most students employed LLMs for a total of 5-10 hours. While all students completed a draft paper on an unsolved problem related to galaxies by semester's end, their impressions of the models' value varied. About half thought that the models saved them time. Many noted that LLMs failed to provide appropriately detailed insights or steps to addressing open, niche questions over a several-month timeframe. The LLMs also frequently (about 20% of the time) returned false citations, links, or summaries of papers. The models struggled with generating complex functional code, accessing online packages or Application Programming Interfaces (APIs), and retrieving astronomical datasets from existing archives. In writing code and in chats, the LLMs made implicit, overly simplifying assumptions and often doubled down even after

搜索结果：LLMs

TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking

Can LLMs Produce Original Astronomy Research in a Semester? A Graduate Class Experiment

F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs

Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives

When Cow Urine Cures Constipation on YouTube: Limits of LLMs in Detecting Culture-specific Health Misinformation

LLMs Prompted for Graphs: Hallucinations and Generative Capabilities

The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario

Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering

Are LLMs More Skeptical of Entertainment News?

SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation

WASIL: In-the-Wild Arabic Spoken Interactions with LLMs

Evaluation of LLMs for Process Model Analysis and Optimization

LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods

Towards a General Framework for HTN Modeling with LLMs

A Roadmap to Guide the Integration of LLMs in Hierarchical Planning

Number Representations in LLMs: A Computational Parallel to Human Perception

Combating Misinformation in the Age of LLMs: Opportunities and Challenges

SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs

Emergent Response Planning in LLMs

Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs