The intersection of visualization and the humanities (VIS*H) is marked by a tension between chasing analytical "insight" and interpretive "meaning." The effectiveness of visualization techniques hinges on established evaluation frameworks that assess both analytical utility and communicative efficacy, creating a potential mismatch with the non-positivist, interpretive aims of humanities scholarship. To examine how this tension manifests in practice, we systematically surveyed 171 VIS*H design studies to analyze their evaluation workflows and rigor according to standard practice. Our findings reveal recurring flaws, such as an over-reliance on monomethod approaches, and show that higher-quality evaluations emerge from workflows that effectively triangulate diverse evidence. From these findings, we derive recommendations to refine quality and validation criteria for humanities visualizations, and juxtapose them to ongoing critical debates in the field, ultimately arguing for a paradigm shift that can reconcile the advantages of established validation techniques with the interpretive depth required for humanistic inquiry.
The accelerated evolution of digital infrastructures and algorithmic systems is reshaping how the humanities engage with knowledge and culture. Rooted in the traditions of Digital Humanities and Digital Humanism, the concept of "Cyber Humanities" proposes a critical reconfiguration of humanistic inquiry for the post-digital era. This Manifesto introduces a flexible framework that integrates ethical design, sustainable digital practices, and participatory knowledge systems grounded in human-centered approaches. By means of a Decalogue of foundational principles, the Manifesto invites the scientific community to critically examine and reimagine the algorithmic infrastructures that influence culture, creativity, and collective memory. Rather than being a simple extension of existing practices, "Cyber Humanities" should be understood as a foundational paradigm for humanistic inquiry in a computationally mediated world. Keywords: Cyber Humanities, Digital Humanities, Transdisciplinary Epistemology, Algorithmic Reflexivity, Human-centered AI, Ethics-by-Design, Knowledge Ecosystems, Digital Sovereignty, Cognitive Infrastructures
Optical Character Recognition (OCR) is a critical but error-prone stage in digital humanities text pipelines. While OCR correction improves usability for downstream NLP tasks, common workflows often overwrite intermediate decisions, obscuring how textual transformations affect scholarly interpretation. We present a provenance-aware framework for OCR-corrected humanities corpora that records correction lineage at the span level, including edit type, correction source, confidence, and revision status. Using a pilot corpus of historical texts, we compare downstream named entity extraction across raw OCR, fully corrected text, and provenance-filtered corrections. Our results show that correction pathways can substantially alter extracted entities and document-level interpretations, while provenance signals help identify unstable outputs and prioritize human review. We argue that provenance should be treated as a first-class analytical layer in NLP for digital humanities, supporting reproducibility, source criticism, and uncertainty-aware interpretation.
Queer students often encounter discrimination and a lack of belonging in their academic environments. This may be especially true in heteronormative male-dominated fields like software engineering, which already faces a diversity crisis. In contrast, disciplines like humanities have a higher proportion of queer students, suggesting a more diverse academic culture. While prior research has explored queer students' challenges in STEM fields, limited attention has been given to how experiences differ between the sociotechnical, yet highly heteronormative, field of software engineering and the socioculturally inclusive humanities. This study addresses that gap by comparing 165 queer software engineering and 119 queer humanities students experiences. Our findings reveal that queer students in software engineering are less likely to be open about their sexuality, report a significantly lower sense of belonging, and encounter more academic challenges compared to their peers in the humanities. Despite these challenges, queer software engineering students show greater determination to continue their studies. These insights suggest that software engineering could enhance inclusivity by adopt
In recent years, the development of Large Language Models (LLMs) has made significant breakthroughs in the field of natural language processing and has gradually been applied to the field of humanities and social sciences research. LLMs have a wide range of application value in the field of humanities and social sciences because of its strong text understanding, generation and reasoning capabilities. In humanities and social sciences research, LLMs can analyze large-scale text data and make inferences. This article analyzes the large language model DeepSeek-R1 from seven aspects: low-resource language translation, educational question-answering, student writing improvement in higher education, logical reasoning, educational measurement and psychometrics, public health policy analysis, and art education . Then we compare the answers given by DeepSeek-R1 in the seven aspects with the answers given by o1-preview. DeepSeek-R1 performs well in the humanities and social sciences, answering most questions correctly and logically, and can give reasonable analysis processes and explanations. Compared with o1-preview, it can automatically generate reasoning processes and provide more detaile
The development of digital humanities necessitates scholars to adopt more data-intensive methods and engage in multidisciplinary collaborations. Understanding their collaborative data behaviors becomes essential for providing more curated data, tailored tools, and a collaborative research environment. This study explores how interdisciplinary researchers collaborate on data activities by conducting focus group interviews with 19 digital humanities research groups. Through inductive coding, the study identified seven primary and supportive data activities and found that different collaborative modes are adopted in various data activities. The collaborative modes include humanities-driven, technically-driven, and balanced, depending on how team members naturally adjusted their responsibilities based on their expertise. These findings establish a preliminary framework for examining collaborative data behavior and interdisciplinary collaboration in digital humanities.
Access to humanities research databases is often hindered by the limitations of traditional interaction formats, particularly in the methods of searching and response generation. This study introduces an LLM-based smart assistant designed to facilitate natural language communication with digital humanities data. The assistant, developed in a chatbot format, leverages the RAG approach and integrates state-of-the-art technologies such as hybrid search, automatic query generation, text-to-SQL filtering, semantic database search, and hyperlink insertion. To evaluate the effectiveness of the system, experiments were conducted to assess the response quality of various language models. The testing was based on the Prozhito digital archive, which contains diary entries from predominantly Russian-speaking individuals who lived in the 20th century. The chatbot is tailored to support anthropology and history researchers, as well as non-specialist users with an interest in the field, without requiring prior technical training. By enabling researchers to query complex databases with natural language, this tool aims to enhance accessibility and efficiency in humanities research. The study highli
Research on Digital Humanities (DH) has been boosted due to the investment in technology for developing access and interaction tools for handling Humanities and Heritage data. The availability of these tools lowers the distance between DH scholars and data generators, and students at various levels, not only because it facilitates access to information but also through the dissemination technologies used in these tools, designed for the improvement of user experience. Most of the disciplines associated with the humanities involve geographical and temporal references, often integrated. These references have been scientifically and pedagogically handled for centuries and are established through the use of maps and timelines. Both these supports have been implemented and used digitally and their potential has been risen through their innovative integration with narratives, storytelling and story maps, enabling the telling of historical events in narratives superimposed on maps. These can be enhanced when supported by rich data, such as images, videos, sound, and their possible combinations in virtual and augmented reality. In this paper, we describe an initial set of tools which use a
The effects of generative AI are experienced by a broad range of constituencies, but the disciplinary inputs to its development have been surprisingly narrow. Here we present a set of provocations from humanities researchers -- currently underrepresented in AI development -- intended to inform its future applications and enrich ongoing conversations about its uses, impact, and harms. Drawing from relevant humanities scholarship, along with foundational work in critical data studies, we elaborate eight claims with broad applicability to generative AI research: 1) Models make words, but people make meaning; 2) Generative AI requires an expanded definition of culture; 3) Generative AI can never be representative; 4) Bigger models are not always better models; 5) Not all training data is equivalent; 6) Openness is not an easy fix; 7) Limited access to compute enables corporate capture; and 8) AI universalism creates narrow human subjects. We also provide a working definition of humanities research, summarize some of its most salient theories and methods, and apply these theories and methods to the current landscape of AI. We conclude with a discussion of the importance of resisting the
The connection between texts is referred to as intertextuality in literary theory, which served as an important theoretical basis in many digital humanities studies. Over the past decade, advancements in natural language processing have ushered intertextuality studies into the quantitative age. Large-scale intertextuality research based on cutting-edge methods has continuously emerged. This paper provides a roadmap for quantitative intertextuality studies, summarizing their data, methods, and applications. Drawing on data from multiple languages and topics, this survey reviews methods from statistics to deep learning. It also summarizes their applications in humanities and social sciences research and the associated platform tools. Driven by advances in computer technology, more precise, diverse, and large-scale intertext studies can be anticipated. Intertextuality holds promise for broader application in interdisciplinary research bridging AI and the humanities.
Probably Not. Journal Citation Indicator (JCI) was introduced to address the limitations of traditional metrics like the Journal Impact Factor (JIF), particularly its inability to normalize citation impact across different disciplines. This study reveals that JCI faces significant challenges in field normalization for Art & Humanities journals, as evidenced by much lower correlations with a more granular, paper-level metric, CNCI-CT. A detailed analysis of Architecture journals highlights how journal-level misclassification and the interdisciplinary nature of content exacerbate these issues, leading to less reliable evaluations. We recommend improving journal classification systems or adopting paper-level normalization methods, potentially supported by advanced AI techniques, to enhance the accuracy and effectiveness of JCI for Art & Humanities disciplines.
Digital humanities scholars increasingly use Large Language Models for historical document digitization, yet lack appropriate evaluation frameworks for LLM-based OCR. Traditional metrics fail to capture temporal biases and period-specific errors crucial for historical corpus creation. We present an evaluation methodology for LLM-based historical OCR, addressing contamination risks and systematic biases in diplomatic transcription. Using 18th-century Russian Civil font texts, we introduce novel metrics including Historical Character Preservation Rate (HCPR) and Archaic Insertion Rate (AIR), alongside protocols for contamination control and stability testing. We evaluate 12 multimodal LLMs, finding that Gemini and Qwen models outperform traditional OCR while exhibiting over-historicization: inserting archaic characters from incorrect historical periods. Post-OCR correction degrades rather than improves performance. Our methodology provides digital humanities practitioners with guidelines for model selection and quality assessment in historical corpus digitization.
Digital Humanities (DH) is an interdisciplinary field that integrates computational methods with humanities scholarship to investigate innovative topics. Each academic discipline follows a unique developmental path shaped by the topics researchers investigate and the methods they employ. With the help of bibliometric analysis, most of previous studies have examined DH across multiple dimensions such as research hotspots, co-author networks, and institutional rankings. However, these studies have often been limited in their ability to provide deep insights into the current state of technological advancements and topic development in DH. As a result, their conclusions tend to remain superficial or lack interpretability in understanding how methods and topics interrelate in the field. To address this gap, this study introduced a new concept of Topic-Method Composition (TMC), which refers to a hybrid knowledge structure generated by the co-occurrence of specific research topics and the corresponding method. Especially by analyzing the interaction between TMCs, we can see more clearly the intersection and integration of digital technology and humanistic subjects in DH. Moreover, this st
This extended abstract describes the challenges in implementing recommender systems for digital archives in the humanities, focusing on Monasterium.net, a platform for historical legal documents. We discuss three key aspects: (i) the unique characteristics of so-called charters as items for recommendation, (ii) the complex multi-stakeholder environment, and (iii) the distinct information-seeking behavior of scholars in the humanities. By examining these factors, we aim to contribute to the development of more effective and tailored recommender systems for (digital) humanities research.
Quantum computing is a new form of computing that is based on the principles of quantum mechanics. It has the potential to revolutionize many fields, including the humanities and social sciences. The idea behind quantum humanities is to explore the potential of quantum computing to answer new questions in these fields, as well as to consider the potential societal impacts of this technology. This paper proposes a research program for quantum humanities, which includes the application of quantum algorithms to humanities and social science research, the reflection on the methods and techniques of quantum computing, and the evaluation of its potential societal implications. This research program aims to define the field of quantum humanities and to establish it as a meaningful part of the humanities and social sciences.
The complexity of cultures in the modern world is now beyond human comprehension. Cognitive sciences cast doubts on the traditional explanations based on mental models. The core subjects in humanities may lose their importance. Humanities have to adapt to the digital age. New, interdisciplinary branches of humanities emerge. Instant access to information will be replaced by instant access to knowledge. Understanding the cognitive limitations of humans and the opportunities opened by the development of artificial intelligence and interdisciplinary research necessary to address global challenges is the key to the revitalization of humanities. Artificial intelligence will radically change humanities, from art to political sciences and philosophy, making these disciplines attractive to students and enabling them to go beyond current limitations.
Artificial Intelligence (AI) and Medical Humanities have become two of the most crucial and rapidly growing fields in the current world. AI has made substantial advancements in recent years, enabling the development of algorithms and systems that can perform tasks traditionally done by humans. Medical Humanities, on the other hand, is the intersection of medical sciences, humanities, and the social sciences, and deals with the cultural, historical, philosophical, ethical, and social aspects of health, illness, and medicine. The integration of AI and Medical Humanities can offer innovative solutions to some of the pressing issues in the medical field.
Numerous digital humanities projects maintain their data collections in the form of text, images, and metadata. While data may be stored in many formats, from plain text to XML to relational databases, the use of the resource description framework (RDF) as a standardized representation has gained considerable traction during the last five years. Almost every digital humanities meeting has at least one session concerned with the topic of digital humanities, RDF, and linked data. While most existing work in linked data has focused on improving algorithms for entity matching, the aim of the LinkedHumanities project is to build digital humanities tools that work "out of the box," enabling their use by humanities scholars, computer scientists, librarians, and information scientists alike. With this paper, we report on the Linked Open Data Enhancer (LODE) framework developed as part of the LinkedHumanities project. With LODE we support non-technical users to enrich a local RDF repository with high-quality data from the Linked Open Data cloud. LODE links and enhances the local RDF repository without compromising the quality of the data. In particular, LODE supports the user in the enhance
We introduce a graph-aware autoencoder ensemble framework, with associated formalisms and tooling, designed to facilitate deep learning for scholarship in the humanities. By composing sub-architectures to produce a model isomorphic to a humanistic domain we maintain interpretability while providing function signatures for each sub-architectural choice, allowing both traditional and computational researchers to collaborate without disrupting established practices. We illustrate a practical application of our approach to a historical study of the American post-Atlantic slave trade, and make several specific technical contributions: a novel hybrid graph-convolutional autoencoder mechanism, batching policies for common graph topologies, and masking techniques for particular use-cases. The effectiveness of the framework for broadening participation of diverse domains is demonstrated by a growing suite of two dozen studies, both collaborations with humanists and established tasks from machine learning literature, spanning a variety of fields and data modalities. We make performance comparisons of several different architectural choices and conclude with an ambitious list of imminent next
This paper discusses problems of visualizing humanities data of various forms, such as video data, archival data, and numeric-oriented social science data, with three distinct case studies. By describing the visualization practices and the issues that emerged from the process, this paper uses the three cases to each identify a pertinent question for reflection. More specifically, I reflect on the difficulty, thoughts, and considerations of choosing the most effective and sufficient forms of visualization to enhance the expression of specific cultural and humanities data in the projects. Discussions in this paper concern some questions, such as, how do the multi-modality of humanities and cultural data challenge the understanding, roles, and functions of visualizations, and more broadly, visual representations in humanities research? What do we lose of the original data by visualizing them in those projects? How to balance the benefits and disadvantages of visual technologies to display complex, unique, and often culturally saturated humanities datasets