Potato functional genomics lags due to unsystematic gene information curation, gene identifier inconsistencies across reference genome versions, and the increasing volume of research publications. To address these limitations, we developed the Potato Knowledge Hub (http://www.potato-ai.top), leveraging Large Language Models (LLMs) and a systematically curated collection of over 3,200 high-quality potato research papers spanning over 120 years. This platform integrates two key modules: a functional gene database containing 2,571 literature-reported genes, meticulously mapped to the latest DMv8.1 reference genome with resolved nomenclature discrepancies and links to original publications; and a potato knowledge base. The knowledge base, built using a Retrieval-Augmented Generation (RAG) architecture, accurately answers research queries with literature citations, mitigating LLM "hallucination." Users can interact with the hub via a natural language AI agent, "Potato Research Assistant," for querying specialized knowledge, retrieving gene information, and extracting sequences. The continuously updated Potato Knowledge Hub aims to be a comprehensive resource, fostering advancements in p
Software is at the core of most scientific discoveries today. Therefore, the quality of research results highly depends on the quality of the research software. Rigorous testing, as we know it from software engineering in the industry, could ensure the quality of the research software but it also requires a substantial effort that is often not rewarded in academia. Therefore, this research explores the effects of research software testing integrated into teaching on research software. In an in-vivo experiment, we integrated the engineering of a test suite for a large-scale network simulation as group projects into a course on software testing at the Blekinge Institute of Technology, Sweden, and qualitatively measured the effects of this integration on the research software. We found that the research software benefited from the integration through substantially improved documentation and fewer hardware and software dependencies. However, this integration was effortful and although the student teams developed elegant and thoughtful test suites, no code by students went directly into the research software since we were not able to make the integration back into the research software
Potatoes are an economically important crop, and their quality is closely related to the starch content, which is typically inferred from specific gravity (SG). Although microwave sensing technologies have been increasingly developed for underground potato detection and quality assessment in recent years, no accurate model has yet been established to link the dielectric properties of potatoes with their key agronomic traits. To address this gap, we developed a model for estimating potato tubers' SG based on their dielectric constant. To construct and validate the model, we conducted SG measurements and dielectric spectroscopy measurements in the frequency range of 0.3 GHz to 3.0 GHz on 250 potatoes of five different types (red, russet, yellow, purple, and chipping potatoes, with 50 samples per type). Out of the 250 data sets, 200 data sets were used for model development, and 50 data sets were used for model validation. A linear regression model was used to summarize the relationship between SG and dielectric constant, where the regression coefficients are expressed as fourth-order polynomial functions of frequency. Experimental results on the 50 validation data sets show that the
Demographic data collection is essential in education research, as demographic data allows researchers to better describe the participant population they study and to contextualize findings. However, current research practices for neurodiversity demographics often rely on prescriptive methods (e.g., requiring participants to report official diagnoses) rather than allowing participants to self-identify. This approach can: a) not allow participants to express their intersecting identities in ways that are authentic; and b) limit trustworthiness and reliability of the data and interpretation. In addition, inconsistent dissemination and representation of demographic data across studies hinder the accessibility and usability of this work. Through a literature review of neurodivergent student experiences with learning and performing STEM, we identified widespread discrepancies in how demographic information is collected and reported. This paper explores how neurodivergent identities can be more accurately and inclusively represented in education research. We present findings of a thematic analysis on the ways neurodivergent demographic data collection is done in the literature using data
This paper presents multi- and interdisciplinary approaches for finding the appropriate AI technologies for research information. Professional research information management (RIM) is becoming increasingly important as an expressly data-driven tool for researchers. It is not only the basis of scientific knowledge processes, but also related to other data. A concept and a process model of the elementary phases from the start of the project to the ongoing operation of the AI methods in the RIM is presented, portraying the implementation of an AI project, meant to enable universities and research institutions to support their researchers in dealing with incorrect and incomplete research information, while it is being stored in their RIMs. Our aim is to show how research information harmonizes with the challenges of data literacy and data quality issues, related to AI, also wanting to underline that any project can be successful if the research institutions and various departments of universities, involved work together and appropriate support is offered to improve research information and data management.
Modern research heavily relies on software. A significant challenge researchers face is understanding the complex software used in specific research fields. We target two scenarios in this context, namely long onboarding times for newcomers and conference reviewers evaluating replication packages. We hypothesize that both scenarios can be significantly improved when there is a clear link between the paper's ideas and the code that implements them. As a time- and staff-saving approach, we propose an LLM-based automation tool that takes in a paper and the software implementing the paper, and generates a trace mapping between research ideas and their locations in code. Initial experiments have shown that the tool can generate quite useful mappings.
This paper presents a scientometric analysis of research output from the University of Lagos, focusing on the two decades spanning 2004 to 2023. Using bibliometric data retrieved from the Web of Science, we examine trends in publication volume, collaboration patterns, citation impact, and the most prolific authors, departments, and research domains at the university. The study reveals a consistent increase in research productivity, with the highest publication output recorded in 2023. Health Sciences, Engineering, and Social Sciences are identified as dominant fields, reflecting the university's interdisciplinary research strengths. Collaborative efforts, both locally and internationally, show a positive correlation with higher citation impact, with the United States and the United Kingdom being the leading international collaborators. Notably, open-access publications account for a significant portion of the university's research output, enhancing visibility and citation rates. The findings offer valuable insights into the university's research performance over the past two decades, providing a foundation for strategic planning and policy formulation to foster research excellence
Potato plants are plants that are beneficial to humans. Like other plants in general, potato plants also have diseases; if this disease is not treated immediately, there will be a significant decrease in food production. Therefore, it is necessary to detect diseases quickly and precisely so that disease control can be carried out effectively and efficiently. Classification of potato leaf disease can be done directly. Still, the symptoms cannot always explain the type of disease that attacks potato leaves because there are many types of diseases with symptoms that look the same. Humans also have deficiencies in determining the results of identification of potato leaf disease, so sometimes the results of identification between individuals can be different. Therefore, the use of Deep Learning for the classification process of potato leaf disease is expected to shorten the time and have a high classification accuracy. This study uses a deep learning method with the DenseNet201 architecture. The choice to use the DenseNet201 algorithm in this study is because the model can identify important features of potato leaves and recognize early signs of emerging diseases. This study aimed to ev
Early and precise identification of plant diseases, especially in potato crops is important to ensure the health of the crops and ensure the maximum yield . Potato leaf diseases, such as Early Blight and Late Blight, pose significant challenges to farmers, often resulting in yield losses and increased pesticide use. Traditional methods of detection are not only time-consuming, but are also subject to human error, which is why automated and efficient methods are required. The paper introduces a new method of potato leaf disease classification Tiny-ViT model, which is a small and effective Vision Transformer (ViT) developed to be used in resource-limited systems. The model is tested on a dataset of three classes, namely Early Blight, Late Blight, and Healthy leaves, and the preprocessing procedures include resizing, CLAHE, and Gaussian blur to improve the quality of the image. Tiny-ViT model has an impressive test accuracy of 99.85% and a mean CV accuracy of 99.82% which is better than baseline models such as DEIT Small, SWIN Tiny, and MobileViT XS. In addition to this, the model has a Matthews Correlation Coefficient (MCC) of 0.9990 and narrow confidence intervals (CI) of [0.9980, 0
The potato is a widely grown crop in many regions of the world. In recent decades, potato farming has gained incredible traction in the world. Potatoes are susceptible to several illnesses that stunt their development. This plant seems to have significant leaf disease. Early Blight and Late Blight are two prevalent leaf diseases that affect potato plants. The early detection of these diseases would be beneficial for enhancing the yield of this crop. The ideal solution is to use image processing to identify and analyze these disorders. Here, we present an autonomous method based on image processing and machine learning to detect late blight disease affecting potato leaves. The proposed method comprises four different phases: (1) Histogram Equalization is used to improve the quality of the input image; (2) feature extraction is performed using a Deep CNN model, then these extracted features are concatenated; (3) feature selection is performed using wrapper-based feature selection; (4) classification is performed using an SVM classifier and its variants. This proposed method achieves the highest accuracy of 99% using SVM by selecting 550 features.
This scientometric study analyzes Avian Influenza research from 2014 to 2023 using bibliographic data from the Web of Science database. We examined publication trends, sources, authorship, collaborative networks, document types, and geographical distribution to gain insights into the global research landscape. Results reveal a steady increase in publications, with high contributions from Chinese and American institutions. Journals such as PLoS One and the Journal of Virology published the highest number of studies, indicating their influence in this field. The most prolific institutions include the Chinese Academy of Sciences and the University of Hong Kong, while the College of Veterinary Medicine at South China Agricultural University emerged as the most productive department. China and the USA lead in publication volume, though developed nations like the United Kingdom and Germany exhibit a higher rate of international collaboration. "Articles" are the most common document type, constituting 84.6% of the total, while "Reviews" account for 7.6%. This study provides a comprehensive view of global trends in Avian Influenza research, emphasizing the need for collaborative efforts ac
The minimum convex cover problem seeks to cover a polygon $P$ with the fewest convex polygons that lie within $P$. This problem is $\exists\mathbb R$-complete, and the best previously known algorithm, due to Eidenbenz and Widmayer (2001), achieves an $O(\log n)$-approximation in $O(n^{29} \log n)$ time, where $n$ is the complexity of $P$. In this work we present a novel approach that preserves the $O(\log n)$ approximation guarantee while significantly reducing the running time. By discretizing the problem and formulating it as a set cover problem, we focus on efficiently finding a convex polygon that covers the largest number of uncovered regions, in each iteration of the greedy algorithm. This core subproblem, which we call the rotten potato peeling problem, is a variant of the classic potato peeling problem. We solve it by finding maximum weighted paths in Directed Acyclic Graphs (DAGs) that correspond to visibility polygons, with the DAG construction carefully constrained to manage complexity. Our approach yields a substantial improvement in the overall running time and introduces techniques that may be of independent interest for other geometric covering problems.
Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The Cyc
Image-based deep learning provides a non-invasive, scalable solution for monitoring potato quality during storage, addressing key challenges such as sprout detection, weight loss estimation, and shelf-life prediction. In this study, images and corresponding weight data were collected over a 200-day period under controlled temperature and humidity conditions. Leveraging powerful pre-trained architectures of ResNet, VGG, DenseNet, and Vision Transformer (ViT), we designed two specialized models: (1) a high-precision binary classifier for sprout detection, and (2) an advanced multi-class predictor to estimate weight loss and forecast remaining shelf-life with remarkable accuracy. DenseNet achieved exceptional performance, with 98.03% accuracy in sprout detection. Shelf-life prediction models performed best with coarse class divisions (2-5 classes), achieving over 89.83% accuracy, while accuracy declined for finer divisions (6-8 classes) due to subtle visual differences and limited data per class. These findings demonstrate the feasibility of integrating image-based models into automated sorting and inventory systems, enabling early identification of sprouted potatoes and dynamic categ
Potato yield is an important metric for farmers to further optimize their cultivation practices. Potato yield can be estimated on a harvester using an RGB-D camera that can estimate the three-dimensional (3D) volume of individual potato tubers. A challenge, however, is that the 3D shape derived from RGB-D images is only partially completed, underestimating the actual volume. To address this issue, we developed a 3D shape completion network, called CoRe++, which can complete the 3D shape from RGB-D images. CoRe++ is a deep learning network that consists of a convolutional encoder and a decoder. The encoder compresses RGB-D images into latent vectors that are used by the decoder to complete the 3D shape using the deep signed distance field network (DeepSDF). To evaluate our CoRe++ network, we collected partial and complete 3D point clouds of 339 potato tubers on an operational harvester in Japan. On the 1425 RGB-D images in the test set (representing 51 unique potato tubers), our network achieved a completion accuracy of 2.8 mm on average. For volumetric estimation, the root mean squared error (RMSE) was 22.6 ml, and this was better than the RMSE of the linear regression (31.1 ml) an
Potato yield is a key indicator for optimizing cultivation practices in agriculture. Potato yield can be estimated on harvesters using RGB-D cameras, which capture three-dimensional (3D) information of individual tubers moving along the conveyor belt. However, point clouds reconstructed from RGB-D images are incomplete due to self-occlusion, leading to systematic underestimation of tuber weight. To address this, we introduce PointRAFT, a high-throughput point cloud regression network that directly predicts continuous 3D shape properties, such as tuber weight, from partial point clouds. Rather than reconstructing full 3D geometry, PointRAFT infers target values directly from raw 3D data. Its key architectural novelty is an object height embedding that incorporates tuber height as an additional geometric cue, improving weight prediction under practical harvesting conditions. PointRAFT was trained and evaluated on 26,688 partial point clouds collected from 859 potato tubers across four cultivars and three growing seasons on an operational harvester in Japan. On a test set of 5,254 point clouds from 172 tubers, PointRAFT achieved a mean absolute error of 12.0 g and a root mean squared
Drawing on 1,178 safety and reliability papers from 9,439 generative AI papers (January 2020 - March 2025), we compare research outputs of leading AI companies (Anthropic, Google DeepMind, Meta, Microsoft, and OpenAI) and AI universities (CMU, MIT, NYU, Stanford, UC Berkeley, and University of Washington). We find that corporate AI research increasingly concentrates on pre-deployment areas -- model alignment and testing & evaluation -- while attention to deployment-stage issues such as model bias has waned. Significant research gaps exist in high-risk deployment domains, including healthcare, finance, misinformation, persuasive and addictive features, hallucinations, and copyright. Without improved observability into deployed AI, growing corporate concentration could deepen knowledge deficits. We recommend expanding external researcher access to deployment data and systematic observability of in-market AI behaviors.
Electroencephalography (EEG) signal cleaning has long been a critical challenge in the research community. The presence of artifacts can significantly degrade EEG data quality, complicating analysis and potentially leading to erroneous interpretations. While various artifact rejection methods have been proposed, the gold standard remains manual visual inspection by human experts-a process that is time-consuming, subjective, and impractical for large-scale EEG studies. Existing techniques are often hindered by a strong reliance on manual hyperparameter tuning, sensitivity to outliers, and high computational costs. In this paper, we introduce the improved Riemannian Potato Field (iRPF), a fast and fully automated method for EEG artifact rejection that addresses key limitations of current approaches. We evaluate iRPF against several state-of-the-art artifact rejection methods, using two publicly available EEG databases, labeled for various artifact types, comprising 226 EEG recordings. Our results demonstrate that iRPF outperforms all competitors across multiple metrics, with gains of up to 22% in recall, 102% in specificity, 54% in precision, and 24% in F1-score, compared to Isolatio
Potatoes are the third-largest food crop globally, but their production frequently encounters difficulties because of aggressive pest infestations. The aim of this study is to investigate the various types and characteristics of these pests and propose an efficient PotatoPestNet AI-based automatic potato pest identification system. To accomplish this, we curated a reliable dataset consisting of eight types of potato pests. We leveraged the power of transfer learning by employing five customized, pre-trained transfer learning models: CMobileNetV2, CNASLargeNet, CXception, CDenseNet201, and CInceptionV3, in proposing a robust PotatoPestNet model to accurately classify potato pests. To improve the models' performance, we applied various augmentation techniques, incorporated a global average pooling layer, and implemented proper regularization methods. To further enhance the performance of the models, we utilized random search (RS) optimization for hyperparameter tuning. This optimization technique played a significant role in fine-tuning the models and achieving improved performance. We evaluated the models both visually and quantitatively, utilizing different evaluation metrics. The
Plastic waste in aquatic environments poses severe risks to marine life and human health. Autonomous robots can be utilized to collect floating waste, but they require accurate object identification capability. While deep learning has been widely used as a powerful tool for this task, its performance is significantly limited by outdoor light conditions and water surface reflection. Light polarization, abundant in such environments yet invisible to the human eye, can be captured by modern sensors to significantly improve litter detection accuracy on water surfaces. With this goal in mind, we introduce PoTATO, a dataset containing 12,380 labeled plastic bottles and rich polarimetric information. We demonstrate under which conditions polarization can enhance object detection and, by providing raw image data, we offer an opportunity for the research community to explore novel approaches and push the boundaries of state-of-the-art object detection algorithms even further. Code and data are publicly available at https://github.com/luisfelipewb/ PoTATO/tree/eccv2024.