共找到 20 条结果
This study independently reproduces the malware detection methodology presented by Felli cious et al. [7], which employs order-invariant API call frequency analysis using Random Forest classification. We utilized the original public dataset (250,533 training samples, 83,511 test samples) and replicated four model variants: Unigram, Bigram, Trigram, and Combined n gram approaches. Our reproduction successfully validated all key findings, achieving F1-scores that exceeded the original results by 0.99% to 2.57% across all models at the optimal API call length of 2,500. The Unigram model achieved F1=0.8717 (original: 0.8631), confirming its ef fectiveness as a lightweight malware detector. Across three independent experimental runs with different random seeds, we observed remarkably consistent results with standard deviations be low 0.5%, demonstrating high reproducibility. This study validates the robustness and scientific rigor of the original methodology while confirming the practical viability of frequency-based API call analysis for malware detection.
Container startup latency is a critical performance metric for CI/CD pipelines, serverless computing, and auto-scaling systems, yet practitioners lack empirical guidance on how infrastructure choices affect this latency. We present a systematic measurement study that decomposes Docker container startup into constituent operations across three heterogeneous infrastructure tiers: Azure Premium SSD (cloud SSD), Azure Standard HDD (cloud HDD), and macOS Docker Desktop (developer workstation with hypervisor-based virtualization). Using a reproducible benchmark suite that executes 50 iterations per test across 10 performance dimensions, we quantify previously under-characterized relationships between infrastructure configuration and container runtime behavior. Our key findings include: (1) container startup is dominated by runtime overhead rather than image size, with only 2.5% startup variation across images ranging from 5 MB to 155 MB on SSD; (2) storage tier selection imposes a 2.04x startup penalty (HDD 1157 ms vs. SSD 568 ms); (3) Docker Desktop's hypervisor layer introduces a 2.69x startup penalty and 9.5x higher CPU throttling variance compared to native Linux; (4) OverlayFS write
AI coding assistants are now used to generate production code in security-sensitive domains, yet the exploitability of their outputs remains unquantified. We address this gap with Broken by Default: a formal verification study of 3,500 code artifacts generated by seven widely-deployed LLMs across 500 security-critical prompts (five CWE categories, 100 prompts each). Each artifact is subjected to the Z3 SMT solver via the COBALT analysis pipeline, producing mathematical satisfiability witnesses rather than pattern-based heuristics. Across all models, 55.8% of artifacts contain at least one COBALT-identified vulnerability; of these, 1,055 are formally proven via Z3 satisfiability witnesses. GPT-4o leads at 62.4% (grade F); Gemini 2.5 Flash performs best at 48.4% (grade D). No model achieves a grade better than D. Six of seven representative findings are confirmed with runtime crashes under GCC AddressSanitizer. Three auxiliary experiments show: (1) explicit security instructions reduce the mean rate by only 4 points; (2) six industry tools combined miss 97.8% of Z3-proven findings; and (3) models identify their own vulnerable outputs 78.7% of the time in review mode yet generate them
Maintaining reliable UI test suites in large-scale enterprise applications is a persistent and costly challenge. We present an industrial case study of a multi-agent autonomous testing system evaluated using anonymized execution data from a production-like enterprise UI testing prototype. The application features several hundred dynamic UI elements per screen. Built on a large language model with LangGraph orchestration, Playwright execution, and a RAG knowledge base, the system evolves from human-directed testing toward High-autonomy feature discovery and test execution: given no explicit test targets, it discovers over 100 testable features across 10 UI screens, dynamically expands coverage by an additional 15--30 features through runtime DOM analysis, and iteratively repairs failing tests without human intervention. We analyzed 300 consecutive autonomous execution reports encompassing 636 individual test-case executions across 10 distinct scenario families. The system achieved a 70% repair convergence rate at the scenario-family level, with a mean of 3.4 repair iterations to convergence. However, only 10% of scenario families succeeded on first attempt, 38% of reports failed to
This report relates to a study group hosted by the EPSRC funded network, Integrating data-driven BIOphysical models into REspiratory MEdicine (BIOREME), and supported by The Insigneo Institute and The Knowledge Transfer Network. The BIOREME network hosts events, including this study group, to bring together multi-disciplinary researchers, clinicians, companies and charities to catalyse research in the applications of mathematical modelling for respiratory medicine. The goal of this study group was to provide an interface between companies, clinicians, and mathematicians to develop mathematical tools to the problems presented. The study group was held at The University of Sheffield on the 17 - 20 April 2023 and was attended by 24 researchers from 13 different institutions. This report relates to a challenge presented by Arete Medical Technologies relating to impulse oscillometry (IOS), whereby a short pressure oscillation is imposed at a person's mouth during normal breathing, usually by a loudspeaker. The resulting pressure and flow rate changes can be used to the impedance of the airways, which in turn can provide proxy measurements for (patho)physiological changes in the small ai
This work presents an independent reproducibility study of a lossy image compression technique that integrates singular value decomposition (SVD) and wavelet difference reduction (WDR). The original paper claims that combining SVD and WDR yields better visual quality and higher compression ratios than JPEG2000 and standalone WDR. I re-implemented the proposed method, carefully examined missing implementation details, and replicated the original experiments as closely as possible. I then conducted additional experiments on new images and evaluated performance using PSNR and SSIM. In contrast to the original claims, my results indicate that the SVD+WDR technique generally does not surpass JPEG2000 or WDR in terms of PSNR, and only partially improves SSIM relative to JPEG2000. The study highlights ambiguities in the original description (e.g., quantization and threshold initialization) and illustrates how such gaps can significantly impact reproducibility and reported performance.
We study the impact of teenage sports participation on early-adulthood health using longitudinal data from the National Study of Youth and Religion. We focus on two primary outcomes measured at ages 23--28 -- self-rated health and total score on the PHQ9 Patient Depression Questionnaire -- and control for several potential confounders related to demographics and family socioeconomic status. To probe the possibility that certain types of sports participation may have larger effects on health than others, we conduct a matched observational study at each level within a hierarchy of exposures. Our hierarchy ranges from broadly defined exposures (e.g., participation in any organized after-school activity) to narrow (e.g., participation in collision sports). We deployed an ordered testing approach that exploits the hierarchical relationships between our exposure definitions to perform our analyses while maintaining a fixed family-wise error rate. Compared to teenagers who did not participate in any after-school activities, those who participated in sports had statistically significantly better self-rated and mental health outcomes in early adulthood.
This study set out to examine the relationship between expressed social emotions (i.e. that what people say they are feeling) and physical sensations, the connection between emotion and bodily experience. It additionally provided the opportunity to investigate how the neurological findings of gender differences can be observed in practice, what difference does it make in behaviour and judgment that we have varying levels of mirror neuron activity? The following report documents the study, procedure, results and findings.
This article serves as a study guide for the $\ell^2$ decoupling theorem for the paraboloid originally proved by Bourgain and Demeter. Given its popularity and importance, many expositions about the $\ell^2$ decoupling theorem already exist. Our study guide is intended to complement and combine these existing resources in order to provide a more gentle introduction to the subject.
Artificial intelligence (AI) systems have substantially improved dermatologists' diagnostic accuracy for melanoma, with explainable AI (XAI) systems further enhancing clinicians' confidence and trust in AI-driven decisions. Despite these advancements, there remains a critical need for objective evaluation of how dermatologists engage with both AI and XAI tools. In this study, 76 dermatologists participated in a reader study, diagnosing 16 dermoscopic images of melanomas and nevi using an XAI system that provides detailed, domain-specific explanations. Eye-tracking technology was employed to assess their interactions. Diagnostic performance was compared with that of a standard AI system lacking explanatory features. Our findings reveal that XAI systems improved balanced diagnostic accuracy by 2.8 percentage points relative to standard AI. Moreover, diagnostic disagreements with AI/XAI systems and complex lesions were associated with elevated cognitive load, as evidenced by increased ocular fixations. These insights have significant implications for clinical practice, the design of AI tools for visual tasks, and the broader development of XAI in medical diagnostics.
This article is a study guide for ``On the Hausdorff dimension of Furstenberg sets and orthogonal projections in the plane" by Orponen and Shmerkin. We begin by introducing Furstenberg set problem and exceptional set of projections and provide a summary of the proof with the core ideas.
Objective: We aimed to determine the relationship between day-to-day sleep efficiency variability and cognitive function among older adults using accelerometer data and three cognitive tests. Methods: Older adults aged 65+ with 5 days of accelerometer data from the National Health and Nutrition Examination Survey (NHANES) who completed the Digit Symbol Substitution Test (DSST), the Consortium to Establish a Registry for Alzheimers Disease Word-Learning subtest (CERAD WL), and Animal Fluency Test (AFT) were included in this study. Associations between sleep efficiency variability and each cognitive test were examined adjusted for age, sex, education, household income, marital status, depressive symptoms, diabetes, smoking habits, alcohol consumption, arthritis, heart disease, prior heart attack, prior stroke, activities of daily living, and instrumental activities of daily living. Results: A total of 1074 older adults were included in this study. Greater sleep efficiency variability was univariably associated with worse cognitive function based on the DSST (per 10% increase, Beta -3.34, 95% CI -5.33 to -1.34), CERAD-WL (per 10% increase, Beta -1.00, 95% CI -1.79 to -0.21), and AFT (
This report is a reproducibility study of the paper "CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification" (Abdelfattah et al, ICCV 2023). Our report makes the following contributions: (1) We provide a reproducible, well commented and open-sourced code implementation for the entire method specified in the original paper. (2) We try to verify the effectiveness of the novel aggregation strategy which uses the CLIP model to initialize the pseudo labels for the subsequent unsupervised multi-label image classification task. (3) We try to verify the effectiveness of the gradient-alignment training method specified in the original paper, which is used to update the network parameters and pseudo labels. The code can be found at https://github.com/cs-mshah/CDUL
Determining the internal structure of Uranus is a key objective for planetary science. Knowledge of Uranus's bulk composition and the distribution of elements is crucial to understanding its origin and evolutionary path. In addition, Uranus represents a poorly understood class of intermediate-mass planets (intermediate in size between the relatively well studied terrestrial and gas giant planets), which appear to be very common in the Galaxy. As a result, a better characterization of Uranus will also help us to better understand exoplanets in this mass and size regime. Recognizing the importance of Uranus, a Keck Institute for Space Studies (KISS) workshop was held in September 2023 to investigate how we can improve our knowledge of Uranus's internal structure in the context of a future Uranus mission that includes an orbiter and a probe. The scientific goals and objectives of the recently released Planetary Science and Astrobiology Decadal Survey were taken as our starting point. We reviewed our current knowledge of Uranus's interior and identified measurement and other mission requirements for a future Uranus spacecraft, providing more detail than was possible in the Decadal Surv
Before any software maintenance can occur, developers must read the identifier names found in the code to be maintained. Thus, high-quality identifier names are essential for productive program comprehension and maintenance activities. With developers free to construct identifier names to their liking, it can be difficult to automatically reason about the quality and semantics behind an identifier name. Studying the structure of identifier names can help alleviate this problem. Existing research focuses on studying words within identifiers, but there are other symbols that appear in identifier names -- such as digits. This paper explores the presence and purpose of digits in identifier names through an empirical study of 800 open-source Java systems. We study how digits contribute to the semantics of identifier names and how identifier names that contain digits evolve over time through renaming. We envision our findings improving the efficiency of name appraisal and recommendation tools and techniques.
In the realm of mobile security, where OS-based protections have proven insufficient against robust attackers, Trusted Execution Environments (TEEs) have emerged as a hardware-based security technology. Despite the industry's persistence in advancing TEE technology, the impact on end users and developers remains largely unexplored. This study addresses this gap by conducting a large-scale analysis of TEE utilization in Android applications, focusing on the key areas of cryptography, digital rights management, biometric authentication, and secure dialogs. To facilitate our extensive analysis, we introduce Mobsec Analytika, a framework tailored for large-scale app examinations, which we make available to the research community. Through the analysis of 170,550 popular Android apps, our analysis illuminates the implementation of TEE-related features and their contextual usage. Our findings reveal that TEE features are predominantly utilized indirectly through third-party libraries, with only 6.7% of apps directly invoking the APIs. Moreover, the study reveals the underutilization of the recent TEE-based UI feature Protected Confirmation.
Technology has increasingly become an integral part of the Bible translation process. Over time, both the translation process and relevant technology have evolved greatly. More recently, the field of Natural Language Processing (NLP) has made great progress in solving some problems previously thought impenetrable. Through this study we endeavor to better understand and communicate about a segment of the current landscape of the Bible translation process as it relates to technology and identify pertinent issues. We conduct several interviews with individuals working in different levels of the Bible translation process from multiple organizations to identify gaps and bottlenecks where technology (including recent advances in AI) could potentially play a pivotal role in reducing translation time and improving overall quality.
This article provides recommendations for implementing quantitative susceptibility mapping (QSM) for clinical brain research. It is a consensus of the ISMRM Electro-Magnetic Tissue Properties Study Group. While QSM technical development continues to advance rapidly, the current QSM methods have been demonstrated to be repeatable and reproducible for generating quantitative tissue magnetic susceptibility maps in the brain. However, the many QSM approaches available give rise to the need in the neuroimaging community for guidelines on implementation. This article describes relevant considerations and provides specific implementation recommendations for all steps in QSM data acquisition, processing, analysis, and presentation in scientific publications. We recommend that data be acquired using a monopolar 3D multi-echo GRE sequence, that phase images be saved and exported in DICOM format and unwrapped using an exact unwrapping approach. Multi-echo images should be combined before background removal, and a brain mask created using a brain extraction tool with the incorporation of phase-quality-based masking. Background fields should be removed within the brain mask using a technique ba
The present study is concerned with large-eddy simulations (LES) of supersonic jet flows. The work addresses, in particular, the simulation of a perfectly expanded free jet flow with an exit Mach number of 1.4 and an exit temperature equal to the ambient temperature. Calculations are performed using a nodal discontinuous Galerkin method. The present effort studies the effects of mesh and polynomial refinement on the solution. The present calculations consider computational meshes and polynomial orders such that the number of degrees of freedom (DOFs) in the solution ranges from 50 to 410 million. Mean velocity results and root mean square (RMS) values of velocity fluctuations indicate a better agreement with experimental data as the resolution is increased. The generated data provide a good understanding of the effects of increasing the discretization refinement for LES calculations of jet flows. The present results can guide future simulations of similar flow configurations.
This systematic mapping study consisted of tracking the scientific literature that addresses the issue of analogies as a didactic strategy in science teaching. An analogy can be understood as comparing an existing knowledge with a new knowledge to achieve a better understanding of the new knowledge as a result of the comparison of similarities; or in other words, use students' own concepts to introduce new concepts using comparisons between the two. The purpose of this study was to identify, analyze, synthesize and evaluate research works that touched on this topic, with this, to have knowledge about the models of uses of analogies, most used didactic strategies, research methodologies in this field and how to evaluate the learning effectiveness of working with analogies. The methodology that was used is the systematic mapping study; Five questions were posed that guided the information tracking process. Later, the electronic documents in English for the last twenty years were traced in five databases related to the educational field. Finally, it is concluded by responding to the purpose of the study where it is evident that, broadly speaking, the research methodologies in this fie