搜索 — ResearchTracker

The Apache Software Foundation (ASF) ecosystem underpins a vast portion of modern software infrastructure, powering widely used components such as Log4j, Tomcat, and Struts. However, the ubiquity of these libraries has made them prime targets for high-impact security vulnerabilities, as illustrated by incidents like Log4Shell. Despite their widespread adoption, Apache projects are not immune to recurring and severe security weaknesses. We conduct a historical analysis of the Apache ecosystem to follow the "breadcrumb trail of vulnerabilities" by compiling a comprehensive dataset of Common Vulnerabilities and Exposures (CVEs) and Common Weakness Enumerations (CWEs). We examine trends in exploit recurrence, disclosure timelines, and remediation practices. Our analysis is guided by four key research questions: (1) What are the most persistent and repeated CWEs in Apache libraries? (2) How long do CVEs persist before being addressed? (3) What is the delay between CVE introduction and official disclosure? and (4) How long after disclosure are CVEs remediated? We present a detailed timeline of vulnerability lifecycle stages across Apache libraries and offer insights to improve secure cod

Comparative analysis of large data processing in Apache Spark using Java, Python and Scala

arXiv2025-10-21作者：Ivan Borodii, Illia Fedorovych, Halyna Osukhivska

During the study, the results of a comparative analysis of the process of handling large datasets using the Apache Spark platform in Java, Python, and Scala programming languages were obtained. Although prior works have focused on individual stages, comprehensive comparisons of full ETL workflows across programming languages using Apache Iceberg remain limited. The analysis was performed by executing several operations, including downloading data from CSV files, transforming and loading it into an Apache Iceberg analytical table. It was found that the performance of the Spark algorithm varies significantly depending on the amount of data and the programming language used. When processing a 5-megabyte CSV file, the best result was achieved in Python: 6.71 seconds, which is superior to Scala's score of 9.13 seconds and Java's time of 9.62 seconds. For processing a large CSV file of 1.6 gigabytes, all programming languages demonstrated similar results: the fastest performance was showed in Python: 46.34 seconds, while Scala and Java showed results of 47.72 and 50.56 seconds, respectively. When performing a more complex operation that involved combining two CSV files into a single data

搜索结果：Apache

CVE Breadcrumbs: Tracking Vulnerabilities Through Versioned Apache Libraries

Comparative analysis of large data processing in Apache Spark using Java, Python and Scala

Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory

Palomar and Apache Point Spectrophotometry of Interstellar Comet 3I/ATLAS

Bug Priority Change: An Empirical Study on Apache Projects

Flood Data Analysis on SpaceNet 8 Using Apache Sedona

A GPU-accelerated Molecular Docking Workflow with Kubernetes and Apache Airflow

Distributed Record Linkage in Healthcare Data with Apache Spark

Large-Scale Learning from Data Streams with Apache SAMOA

Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Technical Report: On the Usability of Hadoop MapReduce, Apache Spark &amp; Apache Flink for Data Science

On Efficiently Partitioning a Topic in Apache Kafka

Two-sample KS test with approxQuantile in Apache Spark

Photometric rotation periods for 107 M dwarfs from the APACHE survey

An accurate IoT Intrusion Detection Framework using Apache Spark

Apache VXQuery: A Scalable XQuery Implementation

Nowcasting the Financial Time Series with Streaming Data Analytics under Apache Spark

Identifying the potential of Near Data Computing for Apache Spark

MultiCloud Resource Management using Apache Mesos for Planned Integration with Apache Airavata

Technical Report: On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science