共找到 20 条结果
The performance of a voice anonymization system is typically measured according to its ability to hide the speaker's identity and keep the data's utility for downstream tasks. This means that the requirements the anonymization should fulfill depend on the context in which it is used and may differ greatly between use cases. However, these use cases are rarely specified in research papers. In this paper, we study the implications of use case-specific requirements on the design of voice anonymization methods. We perform an extensive literature analysis and user study to collect possible use cases and to understand the expectations of the general public towards such tools. Based on these studies, we propose the first taxonomy of use cases for voice anonymization, and derive a set of requirements and design criteria for method development and evaluation. Using this scheme, we propose to focus more on use case-oriented research and development of voice anonymization systems.
This paper describes a method for creating compelling safety cases. The method seeks to help improve safety case practice in order to address the weaknesses identified in current practice, in particular confirmation bias, after-the-fact assurance and safety cases as a paperwork exercise. Rather than creating new notations and tools to address these issues, we contend that it is improvements in the safety case process that will make the most significant improvement to safety case practice. Our method builds upon established approaches and best practice to create an approach that will ensure safety cases are risk-focused, seek to identify ways in which the system may not be safe (rather than just assuming it is), drive safe design and operation of the system (influencing the system itself rather than just documenting what's there), are used to support decisions made throughout the life of the system, including system operation and change, and encourage developers and operators to think about and understand why their system is safe (and when it isn't). A simple example of an infusion pump system is used to illustrate how the new method is applied in practice.
With the wide adoption of automated speech recognition (ASR) systems, it is increasingly important to test and improve ASR systems. However, collecting and executing speech test cases is usually expensive and time-consuming, motivating us to strategically prioritize speech test cases. A key question is: how to determine the ideal order of collecting and executing speech test cases to uncover more errors as early as possible? Each speech test case consists of a piece of audio and the corresponding reference text. In this work, we propose PROPHET (PRiOritizing sPeecH tEsT), a tool that predicts potential error-uncovering speech test cases only based on their reference texts. Thus, PROPHET analyzes test cases and prioritizes them without running the ASR system, which can analyze speech test cases at a large scale. We evaluate 6 different prioritization methods on 3 ASR systems and 12 datasets. Given the same testing budget, we find that our approach uncovers 12.63% more wrongly recognized words than the state-of-the-art method. We select test cases from the prioritized list to fine-tune ASR systems and analyze how our approach can improve the ASR system performance. Statistical tests
Context: It is an enigma that agile projects can succeed 'without requirements' when weak requirements engineering is a known cause for project failures. While agile development projects often manage well without extensive requirements test cases are commonly viewed as requirements and detailed requirements are documented as test cases. Objective: We have investigated this agile practice of using test cases as requirements to understand how test cases can support the main requirements activities, and how this practice varies. Method: We performed an iterative case study at three companies and collected data through 14 interviews and two focus groups. Results: The use of test cases as requirements poses both benefits and challenges when eliciting, validating, verifying, and managing requirements, and when used as a documented agreement. We have identified five variants of the test-cases-as-requirements practice, namely de facto, behaviour-driven, story-test driven, stand-alone strict and stand-alone manual for which the application of the practice varies concerning the time frame of requirements documentation, the requirements format, the extent to which the test cases are a machine
As frontier artificial intelligence (AI) systems become more capable, it becomes more important that developers can explain why their systems are sufficiently safe. One way to do so is via safety cases: reports that make a structured argument, supported by evidence, that a system is safe enough in a given operational context. Safety cases are already common in other safety-critical industries such as aviation and nuclear power. In this paper, we explain why they may also be a useful tool in frontier AI governance, both in industry self-regulation and government regulation. We then discuss the practicalities of safety cases, outlining how to produce a frontier AI safety case and discussing what still needs to happen before safety cases can substantially inform decisions.
It is a conundrum that agile projects can succeed 'without requirements' when weak requirements engineering is a known cause for project failures. While Agile development projects often manage well without extensive requirements documentation, test cases are commonly used as requirements. We have investigated this agile practice at three companies in order to understand how test cases can fill the role of requirements. We performed a case study based on twelve interviews performed in a previous study. The findings include a range of benefits and challenges in using test cases for eliciting, validating, verifying, tracing and managing requirements. In addition, we identified three scenarios for applying the practice, namely as a mature practice, as a de facto practice and as part of an agile transition. The findings provide insights into how the role of requirements may be met in agile development including challenges to consider.
The mission of resilience of Ukrainian cities calls for international collaboration with the scientific community to increase the quality of information by identifying and integrating information from various news and social media sources. Linked Data technology can be used to unify, enrich, and integrate data from multiple sources. In our work, we focus on datasets about damaging events in Ukraine due to Russia's invasion between February 2022 and the end of April 2023. We convert two selected datasets to Linked Data and enrich them with additional geospatial information. Following that, we present an algorithm for the detection of identical events from different datasets. Our pipeline makes it easy to convert and enrich datasets to integrated Linked Data. The resulting dataset consists of 10K reported events covering damage to hospitals, schools, roads, residential buildings, etc. Finally, we demonstrate in use cases how our dataset can be applied to different scenarios for resilience purposes.
Background: One of the most challenging keys to understand COVID-19 evolution is to have a measure on those mild cases which are never tested because their few symptoms are soft and/or fade away soon. The problem is not only that they are difficult to identify and test, but also that it is believed that they may constitute the bulk of the cases and could be crucial in the pandemic equation. Methods: We present a novel algorithm to extract the number of these mild cases by correlating a COVID-line calls to reported cases in given districts. The key assumption is to realize that, being a highly contagious disease, the number of calls by mild cases should be proportional to the number of reported cases. Whereas a background of calls not related to infected people should be proportional to the district population. Results: We find that for Buenos Aires Province, in addition to the background, there are in signal 6.6 +/- 0.4 calls per each reported COVID-19 case. Using this we estimate in Buenos Aires Province 20 +/- 2 COVID-19 symptomatic cases for each one reported. Conclusions: A very simple algorithm that models the COVID-line calls as sum of signal plus background allows to estimat
Generalised Fermat equation (GFE) is the equation of the form $ax^p+by^q=cz^r$, where $a,b,c,p,q,r$ are positive integers. If $1/p+1/q+1/r<1$, GFE is known to have at most finitely many primitive integer solutions $(x,y,z)$. A large body of the literature is devoted to finding such solutions explicitly for various six-tuples $(a,b,c,p,q,r)$, as well as for infinite families of such six-tuples. This paper surveys the families of parameters for which GFE has been solved. Although the proofs are not discussed here, collecting these references in one place will make it easier for the readers to find the relevant proof techniques in the original papers. Also, this survey will help the readers to avoid duplicate work by solving the already solved cases.
We propose and study a fully efficient method to estimate associations of an exposure with disease incidence when both, incident cases and prevalent cases, i.e. individuals who were diagnosed with the disease at some prior time point and are alive at the time of sampling, are included in a case-control study. We extend the exponential tilting model for the relationship between exposure and case status to accommodate two case groups, and correct for the survival bias in the prevalent cases through a tilting term that depends on the parametric distribution of the backward time, i.e. the time from disease diagnosis to study enrollment. We construct an empirical likelihood that also incorporates the observed backward times for prevalent cases, obtain efficient estimates of odds ratio parameters that relate exposure to disease incidence and propose a likelihood ratio test for model parameters that has a standard chi-squared distribution. We quantify the changes in efficiency of association parameters when incident cases are supplemented with, or replaced by, prevalent cases in simulations. We illustrate our methods by estimating associations of single nucleotide polymorphisms (SNPs) wit
Typically, case-control studies to estimate odds-ratios associating risk factors with disease incidence from logistic regression only include cases with newly diagnosed disease. Recently proposed methods allow incorporating information on prevalent cases, individuals who survived from disease diagnosis to sampling, into cross-sectionally sampled case-control studies under parametric assumptions for the survival time after diagnosis. Here we propose and study methods to additionally use prospectively observed survival times from prevalent and incident cases to adjust logistic models for the time between disease diagnosis and sampling, the backward time, for prevalent cases. This adjustment yields unbiased odds-ratio estimates from case-control studies that include prevalent cases. We propose a computationally simple two-step generalized method-of-moments estimation procedure. First, we estimate the survival distribution based on a semi-parametric Cox model using an expectation-maximization algorithm that yields fully efficient estimates and accommodates left truncation for the prevalent cases and right censoring. Then, we use the estimated survival distribution in an extension of th
Recently, assurance cases have received much attentions in the field of software-based computer systems and IT services. However, software very often changes and there are no strong regulations for software. These facts are main two challenges to be addressed in software assurance cases. We propose a development method of assurance cases by means of continuous revision at every stage of the system lifecycle, including in-operation and service recovery in failure cases. The quality of dependability arguments are improved by multiple stakeholders who check with each other. This paper reported our experience of the proposed method in a case of the ASPEN education service. The case study demonstrate that the continuos updates create a significant amount of active risk communications between stakeholders. This gives us a promising perspective for the long-term improvement of service dependability with the lifecycle assurance cases.
One of the key indicators used in tracking the evolution of an infectious disease isthe reproduction number. This quantity is usually computed using the reportednumber of cases, but ignoring that many more individuals may be infected (e.g.asymptomatics). We propose a statistical procedure to quantify the impact of un-detected infectious cases on the determination of the effective reproduction number. Our approach is stochastic, data-driven and not relying on any compartmentalmodel. It is applied to the COVID-19 case in eight different countries and all Italianregions, showing that the effect of undetected cases leads to estimates of the effective reproduction numbers larger than those obtained only with the reported cases by factors ranging from two to ten. Our findings urge caution about deciding when and how to relax containment measures based on the value of the reproduction number.
Acceptance testing is a validation activity performed to ensure the conformance of software systems with respect to their functional requirements. In safety critical systems, it plays a crucial role since it is enforced by software standards. Test engineers need to identify all the representative test execution scenarios from requirements, determine the runtime conditions that trigger these scenarios, and finally provide the input data that satisfy these conditions. Given that requirements specifications are typically large and often provided in natural language, the generation of acceptance test cases tends to be expensive and error-prone. In this paper, we present UMTG, an approach that supports the generation of executable, system-level, acceptance test cases from requirements specifications in natural language, with the goal of reducing the manual effort required to generate test cases and ensuring requirements coverage. More specifically, UMTG automates the generation of acceptance test cases based on use case specifications and a domain model for the system under test, which are commonly produced in many development environments. Unlike existing approaches, it does not impose
A long-standing open question asks for the minimum number of vectors needed to form an unextendible product basis in a given bipartite or multipartite Hilbert space. A partial solution was found by Alon and Lovasz in 2001, but since then only a few other cases have been solved. We solve all remaining bipartite cases, as well as a large family of multipartite cases.
The analysis of case-control studies with several subtypes of cases is increasingly common, e.g. in cancer epidemiology. For matched designs, we show that a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among the subtypes of cases, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of regression models in a stratified setting. For unmatched designs, we compare two standard methods based on L1-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among the subtypes of cases: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are pr
Machine learning shows promise in predicting the outcome of legal cases, but most research has concentrated on civil law cases rather than case law systems. We identified two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary to consider the evolution of legal principles over time, as early cases may adhere to different legal contexts. In this paper, we proposed a new framework named PILOT (PredictIng Legal case OuTcome) for case outcome prediction. It comprises two modules for relevant case retrieval and temporal pattern handling, respectively. To benchmark the performance of existing legal case outcome prediction models, we curated a dataset from a large-scale case law database. We demonstrate the importance of accurately identifying precedent cases and mitigating the temporal shift when making predictions for case law, as our method shows a significant improvement over the prior methods that focus on civil law case outcome predictions.
Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with whose values is AI to align, and how should AI do so? We propose a complementary approach to constitutional AI alignment, grounded in ideas from case-based reasoning (CBR), that focuses on the construction of policies through judgments on a set of cases. We present a process to assemble such a case repository by: 1) gathering a set of ``seed'' cases -- questions one may ask an AI system -- in a particular domain, 2) eliciting domain-specific key dimensions for cases through workshops with domain experts, 3) using LLMs to generate variations of cases not seen in the wild, and 4) engaging with the public to judge and improve cases. We then discuss how such a case repository could assist in AI alignment, both through directly acting as precedents to ground acceptable behaviors, and as a medium for i
Legal case retrieval for sourcing similar cases is critical in upholding judicial fairness. Different from general web search, legal case retrieval involves processing lengthy, complex, and highly specialized legal documents. Existing methods in this domain often overlook the incorporation of legal expert knowledge, which is crucial for accurately understanding and modeling legal cases, leading to unsatisfactory retrieval performance. This paper introduces KELLER, a legal knowledge-guided case reformulation approach based on large language models (LLMs) for effective and interpretable legal case retrieval. By incorporating professional legal knowledge about crimes and law articles, we enable large language models to accurately reformulate the original legal case into concise sub-facts of crimes, which contain the essential information of the case. Extensive experiments on two legal case retrieval benchmarks demonstrate superior retrieval performance and robustness on complex legal case queries of KELLER over existing methods.
Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose AutoRubric, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.