Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. We present an approach to this problem by developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics. We investigated two model classes consisting of sparse regression and decision trees. We obtained the best performance by using a decision tree with depth 10. We obtained an R-square value of 0.76 which is better than the values reported in the literature for similar problems.
Healthcare is one of the most promising areas for machine learning models to make a positive impact. However, successful adoption of AI-based systems in healthcare depends on engaging and educating stakeholders from diverse backgrounds about the development process of AI models. We present a broadly accessible overview of the development life cycle of clinical AI models that is general enough to be adapted to most machine learning projects, and then give an in-depth case study of the development process of a deep learning based system to detect aortic aneurysms in Computed Tomography (CT) exams. We hope other healthcare institutions and clinical practitioners find the insights we share about the development process useful in informing their own model development efforts and to increase the likelihood of successful deployment and integration of AI in healthcare.
Machine learning (ML) transforms healthcare by enabling predictive analytics, personalized treatments, and improved patient outcomes. However, traditional ML workflows often require specialized skills, infrastructure, and resources, limiting accessibility for many healthcare professionals. This paper explores how BigQuery ML Cloud service helps healthcare researchers and data analysts to build and deploy models using SQL, without need for advanced ML knowledge. Our results demonstrate that the Boosted Tree model achieved the highest performance among the three models making it highly effective for diabetes prediction. BigQuery ML directly integrates predictive analytics into their workflows to inform decision-making and support patient care. We reveal this capability through a case study on diabetes prediction using the Diabetes Health Indicators Dataset. Our study underscores BigQuery ML's role in democratizing machine learning, enabling faster, scalable, and efficient predictive analytics that can directly enhance healthcare decision-making processes. This study aims to bridge the gap between advanced machine learning and practical healthcare analytics by providing detailed insig
How should we evaluate the effect of a policy on the likelihood of an undesirable event, such as conflict? The significance test has three limitations. First, relying on statistical significance misses the fact that uncertainty is a continuous scale. Second, focusing on a standard point estimate overlooks the variation in plausible effect sizes. Third, the criterion of substantive significance is rarely explained or justified. A new Bayesian decision-theoretic model, "causal binary loss function model," overcomes these issues. It compares the expected loss under a policy intervention with the one under no intervention. These losses are computed based on a particular range of the effect sizes of a policy, the probability mass of this effect size range, the cost of the policy, and the cost of the undesirable event the policy intends to address. The model is more applicable than common statistical decision-theoretic models using the standard loss functions or capturing costs in terms of false positives and false negatives. I exemplify the model's use through three applications and provide an R package.
In this work, deep learning techniques for brain age prediction from magnetic resonance images are investigated, aiming to assist in the identification of biomarkers of the natural aging process. The identification of biomarkers is useful for detecting an early-stage neurodegenerative process, as well as for predicting age-related or non-age-related cognitive decline. Two techniques are implemented and compared in this work: a 3D Convolutional Neural Network applied to the volumetric image and a 2D Convolutional Neural Network applied to slices from the axial plane, with subsequent fusion of individual predictions. The best result was obtained by the 2D model, which achieved a mean absolute error of 3.83 years. -- Neste trabalho são investigadas técnicas de aprendizado profundo para a predição da idade cerebral a partir de imagens de ressonância magnética, visando auxiliar na identificação de biomarcadores do processo natural de envelhecimento. A identificação de biomarcadores é útil para a detecção de um processo neurodegenerativo em estágio inicial, além de possibilitar prever um declínio cognitivo relacionado ou não à idade. Duas técnicas são implementadas e comparadas neste tra
Reinforcement learning (RL) can be used to learn treatment policies and aid decision making in healthcare. However, given the need for generalization over complex state/action spaces, the incorporation of function approximators (e.g., deep neural networks) requires model selection to reduce overfitting and improve policy performance at deployment. Yet a standard validation pipeline for model selection requires running a learned policy in the actual environment, which is often infeasible in a healthcare setting. In this work, we investigate a model selection pipeline for offline RL that relies on off-policy evaluation (OPE) as a proxy for validation performance. We present an in-depth analysis of popular OPE methods, highlighting the additional hyperparameters and computational requirements (fitting/inference of auxiliary models) when used to rank a set of candidate policies. We compare the utility of different OPE methods as part of the model selection pipeline in the context of learning to treat patients with sepsis. Among all the OPE methods we considered, fitted Q evaluation (FQE) consistently leads to the best validation ranking, but at a high computational cost. To balance thi
The recent advancements in computing systems and wireless communications have made healthcare systems more efficient than before. Modern healthcare devices can monitor and manage different health conditions of the patients automatically without any manual intervention from medical professionals. Additionally, the use of implantable medical devices (IMDs), body area networks (BANs), and Internet of Things (IoT) technologies in healthcare systems improve the overall patient monitoring and treatment process. However, these systems are complex in software and hardware, and optimizing between security, privacy, and treatment is crucial for healthcare systems as any security or privacy violation can lead to severe effects on patients' treatments and overall health conditions. Indeed, the healthcare domain is increasingly facing security challenges and threats due to numerous design flaws and the lack of proper security measures in healthcare devices and applications. In this paper, we explore various security and privacy threats to healthcare systems and discuss the consequences of these threats. We present a detailed survey of different potential attacks and discuss their impacts. Furth
Facial recognition powered by Artificial Intelligence has achieved high accuracy in specific scenarios and applications. Nevertheless, it faces significant challenges regarding privacy and identity management, particularly when unknown individuals appear in the operational context. This paper presents the design, implementation, and evaluation of a facial recognition system within a federated learning framework tailored to open-set scenarios. The proposed approach integrates the OpenMax algorithm into federated learning, leveraging the exchange of mean activation vectors and local distance measures to reliably distinguish between known and unknown subjects. Experimental results validate the effectiveness of the proposed solution, demonstrating its potential for enhancing privacy-aware and robust facial recognition in distributed environments. -- El reconocimiento facial impulsado por Inteligencia Artificial ha demostrado una alta precisión en algunos escenarios y aplicaciones. Sin embargo, presenta desafíos relacionados con la privacidad y la identificación de personas, especialmente considerando que pueden aparecer sujetos desconocidos para el sistema que lo implementa. En este tr
The COVID-19 pandemic and other ongoing health crises have underscored the need for prompt healthcare services worldwide. The traditional healthcare system, centered around hospitals and clinics, has proven inadequate in the face of such challenges. Intelligent wearable devices, a key part of modern healthcare, leverage Internet of Things technology to collect extensive data related to the environment as well as psychological, behavioral, and physical health. However, managing the substantial data generated by these wearables and other IoT devices in healthcare poses a significant challenge, potentially impeding decision-making processes. Recent interest has grown in applying data analytics for extracting information, gaining insights, and making predictions. Additionally, machine learning, known for addressing various big data and networking challenges, has seen increased implementation to enhance IoT systems in healthcare. This chapter focuses exclusively on exploring the hurdles encountered when integrating ML methods into the IoT healthcare sector. It offers a comprehensive summary of current research challenges and potential opportunities, categorized into three scenarios: IoT
Neste artigo discutimos uma nova demonstracao experimental da independencia das propriedades dos corpos (massa, composicao quimica, forma, etc.) na queda livre. Eh uma das experiencias mais simples, porem uma das mais importantes da Mecanica, tendo sido realizada e repensada repetidamente por diversos cientistas tais como Galileu e Newton. Nossa versao eh introduzir dentro de uma garrafa fechada e transparente uma pena e uma pedra observando a queda simultanea destes corpos. Por nao haver a necessidade de produzir vacuo, esta versao pode ser repetida por qualquer aluno e professor de ensino medio e universitario em qualquer ambiente, evidenciando sua viabilidade e aplicabilidade na sala de aula. English version of abstract: In this paper we discuss a new experimental demonstration of the independence of the properties of bodies (mass, chemical composition, shape, etc) in free fall. This is one of the simplest experiments in mechanics, though one of the most important ones, having been repeatedly carried out and rethought by several scientists such as Galileo and Newton. Our version of this famous experiment uses one bottle (closed and transparent), in which we introduce a feather a
Reliance on scanned documents and fax communication for healthcare referrals leads to high administrative costs and errors that may affect patient care. In this work we propose a hybrid model leveraging LayoutLMv3 along with domain-specific rules to identify key patient, physician, and exam-related entities in faxed referral documents. We explore some of the challenges in applying a document understanding model to referrals, which have formats varying by medical practice, and evaluate model performance using MUC-5 metrics to obtain appropriate metrics for the practical use case. Our analysis shows the addition of domain-specific rules to the transformer model yields greatly increased precision and F1 scores, suggesting a hybrid model trained on a curated dataset can increase efficiency in referral management.
The granting process of all credit institutions rejects applicants who seem risky regarding the repayment of their debt. A credit score is calculated and associated with a cut-off value beneath which an applicant is rejected. Developing a new score implies having a learning dataset in which the response variable good/bad borrower is known, so that rejects are de facto excluded from the learning process. We first introduce the context and some useful notations. Then we formalize if this particular sampling has consequences on the score's relevance. Finally, we elaborate on methods that use not-financed clients' characteristics and conclude that none of these methods are satisfactory in practice using data from Crédit Agricole Consumer Finance. ----- Un système d'octroi de crédit peut refuser des demandes de prêt jugées trop risquées. Au sein de ce système, le score de crédit fournit une valeur mesurant un risque de défaut, valeur qui est comparée à un seuil d'acceptabilité. Ce score est construit exclusivement sur des données de clients financés, contenant en particulier l'information `bon ou mauvais payeur', alors qu'il est par la suite appliqué à l'ensemble des demandes. Un tel sc
Federated learning is the process of developing machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, and applications a practitioner should be aware of in the topic of federated learning. This paper aims to lay out existing research and list the possibilities of federated learning for healthcare industries.
From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly comp
This work explores the use of Role Playing Games (RPG) as an active methodology in teaching Modern Physics, focusing on a game called Newton's Revenge. The game was developed with the aim of engaging students in collaborative and investigative learning processes, using gamification elements to increase motivation and involvement. Based on the constructivist theories of Piaget and Vygotsky, the RPG stimu- ewline lates cognitive and social development by placing students in the roles of historical science figures. Through contextualized physical challenges, such as understanding the Photoelectric Effect, participants actively construct knowledge. This study presents preliminary learning data obtained through pre- and post-tests, as well as evaluates students' perceptions of using educational games in science education. The results indicate that the use of RPG can be an effective tool for teaching Modern Physics, promoting greater engagement and understanding of scientific concepts.
In this work, we initiate the study of light-ray operators in four-dimensional de Sitter space focusing on null integrals of the stress tensor. In Minkowski space, the null integral of the stress tensor unifies several ostensibly different constructions, functioning simultaneously as the energy flux operator, the angular contribution to a conserved charge, the averaged null energy operator, and the light transform of the stress tensor. However, we show that the de Sitter analogs of these various interpretations do not necessarily coincide but rather lead to distinct, observer-dependent light-ray operators. We construct four such de Sitter analogs and analyze their matrix elements in a free, conformally coupled scalar theory, showing that they exhibit the expected symmetry and positivity properties.
Machine Learning (ML) has recently shown tremendous success in modeling various healthcare prediction tasks, ranging from disease diagnosis and prognosis to patient treatment. Due to the sensitive nature of medical data, privacy must be considered along the entire ML pipeline, from model training to inference. In this paper, we conduct a review of recent literature concerning Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus on privacy-preserving training and inference-as-a-service, and perform a comprehensive review of existing trends, identify challenges, and discuss opportunities for future research directions. The aim of this review is to guide the development of private and efficient ML models in healthcare, with the prospects of translating research efforts into real-world settings.
The LHCb upgrade represents a major change of the experiment. The detectors have been almost completely renewed to allow running at an instantaneous luminosity five times larger than that of the previous running periods. Readout of all detectors into an all-software trigger is central to the new design, facilitating the reconstruction of events at the maximum LHC interaction rate, and their selection in real time. The experiment's tracking system has been completely upgraded with a new pixel vertex detector, a silicon tracker upstream of the dipole magnet and three scintillating fibre tracking stations downstream of the magnet. The whole photon detection system of the RICH detectors has been renewed and the readout electronics of the calorimeter and muon systems have been fully overhauled. The first stage of the all-software trigger is implemented on a GPU farm. The output of the trigger provides a combination of totally reconstructed physics objects, such as tracks and vertices, ready for final analysis, and of entire events which need further offline reprocessing. This scheme required a complete revision of the computing model and rewriting of the experiment's software.
Galaxy clusters are important cosmological probes since their abundance and spatial distribution are directly linked to structure formation on large scales. The principal uncertainty source on the cosmological parameter constraints concerns the cluster mass estimation from mass proxies. In addition, future surveys will provide a large amount of data, requiring an improvement in the accuracy of other elements used in the construction of cluster likelihoods. Therefore, accurate modeling of the mass-observable relations and reducing the effect of different systematic errors are fundamental steps for the success of cluster cosmology. In this work, we briefly review the abundance of galaxy clusters and discuss many sources of uncertainty. Os aglomerados de galáxias são importantes sondas cosmológicas, já que a abundância e a distribuição espacial desses objetos estão diretamente ligadas à formação de estruturas em grandes escalas. A maior fonte de incerteza nas restrições de parâmetros cosmológicos é originária das estimativas das massas dos aglomerados a partir da relação massa-observável. Além disso, os próximos grandes levantamentos fornecerão uma grande quantidade de dados, requeren
There is strong interest among payers to identify emerging healthcare cost drivers to support early intervention. However, many challenges arise in analyzing large, high dimensional, and noisy healthcare data. In this paper, we propose a systematic approach that utilizes hierarchical and multi-resolution search strategies using enhanced statistical process control (SPC) algorithms to surface high impact cost drivers. Our approach aims to provide interpretable, detailed, and actionable insights of detected change patterns attributing to multiple demographic and clinical factors. We also proposed an algorithm to identify comparable treatment offsets at the population level and quantify the cost impact on their utilization changes.