共找到 20 条结果
Qualitative data is often subjective, rich, and consists of in-depth information normally presented in the form of words. Analysing qualitative data entails reading a large amount of transcripts looking for similarities or differences, and subsequently finding themes and developing categories. Traditionally, researchers ‘cut and paste’ and use coloured pens to categorise data. Recently, the use of software specifically designed for qualitative data management greatly reduces technical sophistication and eases the laborious task, thus making the process relatively easier. A number of computer software packages has been developed to mechanise this ‘coding’ process as well as to search and retrieve data. This paper illustrates the ways in which NVivo can be used in the qualitative data analysis process. The basic features and primary tools of NVivo which assist qualitative researchers in managing and analysing their data are described.
On synthetic data: a brief introduction for data protection law dummies 2 Synthetic data is attracting increasing attention from technicians and legal scholars in recent years.This is especially noticeable among entities and people working on data-driven technologies, particularly in the artificial intelligence application development and testing sector, where sheer volumes of data are needed.In these circles, synthetic data has become a growing trend under the "fake it till you make it" concept by promising to alleviate existing data access and analytics challenges while respecting data protection rules.Given the rising prospects and acceptance of data synthesis, there is a need to assess the legal implications of its generation and use, the starting point being the legal qualification of synthetic data.Synthetic data is a broad concept encompassing both personally and non-personally identifiable information.In this blog entry, we focus on the intersection between synthetic data and personal data.The reason for so doing is that generating synthetic data through personal data (including hybrid data) provides a more straightforward assessment and is more suitable for the introductory purposes of this blog entry.We acknowledge that issues surrounding the qualification as personal data of existing models and background knowledge used in data synthesis may be particularly relevant to this topic.However, these issues will not be dealt with in this entry.The present blog post hence narrows down to delimiting the notion of synthetic data generated from personal data and studying its legal qualification within the European data protection framework.Three main
In the dynamic field of modern artificial intelligence, GPT-4 emerges as a key participant, addressing challenges similar to Big Data's 5Vs—Volume, Velocity, Variety, Veracity, and Value. This study explores the convergence of GPT-4's operational framework with the core aspects of Big Data, highlighting the model's flexibility and efficacy in handling intricate datasets. GPT-4 excels in managing extensive textual data, aligning with Big Data's voluminous nature, and demonstrates real-time processing capabilities to match the rapid evolution of Big Data. While initially text-oriented, GPT-4 expands into image recognition, enhancing versatility and aligning with Big Data's Variety aspect. The model's evolving proficiency in non-textual domains broadens its utility. Addressing Veracity, GPT-4 critically evaluates diverse training data, mirroring Big Data's challenges in ensuring accuracy. Its outputs, offering context and insights, contribute to actionable knowledge and align with Big Data's objectives. Despite differences, GPT-4 serves as a microcosm, providing scalable and accessible data processing capabilities, establishing itself as a crucial tool in the AI domain. This paper emphasizes the parallels and underscores GPT-4's adaptability in handling complex datasets.
Big data is a collection of massive and complex data sets and data volume that include the huge quantities of data, data management capabilities, social media analytics and real-time data. Big data analytics is the process of examining large amounts of data. There exist large amounts of heterogeneous digital data. Big data is about data volume and large data set's measured in terms of terabytes or petabytes. This phenomenon is called Bigdata. After examining of Bigdata, the data has been launched as Big Data analytics. In this paper, presenting the 5Vs characteristics of big data and the technique and technology used to handle big data. The challenges include capturing, analysis, storage, searching, sharing, visualization, transferring and privacy violations. It can neither be worked upon by using traditional SQL queries nor can the relational database management system (RDBMS) be used for storage. Though, a wide variety of scalable database tools and techniques has evolved. Hadoop is an open source distributed data processing is one of the prominent and well known solutions. The NoSQL has a non-relational database with the likes of MongoDB from Apache.
Data mining technology can search for potentially valuable knowledge from a large amount of data, mainly divided into data preparation and data mining, and expression and analysis of results. It is a mature information processing technology and applies database technology. Database technology is a software science that researches manages, and applies databases. The data in the database are processed and analyzed by studying the underlying theory and implementation methods of the structure, storage, design, management, and application of the database. We have introduced several databases and data mining techniques to help a wide range of clinical researchers better understand and apply database technology.
The damage identification process provides relevant information about the current state of a structure under inspection, and it can be approached from two different points of view. The first approach uses data-driven algorithms, which are usually associated with the collection of data using sensors. Data are subsequently processed and analyzed. The second approach uses models to analyze information about the structure. In the latter case, the overall performance of the approach is associated with the accuracy of the model and the information that is used to define it. Although both approaches are widely used, data-driven algorithms are preferred in most cases because they afford the ability to analyze data acquired from sensors and to provide a real-time solution for decision making; however, these approaches involve high-performance processors due to the high computational cost. As a contribution to the researchers working with data-driven algorithms and applications, this work presents a brief review of data-driven algorithms for damage identification in structural health-monitoring applications. This review covers damage detection, localization, classification, extension, and prognosis, as well as the development of smart structures. The literature is systematically reviewed according to the natural steps of a structural health-monitoring system. This review also includes information on the types of sensors used as well as on the development of data-driven algorithms for damage identification.
BACKGROUND: An array of environmental compounds is known to possess endocrine disruption (ED) potentials. Bisphenol A (BPA) and bisphenol A dimethacrylate (BPA-DM) are monomers used to a high extent in the plastic industry and as dental sealants. Alkylphenols such as 4-n-nonylphenol (nNP) and 4-n-octylphenol (nOP) are widely used as surfactants. OBJECTIVES: We investigated the effect in vitro of these four compounds on four key cell mechanisms including transactivation of a) the human estrogen receptor (ER), b) the human androgen receptor (AR), c) the aryl hydrocarbon receptor (AhR), and d) aromatase activity. RESULTS: All four compounds inhibited aromatase activity and were agonists and antagonists of ER and AR, respectively. nNP increased AhR activity concentration-dependently and further increased the 2,3,7,8-tetrachlorodibenzo-p-dioxin AhR action. nOP caused dual responses with a weak increased and a decreased AhR activity at lower (10(-8) M) and higher concentrations (10(-5)-10(-4) M), respectively. AhR activity was inhibited with BPA (10(-5)-10(-4) M) and weakly increased with BPA-DM (10(-5) M), respectively. nNP showed the highest relative potency (REP) compared with the respective controls in the ER, AhR, and aromatase assays, whereas similar REP was observed for the four chemicals in the AR assay. CONCLUSION: Our in vitro data clearly indicate that the four industrial compounds have ED potentials and that the effects can be mediated via several cellular pathways, including the two sex steroid hormone receptors (ER and AR), aromatase activity converting testosterone to estrogen, and AhR; AhR is involved in syntheses of steroids and metabolism of steroids and xenobiotic compounds.
While magnetic resonance imaging (MRI) data is itself 3D, it is often difficult to adequately present the results papers and slides in 3D. As a result, findings of MRI studies are often presented in 2D instead. A solution is to create figures that include perspective and can convey 3D information; such figures can sometimes be produced by standard functional magnetic resonance imaging (fMRI) analysis packages and related specialty programs. However, many options cannot provide functionality such as visualizing activation clusters that are both cortical and subcortical (i.e., a 3D glass brain), the production of several statistical maps with an identical perspective in the 3D rendering, or animated renderings. Here I detail an approach for creating 3D visualizations of MRI data that satisfies all of these criteria. Though a 3D 'glass brain' rendering can sometimes be difficult to interpret, they are useful in showing a more overall representation of the results, whereas the traditional slices show a more local view. Combined, presenting both 2D and 3D representations of MR images can provide a more comprehensive view of the study's findings.
As a proposed Internet architecture, Named Data Networking (NDN) is designed to network the world of computing devices by naming data instead of naming data containers as IP does today. With this change, NDN brings a number of benefits to network communication, including built-in multicast, in-network caching, multipath forwarding, and securing data directly. NDN also enables resilient communication in intermittently connected and mobile ad hoc environments, which is difficult to achieve by today's TCP/IP architecture. This paper offers a brief introduction to NDN's basic concepts and operations, together with an extensive reference list for the design and development of NDN for readers interested in further exploration of the subject.
Forest management inventories (FMIs) provide critical information, usually at the stand level, for forest management planning. A typical FMI includes (i) the delineation of the inventory area to stands by applying auxiliary information; (ii) the classification of the stands according to categorical attributes such as age, site fertility, main tree species, and stand development; and (iii) measurement, modelling, and prediction of stand attributes of interest. The emergence of wall-to-wall remote-sensing data has enabled a paradigm change in FMIs from highly subjective, visual assessments to objective, model-based inferences. Previously, optical remote-sensing data were used to complement visual assessments, especially in stand delineation and height measurements. The evolution of airborne laser scanning (ALS) has made objective estimation of forest characteristics with known accuracy possible. New optical and Lidar-based sensors and platforms will allow further improvements of accuracy. However, there are still bottlenecks related to species-specific stand attribute information in mixed stands and assessments of tree quality. Here, we concentrate on approaches and methods that have been applied in the Nordic countries in particular.
Nowadays, partly driven by many Web 2.0 applications, more and more social network data has been made publicly available and analyzed in one way or another. Privacy preserving publishing of social network data becomes a more and more important concern. In this paper, we present a brief yet systematic review of the existing anonymization techniques for privacy preserving publishing of social network data. We identify the new challenges in privacy preserving publishing of social network data comparing to the extensively studied relational case, and examine the possible problem formulation in three important dimensions: privacy, background knowledge, and data utility. We survey the existing anonymization methods for privacy preservation in two categories: clustering-based approaches and graph modification approaches.
In image and video data, visual pattern refers to re‐occurring composition of visual primitives. Such visual patterns extract the essence of the image and video data that convey rich information. However, unlike frequent patterns in transaction data, there are considerable visual content variations and complex spatial structures among visual primitives, which make effective exploration of visual patterns a challenging task. Many methods have been proposed to address the problem of visual pattern discovery during the past decade. In this article, we provide a review of the major progress in visual pattern discovery. We categorize the existing methods into two groups: bottom‐up pattern discovery and top‐down pattern modeling. The bottom‐up pattern discovery method starts with unordered visual primitives followed by merging the primitives until larger visual patterns are found. In contrast, the top‐down method starts with the modeling of visual primitive compositions and then infers the pattern discovery result. A summary of related applications is also presented. At the end we identify the open issues for future research. WIREs Data Mining Knowl Discov 2014, 4:24–37. doi: 10.1002/widm.1110 This article is categorized under: Algorithmic Development > Multimedia Algorithmic Development > Structure Discovery
Today, in data science, the large, complex, structured or unstructured, and heterogeneous data has gained significant attention. The data is being generated at a very rapid pace through various disparate potential resources and sensors, scientific instruments, and internet, especially the social media, are just to name a few. The considerable velocity of the volume expansion of the data possesses the serious challenges for the existing data processing systems. In this paper, we review some of the modern data models that claim to process the extremely large data such as Big Data (of petabyte size) in reliable and efficient way and are the leading contributors in NoSQL (largely being translated as 'not only SQL') era.
In this Section: 1. Brief Table of Contents 2. Full Table of Contents 1. BRIEF TABLE OF CONTENTS Chapter 1 Introduction Chapter 2 A Guide to Statistical Techniques: Using the Book Chapter 3 Review of Univariate and Bivariate Statistics Chapter 4 Cleaning Up Your Act: Screening Data Prior to Analysis Chapter 5 Multiple Regression Chapter 6 Analysis of Covariance Chapter 7 Multivariate Analysis of Variance and Covariance Chapter 8 Profile Analysis: The Multivariate Approach to Repeated Measures Chapter 9 Discriminant Analysis Chapter 10 Logistic Regression Chapter 11 Survival/Failure Analysis Chapter 12 Canonical Correlation Chapter 13 Principal Components and Factor Analysis Chapter 14 Structural Equation Modeling Chapter 15 Multilevel Linear Modeling Chapter 16 Multiway Frequency Analysis 2. FULL TABLE OF CONTENTS Chapter 1: Introduction Multivariate Statistics: Why? Some Useful Definitions Linear Combinations of Variables Number and Nature of Variables to Include Statistical Power Data Appropriate for Multivariate Statistics Organization of the Book Chapter 2: A Guide to Statistical Techniques: Using the Book Research Questions and Associated Techniques Some Further Comparisons A Decision Tree Technique Chapters Preliminary Check of the Data Chapter 3: Review of Univariate and Bivariate Statistics Hypothesis Testing Analysis of Variance Parameter Estimation Effect Size Bivariate Statistics: Correlation and Regression. Chi-Square Analysis Chapter 4: Cleaning Up Your Act: Screening Data Prior to Analysis Important Issues in Data Screening Complete Examples of Data Screening Chapter 5: Multiple Regression General Purpose and Description Kinds of Research Questions Limitations to Regression Analyses Fundamental Equations for Multiple Regression Major Types of Multiple Regression Some Important Issues. Complete Examples of Regression Analysis Comparison of Programs Chapter 6: Analysis of Covariance General Purpose and Description Kinds of Research Questions Limitations to Analysis of Covariance Fundamental Equations for Analysis of Covariance Some Important Issues Complete Example of Analysis of Covariance Comparison of Programs Chapter 7: Multivariate Analysis of Variance and Covariance General Purpose and Description Kinds of Research Questions Limitations to Multivariate Analysis of Variance and Covariance Fundamental Equations for Multivariate Analysis of Variance and Covariance Some Important Issues Complete Examples of Multivariate Analysis of Variance and Covariance Comparison of Programs Chapter 8: Profile Analysis: The Multivariate Approach to Repeated Measures General Purpose and Description Kinds of Research Questions Limitations to Profile Analysis Fundamental Equations for Profile Analysis Some Important Issues Complete Examples of Profile Analysis Comparison of Programs Chapter 9: Discriminant Analysis General Purpose and Description Kinds of Research Questions Limitations to Discriminant Analysis Fundamental Equations for Discriminant Analysis Types of Discriminant Analysis Some Important Issues Comparison of Programs Chapter 10: Logistic Regression General Purpose and Description Kinds of Research Questions Limitations to Logistic Regression Analysis Fundamental Equations for Logistic Regression Types of Logistic Regression Some Important Issues Complete Examples of Logistic Regression Comparison of Programs Chapter 11: Survival/Failure Analysis General Purpose and Description Kinds of Research Questions Limitations to Survival Analysis Fundamental Equations for Survival Analysis Types of Survival Analysis Some Important Issues Complete Example of Survival Analysis Comparison of Programs Chapter 12: Canonical Correlation General Purpose and Description Kinds of Research Questions Limitations Fundamental Equations for Canonical Correlation Some Important Issues Complete Example of Canonical Correlation Comparison of Programs Chapter 13: Principal Components and Factor Analysis General Purpose and Description Kinds of Research Questions Limitations Fundamental Equations for Factor Analysis Major Types of Factor Analysis Some Important Issues Complete Example of FA Comparison of Programs Chapter 14: Structural Equation Modeling General Purpose and Description Kinds of Research Questions Limitations to Structural Equation Modeling Fundamental Equations for Structural Equations Modeling Some Important Issues Complete Examples of Structural Equation Modeling Analysis. Comparison of Programs Chapter 15: Multilevel Linear Modeling General Purpose and Description Kinds of Research Questions Limitations to Multilevel Linear Modeling Fundamental Equations Types of MLM Some Important Issues Complete Example of MLM Comparison of Programs Chapter 16: Multiway Frequency Analysis General Purpose and Description Kinds of Research Questions Limitations to Multiway Frequency Analysis Fundamental Equations for Multiway Frequency Analysis Some Important Issues Complete Example of Multiway Frequency Analysis Comparison of Programs
Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data, such as medical image analysis. This survey focuses on Data Augmentation, a data-space solution to the problem of limited data. Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them. The image augmentation algorithms discussed in this survey include geometric transformations, color space augmentations, kernel filters, mixing images, random erasing, feature space augmentation, adversarial training, generative adversarial networks, neural style transfer, and meta-learning. The application of augmentation methods based on GANs are heavily covered in this survey. In addition to augmentation techniques, this paper will briefly discuss other characteristics of Data Augmentation such as test-time augmentation, resolution impact, final dataset size, and curriculum learning. This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing Data Augmentation. Readers will understand how Data Augmentation can improve the performance of their models and expand limited datasets to take advantage of the capabilities of big data.
BACKGROUND: The increased adoption of the internet, social media, wearable devices, e-health services, and other technology-driven services in medicine and healthcare has led to the rapid generation of various types of digital data, providing a valuable data source beyond the confines of traditional clinical trials, epidemiological studies, and lab-based experiments. METHODS: We provide a brief overview on the type and sources of real-world data and the common models and approaches to utilize and analyze real-world data. We discuss the challenges and opportunities of using real-world data for evidence-based decision making This review does not aim to be comprehensive or cover all aspects of the intriguing topic on RWD (from both the research and practical perspectives) but serves as a primer and provides useful sources for readers who interested in this topic. RESULTS AND CONCLUSIONS: Real-world hold great potential for generating real-world evidence for designing and conducting confirmatory trials and answering questions that may not be addressed otherwise. The voluminosity and complexity of real-world data also call for development of more appropriate, sophisticated, and innovative data processing and analysis techniques while maintaining scientific rigor in research findings, and attentions to data ethics to harness the power of real-world data.
A general inductive approach for analysis of qualitative evaluation data is described. The purposes for using an inductive approach are to (a) condense raw textual data into a brief, summary format; (b) establish clear links between the evaluation or research objectives and the summary findings derived from the raw data; and (c) develop a framework of the underlying structure of experiences or processes that are evident in the raw data. The general inductive approach provides an easily used and systematic set of procedures for analyzing qualitative data that can produce reliable and valid findings. Although the general inductive approach is not as strong as some other analytic strategies for theory or model development, it does provide a simple, straightforward approach for deriving findings in the context of focused evaluation questions. Many evaluators are likely to find using a general inductive approach less complicated than using other approaches to qualitative data analysis.
Recently there has been a lot of interest in “ensemble learning” — methods that generate many classifiers and aggregate their results. Two well-known methods are boosting (see, e.g., Shapire et al., 1998) and bagging Breiman (1996) of classification trees. In boosting, successive trees give extra weight to points incorrectly predicted by earlier predictors. In the end, a weighted vote is taken for prediction. In bagging, successive trees do not depend on earlier trees — each is independently constructed using a bootstrap sample of the data set. In the end, a simple majority vote is taken for prediction. Breiman (2001) proposed random forests, which add an additional layer of randomness to bagging. In addition to constructing each tree using a different bootstrap sample of the data, random forests change how the classification or regression trees are constructed. In standard trees, each node is split using the best split among all variables. In a random forest, each node is split using the best among a subset of predictors randomly chosen at that node. This somewhat counterintuitive strategy turns out to perform very well compared to many other classifiers, including discriminant analysis, support vector machines and neural networks, and is robust against overfitting (Breiman, 2001). In addition, it is very user-friendly in the sense that it has only two parameters (the number of variables in the random subset at each node and the number of trees in the forest), and is usually not very sensitive to their values. The randomForest package provides an R interface to the Fortran programs by Breiman and Cutler (available at http://www.stat.berkeley.edu/ users/breiman/). This article provides a brief introduction to the usage and features of the R functions.
Recent development in sensing and communication technologies has made the collection of a large amount of traffic data easy and transportation engineering has entered the big data era. The massive traffic data provides some good opportunities for Intelligent Transportation System (ITS), while some great challenges because of its characteristics of large value, variety, velocity, veracity, and volume. In recent few years, tensor decomposition has played an important role in traffic data analytic solutions and attached great interest from both academic and industrial areas. In this paper, the preliminary background and the implementation of tensor decomposition are presented. Then, some recent studies of tensor decomposition for traffic data imputation, traffic state prediction, and analysis of travel pattern are reviewed. Furthermore, advantages and disadvantages are discussed. Finally, remaining challenges of the application of tensor decomposition in transportation engineering are pointed out.
In recent years, there has been a growing interest in utilizing symptom-network models to study psychopathology and relevant risk factors, such as cognitive and physical health. Various methodological approaches can be employed by researchers analyzing cross-sectional and panel data (i.e., several time points over an extended period). This paper provides an overview of some commonly used analytical tools, including moderated network models, network comparison tests, cross-lagged network analysis, and panel graphical vector-autoregression (VAR) models. Using an easily accessible dataset (easySHARE), this study demonstrates the use of different analytical approaches when investigating (a) the association between mental health and cognitive functioning, and (b) the role of chronic disease in mediating or moderating this association. This multiverse analysis showcases both converging and diverging evidence from different analytical avenues. These findings underscore the importance of multiverse investigations to increase transparency and communicate the extent to which conclusions depend on analytical choices.