Big Data Clinical Study and Its Implementation with R
Editors: | Zhongheng Zhang, Fionn Murtagh, Sven Van Poucke |
Publisher: AME Publishing Company; 1st edition (2018)
ISBN-13: 978-9887784081
Hardcover: 233 pages
Language: English
With the increasing availability of big data, the need is urgent for more studies of best practices when dealing with these data. There are six chapters in this book. Chapter 1 provides an overview of the big data clinical research, including the perspective, the general accessing workflow, a brief review of machine learning methods and data acquisition and management. Chapter 2 discusses about exploratory data analysis and data management. It focuses on the missing data problem that is frequently encountered in clinical studies by introducing a number of methods and their applications. First it discusses about missing data exploration and data reshaping and aggregating. Then it introduces several imputation methods including single imputation, multiple imputation, and multivariate imputation. Chapter 3 discusses methods for variable selection for both parametric and non-parametric models that are commonly used in clinical studies. It also discusses about methods for diagnostic and introduced a useful R package to draw Nomograms. Chapter 4 discusses about the analysis of survival data. In this chapter both the application of parametric and semi-parametric models are illustrated, as well as the competing risk model. Chapter 5 discusses several commonly used unsupervised and supervised machine learning methods including the k nearest neighbor, naïve Bayes classification, decision tree and neural network. Chapter 6 addresses a number of other important statistical areas that has applications in clinical studies, for example, the hierarchical cluster analysis and its visualization with R, causal mediation analysis, structural equation modeling, and case-crossover design.
Honorary Editors
Michael W. Kattan | Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio, USA |
Cheng Zheng | Joseph. J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA |
Editors
Zhongheng Zhang | Department of Emergency Medicine, Sir Run-Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310016, China |
Fionn Murtagh | Professor of Data Science, University of Hudders eld, Hudders eld, UK |
Sven Van Poucke | Department of Anesthesia, Critical Care, Emergency Medicine and Pain Therapy, Ziekenhuis Oost-Limburg, Genk 3600, Belgium |
Table of Contents
Preface
Preface
Perspectives on Big Data Clinical Research
1 Big data and clinical research: perspective from a clinician
7 Big data and clinical research: focusing on the area of critical care medicine in mainland China
11 Accessing critical care big data: a step by step approach
16 When doctors meet with AlphaGo: potential application of machine learning to clinical medicine
18 Release of the national healthcare big data in China: a historic leap in clinical research
20 Data management by using R: big data clinical research series
Data Management
26 Missing values in big data research: some basic skills
31 Missing data exploration: highlighting graphical presentation of missing patterns
38 Reshaping and aggregating data: an introduction to reshape package
43 Missing data imputation: focusing on single imputation
50 Multiple imputation with multivariate imputation by chained equation (MICE) package
55 Multiple imputation for time series data with Amelia package
64 Univariate description and bivariate statistical inference: the rst step delving into data
Model Building Strategy
71 Model building strategy for logistic regression: purposeful selection
78 Variable selection with stepwise and best subset approaches
83 Multivariable fractional polynomial method for regression model
89 Residuals and regression diagnostics: focusing on logistic regression
96 Propensity score method: a non-parametric technique to reduce model dependence
104 Drawing Nomograms with R: applications to categorical outcome and survival data
Survival Analysis
113 Statistical description for survival data
120 Parametric regression model for survival data: Weibull regression model as an example
128 Semi-parametric regression model for survival data: graphical visualization with R
136 Survival analysis in the presence of competing risks
Machine Learning
145 Introduction to machine learning: k-nearest neighbors
152 Naïve Bayes classi cation in R
157 Decision tree modeling using R
165 A gentle introduction to arti cial neural networks
170 Neural networks: further insights into error function, generalized weights and others
Others
176 Hierarchical cluster analysis in clinical research with heterogeneous study population: highlighting its visualization with R
187 Causal mediation analysis in the context of clinical research
197 Structural equation modeling in the context of clinical research
208 Case-crossover design and its implementation in R