Data Wrangling with R
Preface
Who should read this
What You Need For this Book
Conventions used in this book
Feedback
Acknowledgments
Software information
I Introduction
1
The Role of Data Wrangling
2
Introduction to the R Language
2.1
Open Source
2.2
Flexibility
2.3
Community
3
Getting Started with R & RStudio
3.1
Installing R and RStudio
3.2
Understanding the Console
3.2.1
Script Editor
3.2.2
Workspace Environment
3.2.3
Console
3.2.4
Misc. Displays
3.2.5
Workspace Options & Shortcuts
3.2.6
Exercises
3.3
Getting Help
3.3.1
General Help
3.3.2
Getting Help on Functions
3.3.3
Getting Help From the Web
3.4
Working with Packages
3.4.1
Installing Packages
3.4.2
Loading Packages
3.4.3
Getting Help on Packages
3.4.4
Useful Packages
3.4.5
Exercises
3.5
Assignment & Evaluation
3.5.1
Assignment
3.5.2
Evaluation
3.5.3
Case Sensitivity
3.5.4
Exercises
3.6
R as a Calculator
3.6.1
Basic Arithmetic
3.6.2
Miscellaneous Mathematical Functions
3.6.3
Infinite, and NaN Numbers
3.6.4
Exercises
3.7
Vectorization
3.7.1
Looping versus Vectorization
3.7.2
Recycling
3.7.3
Exercises
3.8
Style Guide
3.8.1
Notation and Naming
3.8.2
Organization
3.8.3
Syntax
3.8.4
Exercises
II Working with Different Types of Data in R
4
Dealing with Numbers
4.1
Numeric Types (integer vs. double)
4.1.1
Creating Integer and Double Vectors
4.1.2
Checking for Numeric Type
4.1.3
Converting Between Integer and Double Values
4.2
Generating Non-random Numbers
4.2.1
Specifing Numbers within a Sequence
4.2.2
Generating Regular Sequences
4.2.3
Generating Repeated Sequences
4.3
Generating Random Numbers
4.3.1
Uniform numbers
4.3.2
Normal Distribution Numbers
4.3.3
Binomial Distribution Numbers
4.3.4
Poisson Distribution Numbers
4.3.5
Exponential Distribution Numbers
4.3.6
Gamma Distribution Numbers
4.4
Setting Seed Values
4.5
Comparing Numeric Values
4.5.1
Comparison Operators
4.5.2
Exact Equality
4.5.3
Floating Point Comparison
4.6
Rounding numeric Values
4.7
Exercises
5
Dealing with Character Strings
5.1
Character string basics
5.1.1
Creating Strings
5.1.2
Converting to Strings
5.1.3
Printing Strings
5.1.4
Counting string elements and characters
5.2
String manipulation with base R
5.2.1
Case conversion
5.2.2
Simple Character Replacement
5.2.3
String Abbreviations
5.2.4
Extract/Replace Substrings
5.3
String manipulation with stringr
5.3.1
Basic Operations
5.3.2
Duplicate Characters within a String
5.3.3
Remove Leading and Trailing Whitespace
5.3.4
Pad a String with Whitespace
5.4
Set operatons for character strings
5.4.1
Set Union
5.4.2
Set Intersection
5.4.3
Identifying Different Elements
5.4.4
Testing for Element Equality
5.4.5
Testing for
Exact
Equality
5.4.6
Identifying if Elements are Contained in a String
5.4.7
Sorting a String
5.5
Exercises
6
Dealing with Regular Expressions
6.1
Regex Syntax
6.1.1
Metacharacters
6.1.2
Sequences
6.1.3
Character classes
6.1.4
POSIX character classes
6.1.5
Quantifiers
6.2
Regex Functions in Base R
6.2.1
Pattern Finding Functions
6.2.2
Pattern Replacement Functions
6.2.3
Splitting Character Vectors
6.3
Regex Functions with stringr
6.3.1
Detecting Patterns
6.3.2
Locating Patterns
6.3.3
Extracting Patterns
6.3.4
Replacing Patterns
6.3.5
String Splitting
6.4
Additional Resources
6.5
Exercises
7
Dealing with Factors
8
Dealng with Dates
III Managing Data Structures in R
9
Data Structure Basics
10
Managing Vectors
11
Managing Lists
12
Managing Matrices
13
Managing Arrays
14
Managing Data Frames
15
Dealing with Missing Values
IV Importing, Scraping, and Exporting Data in R
16
Basic I/O
17
Scraping Data
V Control Flow in R
18
Choice Flow
19
Iteration with Loops
20
Iteration with Functional Programming
VI Efficient Programming in R
21
Writing Functions
22
Parallel Computing
23
Rcpp
VII Efficient Workflow in R
24
Running R scripts
25
Project-oriented Workflow
26
Efficient & Reproducible Deliverables
27
Version Control
VIII Shaping and Transforming Your Data with R
28
Simplify Your Code with the Pipe (
%>%
) Operator
29
Reshaping Your Data with tidyr
30
Transform Your Data with dplyr
31
Optimizing Speed with dtplyr
32
Exploratory Data Analysis
IX Data Wrangling Beyond Your Local CPU
33
Communicating with Local Databases
34
Communicating with Distributed Cluster-computing Frameworks
X Case Studies
35
Case Study 1
36
Case Study 2
37
Case Study 3
References
Written with bookdown
Data Wrangling with R
27
Version Control