4 Dealing with Numbers

In this chapter you will learn the basics of working with numbers in R. This includes understanding how to manage the numeric type (integer vs. double), the different ways of generating non-random and random numbers, how to set seed values for reproducible random number generation, and the different ways to compare and round numeric values.

4.1 Numeric Types (integer vs. double)

The two most common numeric classes used in R are integer and double (for double precision floating point numbers). R automatically converts between these two classes when needed for mathematical purposes. As a result, it’s feasible to use R and perform analyses for years without specifying these differences.

4.1.1 Creating Integer and Double Vectors

By default, when you create a numeric vector using the c() function it will produce a vector of double precision numeric values. To create a vector of integers using c() you must specify explicity by placing an L directly after each number.

4.1.2 Checking for Numeric Type

To check whether a vector is made up of integer or double values:

4.1.3 Converting Between Integer and Double Values

By default, if you read in data that has no decimal points or you create numeric values using the x <- 1:10 method the numeric values will be coded as integer. If you want to change a double to an integer or vice versa you can specify one of the following:

Although all three instances above do not print out the decimal, if you checked the type of the object with typeof(as.double(int_var)) you would in fact see that it is a double floating point.

4.2 Generating Non-random Numbers

There are a few R operators and functions that are especially useful for creating vectors of non-random numbers. These functions provide multiple ways for generating sequences of numbers.

4.2.1 Specifing Numbers within a Sequence

To explicitly specify numbers in a sequence you can use the colon : operator to specify all integers between two specified numbers or the combine c() function to explicity specify all numbers in the sequence.

4.2.3 Generating Repeated Sequences

The rep() function allows us to conveniently repeat specified constants into long vectors. This function allows for collated and non-collated repetitions.

4.3 Generating Random Numbers

Simulation is a common practice in data analysis. Sometimes your analysis requires the implementation of a statistical procedure that requires random number generation or sampling (i.e. Monte Carlo simulation, bootstrap sampling, etc). R comes with a set of pseudo-random number generators that allow you to simulate the most common probability distributions such as:

4.3.1 Uniform numbers

To generate random numbers from a uniform distribution you can use the runif() function. Alternatively, you can use sample() to take a random sample using with or without replacements.

For example, to generate 25 random numbers between the values 0 and 10:

For each non-uniform probability distribution there are four primary functions available to generate random numbers, density (aka probability mass function), cumulative density, and quantiles. The prefixes for these functions are:

  • r: random number generation
  • d: density or probability mass function
  • p: cumulative distribution
  • q: quantiles

4.3.2 Normal Distribution Numbers

The normal (or Gaussian) distribution is the most common and well know distribution. Within R, the normal distribution functions are written as norm().

For example, to generate 25 random numbers from a normal distribution with mean = 100 and standard deviation = 15:

You can also pass a vector of values. For instance, say you want to know the CDF probabilities for each value in the vector x created above:

4.4 Setting Seed Values

If you want to generate a sequence of random numbers and then be able to reproduce that same sequence of random numbers later you can set the random number seed generator with set.seed(). This is a critical aspect of reproducible research.

For example, we can reproduce a random generation of 10 values from a normal distribution:

4.5 Comparing Numeric Values

There are multiple ways to compare numeric values and vectors. This includes logical operators along with testing for exact equality and also near equality.

4.5.1 Comparison Operators

The normal binary operators allow you to compare numeric values and provides the answer in logical form:

These operations can be used for single number comparison:

and also for comparison of numbers within vectors:

Note that logical values TRUE and FALSE equate to 1 and 0 respectively. So if you want to identify the number of equal values in two vectors you can wrap the operation in the sum() function:

If you need to identify the location of pairwise equalities in two vectors you can wrap the operation in the which() function:

4.5.3 Floating Point Comparison

Sometimes you wish to test for ‘near equality’. The all.equal() function allows you to test for equality with a difference tolerance of 1.5e-8.

If the difference is greater than the tolerance level the function will return the mean relative difference:

4.7 Exercises

  1. Generate a sequence of non-random numbers from 1 to 100 by increments of 2. Save the output to an object x.
  2. Generate 50 random numbers between 0 and 100 with a uniform distribution. Set the seed to 123 so you can reproduce the same numbers. Save the output to an object y.
  3. Round y to the nearest integer digit.
  4. Compare x to y element-wise to find out how many of the x values are less than the corresponding y elements.