R basics part 1

A. Data types

The basic data types in R are logical, integer, double, complex and character. By default, any number is considered a double and to explicitly specify an integer the letter L must be appended:

a <- 26
typeof(a)

## [1] "double"

b <- 26L
typeof(b)

## [1] "integer"

Logical values are written TRUE and FALSE. Otherwise, the classical arithmetic operators and rules apply as in any language for numerical data types. Integer arithmetic operators are available: %/% for the integer division and %% for its remainder.

23/3

## [1] 7.666667

23L/3L

## [1] 7.666667

23L %/% 3L

## [1] 7

23L %% 3L

## [1] 2

The power can be written with two different notations:

5**4

## [1] 625

5^4

## [1] 625

Character literals are delimited by simple or double quotes:

"message"

## [1] "message"

'message'

## [1] "message"

Comparison operators enable the computation of logical values:

6 < 45

## [1] TRUE

23 >= 45

## [1] FALSE

Logical operators work as usual:

34 < 5 && 6 > 7

## [1] FALSE

34 < 23 || 7 > 5

## [1] TRUE

!(23 < 4)

## [1] TRUE

Variable identifiers are made of letters, dots (.), underscores (_), and digits. The first character must be a letter or a dot. The assignment operator can be written <- (preferred) or =.

my.age <- 27
code <- 34+my.age*3
name <- "Robert"

One central data structure in R is the atomic vector. An atomic vector is a sequence of values that are all of the same basic type. Vectors can be generated by the function c() (short for combine):

ages <- c(23,24,45,21,34,65,43,77,12,14,24)
ages[3]

## [1] 45

ages[6]

## [1] 65

Vectors are natural in R, meaning that many functions can take single values or vectors as input. In such a case, they usually apply to each element of the input vector and return a vector with each result as output (functions apply component-wise):

sqrt(64)

## [1] 8

sqrt(c(1,4,9,16))

## [1] 1 2 3 4

sin(ages)

##  [1] -0.8462204 -0.9055784  0.8509035  0.8366556  0.5290827  0.8268287
##  [7] -0.8317747  0.9995202 -0.5365729  0.9906074 -0.9055784

The arithmetic and logical operators also apply component-wise to vectors:

v <- c(2,45,3,4,1,90,233)
v/2

## [1]   1.0  22.5   1.5   2.0   0.5  45.0 116.5

v+5

## [1]   7  50   8   9   6  95 238

c(1,2,3,4)*c(3,2,3,2)

## [1] 3 4 9 8

c(1,2,3,4)*c(3,2,3) # lengths must match obviously!

## Warning in c(1, 2, 3, 4) * c(3, 2, 3): la taille d'un objet plus long n'est pas
## multiple de la taille d'un objet plus court

## [1]  3  4  9 12

ages < 27

##  [1]  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

Note that real linear algebra operators are written in a special way to avoid any ambiguity. For instance, the matrix product is %*%.

Component-wise logical operators between logical vectors are written with single symbol. Compare:

ages<33 & ages>20

##  [1]  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

ages<33 && ages>20 # only perform the operation between the first elements of each vector

## Warning in ages < 33 && ages > 20: ‘length(x) = 11 > 1’ dans la conversion
## automatique vers ‘logical(1)’

## Warning in ages < 33 && ages > 20: ‘length(x) = 11 > 1’ dans la conversion
## automatique vers ‘logical(1)’

## [1] TRUE

Why should we bother using && and || for scalars that are anyway regarded as vectors of length 1 in R? We could always use & and |.

There is indeed a difference: with scalar values, && and || yield faster evaluation of the expression usually as unnecessary computations are avoided. For instance, in an expression like a<10 && c>23, evaluating whether c>23 is useless as soon as a<10 is FALSE. This optimization does not happen with & and |.

B. Elementary plotting and summarizing

R has been designed to facilitate statistical computations and graphical display of statistical data. It comes with a powerful graphical package and all kinds of standard statistical functions.

rand.num <- rnorm(1000, mean=2, sd=1.5) # generates 1000 pseudo random numbers following a normal distribution with mean 2 and sd 1.5
hist(rand.num)

summary(rand.num)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -3.2378  0.9297  1.9049  1.9125  2.9887  6.3051

mean(rand.num)

## [1] 1.912537

sd(rand.num)

## [1] 1.52041

median(rand.num)

## [1] 1.904866

quantile(rand.num,prob=0.25)

##      25% 
## 0.929745

Check the definitions of the seven functions above to learn about their many options and parameters.

R basics part 1

Jacques Colinge

11/29/2021

A. Data types

B. Elementary plotting and summarizing