A. Data types

The basic data types in R are logical, integer, double, complex and character. By default, any number is considered a double and to explicitly specify an integer the letter L must be appended:

a <- 26
typeof(a)
## [1] "double"
b <- 26L
typeof(b)
## [1] "integer"

Logical values are written TRUE and FALSE. Otherwise, the classical arithmetic operators and rules apply as in any language for numerical data types. Integer arithmetic operators are available: %/% for the integer division and %% for its remainder.

23/3
## [1] 7.666667
23L/3L
## [1] 7.666667
23L %/% 3L
## [1] 7
23L %% 3L
## [1] 2

The power can be written with two different notations:

5**4
## [1] 625
5^4
## [1] 625

Character literals are delimited by simple or double quotes:

"message"
## [1] "message"
'message'
## [1] "message"

Comparison operators enable the computation of logical values:

6 < 45
## [1] TRUE
23 >= 45
## [1] FALSE

Logical operators work as usual:

34 < 5 && 6 > 7
## [1] FALSE
34 < 23 || 7 > 5
## [1] TRUE
!(23 < 4)
## [1] TRUE

Variable identifiers are made of letters, dots (.), underscores (_), and digits. The first character must be a letter or a dot. The assignment operator can be written <- (preferred) or =.

my.age <- 27
code <- 34+my.age*3
name <- "Robert"

One central data structure in R is the atomic vector. An atomic vector is a sequence of values that are all of the same basic type. Vectors can be generated by the function c() (short for combine):

ages <- c(23,24,45,21,34,65,43,77,12,14,24)
ages[3]
## [1] 45
ages[6]
## [1] 65

Vectors are natural in R, meaning that many functions can take single values or vectors as input. In such a case, they usually apply to each element of the input vector and return a vector with each result as output (functions apply component-wise):

sqrt(64)
## [1] 8
sqrt(c(1,4,9,16))
## [1] 1 2 3 4
sin(ages)
##  [1] -0.8462204 -0.9055784  0.8509035  0.8366556  0.5290827  0.8268287
##  [7] -0.8317747  0.9995202 -0.5365729  0.9906074 -0.9055784

The arithmetic and logical operators also apply component-wise to vectors:

v <- c(2,45,3,4,1,90,233)
v/2
## [1]   1.0  22.5   1.5   2.0   0.5  45.0 116.5
v+5
## [1]   7  50   8   9   6  95 238
c(1,2,3,4)*c(3,2,3,2)
## [1] 3 4 9 8
c(1,2,3,4)*c(3,2,3) # lengths must match obviously!
## Warning in c(1, 2, 3, 4) * c(3, 2, 3): la taille d'un objet plus long n'est pas
## multiple de la taille d'un objet plus court
## [1]  3  4  9 12
ages < 27
##  [1]  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

Note that real linear algebra operators are written in a special way to avoid any ambiguity. For instance, the matrix product is %*%.

Component-wise logical operators between logical vectors are written with single symbol. Compare:

ages<33 & ages>20
##  [1]  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
ages<33 && ages>20 # only perform the operation between the first elements of each vector
## Warning in ages < 33 && ages > 20: ‘length(x) = 11 > 1’ dans la conversion
## automatique vers ‘logical(1)’

## Warning in ages < 33 && ages > 20: ‘length(x) = 11 > 1’ dans la conversion
## automatique vers ‘logical(1)’
## [1] TRUE

Why should we bother using && and || for scalars that are anyway regarded as vectors of length 1 in R? We could always use & and |.

There is indeed a difference: with scalar values, && and || yield faster evaluation of the expression usually as unnecessary computations are avoided. For instance, in an expression like a<10 && c>23, evaluating whether c>23 is useless as soon as a<10 is FALSE. This optimization does not happen with & and |.


B. Elementary plotting and summarizing

R has been designed to facilitate statistical computations and graphical display of statistical data. It comes with a powerful graphical package and all kinds of standard statistical functions.

rand.num <- rnorm(1000, mean=2, sd=1.5) # generates 1000 pseudo random numbers following a normal distribution with mean 2 and sd 1.5
hist(rand.num)

summary(rand.num)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -3.2378  0.9297  1.9049  1.9125  2.9887  6.3051
mean(rand.num)
## [1] 1.912537
sd(rand.num)
## [1] 1.52041
median(rand.num)
## [1] 1.904866
quantile(rand.num,prob=0.25)
##      25% 
## 0.929745

Check the definitions of the seven functions above to learn about their many options and parameters.