1 Prerequisites

Before starting this assignment, you should have completed the following prerequisites:

2 Objective : Create a notebook to analyse biometric data of students

I recommend the following logic, for all your assignments in this course:

  1. Create a dedicated RStudio project for each assignment, so that you can keep all files organized in one place.
  2. Use RMarkdown to create a notebook that combines text, code, and graphics. This allows you to document your analysis and results in a clear and reproducible way. https://rmarkdown.rstudio.com/

2.1 Download the data

Go to the Github repository https://github.com/DenisMot/RStudio-for-HMS-Template

Download the repository as a zip file, and unzip it in a directory named Series00-RStudio.

You should obtain the following directory structure:

Series00-RStudio
  ├── main.Rmd                 # the exemple RMarkdown file 
  ├── main.Rproj               # the project file 
  ├── results/                 # folder for results of the analysis 
  └── data/                    # folder with data used as input 
      └── test_data/           # folder with public test data 
         └── students.csv      # the data file

What you need is the students.csv file, which contains the data to analyse.

NB: You can safely delete the main.Rmd and main.Rproj files, as they are not needed for this assignment. You can also delete the results folder, as it is not needed for this assignment.

2.2 Organize your files on your computer

You now have the data file, but what you don’t have is :

  • the RStudio project, so that you can work in a dedicated environment for this job
  • the RMarkdown file to run the analysis.

In the RStudio interface you will create a new project and a new RMarkdown file:

  • Create a new project
    • named Series00-RStudio
    • in the Series00-RStudio directory.
    • open the project in a new RStudio window
      • VERIFY: Series00-RStudio.Rproj is now visible in the Files tab.
  • Create an RMarkdown file
    • file named Series00-RStudio.rmd.
    • in the Series00-RStudio directory.
      • VERIFY: Series00-RStudio.rmd is now visible in the Files tab.

You now have the following directory structure:

Series00-RStudio
  ├── Series00-RStudio.Rproj   # the project file       
  ├── Series00-RStudio.rmd     # the notebook file 
  └── data                     # folder with data used as input 
      └── test_data            # folder with public test data 
         └── students.csv      # the data file
       

You are ready to analyze the data.

2.3 Analyse the student’s data

To analyze the data, in the file Series00-RStudio.rmd, you’ll now add explanatory text and R code to produce the requested boxplot of student weights.

IMPORTANT: you have to activate Copilot in RStudio to get the code suggestions. Copilot will help you learn data analysis with R, as it makes good suggestions after each line of comments about what you’d like to do. Stated differently, you have to write the comments first, so that copilot can understand what you want to do.

2.3.1 Read the data file

# read the students data file and show the first lines
library(tidyverse)

2.3.2 Make a boxplot of the weight of the students

# make a boxplot of the weight of the students
library(ggplot2)