Netflix
The
Netflix Database from (https://github.com/lerocha/netflixdb) is a
publicly available dataset for demonstrating SQL queries in R. It
contains real Netflix data in particular stored in an SQLite database,
making it
perfect for educational purposes.
1. Download and Set Up the Database
Download the Netflix database from GitHub
- Go to https://github.com/lerocha/netflixdb
- Download the netflixdb.qlite file and save it in your working directory
2. Install and Load Required Packages
install.packages("DBI")
install.packages("RSQLite")
install.packages("dplyr")
library(DBI)
library(RSQLite)
library(dplyr)
library(ggplot2)
3. Connect to the Netflix Database
con <- dbConnect(RSQLite::SQLite(), "netflixdb.sqlite")
4. List Available Tables
dbListTables(con)
5. Querying the Database
test <- dbGetQuery(con, "SELECT * FROM view_summary;")
6. Using dplyr
tbl(con,"view_summary") |> select("cumulative_weeks_in_top10") |> ggplot() + aes(x=cumulative_weeks_in_top10) + geom_bar(na.rm=TRUE)
7. Writing a New Table to the Database
You can add your own data
new_data <- data.frame(title = c("Test Show 1", "Test Show 2"),
release_year = c(2023, 2024))
dbWriteTable(con, "test_shows", new_data, overwrite = TRUE)
dbListTables(con)
8. Closing the Connection
Always disconnect when done
dbDisconnect(con)