The creation of the new dataset that retains numerical variables is done for the later process of this exercise - Model Building. > map(top_5_artists_df, ~sum(is.na(.))) $album 0 $id 0 $name 0 $acousticness 0 $danceability 0 $energy 0 $instrumentalness 0 $liveness 0 $loudness 0 $speechiness 0 $tempo 0 $valence 0 $popularity 0 $artist 0 map(top_5_artists_df, ~sum(is.na(.)))įortunately, in the below results, there are no null values in both of the datasets else there is a need to drop data with null values. lauv % select(-X1, -track_number, -uri)īefore moving on to the next stage, Exploratory Data Analysis, I need to ensure if there are any null values in the dataset. The below datasets were scraped through Python’s Spotipy when accessing the Spotify API to call each of the Top 5 Spotify Artists from my #2020Wrapped. library(tidyverse) # set of R packages (including ggplot) library(readr) # read rectangular data library(dplyr) # data manipulation library(data.table) # data manipulation library(fmsb) # radar chart library(corrplot) # correlation plot library(FactoMineR) # for Principal Component Analysis library(factoextra) # for K-Means Clustering library(qdap) # for text cleaning library(qdapDictionaries) # for text cleaning library(tm) # for text cleaning library(wordcloud2) # for lyrics word cloud library(syuzhet) # for sentiment analysis Before importing the dataset, let’s call the packages that would be used throughout this exercise. In musical terminology, the tempo is the speed or pace of a given piece and derives directly from the average beat duration.Īfter retrieving the dataset from Spotify API through scraping, now it’s time to transit the programming language from Python to R.
You can get the code with regards to the scraping of data on my GitHub repository.įrom the data available, I have only selected those variables related to the audio features. Spotify allows every listener or data enthusiast to retrieve data from their amazing Spotify Developer Platform.įrom there, one can access the Spotify Web API Console to scrape the data out through the use of Python and its package called Spotipy. In this exercise, Python is used for data scraping while the data while R is used for data analytics, data modelling, and data visualisation. My Top 5 Spotify Artists of 2020, Image by Authorīased on my #2020Wrapped, I am interested to find out if there is something in common among my Top 5 Spotify Artists - Lauv, The Chainsmokers, Gryffin, Kygo, and Martin Garrix.