YouTube Viz A Longitudinal Data Visualization

zoom out button zoom out button
Top 50 all categories videos for all time
Duration of all categories videos for all time

Welcome in the world of YouTube videos

In this project, we aim to explore the metadata of 96 million YouTube videos. They span from the very origin of YouTube to late 2019 when the data was scraped. We use a sample from English channels with more than 10 thousand subscribers as our source. For your convenience, we present only data from January 2008 to June 2019 so you can appreciate how YouTube grew over the years. The visualization is separated into two parts, the main visualization on the top with the weekly category score and aggregated results for the selected time interval and categories below.

How to navigate through time and categories

The main way to change the time is by panning and selecting a time interval either on the main graph directly or on the timeline below it. You can drag and modify your selection on the timeline. If you are interested in getting data weekly instead of an aggregation over the selected time interval, please toggle the button below the table. In this mode, you can press f while hovering the plots to freeze the time and learn more about the distributions of the main metrics. To select a given category, simply click on the category name or on the category in the main plot.

Ranking categories over time

For a given category \(C_i\) and a time period \(T\), its score \(score(C_i, T)\) can be defined as \[score(C_i, T) = \sum_{c_j = C_i \wedge t_j \in T} v_j \cdot w_j \] Where for a given video \(j\): \(c_j\) is its category, \(t_j\) its publication date, \(v_j\) its number of views and \(w_j\) the weight associated with the channel that published the video. This weight is inversely proportional to the probability of a channel being sampled during the scrap phase. Rarer channels have higher weights because they are underrepresented in our sample. The score is then scaled down by 1 million to have a comprehensible Y axis.

An interesting spike to look at

While you enjoy some old music that hang on the top of our charts using our embedded video player, you may think where do they come from? You may have spotted this huge spike in late 2009 in the music category. It is flagrant if you view the chart in interleaving mode (Click the cogwheel to change the disposition). Well, as we have a view of the data as it was in November 2019, we see an accumulated number of views for this short period. This is thanks to VEVO, dropping most of last century music clips for our enjoyment in less than a month.

We hope that you can enjoy some of your time listening or watching some relics of the past while uncovering more about the evolution of YouTube.