The goals of the project are to study the kinds of content and formats that prospered on YouTube through the years and how the creation practices changed. In our opinion, the required steps are:
Additionally, we want to make a fun experience, so that people explore this dataset. Thus, it’d be interesting for someone to be able to see the top videos per category over a given time period, so that he can click on them an enjoy seeing what was popular at that time.
Some different scoring approaches are being considered. Currently, for a category \(C_i\) and a time period \(T\), the score \(score(C_i, T)\) can be defined as \[score(C_i, T) = \sum_{c_j = C_i \wedge t_j \in T} v_j \cdot w_j \] Where for a given video \(j\): \(c_j\) is its category, \(t_j\) its publish date, \(v_j\) its number of views and \(w_j\) the weight associated with the channel that published the video. This weight is inversely proportional to the probability of a channel being sampled during the scrap phase. Rarer channels have higher weights because they are underrepresented in our sample.
To compute the popularity score for a given video, 2 schemes are currently approached:
We present in Fig. 1. the core visualizations for our project. The main idea is to be able to toggle between 2 views: a non-stacked and a stacked versions of the score for the categories across time.
This 2 views should be interactive in the sense that when you hover the data, you should be able to see the labels and you should also be able to brush over time to study a timeframe that interests you. An early prototype is available here.
We present in Fig. 2. the visualizations that accompany our core visualizations and are aimed at tackling goal 2. The data populating this visualizations would be dependent on the state – category and timeframe selected – of the core visualizations. These helpers are comprised of two visualizations:
The tools and lectures that we plan to use are similar for all visualizations so they are summarized here:
D3.js
: For all the visualization partscrossfilter
: To handle the large transformed data at different scale (daily, weekly, monthly)