The Pudding has a page devoted to pitch's they would like to see at some point, and I noticed their was one related to the complexity of headlines. Wanting a project to work on, I began to tackle the issue by using BuzzFeed's open API to download nearly 50,000 BuzzFeed news articles.
The trouble is less the data collection at this point. With R using Purr and the rest of the Tidyverse I have found it easy to pull news articles from various API's and store them in SQLite databases, then it is the visualization.
Consider this sample of 25 BuzzFeed articles to be a stab at trying to figure this out.
#2 A Legend!
I recently talked to a proper Graphics editor and while they liked the work, something I realized is a legend would be helpful. Beyond the color scale, the FK Grade for these charts determines the radius and/or height and width of the various visualizations.
As a Grid
Reading the website, I noticed the Pudding had a tutorial for using d3 to manipulate parts of the DOM that were not SVG objects. I had this issue a few months ago, but it turns out that classic CSS combined with sprinkling some D3 can go a long way in creating visualizations.
Not only are the circles ordered by time, for many of the categories, they have a custom fill as well. The height and width by setting the style of the circles is also set with D3 as well.
As a Bee Swarm
D3 v5's API seemed to make working with beeswarm charts a bit more tedious, but this is a solvable problem in the long run.
Like before, the chart is in theory ordered by year, colored by category, and radius by score.
Grouping by Category
One of the first charts I made, a classic usage of rollup to group the category data on the sample dataset.
Across the Years
#2 Circle Packing
A new option, I realized I liked the style of the beeswarm charts, but needed a way to separate the different years from each other. Also, the grey circle for the level 1 hierarchy provides a nice way to separate the elements.
#2 Stream Graph
This is more of a test bed for a few ideas quite honestly. For starters, I need to visualize the whole dataset of around 48,000 articles, and that's just for BuzzFeed news, and not the nearly 200,000 Guardian articles I haven't touched yet.
Time to experiment with more chart types then in future iterations!