Wouldn't it be nice to know why your competitor is performing better than you on Facebook, or what kind of content makes people feel the love? Or like me, to know what is happening on Donald Trump's and Hillary Clinton's Facebook pages. All of this and more is possible with this handy little script and a bit of data analysis. For a quick preview and to learn how it is done, try to hang on until the end of this post!

Cut the BS and show me what Donald and Hillary are up to!

Prerequisities

  • Ability to breathe and a heart beat
  • Functional copy-paste fingers
  • A Facebook account
## 1-2-3 Ready!
1. Create a Facebook App under your personal FB account  
2. Download the script  
3. Add your App ID, App Secret and Page ID to `get_fb_posts_fb_page.py`  
4. Run the script  
5. Perform the analysis you desire  

1. Ready, Set, Scrape!

To scrape the post data, the script leverages something called Facebook Graph API, which let's other computers to access Facebook's computers programmatically and to transfer data between them. Graph API only accepts authenticated calls, so you need to have App ID and App secret that you pass on to the API upon scraping the data.

If I lost you at API, all you need to know is that these two magical components are yours once you create a Facebook App for yourself (don't worry, it's basically just filling up a form, no actual app involved!).

To create your own app:

  1. Go to https://developers.facebook.com/apps/ and click "Add a new app"
  2. Select "Basic setup"
  3. Give your app a name under "Display name" (it can be anything) and give your email address
  4. Click "Create App ID"

alt

You'll be redirected to App Dashboard, where you'll find your App ID and App Secret. Leave them be for a while.

alt

Now let's download our own copy of The Scraper - navigate to https://github.com/minimaxir/facebook-page-post-scraper, click "Clone or download" and "Download ZIP". Extract the ZIP file and open the folder.

alt

The contents of the folder are:

get_fb_posts_fb_page.py = Facebook Post Scraper, used to scrape all posts of a public Facebook page (this is what we'll use)
get_fb_comments_from_fb.py = Scrape all comments from a public Facebook page
get_fb_posts_fb_group.py = Scrape all discussions, likes etc. from a public Facebook Group

Open up get_fb_posts_fb_page.py with a code editor (not Microsoft Word), and you should see something like this:

alt

Pretty self-explanatory. Fill in the App ID and App Secret from Facebook, but wait...page_id? That's the page you want to scrape, for which you need to know its ID. Luckily you'll find this at the end of the page's URL: www.facebook.com/page_id (e.g. facebook.com/hillaryclinton). Save the file.

OBS! If you don't know your way around Python (like me), don't touch anything else (I didn't!).

Now it's time to run the script! If you're on The One True Computerâ„¢ open up Terminal (cmd + Space + Terminal + Enter) to take you back to the DOS era. If you're on a PC, I've heard there's something called Command Prompt.

Terminal usually opens up at the root of your user folder so we need to navigate to the folder where the scraper file is located. This is done with a command called cd. In my case, the scraper is located at Documents/code/reaction-analysis, so I type in cd Documents/code/reaction-analysis to the Terminal window. Once in the right folder we can start the scraper with the following command: python get_fb_posts_fb_page.py. If everything goes as expected, you'll see something like this:

alt

The scraper takes a few minutes to run (progress is updated every 100 statuses), and after it's finished you have a file called page_id_facebook_statuses.csv in the same folder you have the scraper.

Congrats, now you know everything about the page you want to know everything about!

2. How to analyze the data?

As the outcome of the scraper is a CSV file, you can do your analysis basically on any software you prefer. The datasets here are relatively small so if you're not going to build any models around the data I'd probably go with Excel. And if models are what you're after, you know better...

As my end goal is to learn Python and some machine learning along the way I use a framework called Graphlab Create. If you're at all into programming/machine learning, I urge you to try it out as it is pretty powerful and easy-to-use. Coursera's Machine Learning Foundations is a good starting point to get your hands dirty.

In the dataset, each row represents a single post with the following attributes:

status_id: access the post at fb.com/page_id/posts/<numbers-after-underscore-here>  
status_message: the actual post  
link_name: title of the link  
status_type: photo, status, link, video, event, note  
status_link: the URL of the link  
status_published: time of publication  
num_reactions: total count of all likes and reactions  
num_comments: <--  
num_shares: <--  
num_likes: <--  
num_loves: <--  
num_wows: <--  
num_hahas: <--  
num_sads: <--  
num_angrys: <--  

OBS! Reactions became globally available on 2016-02-24, so keep this in mind when making your analysis. It is probably a good idea to make a new dataset for "reaction era".

3. Yeah, so about that Trump fella??

Ie. What can the data tell us (Case Donald and Hillary)

Trump posts pictures, Clinton videos

Trump Status Types Trump keeps things simple, he mostly posts photos or plain text statuses.

Clinton Status Types Clinton on the other hand is heavy on videos. Also, didn't know that somebody actually uses notes anymore.

Trump loves Hispanics!

alt

You can't go wrong with authenticity and good tacos. This is nothing but the most liked, shared and hated of Trump's status updates.

Grandchildren are golden

I'm not going that far to suggest this was planned, but both candidates just recently got a grandson - and their followers LIKED it! For Hillary the official announcement is her most liked status, and for Donald only Mexican food trumps his family in terms of likability.

alt alt

People like to share dirt

Sharing is caring they say. Many of the most shared posts on both sides feature the cheating or crooked opposing side.

alt

alt

Trumpsters don't like being asked for money

The worst performing post of Donald Trump? The one where he asks people to open their wallets and head to the campaign store.

alt

Hillary shows the world that women rock

And people LOVE LOVE LOVE it.

alt

alt

Closing words

With the dataset the scraper provides, you can make a quick analysis of any Facebook page you desire. Here's a few things you could do to get more out of the data:

  • analyze the language used in posts and how it correlates to different reactions
  • combine data from various pages to get a more neutral view of people's behavior
  • scrape also the comments and analyze them as well
  • create models to predict how certain posts would perform in real life

+ Bonus

If you ever wondered the answer is yes - Jeijjo & Nupi's most viral status update involves police cars. Take a look:

alt


Massive credit goes to Max Woolf, this post wouldn't have been possible without his work on the scraper. Thank you!