Congress Says, a Twitterbot using Python and Google App Engine
October 29, 2014
I came across an interesting API offered by the Sunlight Foundation called Capitol Words. This API allows us to access the text from the Congressional record and look at who said what and when. As I was also interested in learning how to use the Twitter API, I decided to make a Twitterbot. So I figured I needed to do three things:
- Figuring out how to make a Twitterbot.
- Figuring out how to access the Capitol Words API.
- Figuring out how to add the Capitol Words API into the Twitterbot.
The end result of all of this is a bot that tweets the top words (more on that later) from yesterday’s Congressional session, assuming the were in session, and looks like this:
The top words from Congress yesterday were "isil", "isis", and "syrian."
— Congress Says (@CongressSays) September 18, 2014
##1. How to make a Twitterbot
I mostly followed the directions provided by Bill the Lizard, which uses Google App Engine and Python to create a Twitterbot. It uses the Python library Tweepy for handling the interface with Twitter. His directions were pretty straightforward, although he used the StackOverFlow API for his bot.
I copied this format closely, so I’ll refrain from repeating the early steps and point you in his direction. Remember to get your own Twitter authentication credentials, and that you will also probably want to create a new profile for your Twitterbot, unless you want it tweeting from your personal account.
##2. Capitol Words API
The Capitol Words API allows for some interesting analysis (which will be the topic of other posts). It also requires signing up for a unique key to access the API.
I chose to have my Twitterbot tweet out the most unique terms from yesterday’s Congressional Session, assuming Congress met. This is done using a metric called tf-idf, which is short for term frequency-inverse document frequency, which compares how frequently a word is used in a document of a corpus as compared to the entire corpus. It is basically a statistical measure of how likely a word is used in one of many text files, when we know how often it is used in all of them combined. The Capitol Words API returns the tf-idf for the phrases, for which we can define the length of phrase in words.
The code for accessing the Capitol Words API in Python, utilizing the urllib2, json, and datetime libraries, is:
The API can also be used to query specific parties, states, and representatives over a specified range of time, as well as the full text.
##3. Putting it all together
We find that the year is almost always printed out, which is boring, so we are instead going to look at the top three non-2014 words. The code isn’t pretty, but looks like this:
This goes in the equivalent of the bounty_bot.py file, which is the finished bot in the Bill the Lizard tutorial. I renamed mine congresssays.py. Once you get it working, throw some pictures and a bio (preferably referencing you). My finished product can be seen here, and, again, looks like this:
The top words from Congress yesterday were "isil", "isis", and "syrian."
— Congress Says (@CongressSays) September 18, 2014