Creating a simple command line streaming twitter search engine using node.js

About two weeks ago I published an article on Texas fan sentiment analysis, based on over 50,000 tweets I collected the day of the Valero Alamo Bowl.

This was fairly straightforward, as I utilized the code my colleague Taylor Smith created and modified it for my purposes. My biggest changes came with how I analyzed the data.

The problem I had was that the process of obtaining the tweets tied up my R console. This was problematic because I could neither use R, nor start looking at the data. Another problem was I had to determine up front how long I wanted to run the search. I could kill the process, but if the game ran past the time I had set, I would have to rush and restart it again.

I prefer using the command line on things like this. I don’t use the command line for too much, so I knew it would at least free up my software. Last summer, before twitter changed their OAuth requirements, you could login very easily via the command line.

Obviously that has changed. With the latest OAuth, logging in directly from the command line is not really an option. You need a registered application to make this happen.

This fall, Elizabeth Winkler at Mass Relevance mentioned an open source package called ‘t’ that runs using Ruby. I set this up, but don’t really know enough Ruby to get much further than simple tweeting and a basic search from the command line.

What I really wanted was a way to run command line searches that could utilize the streaming API and parse them to a nice JSON file. I am fairly certain t does that, but like I said, my Ruby skills aren’t the best.

Talking this over with my brother-in-law at lunch on Sunday, he figured there had to be a good way to do this with node.js. I have a little experience in node.js and JavaScript is fairly forgiving, so I started looking into it.

I found several node.js modules, such as ‘twit’ and ‘twitter’. Both are open source and easily found on GitHub. I experimented with both and came up with a simple method for running a search using twit (basically modifying a bit of their sample code).

Getting Started
First make sure you have installed node.js on your computer

Then go to and follow the instructions on installing twit.

Next you need to register for a twitter account. I read one article this week where a guy suggests registering a separate twitter account just for twitter searches and experiments. That way you don’t get your personal account banned by accident. I didn’t do this.

Once you have an account, you need to then go and sign in as a developer to register an application.

After logging in, go to the drop-down menu at the top-right (where your profile icon is) and go to ‘My Applications’.

Create a new application. This is where you get your consumer key and Consumer Secret. (make sure you save these for easy reference in a minute)

Now that you have those, you need to click the button at the bottom of the page and ‘Create My Access Token’. Here’s where you get the Access Token Key and Access Token Secret. (again, make sure you save these for easy reference in a minute)

I am going to assume you are logged into your terminal. Node and node_modules should be installed in your main directory. If so, you can change directories

cd node_modules/twit/examples (honestly, you can put this anywhere, but I figured why not keep it with the other twit code?)

create a file mysearch.js (vi mysearch.js and paste the code below)

var Twit = require('twit')
var T = new Twit({
consumer_key: 'put yours here'
, consumer_secret: ‘put yours here’
, access_token: ‘put yours here '
, access_token_secret: 'put yours here '

// filter public stream
var fs = require('fs')
var myList = ['texas', ‘longhorns’]
var stream ='statuses/filter', { track: myList})

stream.on('tweet', function (tweet) {
var jsonTweet = JSON.stringify(tweet)

// write to file
fs = require('fs');
fs.appendFile('myStream.json', jsonTweet + "\n", function (err) {
if (err) return console.log(err);


Be sure to add your login key, token secrets…

That’s it. Now just run the file.

node mysearch

A nice file (myStream.json) will start to accumulate in the folder you set it to.

You can modify your search my just changing the items in myList. The twit documentation also gives solid advice on how to modify the searches by location, language, etc.

When you want to kill it, just hit ctrl-C, but this thing can run for days, as long as you leave it on.

What to do with that file is up to you. I will post next week on update R code to get you some usable data. Taylor Smith has some good starter code and we are working on tightening that up a bit. (There are some kinks in the date/time portion)

The eventual extension will be to set the searches up on a server and let them run continuously on their own. I am currently installing node.js on my server and will be able to run these as chron jobs.

Another extension is to evaluate the tweets in real-time and send some sort of data to a browser. Node.js was created to allow you to run JavaScript code on the backend very efficiently.

Least Squares Predictions 3-0-1 During NFL Wild Card Round

With the first weekend of the NFL Playoffs completed, it seemed like a good time to catch up on how well the Least Squares Optimization predictions did the past two weeks.

If this is your first time reading about this, please refer to my initial article here.

First, let’s recap the final week of the regular season. Using only the games where the percentage difference between the expected line and the actual lines (from by when published) was greater than 100%, the predictor went 3-1. In games where the raw absolute value of the expected and actual lines was greater than 2.5, the predictor went 5-3. Overall the predictor went 12-3 (the Bears-Packers game did not have a line sure to the unsure status of Aaron Rodgers.

Regular Season Totals
For the final five weeks of the season, using least squares predictions vs. actual vegas lines, the predictor went 15-4 in games where the percentage difference was greater than 100%, 22-13 in games where the raw absolute value was greater than 2.5 and 47-27-1 overall.

Wild Card Games
That bring us to the playoffs, where only four wild card games were on the slate this past weekend.

Visitor Home Vegas Exp Raw Diff % Diff
Kansas City Chiefs Indianapolis Colts -1.5 1.1 2.6 -173%
New Orleans Saints Philadelphia Eagles 3 -3.64 -6.64 -221%
San Diego Chargers Cincinnati Bengals 7 5.83 -1.17 -17%
San Francisco 49ers Green Bay Packers -3 -10.11 -7.11 237%

Reading the tables, the predictions were Indianapolis (+1.5), New Orleans (+3), San Diego (+7) and San Francisco (-3). Of those selections, the games went 3-0-1 overall, 2-0-1 for those where the percentage difference was greater than 100% and 2-0-1 also where the raw difference was greater than 2.5 points.

Of course about half way into that first game, when Kansas City was 28 points ahead of Indianapolis.

Up-to-date Totals
Percentage difference Vegas v Expected: 17-4-1
Raw Difference Greater than 2.5: 24-13-1
Overall: 50-27-2

Why isn’t Purdue in the Sugar Bowl? A study in graph theory

Why isn’t Purdue in the Sugar Bowl?

Yes, 1-11 Purdue, with their big time win over Indiana State. It sounds absurd, doesn’t it? But like 118 other teams in the NCAA Division I BCS, they have an indirect win over Alabama (and Auburn for that matter).

This is one of the reasons I love college football.

You hear all of the talk about how on any given day, TEAM A can beat TEAM B. But we don’t believe it, until some Saturday in the fall, Georgia Southern beats Florida or Appalachian State beats Michigan.

This is a story, not so much that a team like Purdue has a win over the mighty SEC teams like Alabama, but the amazing journey of how we get there.

A little background: I have been working on a project that connects College Football and graph theory – actually graph databases. I am going to skip through this, but if you are interested at all, just go here ( or download the free book located here, I use the Neo4j software.

The central concept is to connect teams together as nodes in a huge network. The data presented here was not compiled using this exact software, but by other means, as I attempted to verify parts of the database I am building. I will write more on that at a later date.

Back to Purdue, a horrible team (sorry Jerry and Drew), that went 1-11 and only beat NCAA Division I FBS Indiana State. How can they have an indirect win over Alabama? With a little luck and some help from the NCAA and NAIA playoffs in other divisions is the short answer.

It is a long and winding trip, which starts deep in the heart of the Midwest, travels through the backwoods of collegiate football not known to the casual reader and ends at Jordan-Hare Stadium… and it takes an incredible 40 games to connect the dots. Yes, 40 games.

The 40 games is by far the longest path this season, by a long shot, which starts and ends at a NCAA I BCS school. The next longest paths are Kentucky, Tulsa and UTEP’s 15-game paths to Alabama.

That is quite a jump and it is a long journey, so let’s get started.

September 7, 2013 (Lafayette, IN)
Purdue 20 Indiana State 14
It began in September when the Boilermakers, full of hope and promise, took it to the Sycamores. Unfortunately, they never won again.

September 14 (Terre Haute, IN)
Indiana State 70 Quincy 7
Not many realize this, but Indiana State went 1-11 as well, but their lone victory came the next weekend in a 70-7 rout of NCAA Division II, Quincy University.

Quincy is a small, Franciscan school of about 1300 students located on the banks of the Mississippi River in West Central Illinois. It was the college of Father Augustine Tolton, the first African-American Catholic priest.

September 28 (Quincy, IL)
Quincy 36 Lindenwood University-Belleville 7
The Quincy Hawks of Quincy, Illinois did not soar this season, going a paltry 2-9.

Lindenwood-Belleville, was opened in 2003 as an extension of Lindenwood University, but is now a stand-alone college with almost 2000 student. The football team is known for having the barbershop-striped field that alternates grey and red stripes every five yards.

LU-Belleville is an NAIA Independent and the path towards Alabama stayed within the NAIA for 15 games.

September 5 (Fayette, MO)
Lindenwood-Belleville 42, Central Methodist 16
Central Methodist, located in Fayette, Missouri, was founded in 1854 (or 1853 or 1855, depending on where you look). Former Missouri Governor Roger Wilson, who served the remainder of Governor Mel Carnahan’s term when he was killed in an airplane crash in October of 2000, is an alum of Central Methodist.

September 14 (North Newton, KS)
Central Methodist 38, Bethel KS 31
The Bethel College struggled through a 2-9 season. The school is the oldest Mennonite college in North America, founded in 1887 by Russian Mennonites, who has flocked to the area in the 1870’s.

November 16 (North Newton, KS)
Bethel KS 26 Bethany KS 14
Bethany College is located in Lindsborg, Kansas and is home to a mere 600 students. The college was founded in 1881 by Swedish Lutherans.
According to Wikipedia, “since 1903, when the ‘Terrible Swedes’ were feared and respected by all opponents, students and alumni have rallied Bethany athletic competition with the ‘Rockar! Stockar!’ cheer.” This apparently means “Rocks! Sticks!”.

November 2 (Hillsboro, KS)
Bethany KS 24 Tabor 17
Tabor College is another Mennonite-based college. Founded in 1908, it has fewer than 700 students. NFL Pro Bowler, Rolland “Bay” Lawrence, who played eight seasons at cornerback for the Atlanta Falcons during the 1970’s attended Tabor.

September 7 (Lincoln, NE)
Tabor 10 Nebraska Wesleyan 9
Founded by Methodists in 1887, Nebraska Wesleyan is known as the Prairie Wolves, which was only adopted in 2000. They were previously known as the Sunflowers, Coyotes and Plainsmen.

Glenn and Grace Hefner, parents of Hugh Hefner, are listed among their notable alums.

September 28 (Orange City, IA)
Nebraska Wesleyan 22 Northwestern IA 8
Northwestern College was founded in 1882 as an academy and became a four-year institution in 1961. It is affiliated with the Reformed Church of America.

The Red Raiders reached the NAIA playoffs this season and went 8-3 before falling to Missouri Valley.

Former NAIA Women’s Player of the Year (2006 and 2008) and record holder of most in-game consecutive free throws (133 straight), Deb Remmerde attended Northwestern.

November 9 (Orange City, IA)
Northwestern IA 38 Morningside 28
Morningside, located in Sioux City, Iowa, is affiliated with the Methodist Church. Pro Football Hall of Famer George Allen coached the football team from 1948-1950.

The Morningside Mustangs rebounded from this loss and actually reached the NAIA national semifinals, finishing with an 11-2 record.

November 23 (Sioux City, IA)
Morningside 40 Rocky Mountain 21
Arlo Guthrie attended Rocky Mountain College, which is located in Billings, Montana, but never graduated. Former Kansas City Chief, Chris Horn, also played football for the Battlin’ Bears.

October 19 (Billings, MT)
Rocky Mountain 45 Eastern Oregon 13
Eastern Oregon University is part of the Oregon Universities System and located between Portland, Oregon and Boise, Idaho. In 2011, 99-year old Leo Plass received his diploma from the university. He had dropped out less than one semester away from graduation during the Great Depression in 1932 to get a job as a teacher.

September 28 (La Grande, OR)
Eastern Oregon 35 Carroll MT 31
Carroll has a long football history, their 1931 team went undefeated and the more current version won six NAIA national championships between 2002 and 2010. Carroll is also where Hall of Fame coach John Gagliardi graduated and began his coaching career. Interestingly enough, Bobby Petrino also graduated from Carroll.

November 23 (Helena, MT)
Carroll MT 38 Georgetown KY 28
Founded in 1829 when the Kentucky General Assembly chartered the Kentucky Baptist Education Society to form a Baptist college within the state. Just recently the college of about 1300 students has severed ties with the Kentucky Baptist Convention and will operate as an independent university. The school has also considered becoming an NCAA DII affiliate, but their application was denied in 2012.

November 16 (Georgetown, KY)
Georgetown KY 20 Lindsey Wilson 10
Located in Columbia, Kentucky, Lindsey Wilson was only grades one through 12 from 1903-1922, mostly to train students to become teachers, many of whom continued their schooling at Vanderbilt. They also have a mascot named Blue Raider Bob. Seriously.

September 28 (Columbia, KY)
Lindsey Wilson 37 Faulkner 30
Faulkner University was founded in 1942 as Montgomery Bible School. It was eventually renamed to Alabama Christian College then Faulkner University. The Eagles did not even begin football until the 2007 season, when Jim Nichols, who had been a graduate assistant for Tommy Tuberville at Auburn, became their head coach.

September 7 (Ave Maria, FL)
Faulkner 47 Ave Maria 7
Ave Maria University was founded in 2003 as the dream of Thomas Monaghan, the founder of Domino’s Pizza, in his mission to found a Catholic university. He is the Chancellor of the school. The name Gyrenes refers to the Marines, of which Monaghan was a member. BTW, the Gyrenes went 7-2 this season, despite having fewer than 900 students.

October 12 (Ave Maria, FL)
Ave Maria 45 Florida Tech 41
Ave Maria defeated NCAA Division II, Florida Tech, who started football this season. Yes children, you can start football this season and still have an indirect win over Alabama. Good work, Panthers!

The school has been around for awhile and is the alma mater of at three astronauts and the original Survivor winner, Richard Hatch.

October 19 (Melbourne, FL)
Florida Tech 28 Shorter University 24
Shorter University of Rome, Georgia, was originally founded in 1873 as Cherokee Baptist Female College. It unsuccessfully attempted to break away from the Georgia Baptist Convention in 2005 and created a public relations storm when it required all faculty and staff to sign a public lifestyle statement in 2011. (

November 2 (Atlanta, GA)
Shorter 58 Clark Atlanta 14
Clark Atlanta is a historically black university founded in 1988 with the consolidation of Clark College (founded in 1869) and Atlanta University (1865). The school boasts many prominent alumni, including Henry Flipper, who after his freshman year at Atlanta University during the Reconstruction, was given a West Point appointment and later became the first African American to graduate from the United States Military Academy.

October 5 (Atlanta, GA)
Clark Atlanta 21 Morehouse 17
All-male Morehouse College is the alma mater of Dr. Martin Luther King, Jr., Spike Lee, Edwin Moses, Samuel L. Jackson and Herman Cain, to only mention a few. It was founded in 1867 as the Augusta Institute. Along with Wabash and Hampton-Sydney, it is one of only three remaining traditional liberal arts male colleges in the United States.

September 21 (Soldier Field, Chicago, IL)
Morehouse 42 Central St OH 20
The Central State Marauders were NCAA Division II national runner-up in 1983 and won NAIA titles in 1990, 1992 and 1995. Then, due to financial difficulties, the school dropped football in 1997, only reinstating the sport in 2005.

The school counts quite a few dignitaries as alum, including former President of Malawi, Hastings Kamuzu Banda. Pro Bowl lineman Erik Williams, baseball player Eddie Milner and ‘actress’ Omorosa (from The Apprentice) also attended Central State.

October 12 (Wilberforce, OH)
Central St OH 25 Miles 21
Located in Fairfield, Alabama, Miles is another historically black university founded in 1898. The football team was part of history this year when their game against Lane was officiated by a crew that included four women, marking the first time in history a predominantly female crew had officiated at any NCAA level.

The school is also the alma mater of Autherine Lucy, who graduated in 1952 then applied to graduate school at Alabama, eventually becoming the first African American student in the school’s history. She was expelled three days into school, as the university felt it could not provide a safe environment for her. The University of Alabama overturned her expulsion in 1980 and she earned a Master’s degree in 1982.

ADDITIONAL FUN FACT – Four of those teams ONLY won two games each. Shorter, Clark Atlanta, Morehouse and Central St OH won a combined eight games.

November 9 (Tuskegee, AL)
Miles 41 Tuskegee 36
Of course everyone knows that Tuskegee University is world-renowned as one of the first historically black universities, the home of the Tuskegee Airmen and was founded by George Washington Carver and Booker T Washington. But it also happens to be the shortest distance geographically to Jordan-Hare Stadium, where this journey will end. It is a mere 20.5 miles from Abbott Stadium.

September 7 (Huntsville, AL)
Tuskegee 23 Alabama A&M 7
With Alabama A&M, we are back in the NCAA Division I FBS, where the SWAC will exchange this chain for a few games. Famous Alabama A&M alumns are American Idol Season 2 winner Ruben Studdard and Pro Football Hall of Famer John Stallworth.

November 2 (Lorman, MS)
Alabama A&M 19 Alcorn State 18
The first black land-grant institution in the United States, Alcorn State was founded in 1871. Again, a school chock full of famous alumni including Medgar Evans and Alex Haley. A few football Braves include Donald Drive and the late Steve McNair.

November 7 (Lorman, MS)
Alcorn State 50 Prairie View A&M 35
Prairie View is part of the Texas A&M University System and was the organizing body of interscholastic sports and academic contests for black high schools in Texas prior to integration.

September 28 (Nacogdoches, TX)
Prairie View A&M 56 Stephen F. Austin 48
SFA is one of four public universities in Texas that is not part of one of the six university systems. The Lumberjacks of the Southland Conference, had their only bowl appearance in the 1973 Poultry Bowl, where they defeated Gardner-Webb.

September 21 (Nacogdoches,TX)
Stephen F. Austin 52 Montana State 38
Montana State earned a share of the 1956 NAIA title when they played St. Joseph’s of Indiana to a 0-0 tie in the Aluminum Bowl. They won the 1976 NCAA Division II title and the 1984 NCAA Division I-A title. They are the only team to win national titles in three different divisions. It also happens to be the alma mater of former ESPN anchor Craig Kilborn NFL kicker Jan Steneroud.

October 5 (Bozeman, MT)
Montana State 36 Northern Arizona 7
Northern Arizona’s initial graduating class consisted of four women who received teaching credentials for the then Arizona Territory. It is a far cry from the 26,000+ who now attend.

September 21 (Flagstaff, AZ)
Northern Arizona 22 South Dakota 16
Located in Vermillion, South Dakota, school was founded in 1862 as the University of Vermillion and is the oldest postsecondary institution in the Dakotas. The Yotes (formally Coyotes) now play in the Summit League.

October 19 (Cedar falls, IA)
South Dakota 38 Northern Iowa 31
UNI may be best known for upsetting top-seeded Kansas in the 2010 NCAA Basketball Tournament when Ali Farokhmanesh hit a crucial three. NFL Pro Bowler Bryce Paup attended UNI.

August 31 (Ames, IA)
Northern Iowa 28 Iowa State 20
After 32 games outside of Division I BCS, the trail returns.

November 30 (Morgantown, WV)
Iowa State 52 West Virginia 44
These two teams were by far the strangest in the Big 12 this season. West Virginia lost to both Kansas and Iowa State, but also defeated Oklahoma State to extend our chain.

September 28 (Morgantown, WV)
West Virginia 30 Oklahoma State 21
I still don’t know how this happened, other than this was OSU’s usual letdown loss.

August 31 (Reliant Stadium, Houston, TX)
Oklahoma State 21 Mississippi State 3
The rest fall in the hands of the mighty SEC.

November 28 (Starkville, MS)
Mississippi St 17 Mississippi 10
Jamerson Love’s fumble recovery in the end zone of what looked like it was going to be an easy Ole Miss touchdown shockingly ended the Egg Bowl.

October 19 (Oxford, MS)
Mississippi 27 LSU 24
Ole Miss gets the biggest win of Hugh Freeze’s tenure.

September 21 (Baton Rouge, LA)
LSU 35 Auburn 21
When LSU defeated Auburn, the Tigers were in the middle of the SEC and national title hunts. Little did we know how the fates of the two schools would reverse several months later.

November 30 (Auburn, AL)
Auburn 34 Alabama 28
Finally, in the Iron Bowl, Auburn shocks Alabama on the ridiculous field goal return by Chris Davis on the final play of the game. The chain is complete.

So there it was: 40 games. Purdue -> Alabama.

Texas Football Fan Sentiment Analysis During Valero Alamo Bowl

With Monday night’s Alamo Bowl being Coach Mack Brown’s final game as coach of the Texas Longhorns, it seemed like a good opportunity to test fan sentiment on the occasion via Twitter. I captured tweets containing certain words in an attempt to follow sentiment towards Mack Brown and Texas over time, leading up to the game, during the game and afterwards for a brief period.

I began collecting data around 2:25 PM CST and stopped just after 10:00 PM. The search terms I used were: Mack Brown, mackbrown, Texas Football, Texas Longhorn, hookem and hook em. During that time period, over 51,000 tweets were collected using these search terms. Please not that these terms could be used as regular words, a part of words as well as hashtags.

Alamo Bowl Pre-Game
Mack Brown is a good man and I have had the opportunity to meet him several times and have always found him amazing. I won’t go into the details, but he’s the type of guy that makes you feel important, despite the fact he’s probably the most important guy in whatever room he is in.

But things didn’t really end well on the 40 Acres, so I thought it might be interesting to see what transpired over the day.

To find sentiment, I utilized the R package, qdap (Quantitative Discourse Analysis Package), created by Tyler Rinker. The polarity function is based on Jeffrey Breen’s work.

Tweets are evaluated based on the words within them, utilizing the polarity function, which assigns this based on the sentiment dictionary (Hu and Liu, 2004). Approximately 50% of the tweets are neutral, earning a score of 0.

As I stated above, there were over 51,000 tweets, so plotting them over time would basically give you a huge cluster centered around the y-axis.

Instead, I decided to collect the tweets in 5-minute intervals and find average polarity. (You can click on the chart for a larger view)

Texas Fan Sentiment

As you can see, the average polarity was always positive during the pregame. Reading through the early tweets, you can see a bunch of congratulatory and well-wishing tweets directed at Coach Brown. As we near kickoff, the ‘positive’ flavor of the tweets drops dramatically, but the volume of tweets begins to take off.

The First Half
There’s a quick peak as kickoff approaches (almost 2k tweets in the five minute period around kickoff), then it tapers off a bit (300-500 tweets every five minutes).

The first big change comes when Case McCoy through the pick six on the first possession of the game. Not only does sentiment fall, but the volume of tweets begins to rise following the interception.

Sentiment fluctuates back and forth throughout the first half. I am guessing these are directly related to the Texas defense (positive) and offense (not-so positive) being on the field.

Again, when Oregon makes the big drive and scores at the end of the half, Texas fans react with expected disdain.

Second Half
Sentiment stabilizes during halftime and doesn’t really dip too badly until the final pick six by McCoy at the end of the game.

The most interesting discovery are the peaks in volume and what I call ‘Plus-Minus’. Plus-Minus is the number of positive tweets minus the number of negative tweets during the five minute interval.

The first major peak comes in the second half when Tyrone Swoopes enters the game. Swoopes is a very popular freshman quarterback many fans believe should have been given more playing time when Case McCoy struggled during the season.

Interestingly enough, the language was not as strongly positive as other times during the game, despite a preponderance of positive tweets.

The final big negative dip of the night came on Case McCoy’s second interception that went for a touchdown. The sentiment was so strong, it was the only time all evening where the average dipped below neutral.

Post Game
Texas fans appear to rebound fairly quickly after the game. It appears that fans were more interested in wishing Coach Brown well, than bashing the teams’ performance. Positive polarity reached its second-highest peak of the night, while tweet volume and Plus-Minus reach night-high peaks.

Final Thoughts
Obviously, when you limit yourself to search terms specifically tied to a person, team and school, you get skewed results. I did not include the term ‘Alamo Bowl’ specifically for this reason. I did not want Oregon fan or ‘bowl watcher’ sentiment included.

Since this past summer when I posted a twitter cloud of terms used in tweets at the 2013 NCAA Division I Tennis Championships, I have wanted to use R to do some ‘sentiment analysis’ on tweets during an event. I was initially stymied by Twitter’s move to OAuth, then by my fall class load.

Luckily, my cohort and friend, Taylor Smith was thinking the along the same lines and created an awesome pathway to try this out. Taylor details his methods here.

I had to make some modifications to his code and also altered my methods a bit, but the main collection, cleaning and initial evaluation of the data was the same. I will detail my specific changes in a later post.