If you frequented BRB over the summer, you probably read my two posts about the statistical model I created, even if you had no idea what I was talking about. What I did was build a model using logistic regression that measured the probabilities Houston had to win a football game based on certain variables. Going into it, I was trying to find out the facets of the game that are most detrimental to Houston's success. However, it slowly morphed into something more valuable--the probability Houston would win or lose based on its performance in that week's game.
It was time consuming and taxing, but I learned a tremendous amount from it and it actually worked while providing interesting data. If you missed the posts when they were first published, the links to Part I and Part II are below. Even if you did read them when they were first published, I highly recommend re-reading them as a refresher to understand how I built the model, what variables are used, and how the probabilities were computed. If you don't feel like slogging through the two posts, the comments alone are worth reading to remember how optimistic we all were in August.
The fun part of going through the effort to create the model was supposed to be updating it every week to find out what the probability was for Houston to win that week's game. Before the season started, I had planned to add it to the weekly review posts, but because I was wrapped up in the allure of a new season and AFC South previews, I completely forgot about the little logistic regression I had created.
Now that it's the bye week, I finally had enough time to input the data from the first seven weeks this season into the computer. There's a model predicting the probability to win based on offense and on defense, as well as a total model that combines offense, defense, and special teams. It's worth mentioning that the total model is not offense and defense merged together; it's independent on its own.
The best way to look at these figures is that the offense and defense model shows the performance of that side of the ball. The total predicts the probability Houston had to win.
It's important to note that the total model has done very well this season. If we count a probability above 50% as a win and a probability below 50% as a loss, the model has gone 7-0 this year. If you're hesitant about the model being accurate, you shouldn't be. It has posthumously predicted 112 out of 123 games correctly, good for a success rate of 91.05%. The last time it conflicted with reality was 28 games ago when Houston lost to the Raiders in the Al Davis Memorial Game. Consequently, it proves that Houston has either not played good football this year or has not played like it has in previous seasons.
The plan from here on out is to update the model after every game and publish the probabilities in the weekly review or on Twitter. I still have been unable to find a site that has predictions on rushing yards, passing yards, turnovers, etc., so the model isn't helpful until after the conclusion of the game. After the bye week, I will make a guess and plug my predictions into the model to come up with a probability for Houston to win that week's game. If you have any questions over what variables were used or anything else, let me know in the comments.
Below are two Google Docs that have the stats and probabilities for every Texans game where Gary Kubiak has been the head coach. Have fun with the data and let me know if you find anything interesting.