clock menu more-arrow no yes mobile

Filed under:

BRB University: Winning Football, Gary Kubiak Style (Part II--The Defense)

Back to Gary Kubiak and the mighty Logistic Regression. Take a look inside the Texans' defensive numbers and see exactly what kind of impact Wade Phillips has had on the Texans and Kubes.

Show me some numbers, dammit!
Show me some numbers, dammit!
Rob Carr

If you missed Part I of this series, click here.

These past couple of weeks I have been MIA. I was lucky enough to visit one of the natural wonders of the world, the Grand Canyon, and go on a road trip throughout the Midwest of the United States. The last time I was at the Grand Canyon was for Spring Break two years ago; I hiked down to Skeleton Point on one day and down to Plateau Point on the next, but I ached to return and camp down by the murky Colorado River. So my crappy 14 year old brother, who is as worthless as every other 14 year old on the planet, and I loaded up my car with backpacks and other camping paraphernalia to fulfill this goal. After being the only driver for 1,200 miles of highway along I-10, U.S 385, and I-40, we were back to the place I was longing for.

About five million people visit the park every year, but only about 1% of the visiting population actually travels down into the canyon to camp. I highly suggest going (or going back) if you have not been and join the 1%. When you see the canyon from the rim, you are able to see the colors and overwhelming immensity of the canyon. However, you have no idea how colossal it really is until you look at it from a mile below and see the billion year old rocks climbing toward the rays of light emitted by the sun. You also learn how important water is to the area, as every creek and water fall is covered in luscious green. The hike out is 7-9 hours, but is not too strenuous if you take plenty of breaks and stop to drink water and snarf down trail mix. If you are not up for hiking down and back out, you can hop on a mule to take you there or go on a river trip that takes you out by helicopter. Don't be perturbed. Find some vacation time go down into the canyon as soon as you get the chance.

The week after that, I drove all around the Midwest with friends and explored Memphis, the Smoky Mountains, Nashville, Chicago, Kansas City, and Dallas until I arrived back in the Hill Country.

Since I have been back home, I have been recuperating from the miles of hiking, driving, eating, drinking and sleeping in grimy Howard Johnson Motels and America's Best Value Inns by scrolling through Kubiak's games and tediously adding the defensive numbers to the model. I know all of you spent the last couple of weeks refreshing BRB and waiting for the entrance of defensive data. This time I won't go into too much depth about the process/hypothesis tests/coefficients and will mainly discuss the results. However, If you would like to read more about the process, check out Part One here. As a heads up, there is a slight difference in the offensive model's overall record in Part I compared to Part II due to some small mistakes I found.

Defense Only

If you'll recall, the first step is to find the correlations and pick the highest variables first to add to your model. Here are the defensive correlations and a refreshing look at the offensive correlations.

Gary_kubiak_offensive_correlations_mediumDefensive_correlations_medium

After tinkering with the variables, the model ended up using opponent rushing attempts, opponent turnovers, opponent rush touchdowns, opponent first downs, opponent pass touchdowns, and opponent passing attempts as the independent variables and win or lose as the dependent variable.

Remember the basic outline is Y=β0+β1*X1+β2*X2...+βn*Xn+εi

Y=3.604-.108+.719-.647-.094-.568+.049

The R Output is below for those who are interested.

Deviance Residuals:
Min 1Q Median 3Q Max
-2.6362 -0.6670 0.1137 0.5875 2.7645

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.60414 1.81849 1.982 0.0475 *
Opp.Rush.Att -0.10843 0.04413 -2.457 0.0140 *
Opp.Turnovers 0.71906 0.29068 2.474 0.0134 *
Opp.Rush.TD -0.64787 0.37744 -1.716 0.0861 .
Opp.1st.Down -0.09442 0.08987 -1.051 0.2934
Opp.Pass.TD -0.56829 0.28546 -1.991 0.0465 *
Opp.Pass.Att 0.04933 0.04585 1.076 0.2821
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 160.500 on 115 degrees of freedom.
Residual deviance: 93.296 on 109 degrees of freedom.
AIC: 107.3

The R squared measures how well the independent variables explain the dependent variable.

R squared=1-(residual/null)

=1-(93.296/160.5)=.418

Click this link for the probabilities of Houston winning with Gary Kubiak as the head coach based on defensive performance, offensive performance, and total performance. It would probably be best to keep a tab open with this document as you read the rest of the post.

The defensive aspect of the model is messier and more vague than the offensive component. The R Squared, which measures how well the independent variables explain the defensive variable, is not very high at .418 . The Residual Deviance, which measures the difference between the model without an independent variable and with a model that has independent variables, is 20 points lower than the offensive one discussed in Part I.

To sum it up, the offensive model is a better statistical model. This is mostly due to the fluctuations and lack of consistency the defense has had during Kubiak's reign of terror. Richard Smith, Frank Bush, Wade Phillips, revolving doors in the secondary, soft defenses, draft busts, one of the worst pass defenses of all time and All-Pro players have all made homes in a defense that has changed considerably for the better these past two seasons. On the filp side, the offense has been fairly stable, as stagnant as an open, neglected cooler filled with rain water that has turned into a mosquito breeding ground. It all depends on your outlook.

Based on the R outputs of the models, the offensive model should be better at predicting the outcome of a Houston Texans game, but the results are a little different. After creating the probabilities, the defensive model actually performed one game better than the offensive one. The defensive model went 100-16 predicting a Texans win and the offensive model went 99-17. After looking at the two models, I assumed the offensive one would have performed better, since it is a better model based on the R output; it was surprising to see the opposite hold true.

The only reason I can think of is derived from common sense. Let's assume you have no idea what the offense did for a Texans game, and I only gave you the defensive numbers. If the Texans gave up 28 points, you would probably give them a low chance of winning. If they held the other team to 17 points, the win probability you would come up with would probably triple. However, the offense consistently scores 17-30 points a game, and if I told you they scored 17 or 28 points, you would be in a sort of purgatory when making a prediction. You would expect them to win, but would not give come up with as high of a probability as if I told you Houston's defense only gave up 14 points.

The same example can be seen in the sport of baseball. You would give a higher probability of winning to a team that only gave up two runs compared to a team that scored five runs in a different game that same afternoon.

After coming up with this idea and the defensive model outperforming the offensive one, I think defensive numbers might actually be better at predicting outcomes than offensive numbers. I don't know that for sure, and more work and research would be needed, probably by looking at teams other than the Houston Texans.

Also in the Google Doc, I compared the difference between the offensive and defensive probabilities. I took the difference between the two and made a note when one predicted a different outcome than the other. What is fairly amazing from it is how well Houston has played on both sides of the ball since Wade Phillips came to town. Game 81 was his first season leading the Texans' D. Since then, only four times have the offensive and defensive sides of the ball had a large difference in performance. It only happened once last year; it was the Jacksonville game in Houston. This can be troublesome at times, because whenever they have been beat these past two years, it's because of an all-around awful performance. There are some games, like the ones against Minnesota and Green Bay, where they came out flat, unprepared, and the idea of winning left everyone's consciousness after the first quarter.

Here's a nifty graph showing the team's balance since Wade put on the headset.

Houston_texans_balance_medium
Whenever there is a large difference between the two, Houston is 13-9, which is a decent record. However, most of the games where there large differences in probabilities came at the beginning of Kubiak's head coaching career. Since their turnaround season in 2011, they have only played four games where there was a large discrepancy between offensive and defensive performance. The games include wins against Cincinnati (Offense 76.5%/Defense 35.1%) and Jacksonville last year (Offense 98.83%/Defense 27.42%) and in 2011, when they lost to New Orleans (Offense 75%/Defense 33%) and Oakland in the Al Davis Wake game (Offense 82%/Defense 44%).

Additionally, over these past two years Houston has played 11 games where both the defense and offense played games resulting in both sides of the ball having probabilities above 90%. Some of you may be wondering why the probabilities are greater than 1 when added together. It's because I am looking at two separate models with their own variables; I'll dive into the total probabilities in a moment. It really is a beautiful thing to look at the probabilities chart and see 90% and greater performances on both sides of the ball.

In the last article, some of the comments made points regarding Wade Phillips as Gary Kubiak's savior. So I looked at the change in per game averages the past two years. The table below outlines the differences B.W.P. (Before Wade Phillips) and A.W.P. (After Wade Phillips).

B.W.P A.W.P Difference
Opp Pass Yards 245.35 225.19 20.16
Opp Pass TD 1.575 1.44 .13
Opp Rush Yards 116.63 96.14 20.5
Opp Y/C 4.27 4.0 .27
Opp Rush TD 1.05 .44 .61
Opp Punts 4.16 3.39 -.77
Opp 1st Downs 20.16 17.5 2.66
Sacks 1.86 2.78 .92
Turnovers 1.42 1.67 .24

Obviously, Philllips is not the sole reason the Houston D has seen an increased performance. The team added Johnathan Joseph, Danieal Manning, and J.J. Watt, and they benefited from the general increase in performance by younger players as well. But the job Wade has done is quite remarkable when you look at his body of work here. The scary part is that this defense should be even better this year than it was last year, depending on how healthy the team is.

Model 4: Final Model

This is the best one I have made. It is not just all three of the models combined into a one eyed, two headed, three titted monstrosity. One of the goals strived for when creating a model is to make it as simple as possible while maximizing its ability to explain the dependent variable. Football is a complicated game, so consequentially there are more variables than I anticipated adding, but I added the minimum needed based on the results. The variables included are rush attempts, rush yards, rush touchdowns, opponent rush yards, opponent turnovers (turnovers forced), opponent first downs, field goal attempts, pass touchdowns, opponent return yards, pass completions and the model is as following.

Y=-5.31+.204-.0032+1.15-.0073+.579-.117+.393+.983+.0074-.0756

The R Output is:

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.319391 3.868691 -1.375 0.1691
Rush.Att 0.204971 0.093534 2.191 0.0284 *
Rush.Yards -0.003212 0.010198 -0.315 0.7527
Rush.TD 1.151438 0.540153 2.132 0.0330 *
Opp.Rush.Yds -0.007325 0.007614 -0.962 0.3360
Opp.Turnovers 0.579029 0.357878 1.618 0.1057
Opp.1st.Down -0.117178 0.077529 -1.511 0.1307
FG.Att 0.393340 0.283214 1.389 0.1649
Pass.TD 0.983851 0.433152 2.271 0.0231 *
Opp.Return.Yds 0.007444 0.007080 1.051 0.2931
Pass.Comp -0.075698 0.073581 -1.029 0.3036
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 160.500 on 115 degrees of freedom
Residual deviance: 69.101 on 105 degrees of freedom
AIC: 91.101

R Squared=1-(69.101/160.5)=.569

I know I said there would be no talk about hypothesis tests, but I need to in this case. Earlier I mentioned the surprising number of variables I needed to add and Hypothesis Test #2 deals with the complexity of the model.

Ho: Complicating the model does not increase the explainability of the dependent variable.

Ha: Complicating the model does increase the explainability of the dependent variable.

This is measured by the formula: 1-pchisq(null deviance-residual deviance,1)

1-pchisq(160.5-69.101)=0

So at an alpha level of .10, one would reject the null hypothesis because the p value of 0 is less than the alpha of .10, and adding more variables to the model would still be warranted depending on the residual deviance. Really I believe I can add every variable in the world that involves a NFL game and still the hypothesis test will turn out the same. Hundreds of different measurable and unmeasurable variables, 50 different players working together and against each other ,and components like chemistry all have an immense impact on the game. This complexity and ambiguity is the main reason why advanced stats is often lacking in the NFL compared to MLB and the NBA. Someday, when I have gained enough of my soul back after having it split into different Horcruxes during this process, I will add other variables like yards per punt, time of possession, and 3rd down numbers. Even then, it still probably will not be enough data to cover everything.

When using the model made combining the offensive, defensive and a few special teams stats, it performed better than when looking at just the offense or defensive side of the ball. The model went 102-14 when predicting past games, which is the same as the defensive model. However, the probabilities are more accurate and better depict what happened in the game because it uses a combination of the three sides of the ball. Put your glasses on because this chart is not the easiest on the eyes; it depicts how the total probability of winning has changed over time. The best way to read the chart is to pay attention to the longevity of the peaks and troughs instead of just seeing how the line zig-zags. The majority of the pertinent data will be found in the Google document provided in the earlier link.

Texans_total_probability_medium

From this work, there are multiple aspects to learn about how the Houston Texans have operated under Gary Kubiak. The consistency of the offense, the transformation of the defense, the realization of how complex the game of football really is are all there. The real fun will be once the season starts and we can see if this project has any merit whatsoever. What's the point of attempting to mimic reality and make projections if there is nothing to test it against?

Here's a link to Houston's stats under Gary Kubiak I charted. Holler at me if you find anything interesting.