In a recent post on motorsport.com we get an interesting quote from Kyle Busch after his third consecutive win this season.
When asked if he had ever had as much momentum in his career, Busch said: “No, and you know, I certainly would love to be doing this if this was week 10 of the playoffs we’d be talking about something pretty cool, but I hope it’s not peaking too early.
“Obviously this is way early in the season. We’ve got a long way to go.”
Tom Errington for Motorsport.com
After reading this, I became curious about “peaking” and how it might manifest itself in NASCAR data. I decided to perform some exploratory data analysis on “streaks” or consecutive NASCAR wins, and what they’re associated with. For this purpose, I consider a streak to be any consecutive number of wins. If Kyle is concerned with peaking after a 3-win streak, we might expect to see a reversal in performance after 3-win streaks historically.
I begin by scraping RacingReference for lots of data. The two datasets I gather are race results back to 1990, and cup series standings at the end of the season. This is easy with
How common are streaks?
I answer this question by looking at consecutive wins through time. It seems that 2-win streaks aren’t entirely uncommon. 3-win and 4-win streaks are, but we’ve been lucky enough to see two 3-win streaks this year from Kyle Busch and Kevin Harvick.
What is the relationship between streak length and end-of-season position?
First, I decide to not double-count by only looking at the “maximum” streak length. For instance, a 3-win streak was at one point in time a 2-win streak, and a 1-win streak. For the purposes of this exercise, I will only be considering the maximum streak, the 3-win, and not its components.
We can create some box plots grouped by the maximum streak length. As we would expect, the trend shown is a higher (lower numerically) end of season position as the maximum streak length increases. I’ll note here that since 1990, we have not seen a racer obtain a 3-win or 4-win streak and place outside the top 10. Good news for Kyle Busch!
I also examine the streaks for the cup champions over the past 28 years.
- All champions won at least one race
- For 14 out of 28 champions, a 1-win streak was their longest
- For 11 out of 28 champions, a 2-win streak was their longest
- For 1 out of 28 champions, a 3-win streak was his longest
- For 2 out of 28 champions, a 4-win streak was their longest
Coincidentally, the one champion with a 3-win streak was Kyle Busch. He had a 3-win streak and won two other races that year to take the cup in 2015.
Do streaks and non-consecutive wins have the same relationship with a driver’s end of season position?
To look at this question, I’ve taken the boxplot above and added some more to it. The teal(?) boxplots show the end of season position distribution at different amounts of non-consecutive wins while the blue is the graph above that groups by the maximum streak. If streaks were unimportant, we wouldn’t expect to see a difference between the two colors. However, we see that the medians and boxes don’t overlap between groups two through four, so it is likely that drivers that have an X-win streak place higher than drivers with X non-consecutive wins.
We can go further by taking a look at the kernel density estimation for drivers with 2+ non-consecutive wins and then drivers with 2+win streaks. Running a two-sample Kolmogorov–Smirnov test in R will confirm the significant difference in these two distributions.
What does the data tell us about historical performance after streaks?
A little. I’ve aggregated driver performance by gathering finishing positions over the next five races after a 3-win streak, and don’t see any exceptional reversal that would indicate the “peak” that Kyle Busch mentioned. Drivers with a 3-win streak perform continue to perform superbly, with the median observation around 6th place. It seems that only 25% of observations weren’t top half finishers.
Will Dega be an important race for Kyle Busch?
An interesting post on Building Speed explores the correlation between Talladega finishing position and season outcomes. I’m not sure if she is looking at a different dataset than I am, but I do see some sort of relationship. See the plot below.
I define a spring Talladega race as one that occurs in April or May. The plot above has a few components. The underlying data is the mean end of season position for every spring Talladega finishing position. Each black point is the mean, and the gray bars are standard error estimates for each mean. The blue curve overlaid is a second order polynomial that attempts to fit the means. I see two general segments here. It looks like the first 15 finishing positions have a defined trend, and all positions after the 15th seem noisy. However, there is quite a large spread in end of season position within the first 15 finishers. We can compare this to the fall Talladega race, where we see a much more consistent trend. It seems the spring race is more “forgiving”, or perhaps, more prone to unexpected outcomes.
What does this all mean?
In the data I examined, I wasn’t able to find anything interesting about “peaks” or adverse performance after a win streak. Take a look at one last plot below.
Even when isolating streaks that occurred in the first 15 races of a season, there doesn’t seem to be any indications that consecutive wins hamper future outcomes. Kyle Busch shouldn’t be worried, he’s just off to a dominant start! In addition, a poor showing this Sunday doesn’t look like it will spell the end of his cup dreams. While I haven’t done enough to create a formal prediction on his cup chances, I can say that from observing the data and conducting my general analysis it looks like he shouldn’t have any trouble staying in the top 5 until the end of the season. Of course, now he has to keep driving the car!
Please comment or reach out if you have suggestions to improve this analysis or topics for the future!