Category Archives: Data Scraping

2016 college baseball season preview

Last year, I introduced my open-source college baseball database (which I’ve recently updated), and showed a few example applications. I looked at win probabilities, how the new flatter seams helped increase offense, the stolen base breakeven point, and the value of bunting (honest).

But this time, I want to use someone else’s data. Chris Long (now with the Detroit Tigers) has his own collection of useful college baseball tools on his GitHub. Let’s use them to generate a season preview.

Read the rest on Beyond the Box Score


The Machine that Goes “Ping”: The bunt stops here

In our last article, we saw that the new flat-seamed ball has led to an increase in scoring in college baseball. If you’re used to Major League Baseball, you might be relieved, since more scoring means runs are easier to come by, which in turns means teams should start moving away from small ball tactics such as stealing too oftenand bunting. Especially bunting. Man, do sabermetric people hate bunting.

Part of The Machine that Goes “Ping”, my occasional college baseball column. Read the rest at Beyond the Box Score.

Flattened seams and raised offense in college baseball

One month into the season, the effects of the new ball are already visible. The NCAA was quick to trumpet the apparent power surge: despite a cold February, home runs per game jumped from 0.33 in the first month of 2014 to 0.47 HR per game so far in 2015. But observers claimed a number of other effects as well….

We pulled the data from the first month of the 2015 season; let’s see which of those claims hold water.

Part of The Machine that Goes “Ping”, my occasional college baseball column.

Read the rest on Beyond the Box Score.

Database available for download on GitHub. Win expectancy table available on Tableau.