Sachin Tendulkar and Brian Lara at the top – followed by Jacques Kallis and Ricky Ponting in whichever order, seem to be the conventional wisdom amongst cricket fans about who ranks at the top of the modem game’s best batsman. While former Sri Lankan cricket captain Kumar Sangakkara’s phenomenal record has forced some to take note lately, many undermine his record due to a few sticking points. His many appearances against Bangladesh, possibly the weakest bowling attack among the test playing nations, is a main point of contention. The nature of Sri Lankan wickets, probably more batting friendly than in some other parts of the world is usually the next argument.Firstly, these arguments aren’t completely unfounded. Yes, Sanga’s overall record is helped a bit by his record against Bangladesh as well as by the placid nature of some Sri Lankan batting surfaces. Yet, does it mean his entire record is incomparable with others?
This brings us to a more general question, one that is frequently encountered during business decision-making too. How to compare different alternatives if the circumstances in which the alternatives were tested are not identical?
As far as Sanga’s record is concerned – the objective is to adjust the number of runs scored by a batsman for other factors which may make run making easy or difficult. This process has the potential to isolate batsman quality from everything else which may influence run making. Once such adjustments are made to all other batsmen, we are in a position to compare their relative merits.
A naïve approach which has been used in the past is, doing the comparison after eliminating Bangladesh from everyone’s record. However one needs to keep in mind that Bangladesh is only a case in point. What about mediocre bowling attacks those other teams have had from time to time? Should they be eliminated too? On the other hand what about those who have scored heavily against the best bowling line ups? How are they going to be compensated?
A scientific and non-arbitrary approach is to build a statistical model, which would predict the expected number of runs a batsman would score, given a set of circumstances. Once the expectation is known, it can be used as a benchmark against which the actual number of runs the batsman has scored is compared. The point that needs to be emphasized is that the expected number of runs (the benchmark) depends on the playing circumstances; hence it varies from batsman to batsman.
How do we build a statistical model, which would predict the expected number of runs a batsman would score – given a set of circumstances? The first step is to generate a set of input variables, which would essentially quantify the circumstances.
What makes run making easy or difficult for a batsman? Obviously the quality of the opposition’s bowling is the main influencing factor. The nature of the surface and the quality of the opposition’s fielding may be the other key factors. We undertook the painstaking process of classifying each bowling attack, featured in every test match subjected to the analysis, in to four categories. Very good: If it has three or more bowlers with 150 plus test wickets, Good: 2 with 150 plus wickets, Average: 1 with 150 plus wickets and Weak: none with 150 plus wickets. Then we created a set of variables in the form of “Number of innings against Very good/Good/ Average/Weak” for every batsman in the analysis. Similarly we classified test playing grounds as Difficult, Moderate, and Easy for batting – by the average number of runs per wicket. Then we created a set of variables in the form of “Number of innings in Difficult/Moderate/Easy conditions” for every batsman in the analysis. A similar set of variables were created related to fielding.
A model from a family of statistical models called Generalized Linear Models (GLMs) was used to predict the expected number of runs, using the above variables and the total number of innings as the input variables (predictors). Since the expected number of runs depends on the inputs, which in turn captures the circumstances, the expected runs are adjusted for varying circumstances. Hence it can be used as a dynamic benchmark. The difference between the expected runs and the actual runs is almost entirely due to the batsman quality. Hence we have isolated the batsman effect from other influencing factors!
To build the model, we have only used batsmen who have scored more than 7000 runs. That gives us a data set of about 50 observations. Hence the expected number of runs should be carefully interpreted as the number of runs expected by a top batsman given a set of Note: Some of the distributional assumptions can be changed when fitting the model. This changes the expected values slightly. The percentage difference can be considered instead of the absolute difference. The percentage difference alters the rankings slightly. bowling, fielding and ground conditions during the course of his career.
Brian Lara stands heads and shoulders above the rest. Given the bowling attacks he faced and the grounds he batted in, he was expected to score only 9707 runs. The 2246 he has scored on top of that is mainly due to his genius!
Despite having quite a few knocks against Bangladesh (This pushes the expected number of runs, the benchmark, up) Sanga has scored 1368 runs more than what is expected. Due to the sampling error both Sanga and Kallis should be considered as the second placed batsmen.
Mahela Jayawardene, the other Sri Lankan great, comes only at 25. He has scored 125 runs below the expected amount. However, it’s important to recognize that the expectation is with respect to the very best of the modern batsmen, not all international batsmen.
With that note, it is time to say adieu to Sanga– Probably to the second most prolific batsman the modern game has seen.
Dr. Sankha Muthu Poruthotage earned his Ph.D. in Statistics and Masters in Actuarial mathematics from the University of Connecticut USA. Upon relocating to Sri Lanka, he co-founded Linear2, a data analytics consultancy company. Email: [email protected]
Danushka Liyanage and Rajith Munasinghe contributed to this article.