Wednesday, October 8, 2008

New Metrics for Measuring Batsman Characteristics

Currently most of the batsmen in all formats of cricket are evaluated based on batting average (This is called as Mean score of a batsman). In this article I plan define 3 more fundamental aspects to batsman analysis, i.e.

1. Median score of a batsman (Middle -score of batsman arrived by ordered scores of his/her innings)

2. Mode score of a batsman (Most commonly scored runs by Batsman)

3. Standard Deviation of a batsman (Describes Batsman’s variability)

We will in due course see the application of these statistics to evaluate batsmen.

Let’s formally define all these. Let’s consider scores of 2 batsmen A and B as shown below for analysis:

T-20 scores

Player A

Player B

Match 1

0

15

Match 2

11

24

Match 3

20

25

Match 4

35

12

Match 5

24

17

Match 6

45

25

Match 7

9

26

Mean -- "a value that is computed by dividing the sum of a set of terms by the number of terms" (from the Merriam-Webster Dictionary)

Mean of a batsman is nothing but the batting average statistic which is the most commonly used metric to measure batsman characteristic. In the above case, this will be computed as follows:

Batting Average/Mean score of a batsman A is (0+11+20+35+24+45+9)/7 =20.57

Batting Average/Mean score of a batsman B is (15+24+25+12+17+25+26)/7 =20.57

Median -- "a value in an ordered set of values below and above which there is an equal number of values or which is the arithmetic mean of the two middle values if there is no one middle number" (from the Merriam-Webster Dictionary).

Let’s consider the above example, we find that Median of Player A is 20 whereas Median of Player B is 24. The methodology is very simple, I simply arrange the scores in descending order and the middle most value is median.

Player A

Player B

0

12

9

15

11

17

20

24

24

25

35

25

45

26

Sometime we may not exactly have a middle most value. There could be 2 scores which may be considered for median. The best way to address that problem is to take averages of those two scores.

Mode -- "the most frequent value of a set of data" (from the Merriam-Webster Dictionary).
Consider player A and Player B, we can

Let’s consider the mode calculation for Player A and Player B. Instead of calculating mode as per statistical definitions, I have introduced a range based mode concept as shown below:

Mode Categories

Player A

Player B

0--4

1


5--9

1


10--14

1

1

15--19


2

20--24

2

1

25--29


3

30--34



35--39

1


40--44



45--49

1


In the above example we see that the mode score for Player A is 20-24 where as for Player B is 25-29

Standard Deviation -- a measure of the dispersion of a frequency distribution that is the square root of the arithmetic mean of the squares of the deviation of each of the class frequencies from the arithmetic mean of the frequency distribution (from the Merriam-Webster Dictionary).

The standard deviation can be computed using the below formula:

SD= square root of[(sum of X squared -((sum of X)*(sum of X)/N))/N]

Basically this formula uses sum of square difference between score and means (ยต) batsman scores. This actually represents the variation of the batsman. If we substitute the value for Player A and Player B, we get:

Player A – standard deviation is 14.47

Player B – standard deviation is 5.31

What is Cricmetrics and why do we need it ?

I have been an analytics professional for over 5 years now. I have seen how analytics and metrics have redefined the businesses. Thomas Davenport talks about this exciting area in his new bookhttp://www.amazon.com/Competing-Analytics-New-Science-Winning/dp/1422103323

Cricket has been a favorite sport for Indians and many other Asian countries. Of course we have countries like Australia, NZ, England, West Indies, South Africa and many more countries into this sport for quite sometime.

Statistics have been used for quite sometime in cricket. For example, metrics like batting averages, strike rate all describe the ability of batsman or his behaviour in some sense. Now the point is that can we take this analysis to next level to answer questions like?

1. What can increase the probability of a Team winning?

2. What is optimum mix of Team given opposition and pitch conditions?

3. Are there better metrics to measure batsman and bowlers?

Understanding the game and players with better metrics can not only help the teams to win games but also the Leagues like IPL, ICL, etc to evaluate player for the price they pay and measure the RoI (Return on Investment).

In the next 10 years, analytics and statistics will be extensively used in cricket to measure player, team, and league performances. It will help clubs to reduce costs and increase their win rates. Very soon every league will have experienced analytical consultant (who understand statistics with cricket) to make team strategy and effect the outcome of games.

Welcome to the world of cricmetrics. According to me cricmetrics is a science of using mathematical and statistical analysis of cricket records. For example it can help us to understand players, teams and environment like pitches, effect of day Vs night games, Spinners Vs Pacers, what player is suited in what kind of roles, how to bowl to the weakness of a player, etc.

Currently on-field captain has to make many decisions but very often we see small mistakes can make big difference. For instance bowling change depending on opposition batsman & condition of match (pitch, over being played etc) can give captains great insights to take a right decision.

So welcome to this world of cricmetrics where i plan to discuss variety of analytical applications in cricket. I want all viewers to add their thoughts, so that it can refine my ideas in this exciting field cricket!

Keep checking this space for exciting metrics and analytics on cricket :). Feedbacks is what can take this science to new level :)