Wednesday, October 8, 2008

New Metrics for Measuring Batsman Characteristics

Currently most of the batsmen in all formats of cricket are evaluated based on batting average (This is called as Mean score of a batsman). In this article I plan define 3 more fundamental aspects to batsman analysis, i.e.

1. Median score of a batsman (Middle -score of batsman arrived by ordered scores of his/her innings)

2. Mode score of a batsman (Most commonly scored runs by Batsman)

3. Standard Deviation of a batsman (Describes Batsman’s variability)

We will in due course see the application of these statistics to evaluate batsmen.

Let’s formally define all these. Let’s consider scores of 2 batsmen A and B as shown below for analysis:

T-20 scores

Player A

Player B

Match 1

0

15

Match 2

11

24

Match 3

20

25

Match 4

35

12

Match 5

24

17

Match 6

45

25

Match 7

9

26

Mean -- "a value that is computed by dividing the sum of a set of terms by the number of terms" (from the Merriam-Webster Dictionary)

Mean of a batsman is nothing but the batting average statistic which is the most commonly used metric to measure batsman characteristic. In the above case, this will be computed as follows:

Batting Average/Mean score of a batsman A is (0+11+20+35+24+45+9)/7 =20.57

Batting Average/Mean score of a batsman B is (15+24+25+12+17+25+26)/7 =20.57

Median -- "a value in an ordered set of values below and above which there is an equal number of values or which is the arithmetic mean of the two middle values if there is no one middle number" (from the Merriam-Webster Dictionary).

Let’s consider the above example, we find that Median of Player A is 20 whereas Median of Player B is 24. The methodology is very simple, I simply arrange the scores in descending order and the middle most value is median.

Player A

Player B

0

12

9

15

11

17

20

24

24

25

35

25

45

26

Sometime we may not exactly have a middle most value. There could be 2 scores which may be considered for median. The best way to address that problem is to take averages of those two scores.

Mode -- "the most frequent value of a set of data" (from the Merriam-Webster Dictionary).
Consider player A and Player B, we can

Let’s consider the mode calculation for Player A and Player B. Instead of calculating mode as per statistical definitions, I have introduced a range based mode concept as shown below:

Mode Categories

Player A

Player B

0--4

1


5--9

1


10--14

1

1

15--19


2

20--24

2

1

25--29


3

30--34



35--39

1


40--44



45--49

1


In the above example we see that the mode score for Player A is 20-24 where as for Player B is 25-29

Standard Deviation -- a measure of the dispersion of a frequency distribution that is the square root of the arithmetic mean of the squares of the deviation of each of the class frequencies from the arithmetic mean of the frequency distribution (from the Merriam-Webster Dictionary).

The standard deviation can be computed using the below formula:

SD= square root of[(sum of X squared -((sum of X)*(sum of X)/N))/N]

Basically this formula uses sum of square difference between score and means (ยต) batsman scores. This actually represents the variation of the batsman. If we substitute the value for Player A and Player B, we get:

Player A – standard deviation is 14.47

Player B – standard deviation is 5.31

No comments: