TL;DR — Quick Summary
Section 1: Measure of Dispersion
Dispersion (or variability) refers to the extent to which the data values are spread out or scattered.
Why is Dispersion Important?
- It tells us about the reliability of the mean.
- It helps us compare the consistency of two or more datasets.
- It helps in quality control and risk management.
B. Standard Deviation ($ \sigma $)
The Standard Deviation is the most important and widely used measure of dispersion. It measures the average distance of each data point from the mean.
1. Standard Deviation for Raw Data (Ungrouped)
If $x_1, x_2, x_3, ..., x_n$ are $n$ observations with mean $\bar{x}$, then:
Formula (Direct Method):
Formula (Shortcut / Alternative Method):
Note: For a sample (instead of a population), we use $s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}$, but for Class 11, we usually consider the whole data as the population.
2. Standard Deviation for Grouped Data (Frequency Distribution)
If $x_1, x_2, ..., x_n$ are mid-values of classes with frequencies $f_1, f_2, ..., f_n$, and $N = \sum f_i$:
Formula (Direct Method):
$ \sigma = \sqrt{\frac{\sum f_i (x_i - \bar{x})^2}{N}} $
Formula (Shortcut / Step-Deviation Method):
Let $d_i = x_i - A$ (where $A$ is an assumed mean) or $u_i = \frac{x_i - A}{h}$ (where $h$ is class width).
$ \sigma = h \times \sqrt{ \frac{\sum f_i u_i^2}{N} - \left( \frac{\sum f_i u_i}{N} \right)^2 } $
C. Variance ($ \sigma^2 $)
Variance is simply the square of the standard deviation. It is also a measure of dispersion, but its unit is the square of the data's unit.
Formula:
$ \sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n} $
Relation: Standard Deviation is the positive square root of Variance.
$ \sigma = \sqrt{\text{Variance}} $
D. Coefficient of Variation (C.V.)
The Coefficient of Variation is a relative measure of dispersion. It is used to compare the variability of two or more datasets, even if they have different units or means.
Formula:
$ \text{C.V.} = \frac{\sigma}{\bar{x}} \times 100% $
Interpretation:
- A higher C.V. means the data is more variable (less consistent).
- A lower C.V. means the data is less variable (more consistent).
Important Rule of Thumb: If we compare two groups, the group with the smaller C.V. is more consistent.
Worked Example (Section 1)
Question: The marks of 5 students in Mathematics are: 80, 70, 90, 60, 85. Find the Standard Deviation and Coefficient of Variation.
Solution: Step 1: Find the Mean ($\bar{x}$). $ \bar{x} = \frac{80 + 70 + 90 + 60 + 85}{5} = \frac{385}{5} = 77 $
Step 2: Find the deviations from mean and their squares.
| $x_i$ | $x_i - \bar{x}$ | $(x_i - \bar{x})^2$ |
|---|---|---|
| 80 | 3 | 9 |
| 70 | -7 | 49 |
| 90 | 13 | 169 |
| 60 | -17 | 289 |
| 85 | 8 | 64 |
| Sum | 0 | 580 |
Step 3: Calculate Standard Deviation.
$ \sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}} = \sqrt{\frac{580}{5}} = \sqrt{116} = 10.77 $
Step 4: Calculate Coefficient of Variation.
$ \text{C.V.} = \frac{\sigma}{\bar{x}} \times 100 = \frac{10.77}{77} \times 100 = 13.99% $
Answer: The Standard Deviation is 10.77 marks, and the C.V. is 13.99%.
Section 2: Skewness
Introduction to Skewness
Skewness measures the asymmetry of the probability distribution of a real-valued random variable about its mean.
In a perfectly symmetrical distribution (like a Normal Distribution), the Mean, Median, and Mode are all equal.
Types of Skewness
Positive Skewness (Right-Skewed):
- The tail on the right side is longer.
- Relation: Mean > Median > Mode
Negative Skewness (Left-Skewed):
- The tail on the left side is longer.
- Relation: Mean < Median < Mode
Zero Skewness (Symmetrical):
- The distribution is perfectly balanced.
- Relation: Mean = Median = Mode
Pearson’s Coefficient of Skewness
Karl Pearson developed two coefficients to measure the degree of skewness.
1. Pearson’s First Coefficient of Skewness (Based on Mode)
This coefficient uses the difference between the mean and the mode.
Formula:
$ \text{Sk}_p = \frac{\text{Mean} - \text{Mode}}{\text{Standard Deviation}} $
Properties:
- If $\text{Sk}_p > 0$, the distribution is positively skewed.
- If $\text{Sk}_p < 0$, the distribution is negatively skewed.
- If $\text{Sk}_p = 0$, the distribution is symmetrical.
- The value of $\text{Sk}_p$ typically lies between -1 and +1 (though theoretically, it can exceed these limits).
2. Pearson’s Second Coefficient of Skewness (Based on Median)
Sometimes, the mode is not well-defined. In such cases, we use the median.
Formula:
$ \text{Sk}_p = \frac{3 (\text{Mean} - \text{Median})}{\text{Standard Deviation}} $
Properties:
- The sign rule is the same as the first coefficient.
- This formula is based on the empirical relationship: Mean - Mode ≈ 3(Mean - Median).
Important Points to Remember
- Skewness tells us the direction of the tail. It doesn't tell us about the spread.
- Pearson's Coefficient gives us a pure number, allowing comparison between different distributions.
- A positive skew means most of the data is concentrated on the left (low values), with a long tail on the right (high values).
- A negative skew means most of the data is concentrated on the right (high values), with a long tail on the left (low values).
Worked Example (Section 2)
Question: For a dataset, the Mean is 50, the Median is 45, and the Standard Deviation is 10. Find Pearson's Coefficient of Skewness and interpret the result.
Solution: We will use the Second Coefficient of Skewness since we don't have the mode.
Step 1: Apply the formula.
$ \text{Sk}_p = \frac{3 (\text{Mean} - \text{Median})}{\text{Standard Deviation}} $
Step 2: Substitute the values.
$ \text{Sk}_p = \frac{3 (50 - 45)}{10} = \frac{3 \times 5}{10} = \frac{15}{10} = 1.5 $
Interpretation: Since $\text{Sk}_p = 1.5$ (which is > 0), the distribution is positively skewed. This means the tail of the distribution extends towards the right side (higher values), and the Mean is greater than the Median.
Quick Summary Chart for Revision
| Concept | Formula | Key Insight |
|---|---|---|
| Range | $ \text{Max} - \text{Min} $ | Simplest but unstable. |
| Standard Deviation | $ \sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}} $ | Measures average deviation from mean. |
| Variance | $ \sigma^2 = \frac{\sum (x_i - \bar{x})^2}{n} $ | Square of Standard Deviation. |
| Coeff. of Variation | $ \text{C.V.} = \frac{\sigma}{\bar{x}} \times 100 $ | Compares consistency of datasets. |
| Skewness | $ \text{Sk}_p = \frac{\text{Mean} - \text{Mode}}{\sigma} $ | Measures asymmetry. |
| Pearson’s Skewness (Median) | $ \text{Sk}_p = \frac{3(\text{Mean} - \text{Median})}{\sigma} $ | Used when mode is not defined. |
Practice Questions (Try Yourself!)
- For the data: 4, 8, 6, 10, 12, find the Variance and Standard Deviation.
- The mean of a dataset is 30, and the coefficient of variation is 20%. Find the standard deviation.
- If the Mean = 100, Median = 110, and Standard Deviation = 20, calculate Pearson's coefficient of skewness. Is the data left-skewed or right-skewed?