NDA Maths · Statistics

Foundations + Measures of Central Tendency

A single value that summarises where a dataset is centred — mean, median, or mode.

Why this matters

75 PYQs across 2017–2026 — the biggest subtopic in NDA Statistics. Most questions test linear-transformation effects on the mean, grouped-data calculations, replacement / wrong-value corrections, special-case mean shortcuts, the combined-mean of two groups, or the sum-of-deviations identity. Master the eleven concepts below and you cover the entire EASY + MODERATE bandwidth reliably.

Concept 1 of 17

What is data, and why summarise it?

Intuition

A list of marks like 47, 52, 68, 71, 49, 55 is hard to compare against another class's list. A single representative number — the average, the middle value, the most common — compresses the list into something we can actually reason about.

Definition

Data is the set of observed values of a variable measured on a collection of items. The full collection is the POPULATION; a subset actually observed is a SAMPLE. Statistics builds summary measures from samples to draw conclusions about the population.

Worked example

Class A scored 60, 60, 60 on three tests. Class B scored 40, 60, 80. Both have mean 60. What does the mean MISS about Class B?
  1. Both datasets sum to 180, so the arithmetic mean is identical: 60.
  2. But Class A is constant; Class B varies between 40 and 80.
  3. The mean alone hides the SPREAD. We need a second summary (a dispersion measure) to capture it — that's why this chapter has two parts: tendency and dispersion.
Answer:Mean compresses but loses spread information.

Concept 2 of 17

Types of data — qualitative vs quantitative, discrete vs continuous

Intuition

Some data labels things (blood group, district, brand) — that's QUALITATIVE. Other data measures things (height, marks, count of children) — that's QUANTITATIVE. Quantitative further splits into DISCRETE (whole-number counts) and CONTINUOUS (any value in a range, like 167.4 cm).

Definition

  • Qualitative (categorical): takes labels, not numbers — operations like mean don't apply.
  • Quantitative — discrete: numerical, jumping in integer-sized steps (kids per family, integer marks).
  • Quantitative — continuous: numerical, filling an interval smoothly (height, weight, time).

Worked example

Classify each variable: (i) eye colour, (ii) number of siblings, (iii) running time for 100 m, (iv) shirt size {S, M, L, XL}.
  1. (i) Eye colour — labels (blue, brown, green) — QUALITATIVE.
  2. (ii) Number of siblings — whole-number counts — QUANTITATIVE, DISCRETE.
  3. (iii) Running time — any real number such as 12.47 s — QUANTITATIVE, CONTINUOUS.
  4. (iv) Shirt size — ordered labels (S < M < L < XL) — still QUALITATIVE (ordinal).
Answer:(i) qualitative, (ii) discrete, (iii) continuous, (iv) qualitative (ordinal).

Concept 3 of 17

Frequency and tabulation

Intuition

When values repeat, instead of writing the raw list, we tabulate each unique value with how many times it occurred. That count is the FREQUENCY, written ff. The total number of observations is N=fN = \sum f.

Definition

A frequency distribution lists each distinct value (or class interval) alongside its frequency. The TOTAL FREQUENCY equals the number of observations: N=fiN = \sum f_i. Every chapter formula that involves grouped or repeated data uses this NN, not the count of distinct values.

Total frequency

N=i=1kfiN = \sum_{i=1}^{k} f_i
  • kknumber of distinct values or class intervals
  • fif_ifrequency of the ii-th value/class
  • NNtotal number of observations

Worked example

Marks of 12 students: 4, 5, 4, 6, 5, 7, 5, 4, 7, 6, 5, 4. Build a frequency table and verify NN.
  1. Distinct values in ascending order: 4, 5, 6, 7.
  2. Tally: f(4)=4, f(5)=4, f(6)=2, f(7)=2f(4)=4,\ f(5)=4,\ f(6)=2,\ f(7)=2.
  3. Check N=f=4+4+2+2=12N = \sum f = 4 + 4 + 2 + 2 = 12 — matches the original count.
Answer:f(4)=4, f(5)=4, f(6)=2, f(7)=2; N=12f(4)=4,\ f(5)=4,\ f(6)=2,\ f(7)=2;\ N = 12.
Practice this concept4 quick reps

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    In 5,5,7,8,8,85, 5, 7, 8, 8, 8, what is the frequency of 88?
  2. 2.
    A frequency table has f=2,3,5,4f = 2, 3, 5, 4. Find NN.
  3. 3.
    In 6,6,9,6,96, 6, 9, 6, 9, what is f(6)f(6)?
  4. 4.
    Frequencies 10,12,810, 12, 8 — total observations NN?

Concept 4 of 17

Class marks and class width (grouped data)

Intuition

Continuous data — like heights in cm — gets grouped into INTERVALS such as 150–160, 160–170. We no longer know each exact value, so we treat every observation in an interval as if it sat at the MID-POINT. That mid-point is the CLASS MARK. The interval's width is the CLASS WIDTH.

Definition

For a class interval with lower bound LL and upper bound UU: the CLASS MARK is x=(L+U)/2x = (L + U)/2 and the CLASS WIDTH is h=ULh = U - L. All grouped-data formulas (mean, median, mode, variance) use the class mark as the representative value of the interval.

Class mark and class width

xmark=L+U2h=ULx_{\text{mark}} = \dfrac{L + U}{2} \qquad h = U - L

Worked example

For the class interval 30304040, find the class mark and class width.
  1. Lower bound L=30L = 30, upper bound U=40U = 40.
  2. Class mark: (30+40)/2=35(30 + 40)/2 = 35.
  3. Class width: 4030=1040 - 30 = 10.
Answer:Class mark =35= 35, class width h=10h = 10.
Practice this concept4 quick reps

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Class mark of the interval 20203030?
  2. 2.
    Class width of 40405555?
  3. 3.
    Class mark of 001010?
  4. 4.
    Class width of 100100120120?

Concept 5 of 17

Summation notation Σ

Intuition

Σ\Sigma (capital sigma) is a compact way to write "add up many things". The expression i=1nxi\sum_{i=1}^{n} x_i means: start with i=1i = 1, plug into xix_i, keep going until i=ni = n, and add everything together.

Definition

For a sequence x1,x2,,xnx_1, x_2, \ldots, x_n, i=1nxi=x1+x2++xn\sum_{i=1}^{n} x_i = x_1 + x_2 + \cdots + x_n. Two identities are load-bearing throughout this chapter: i=1nc=nc\sum_{i=1}^{n} c = nc (a constant summed nn times) and (axi+b)=axi+nb\sum (a x_i + b) = a \sum x_i + nb (linearity).

Definition + two identities

i=1nxi=x1++xn,i=1nc=nc,(axi+b)=axi+nb\sum_{i=1}^{n} x_i = x_1 + \cdots + x_n,\quad \sum_{i=1}^{n} c = nc,\quad \sum (a x_i + b) = a\sum x_i + nb

Worked example

If i=110xi=50\sum_{i=1}^{10} x_i = 50, compute i=110(2xi+3)\sum_{i=1}^{10} (2 x_i + 3).
  1. Apply linearity: (2xi+3)=2xi+3\sum (2x_i + 3) = 2 \sum x_i + \sum 3.
  2. Substitute: 250+103=100+30=1302 \cdot 50 + 10 \cdot 3 = 100 + 30 = 130.
Answer:(2xi+3)=130\sum (2 x_i + 3) = 130.
Practice this concept4 quick reps

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    i=143=?\sum_{i=1}^{4} 3 = ?
  2. 2.
    If xi=20\sum x_i = 20, find 2xi\sum 2x_i.
  3. 3.
    If i=15xi=15\sum_{i=1}^{5} x_i = 15, find (xi+2)\sum (x_i + 2).
  4. 4.
    i=13i=?\sum_{i=1}^{3} i = ?

Concept 6 of 17

Weighted vs unweighted counting

Intuition

If the value 77 occurs 4 times in the data, its CONTRIBUTION to the total is 7+7+7+7=287+7+7+7 = 28, not just 77. That's the difference between unweighted (xi\sum x_i) and weighted (fixi\sum f_i x_i) summation. Foreshadows every grouped-data formula in this chapter.

Definition

For raw data, the total is xi\sum x_i and the count is nn. For frequency-tabulated data with distinct values xix_i of frequency fif_i, the total is fixi\sum f_i x_i and the count is N=fiN = \sum f_i. Every measure has a "raw" form (unweighted) and a "grouped" form (weighted) — they're the same idea with frequencies multiplied in.

Worked example

Values 2,4,62, 4, 6 occur with frequencies 3,2,53, 2, 5 respectively. Find the weighted total fixi\sum f_i x_i and the count NN.
  1. Weighted contributions: 23=6, 42=8, 65=302 \cdot 3 = 6,\ 4 \cdot 2 = 8,\ 6 \cdot 5 = 30.
  2. Weighted total: fixi=6+8+30=44\sum f_i x_i = 6 + 8 + 30 = 44.
  3. Total count: N=fi=3+2+5=10N = \sum f_i = 3 + 2 + 5 = 10.
Answer:fixi=44, N=10\sum f_i x_i = 44,\ N = 10.
Practice this concept4 quick reps

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Value 44 with frequency 33 — its contribution to fixi\sum f_i x_i?
  2. 2.
    x=2,5x = 2, 5 with f=3,2f = 3, 2: fixi\sum f_i x_i?
  3. 3.
    Value 1010 with frequency 55 — contribution?
  4. 4.
    x=1,2,3x = 1, 2, 3 with f=4,1,2f = 4, 1, 2: fixi\sum f_i x_i?

Concept 7 of 17

Arithmetic Mean (raw data)

Intuition

The average. Add up every value, then split the total equally among all the observations. It is the single most-used measure when the data is fairly symmetric and free of extreme outliers.

Definition

For nn observations x1,x2,,xnx_1, x_2, \ldots, x_n, the arithmetic mean is the total sum divided by the number of observations.

Arithmetic Mean

xˉ=1ni=1nxi=x1+x2++xnn\bar{x} = \dfrac{1}{n}\sum_{i=1}^{n} x_i = \dfrac{x_1 + x_2 + \cdots + x_n}{n}
  • xˉ\bar{x}the arithmetic mean
  • xix_ithe ii-th observation
  • nnthe total number of observations

Diagram · mean = the balance point

0246810122459mean = 5

Treat each value as equal weight on a beam; the mean is the point where it balances. The pulls on the left (deviations −3, −1) exactly cancel those on the right (0, +4), which is the identity Σ(xᵢ − x̄) = 0. One extreme value drags the balance point toward it — why the mean is sensitive to outliers.

Worked example

Find the arithmetic mean of 4,6,8,10,124, 6, 8, 10, 12.
  1. Add up all the values: 4+6+8+10+12=404 + 6 + 8 + 10 + 12 = 40.
  2. Count the observations: n=5n = 5.
  3. Apply the formula: xˉ=405=8\bar{x} = \dfrac{40}{5} = 8.
Answer:xˉ=8\bar{x} = 8
Practice this conceptself-check · 4 quick reps

Try it yourself

Find the arithmetic mean of 3,6,9,12,15,183, 6, 9, 12, 15, 18.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Mean of 3,5,73, 5, 7?
  2. 2.
    Mean of 10,20,30,4010, 20, 30, 40?
  3. 3.
    Mean of 2,4,4,6,92, 4, 4, 6, 9?
  4. 4.
    Mean of 7,7,7,77, 7, 7, 7?

From the bank · past-year question

Example 7StatisticsMODERATE
The observations 4, 1, 4, 3, 6, 2, 1, 3, 4, 5, 1, 6 are outputs of 12 dices thrown simultaneously. If mm and MM are means of lowest 8 observations and highest 4 observations respectively, then what is (2m+M)(2m+M) equal to?

[Q108 · Apr · 2023]

Outliers move the mean — sometimes a lot

A single very large or very small value shifts the mean noticeably. If you suspect skew, ask the question whether mean or median is the right choice.

Concept 8 of 17

Arithmetic Mean (frequency / grouped data)

Intuition

When values are repeated or grouped into classes, each value has a weight equal to its frequency. The denominator is the total frequency, not the number of distinct classes.

Definition

If the value xix_i occurs with frequency fif_i, the mean is the frequency-weighted sum divided by the total frequency.

Frequency-weighted Mean

xˉ=fixifi\bar{x} = \dfrac{\sum f_i x_i}{\sum f_i}
  • xix_ivalue (or class mark for grouped data)
  • fif_ifrequency of xix_i
  • fi\sum f_itotal frequency = total observations

Worked example

Find the mean of the frequency distribution: x=2,4,6,8x = 2, 4, 6, 8 with frequencies f=3,5,7,5f = 3, 5, 7, 5.
  1. Compute fi=3+5+7+5=20\sum f_i = 3 + 5 + 7 + 5 = 20.
  2. Compute fixi=23+45+67+85=6+20+42+40=108\sum f_i x_i = 2{\cdot}3 + 4{\cdot}5 + 6{\cdot}7 + 8{\cdot}5 = 6 + 20 + 42 + 40 = 108.
  3. Apply the formula: xˉ=10820=5.4\bar{x} = \dfrac{108}{20} = 5.4.
Answer:xˉ=5.4\bar{x} = 5.4
Practice this conceptself-check · 4 quick reps

Try it yourself

Find the mean for x=10,20,30,40x = 10, 20, 30, 40 with frequencies f=1,2,3,4f = 1, 2, 3, 4.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    x=2,4x = 2, 4 with f=1,3f = 1, 3: mean?
  2. 2.
    x=1,2,3x = 1, 2, 3 with f=2,2,2f = 2, 2, 2: mean?
  3. 3.
    x=5,10x = 5, 10 with f=3,1f = 3, 1: mean?
  4. 4.
    x=0,10x = 0, 10 with f=1,1f = 1, 1: mean?

From the bank · past-year question

Example 8StatisticsEASY
The frequency distribution of the marks obtained by students in a Science examination is given below: Marks: 5–15, 15–25, 25–35, 35–45; Number of students: 20, 30, 30, 20. What is the arithmetic mean?

[Q117 · Sep · 2025]

Divide by fi\sum f_i, not by the number of classes

If marks are 20, 30, 30, 20 students across four classes, the divisor is 100 — not 4. This is the single most common arithmetic error on grouped-mean PYQs.

Concept 9 of 17

Linear Transformation of the Mean

Intuition

If you scale every value by aa and shift by bb, the mean scales and shifts in exactly the same way. The mean is a linear operator — constants pass straight through.

Definition

If a new variable yi=axi+by_i = a\,x_i + b is formed from each observation, the new mean is yˉ=axˉ+b\bar{y} = a\,\bar{x} + b.

Linear transformation rule

yi=axi+byˉ=axˉ+by_i = a\,x_i + b \quad\Longrightarrow\quad \bar{y} = a\,\bar{x} + b
  • aascale factor (multiplied)
  • bbshift (added)

Worked example

The mean of 20 observations is 12. If each observation is multiplied by 3 and then 5 is added, find the new mean.
  1. Identify the transformation: yi=3xi+5y_i = 3x_i + 5, so a=3, b=5a = 3,\ b = 5.
  2. Apply the rule: yˉ=axˉ+b=312+5\bar{y} = a\,\bar{x} + b = 3{\cdot}12 + 5.
  3. Compute: yˉ=36+5=41\bar{y} = 36 + 5 = 41.
Answer:yˉ=41\bar{y} = 41
Practice this conceptself-check · 4 quick reps

Try it yourself

If xˉ=10\bar{x} = 10 and yi=2xi5y_i = 2 x_i - 5 for every ii, find yˉ\bar{y}.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Mean is 1010. New mean if each value is multiplied by 22?
  2. 2.
    Mean is 88. New mean if 55 is added to each?
  3. 3.
    Mean is 66. New mean for y=3x1y = 3x - 1?
  4. 4.
    Mean is 1212. New mean for y=x/2y = x/2?

From the bank · past-year question

Example 9StatisticsEASY
The arithmetic mean of 100 observations is 50. If 5 is subtracted from each observation and then divided by 20, then what is the new arithmetic mean?

[Q113 · Apr · 2025]

Shift moves the mean, but not the SD

Adding a constant bb shifts xˉ\bar{x} by bb but leaves the standard deviation unchanged. Multiplying by aa scales both. Don't apply the mean rule to dispersion questions.

Concept 10 of 17

Replacement and Wrong-Value Correction of the Mean

Intuition

When one observation is swapped — either deliberately or because a wrong value was later corrected — the mean shifts by exactly the change in that value divided by nn. No need to recompute from scratch. The same identity handles "k observations are discarded": work with the totals nMnM before and after, the difference is what changed.

Definition

If the mean of nn observations is MM and a single value xx is replaced by yy, the new mean is Mnew=M+yxnM_{\text{new}} = M + \dfrac{y - x}{n}. For a wrong-value correction, xx is what was recorded and yy is the correct value. For discards or additions, nn itself changes — reason about the new total (n±k)Mnew(n \pm k)\,M_{\text{new}} directly.

Replacement rule (single observation, n unchanged)

Mnew=M+yxnM_{\text{new}} = M + \dfrac{y - x}{n}
  • MMoriginal mean
  • nnnumber of observations (unchanged in pure replacement)
  • xxthe value being removed (or wrongly recorded)
  • yythe value taking its place (or the correct one)

Worked example

The mean of 20 observations is 15. One observation was recorded as 8 but the correct value is 28. Find the corrected mean.
  1. Identify the swap: wrong value x=8x = 8, correct value y=28y = 28, n=20n = 20.
  2. Apply the rule: Mnew=15+28820M_{\text{new}} = 15 + \dfrac{28 - 8}{20}.
  3. Compute the correction: 2020=1\dfrac{20}{20} = 1.
  4. Therefore Mnew=15+1=16M_{\text{new}} = 15 + 1 = 16.
Answer:Mnew=16M_{\text{new}} = 16
Practice this conceptself-check · 4 quick reps

Try it yourself

The mean of 25 observations is 30. A value recorded as 18 was actually 43. Find the corrected mean.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Mean of 1010 obs is 55. A value 33 is corrected to 1313. New mean?
  2. 2.
    Mean of 55 obs is 2020. A value 1010 is corrected to 1515. New mean?
  3. 3.
    Mean of 2020 obs is 88. A value 3030 is corrected to 1010. New mean?
  4. 4.
    Mean of 44 obs is 99. A value 55 is corrected to 99. New mean?

From the bank · past-year question

Example 10StatisticsEASY
The average of a set of 15 observations is recorded, but later it is found that for one observation, the digit in the tens place was wrongly recorded as 8 instead of 3. After correcting the observation, the average is

[Q116 · Apr · 2021]

Divide by nn, not by 1

Students often subtract xyx - y directly from MM. The mistake: only ONE of the nn terms changed, so the shift in the average is the change in that one term divided by nn — not the full change.

Discards: work with totals nMnM, not the rule directly

When kk observations are discarded, nn itself changes. Don't try to force the single-replacement formula. Instead: original total =nM= nM, new total =(nk)Mnew= (n-k)M_{\text{new}}, the difference is the sum of the discarded values.

Concept 11 of 17

Special-Case Means — Consecutive Integers, Squares, AP, Binomial

Intuition

NDA loves to ask the mean of a structured sequence — natural numbers in an interval, perfect squares, an AP, values weighted by binomial coefficients. Rather than summing by hand, recognise the structure and use a closed-form shortcut. Saves 60–90 seconds per question.

Definition

Three shortcuts are load-bearing: (a) Mean of consecutive integers from aa to bb is (a+b)/2(a+b)/2. (b) Mean of squares 12,22,,n21^2, 2^2, \ldots, n^2 is (n+1)(2n+1)6\dfrac{(n+1)(2n+1)}{6}. (c) For an AP, the mean equals the average of the first and last terms (or equivalently the middle term). For binomial-weighted means, the denominator is (nk)=2n\sum \binom{n}{k} = 2^n, not the number of terms.

Closed-form means for common sequences

xˉa..b=a+b2k21n=(n+1)(2n+1)6xˉAP=a1+an2\bar{x}_{a..b} = \dfrac{a+b}{2} \qquad \overline{k^2}\big|_{1}^{n} = \dfrac{(n+1)(2n+1)}{6} \qquad \bar{x}_{\text{AP}} = \dfrac{a_1 + a_n}{2}
  • a,ba, bfirst and last integer of an arithmetic run
  • nnnumber of terms (for the squares formula, the upper index)
  • a1,ana_1, a_nfirst and last term of an AP

Worked example

Find the arithmetic mean of 12,22,32,,1321^2, 2^2, 3^2, \ldots, 13^2.
  1. These are the first n=13n = 13 perfect squares, so use the closed form for the mean of squares.
  2. Mean of 121^2 to n2n^2 is (n+1)(2n+1)6\dfrac{(n+1)(2n+1)}{6}.
  3. Substitute n=13n = 13: (14)(27)6=3786=63\dfrac{(14)(27)}{6} = \dfrac{378}{6} = 63.
Answer:Mean =63= 63
Practice this conceptself-check · 4 quick reps

Try it yourself

Find the arithmetic mean of the first 1010 natural numbers.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Mean of 1,2,3,,91, 2, 3, \ldots, 9?
  2. 2.
    Mean of the first five even numbers 2,4,6,8,102, 4, 6, 8, 10?
  3. 3.
    Mean of 10,11,,2010, 11, \ldots, 20?
  4. 4.
    Mean of an AP with first term 44 and last term 1616?

From the bank · past-year question

Example 11StatisticsMODERATE
What is the arithmetic mean of 82,92,102,,1528^2, 9^2, 10^2, \ldots, 15^2?

[Q120 · Apr · 2025]

AP shortcut fails for GPs and other non-uniform spacings

The mean (a1+an)/2(a_1 + a_n)/2 works only because in an AP every term sits at equal distance around the centre. For 1,2,4,8,1, 2, 4, 8, \ldots (GP) the shortcut gives the wrong answer — you must sum properly or use the GP sum formula.

Binomial-weighted means use (nk)=2n\sum \binom{n}{k} = 2^n

When asked the mean of 1,2,,n+11, 2, \ldots, n+1 with frequencies (n0),(n1),,(nn)\binom{n}{0}, \binom{n}{1}, \ldots, \binom{n}{n}, the denominator is 2n2^n (sum of one row of Pascal's triangle) — not the number of distinct values. Use k(nk)=n2n1\sum k \binom{n}{k} = n \cdot 2^{n-1} for the numerator.

Concept 12 of 17

Combined Mean of Two Groups

Intuition

When two datasets with KNOWN sizes and means are pooled, the combined mean is the frequency-weighted average — the sum of the two totals divided by the sum of the two sizes. Plain averaging of the two means works ONLY when both groups have the same size. PYQs love the reverse direction: give you the combined mean and both group means, ask for the size split.

Definition

For group 1 of size n1n_1 with mean M1M_1 and group 2 of size n2n_2 with mean M2M_2, the combined mean of the pooled dataset is the frequency-weighted average. Generalises to kk groups as a weighted average of the group means, with each weight equal to the group's size.

Combined mean of two groups

M12=n1M1+n2M2n1+n2M_{12} = \dfrac{n_1 M_1 + n_2 M_2}{n_1 + n_2}
  • n1,n2n_1, n_2sizes of the two groups
  • M1,M2M_1, M_2means of the two groups
  • M12M_{12}combined mean of the pooled dataset

Worked example

The mean age of 30 men is 40 years and the mean age of 20 women is 35 years. Find the mean age of the combined group.
  1. Identify the groups: n1=30, M1=40, n2=20, M2=35n_1 = 30,\ M_1 = 40,\ n_2 = 20,\ M_2 = 35.
  2. Compute group totals: n1M1=30×40=1200n_1 M_1 = 30 \times 40 = 1200; n2M2=20×35=700n_2 M_2 = 20 \times 35 = 700.
  3. Apply the formula: M12=1200+70030+20=190050M_{12} = \dfrac{1200 + 700}{30 + 20} = \dfrac{1900}{50}.
  4. Compute: M12=38M_{12} = 38 years.
Answer:M12=38M_{12} = 38 years
Practice this conceptself-check · 4 quick reps

Try it yourself

Section A (25 students) has mean marks 72; Section B (35 students) has mean 66. Find the combined mean.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    2020 boys mean 6060, 3030 girls mean 5050. Combined mean?
  2. 2.
    Two equal-size groups, means 4040 and 6060. Combined mean?
  3. 3.
    1010 obs mean 55, 4040 obs mean 1010. Combined?
  4. 4.
    Group of 33 mean 88, group of 11 mean 44. Combined?

From the bank · past-year question

Example 12StatisticsEASY
A data set of nn observations has mean 2M2M, while another data set of 2n2n observations has mean MM. What is the mean of the combined data sets?

[Q105 · Apr · 2020]

Plain average of the two means is wrong unless n1=n2n_1 = n_2

Students average M1M_1 and M2M_2 directly. That gives the correct combined mean ONLY when both groups are the same size. For unequal sizes the larger group pulls the combined mean toward its own mean — which is exactly what the weighted formula encodes.

Reverse-solve: combined + group means give the size ratio

If M12, M1, M2M_{12},\ M_1,\ M_2 are given and you need n1:n2n_1 : n_2, rearrange the formula to n1n2=M2M12M12M1\dfrac{n_1}{n_2} = \dfrac{M_2 - M_{12}}{M_{12} - M_1}. PYQs use this shape with concrete totals (150 students, combined 60 kg, boys 70, girls 55) to test whether you recognise it as one equation in one unknown.

Concept 13 of 17

Median — Middle Value

Intuition

Sort the data and pick the middle. Half the values lie below the median, half lie above. Because it only cares about position, the median ignores extreme values — preferred for skewed data like income or marks.

Definition

For raw data with nn sorted observations, the median is the middle value if nn is odd, and the average of the two middle values if nn is even. For grouped data, use the class-interval formula below.

Median (raw and grouped)

Raw: M={x(n+1)/2n oddxn/2+xn/2+12n evenGrouped: M=L+n2Ffh\text{Raw: } M = \begin{cases} x_{(n+1)/2} & n \text{ odd} \\[4pt] \dfrac{x_{n/2} + x_{n/2+1}}{2} & n \text{ even} \end{cases} \qquad \text{Grouped: } M = L + \dfrac{\tfrac{n}{2} - F}{f}\,h
  • LLlower bound of the median class
  • FFcumulative frequency before the median class
  • fffrequency of the median class
  • hhclass width

Diagram · median = the middle of sorted data

odd n = 7 → single middle35811141822median 11even n = 6 → mean of the two middles479131620median (9+13)/2 = 11

Sort first, then locate the middle position. With an odd count there is one middle value; with an even count the median is the average of the two middle values. It ignores how far the extremes lie — which is why it resists outliers better than the mean.

Worked example

Find the median of 7,3,9,5,11,4,87, 3, 9, 5, 11, 4, 8.
  1. Sort ascending: 3,4,5,7,8,9,113, 4, 5, 7, 8, 9, 11.
  2. Count the observations: n=7n = 7 (odd).
  3. Median is the n+12=4\tfrac{n+1}{2} = 4-th value, which is 77.
Answer:M=7M = 7
Practice this conceptself-check · 4 quick reps

Try it yourself

Find the median of 2,9,4,11,6,15,72, 9, 4, 11, 6, 15, 7.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Median of 3,1,23, 1, 2?
  2. 2.
    Median of 4,8,6,24, 8, 6, 2?
  3. 3.
    Median of 7,3,9,5,117, 3, 9, 5, 11?
  4. 4.
    Median of 10,20,30,4010, 20, 30, 40?

From the bank · past-year question

Example 13StatisticsMODERATE
The following table gives the frequency distribution of number of peas per pea pod of 198 pods: Number of peas: 1,2,3,4,5,6,7; Frequency: 4,33,76,50,26,8,1. What is the median of this distribution?

[Q105 · Apr · 2021]

Always sort before reading off the middle

The median of an unsorted list is not the middle of the original order. PYQs sometimes hand you data in random order to catch this.

Concept 14 of 17

Mode — Most Frequent Value

Intuition

The value that occurs most often. The mode is the only measure of central tendency that makes sense for purely categorical data (colours, blood groups) and the natural answer when the question is "which is the most common?".

Definition

For raw data, the mode is the value with the highest frequency. If multiple values tie for highest, the dataset is multimodal. For grouped data, use the class-interval formula below.

Mode (grouped data)

M0=L+f1f02f1f0f2hM_0 = L + \dfrac{f_1 - f_0}{2f_1 - f_0 - f_2}\,h
  • LLlower bound of the modal class
  • f1f_1frequency of the modal class
  • f0f_0frequency of the class before
  • f2f_2frequency of the class after
  • hhclass width

Diagram · mode = the tallest bar

3A5B2C7D4E

The mode is the value with the highest frequency — category D here. Data can have two modes (bimodal) or none (all equal); the mode is the only average that also works for non-numeric categories.

Worked example

Find the mode of 2,5,3,7,5,8,5,92, 5, 3, 7, 5, 8, 5, 9.
  1. Tally each value's frequency: 55 appears 3 times, every other value appears once.
  2. The highest frequency is 3, achieved only by the value 55.
  3. Therefore the mode is 55.
Answer:M0=5M_0 = 5
Practice this conceptself-check · 4 quick reps

Try it yourself

Find the mode of 3,5,7,5,9,3,5,11,3,33, 5, 7, 5, 9, 3, 5, 11, 3, 3.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Mode of 2,3,3,52, 3, 3, 5?
  2. 2.
    Mode of 7,7,8,9,77, 7, 8, 9, 7?
  3. 3.
    Mode of 1,2,2,3,3,31, 2, 2, 3, 3, 3?
  4. 4.
    Mode of 5,5,6,6,95, 5, 6, 6, 9?

From the bank · past-year question

Example 14StatisticsEASY
If the mode of the scores 10, 12, 13, 15, 15, 13, 12, 10, xx is 15, then what is the value of xx?

[Q119 · Apr · 2021]

Mode can be undefined or multimodal — don't force one answer

If every value occurs exactly once, there is no mode. If two values tie for highest frequency, the data is bimodal and the answer is both values. PYQs use this to test understanding.

Concept 15 of 17

Geometric Mean (GM)

Intuition

Geometric mean is for things that multiply, not add — growth rates, ratios, compound interest. You multiply all the values and take the nn-th root. Equivalent to the average on a log scale.

Definition

For nn positive observations x1,x2,,xnx_1, x_2, \ldots, x_n, the geometric mean is the nn-th root of their product.

Geometric Mean

GM=x1x2xnn=(i=1nxi)1/n\text{GM} = \sqrt[n]{x_1 \, x_2 \, \cdots \, x_n} = \left(\prod_{i=1}^{n} x_i\right)^{1/n}
  • nnnumber of observations (all positive)

Worked example

Find the geometric mean of 44 and 99.
  1. Multiply the values: 4×9=364 \times 9 = 36.
  2. Take the nn-th root with n=2n = 2: 36\sqrt{36}.
  3. Compute: 36=6\sqrt{36} = 6.
Answer:GM=6\text{GM} = 6
Practice this conceptself-check · 4 quick reps

Try it yourself

Find the geometric mean of 4,6,94, 6, 9.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    GM of 22 and 88?
  2. 2.
    GM of 33 and 1212?
  3. 3.
    GM of 1,3,91, 3, 9?
  4. 4.
    GM of 55 and 55?

From the bank · past-year question

Example 15StatisticsMODERATE
The geometric mean of a set of observations is computed as 10. The geometric mean obtained when each observation xix_i is replaced by 3xi43x_i^4 is

[Q114 · Apr · 2021]

GM is only defined for positive numbers

Zero or negative observations break the geometric mean — the product vanishes or the root becomes imaginary. If a PYQ throws a zero or negative into the set, GM is not the right measure.

Concept 16 of 17

Harmonic Mean (HM)

Intuition

Harmonic mean is the right average when the quantity you care about is a rate — like speed when distances are equal, or unit price when money spent each year is the same. It's the reciprocal of the average reciprocal.

Definition

For nn positive observations x1,x2,,xnx_1, x_2, \ldots, x_n, the harmonic mean is nn divided by the sum of the reciprocals.

Harmonic Mean

HM=ni=1n1xi=n1x1+1x2++1xn\text{HM} = \dfrac{n}{\displaystyle\sum_{i=1}^{n} \dfrac{1}{x_i}} = \dfrac{n}{\dfrac{1}{x_1} + \dfrac{1}{x_2} + \cdots + \dfrac{1}{x_n}}
  • nnnumber of observations (all positive)

Worked example

Find the harmonic mean of 44 and 66.
  1. Sum of reciprocals: 14+16=312+212=512\dfrac{1}{4} + \dfrac{1}{6} = \dfrac{3}{12} + \dfrac{2}{12} = \dfrac{5}{12}.
  2. Number of observations: n=2n = 2.
  3. Apply the formula: HM=25/12=2×125=245=4.8\text{HM} = \dfrac{2}{5/12} = 2 \times \dfrac{12}{5} = \dfrac{24}{5} = 4.8.
Answer:HM=4.8\text{HM} = 4.8
Practice this conceptself-check · 4 quick reps

Try it yourself

Find the harmonic mean of 22 and 88.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    HM of 22 and 66?
  2. 2.
    HM of 33 and 66?
  3. 3.
    HM of 44 and 44?
  4. 4.
    Which is largest for distinct positives: AM, GM, or HM?

From the bank · past-year question

Example 16StatisticsEASY
If the harmonic mean of 60 and xx is 48, then what is the value of xx?

[Q114 · Sep · 2021]

Order is always AMGMHM\text{AM} \geq \text{GM} \geq \text{HM}

For any set of positive numbers, this inequality is strict unless every observation is equal. If your computed HM exceeds GM or AM, you made an arithmetic error.

GM2=AM×HM\text{GM}^2 = \text{AM} \times \text{HM} for two numbers

For exactly two positive numbers, the geometric mean is the geometric mean of the arithmetic and harmonic means: GM2=AMHM\text{GM}^2 = \text{AM} \cdot \text{HM}. When a PYQ gives you two of {AM,GM,HM}\{\text{AM}, \text{GM}, \text{HM}\} for a pair (e.g. 5HM=4GM5\,\text{HM} = 4\,\text{GM}), use this identity to recover the third without solving for the original numbers — much faster than setting up two equations in m,nm, n.

Concept 17 of 17

Sum of Deviations & the Empirical Relation

Intuition

Two identities every NDA aspirant should reflexively know. First: the deviations of all observations from their mean always sum to zero. Second: for moderately skewed unimodal data, mode, median and mean lie in a fixed empirical ratio.

Definition

(xixˉ)=0\sum (x_i - \bar{x}) = 0 for any dataset — this is a defining property of the mean. The empirical relation Mode3Median2Mean\text{Mode} \approx 3\,\text{Median} - 2\,\text{Mean} holds approximately for moderately skewed unimodal distributions and is used to recover the third measure when two are known.

Two identities to memorise

i=1n(xixˉ)=0andMode3Median2Mean\sum_{i=1}^{n}(x_i - \bar{x}) = 0 \qquad \text{and} \qquad \text{Mode} \approx 3\,\text{Median} - 2\,\text{Mean}

Diagram · mean, median & mode under skew

ModeMedianMeanlong tail →

With a long right tail (positive skew) the mean is dragged toward it, giving Mode < Median < Mean (the order reverses for a left tail). This is the basis of the empirical relation Mode ≈ 3·Median − 2·Mean. For any data, Σ(xᵢ − x̄) = 0 — deviations above and below the mean always cancel.

Worked example

If the mean of 5 numbers is 10, what is the sum of deviations of the numbers from their mean?
  1. Use the identity (xixˉ)=0\sum(x_i - \bar{x}) = 0, which holds for any dataset.
  2. Verify by expansion: (xixˉ)=xinxˉ=nxˉnxˉ=0\sum(x_i - \bar{x}) = \sum x_i - n\bar{x} = n\bar{x} - n\bar{x} = 0.
  3. Plugging in n=5, xˉ=10n = 5,\ \bar{x} = 10 gives 5050=050 - 50 = 0.
Answer:Sum of deviations =0= 0
Practice this conceptself-check · 4 quick reps

Try it yourself

For a moderately skewed unimodal distribution, the mean is 3030 and the median is 2828. Use the empirical relation to find the mode.

Practice — Level 1 (4 reps)

Quick reps to lock in the method. Try each, then check.

  1. 1.
    Sum of deviations of any dataset about its own mean?
  2. 2.
    Mean 3030, median 2727. Mode by the empirical relation?
  3. 3.
    Mean of 77 numbers is 44. Find (xi4)\sum (x_i - 4).
  4. 4.
    Mode 1212, mean 1818. Median by the empirical relation?

From the bank · past-year question

Example 17StatisticsEASY
What is the sum of deviations of the variate values 73, 85, 92, 105, 120 from their mean?

[Q107 · Apr · 2021]

Sum of deviations is zero only about the mean

About any other reference point cc, the sum equals xinc=n(xˉc)\sum x_i - nc = n(\bar{x} - c) — non-zero unless c=xˉc = \bar{x}. PYQs often plant a non-mean reference point to test exactly this.

Empirical relation is approximate, not exact

It works for moderately skewed unimodal data. For symmetric data (mean = median = mode) it is trivially true. For multimodal or heavily skewed data it can be misleading.

Summary — formulas & gotchas at a glance

A revision cheat-sheet for the formulas and gotchas above. Click any concept name to jump back to its full explanation.

Formulas (14)

  • Frequency and tabulation

    Total frequency

    N=i=1kfiN = \sum_{i=1}^{k} f_i
  • Class marks and class width (grouped data)

    Class mark and class width

    xmark=L+U2h=ULx_{\text{mark}} = \dfrac{L + U}{2} \qquad h = U - L
  • Summation notation Σ

    Definition + two identities

    i=1nxi=x1++xn,i=1nc=nc,(axi+b)=axi+nb\sum_{i=1}^{n} x_i = x_1 + \cdots + x_n,\quad \sum_{i=1}^{n} c = nc,\quad \sum (a x_i + b) = a\sum x_i + nb
  • Arithmetic Mean (raw data)

    Arithmetic Mean

    xˉ=1ni=1nxi=x1+x2++xnn\bar{x} = \dfrac{1}{n}\sum_{i=1}^{n} x_i = \dfrac{x_1 + x_2 + \cdots + x_n}{n}
  • Arithmetic Mean (frequency / grouped data)

    Frequency-weighted Mean

    xˉ=fixifi\bar{x} = \dfrac{\sum f_i x_i}{\sum f_i}
  • Linear Transformation of the Mean

    Linear transformation rule

    yi=axi+byˉ=axˉ+by_i = a\,x_i + b \quad\Longrightarrow\quad \bar{y} = a\,\bar{x} + b
  • Replacement and Wrong-Value Correction of the Mean

    Replacement rule (single observation, n unchanged)

    Mnew=M+yxnM_{\text{new}} = M + \dfrac{y - x}{n}
  • Special-Case Means — Consecutive Integers, Squares, AP, Binomial

    Closed-form means for common sequences

    xˉa..b=a+b2k21n=(n+1)(2n+1)6xˉAP=a1+an2\bar{x}_{a..b} = \dfrac{a+b}{2} \qquad \overline{k^2}\big|_{1}^{n} = \dfrac{(n+1)(2n+1)}{6} \qquad \bar{x}_{\text{AP}} = \dfrac{a_1 + a_n}{2}
  • Combined Mean of Two Groups

    Combined mean of two groups

    M12=n1M1+n2M2n1+n2M_{12} = \dfrac{n_1 M_1 + n_2 M_2}{n_1 + n_2}
  • Median — Middle Value

    Median (raw and grouped)

    Raw: M={x(n+1)/2n oddxn/2+xn/2+12n evenGrouped: M=L+n2Ffh\text{Raw: } M = \begin{cases} x_{(n+1)/2} & n \text{ odd} \\[4pt] \dfrac{x_{n/2} + x_{n/2+1}}{2} & n \text{ even} \end{cases} \qquad \text{Grouped: } M = L + \dfrac{\tfrac{n}{2} - F}{f}\,h
  • Mode — Most Frequent Value

    Mode (grouped data)

    M0=L+f1f02f1f0f2hM_0 = L + \dfrac{f_1 - f_0}{2f_1 - f_0 - f_2}\,h
  • Geometric Mean (GM)

    Geometric Mean

    GM=x1x2xnn=(i=1nxi)1/n\text{GM} = \sqrt[n]{x_1 \, x_2 \, \cdots \, x_n} = \left(\prod_{i=1}^{n} x_i\right)^{1/n}
  • Harmonic Mean (HM)

    Harmonic Mean

    HM=ni=1n1xi=n1x1+1x2++1xn\text{HM} = \dfrac{n}{\displaystyle\sum_{i=1}^{n} \dfrac{1}{x_i}} = \dfrac{n}{\dfrac{1}{x_1} + \dfrac{1}{x_2} + \cdots + \dfrac{1}{x_n}}
  • Sum of Deviations & the Empirical Relation

    Two identities to memorise

    i=1n(xixˉ)=0andMode3Median2Mean\sum_{i=1}^{n}(x_i - \bar{x}) = 0 \qquad \text{and} \qquad \text{Mode} \approx 3\,\text{Median} - 2\,\text{Mean}

Watch out for (16)

Mastery check — 5 interleaved questions

Try each one before clicking. Questions are interleaved across the concepts above, not grouped — interleaving sharpens transfer.

Example 1StatisticsEASY
Given that the arithmetic mean and standard deviation of a sample of 15 observations are 24 and 0 respectively. Then which one of the following is the arithmetic mean of the smallest five observations in the data?

[Q106 · Sep · 2017]

Example 2StatisticsEASY
Consider the following frequency distribution for the items that follow: Class: 0-20, 20-40, 40-60, 60-80, 80-100; Frequency: 17, p+qp+q, 32, p3qp-3q, 19. The total frequency is 120. The mean is 50.
If the frequency of each class is doubled, then what would be the mean?

[Q120 · Apr · 2022]

Example 3StatisticsMODERATE
The mean of 100 observations is 50 and the standard deviation is 10. If 5 is subtracted from each observation and then it is divided by 4, then what will be the new mean and the new standard deviation respectively?

[Q109 · Apr · 2019]

Example 4StatisticsEASY
The mean of the series x1,x2,,xnx_1,x_2,\ldots,x_n is xˉ\bar{x}. If xnx_n is replaced by kk, then what is the new mean?

[Q112 · Sep · 2024]

Example 5StatisticsMODERATE
Let x be the mean of squares of first n natural numbers and y be the square of mean of first n natural numbers. If xy=5542\frac{x}{y} = \frac{55}{42}, then what is the value of n?

[Q101 · Sep · 2022]

Drill every past-year question on this subtopic

75 questions from the bank — paginated, with cart and Word-export support.

Related notes