TOPIC 1: APPLICATION OF STATISTICS IN GEOGRAPHY | GEOGRAPHY FORM 5

APPLICATION OF STATISTICS IN GEOGRAPHY

Statistics refers to a scientific and systematic methods of collecting, recording, summarizing, analyzing and representation of numerical data in precise manner.

The study of methods of collecting, recording, summarizing, analyzing and presentation of data in precise manner by using numbers

A science of observing, collecting, recording, summarizing, analyzing and presentation of data in precise manner by using numbers.

Numerical data understood as a body of information which given in numbers. Or Exact numerical facts or figures collected systematically and arranged for a certain purpose.

NATURE OF DATA

Statistical data according to their varied nature

Statistical data according to their varied nature include the following:-

Discrete data

It is a form of statistical data for variables whose values expressed or given in whole numbers. i.e. The data is for cases which do not exist in fractions.

For instance; the data for the number of people which can be given as 102 people who can not be divided into either decimal or fractions

Continuous data

The data for the variables whose values can be expressed in fraction or decimals. In this type of data, any value within the range can be given.

For instance; the data for temperature, rainfall, pressure, distance, growth rate, and other cases which also reflect the same. They are presented in continuity manner of fraction or decimals

Individual data

The set of data which provides specific value to every item in a sample given. For instance; Juma has weight of 47 kg. They consider every item as an important entity and singly presented

Grouped data

It is a form of data which gives values in range or classes. This type of data is of no precise as exact figures are quoted but values range in groups.

The classic example of the grouped data is that of population distribution by age and sex which may appear as follow:-

AGE	FEMALES	MALES
0-9	14,897	14,567
10-19	15,432	14,329
20 – 29	17,987	13,098
30 – 39	16,876	17,654

Statistical data according to scale of measurements

This aspect is considerably on how the values of statistical data are given.

The scale of measurement include the following.

Nominal data

The type of data according to scale of measurement of which the values are given according to the name of items in a given sample. e.g. 10 apples, 5 oranges, 7 mangoes, 5 banana and 2 cherish.

Ordinal data

The data of which the values are given in an order of magnitude of observation in such a way the numbers indicate the rank order among objects. i.e. the values are commonly given in either ascending or descending order e.g. 91, 82, 79, 74, 68, 67, 58, 54 and 49.

The interval data

The data of which values are given in range at regular distance by being grouped. e.g. The data for population distribution by age and sex expressed in interval scale.

Ratio data

The data of which the values given show the number of times items of has relatively to another e.g. 1:3, 2:5, 3:7. e.t.c.

VARIABLES

Variable is an attribute that has values of which fluctuate under a given condition . For instance; production is a considerable variable as whose values change under conditions of policies lie; climate, technology, marketability and other which may make the same.

Variables are considerably varied and are classified into dependent and independent variables.

Dependent variable

Dependent variable is the one whose values fluctuate due to the force of another variable. i.e. the variable whose values change irregularly as controlled by another variable. For instance; production is one among the most pronounced variables as changes due to the force of other variables like climate, level of technology applied, demand of the products produced, and others which might cause it to change.

CLASSIFICATION OF STATISTICS

Statistics being the scientific and systematic methods dealing with numerical facts is broadly categorized into two depending on how data handled. The main broad categories include; descriptive and inferential statistics.

Descriptive Statistics

Descriptive statistics deal with recording, summarization, analyzing and presentation of numerical facts that have been actually collected. The actual collection of data can be like to population by conducting census.

Inferential statistics

Inferential statistics deal with recording, summarization, analyzing and presentation of numerical facts that have been handled by quantifying the uncertainties through prediction e.g. the likely harvest output in the next year or season.

STATISTICAL DATA

As already pointed out, statistical data are understood as the exact numerical facts or figures collected systematically and arranged for a certain purpose or body of information which is usually treated in numerical values.

Statistical data assessed being extremely varied and thus recognized be of different types. The categories of statistical data recognized with regards to their derived sources, varied nature and scale of measurements.

Statistical data according to their varied sources

Data by sources classified into two and include primary and secondary data.

Primary data

These are the numerical facts collected from the field or handled for the first time. i.e. They are the first hand or original information. The data are not available in the existing sources like books. Primary statistical data are handled by the techniques of interview, the use of questionnaires, observation, counting, measurements and other methods.

Secondary data

These are the numerical facts derived from the stored sources. The data were compiled by other people who carried out research. The sources of this type of data include; text books, reference books, magazines, maps, video tapes, audio tapes, and other sources which deliver the same.

Independent variable

Independent variable is the one whose values change on its own without being influenced by another variable. i.e. the variable whose values change steadily and regularly e.g. distance.

SOURCES OF STATISTICAL DATA

The sources of statistical data are simply the techniques employed to gather the numerical facts. These are broadly two and include; the numerical facts. These are broadly two and include; primary and secondary sources.

Some of the primary techniques (sources) providing statistical data include the following:-

Interview method
Questionnaire
Scheduling
Field observation method
Literature review

Interview method

The technique of interview involves the collection of data through the asking of questions verbally by researcher to a respondent.

Is a verbal interaction between an interviewer and interviewee designed to list the information, news, opinion and feelings they have on their own. Generally an interview is an oral organization of questions asked to respondents by a researcher.

Questionnaire method

Questionnaire is a set of research questions printed on a piece of paper then presented to respondents to replay the questions in writing. It is thus; questionnaire method is a way (means) of gathering statistical details done with the use of questionnaires given to the respondents to answer.

Field observation method

It is a method of gathering primary research data which done by a researcher looking over the phenomena. It is of two types and include; participant and non participant observation.

Scheduling method

This method of data collection is very much familiar to questionnaire. But it has little difference to questionnaire. The difference is that, schedule involves a prepared set of questions which are filled in by enumerators who are especially appointed for the purpose and of which carefully selected and trained enough to perform their job well. This method of data collection is very useful for carrying out population census. The secondary sources providing statistical data include

Literature review method

It is a systematic survey of the past documentary sources prepared by other researchers related to the study. The documentary sources include; text books, statistical obstruct census report, research articles, journals, news paper, and official reports.

Other methods for data collection include; measurements, counting and the carrying out of experiments.

Strengths of statistics application in Geography

Application of statistics in geography offers the following vital significance

Summarizes massive information by making more simple and thus, enable the geographers to handle large sets of data.

Statistics facilitate the process of data computation techniques possible in geography

Statistics make easy the process of data comparison. It is so; as it is impossible to make comparison without statistics of the variables to be compared.

Statistics application facilities the process of drawing relationship between the geographical variables like; climate and production, population and time; rainfall and temperature etc.

Application of statistics makes easy the process of data storage inform of numbers, tables, graphs, diagrams, and maps.

Application of statistics makes the geographical data be clearly understood and easy for being analyzed and interpreted.

Statistics enhance validity testing of the geographical models, theories, and concepts to the real world situations.

STATISTICAL MEASURES

Numerical values which make statistics are analyzed or examined to judge their implication (results) by taking into consideration of the statistical measures.

It is thus; statistical measures refer to the computed numerical values used to make data analysis as related to other values in a data set provided.

Statistical measures are numerous but with regards to their nature and roles, broadly divided into the following categories.

Measures of central tendency
Measures of variability

MEASURES OF CENTRAL TENDENCY

These are the measurements which show the central values and include; arithmetic mean, mode and median.

ARITHMETIC MEAN

Arithmetic mean is an average of all values in a set of distribution. It is determined by adding up all values and divided by the sum of observation added. Arithmetic mean is used to assess the distribution value weather was high or low.

Computation of the arithmetic mean

Computation of the arithmetic mean depends up on the nature of data given whether ungrouped or grouped.

For the ungrouped data set; arithmetic mean is computed by applying the following formula

JOIN OUR WHATSAPP GROUP

Where by:

N = The total number of observation added.

Example:

Find the arithmetic mean for the following set of data.

5,7,10,12,13,14,15,7, and 2.

Solution

The arithmetic mean for the given set of data above is calculated as follow:

5+7+10+12+13+14+7+2=85

N = 9

Thus: The Arithmetic mean = 9.4

For the grouped data set; the arithmetic mean is calculated by the following application:

Where by;

X = Class mark

f = Frequency

Example:

Find the arithmetic mean for the following s cores of marks

Class Interval	F	X	fx
91-95	0	93	0
86-90	1	88	88
81-85	6	83	498
76-80	10	78	780
71-75	15	73	1095
66-70	34	68	2312
61-65	22	63	1386
56-60	10	58	580
51-55	2	53	106

Solution:-

According to the given data;

$C:\thlb\cr\tz\Statistics I_files\image008.gif$ = 6845

= 100

Thus; the arithmetic mean = 68.45

Advantages of the Arithmetic mean

It is easy to calculate and the majority of people use to understand it

It is used to check the values if high or low

It can be used for further calculation. For instance; arithmetic mean is used to calculate standard deviation.

Disadvantage of the arithmetic mean

Arithmetic mean has a big weakness of being pulled towards an outlier (extreme scores).

It needs high mathematical knowledge to calculate arithmetic mean for the grouped data set.

MODE

Mode is a value number which occurs most frequently in a data set given

Is the most commonly attained measurement value in a data set

Is the measurement value that appears most in a particular variable among a sample of subjects.

Mode helps us to know concentration of values which can stimulate scientific investigation.

Calculation of a mode

Determination of a mode is depend much up on the nature of data set whether ungrouped or grouped.

For the ungrouped data set; mode is obtained by taking the number that appears most frequently or the one that has highest frequency than the rest

Example;

Determine the mode for the following data set.

2, 4, 2, 2, 5, 6, 4

Value	Concentration
2	3
4	2
5	1
6	1

Thus; the mode for the data set given = 2

Note

Sometimes; a given data set may have more than one modes or no more at all. The one mode obtained in a set of distribution is known as unimodal or monomodal. If two modes obtained from data set; described as bimodal.

Example:

(1) 2, 5, 4, 3, 5, 6, 6, 8, 5, 6.

The modes for the data set are 5 and 6

(2) 4, 9, 8, 5, 6, 7

The given data set has no mode.

For the grouped data; mode is assessed by the following application.

GEOGRAPHY FORM 5-APPLICATION OF STATISTICS IN GEOGRAPHY

Whereby:

· L = The lower limit of the modal class

· t₁= The excess of the modal frequency over the frequency of the next lower class

· t₂ = the excess of the modal frequency over the frequency of the next higher class

· (i) = the class interval

Example;-

The tabled data below shows the score of marks in geography subject test form V students

Class interval	Frequency
40 – 44	7
45 – 49	8
50 – 54	11
55 – 59	10
60 – 64	4

Solution

The mode for the given data set above is calculated as follow:-

According to the given data set;

L = 49.5

t₁ = 3

t₂ = 1

i = 5

Then;

$E:\..\..\thlb\cr\tz\__i__images__i__\image74.png$

49.5 + (0.75 x 5)

49.5 + 3.75 = 53.25

Thus; the mode = 53.25

Advantages of a mode

It helps to make determination of predominance of a certain geographical feature in a place.

It helps to know number of occurrence of the values in data set.

Disadvantages of a mode

It needs high mathematical knowledge to calculate mode for the grouped data set

It is unreliable measures of central tendency as a data set may have more than one modes or no mode at all.

MEDIAN

Median refers to a point value that divides the other values in a set of distribution into two equal parts after to have been arranged in ascending or descending order.

Computation of the median

The computation of the median chiefly depends on the nature of data set given if ungrouped or grouped.

For the ungrouped data set, the calculation of median should further take into account the nature of data set given whether odd or even.

If the ungrouped data set is odd; the median is just the middle value and it is obtained after the value numbers to have been arranged in ascending or descending order.

E.g.

1, 2, 1, 4, 6, 5, 3

Solution

The ascending order of the values is as follow:-

1, 1, 2, 3, 4, 5, 6

Thus; the median = 3.

If the data set is even; median is the average of the two middle values and obtained after the value numbers to have been arranged in ascending descending order.

E.g.

1,4,5,2,7,8,3,2

The ascending order for the values is as follows:-

1,2,2,3,4,5,7,8

Thus; the median = 3.5

Median determination for the grouped data

For the grouped data; median is determined by applying the following formula:-

Where by:-

L = The lower limit of the median class

N = Total number of observation

n_b = the number of elements in the classes below the median class

n_w = number of elements in the median class

i = class interval

Example:-

The tabled data below: shows the score of marks in geography subject for form V students.

Class interval	Frequency
40 – 44	7
45 – 49	8
50 – 54	11
55 – 59	10
60 – 64	4

Example:-

The tabled data below; shows the score of marks in geography subject for form V students.

According to the given data

L = 49.5

N = 40

n_b = 15

n_w = 11

i = 5

n_b = the number of elements in the classes below the median class

n_w = number of elements in the median class

i = class interval

49.5 + (0.45 x 5)

49.5 + 2.25 = 51.75

Thus the median = 51.75

Advantages of median

It helps to understand the middle value among of the numerous values in a certain data set.

It is easy to make determination particularly for the simple data set.

Disadvantages of the median

If the values are numerous, it becomes cumbersome to arrange in ascending or descending order to get the median

It needs high skill to determine median for the grouped data set.

MEASURES OF VARIABILITY

These are the ones which asses the variation of values in data set. The common measures of variability include the following:-

Range
Standard deviation
Variance
Mean deviation

RANGE

Range is the difference between highest and lowest values in a given set of distribution. It is used to assess the existing variation between the highest score and lowest score.

Calculation of the range

Calculation of a range also considers the nature of a data set given whether ungrouped or grouped.

For the ungrouped data set, range is calculated by subtracting the lowest value from the highest value in a data set given.

Example:-

Determine the range for the following data set 4, 2, 3,5, 6,4, 8

Solution

The range for the data set given is computed as following:-

Range = Highest value – lowest value

According to the given data set:-

· Highest value = 8

· Lowest value = 2

· 8 – 2 = 6

· Thus; The range = 6

With the result of range; If it is high implies greater variation. If the range is small, it implies there is small variation.

For the grouped data; range is calculated by subtracting the lowest class mark from the highest subtracting the lowest lower boundary from the highest lower boundary or by subtracting the lowest higher boundary from the highest higher boundary.

Example:-

Determine the range for the following data set.

Class interval

10 – 1415 – 19

20 – 24

25 – 29

30 – 34

35 – 39

Solution

The range for the data set given is calculated as follow:

Range = Highest class mark – Lowest class

Determination of the class mark

Class interval

Class marks

10 – 1415 – 19

20 – 24

25 – 29

30 – 34

35 – 39

1217

According to the computed class marks

· Highest class mark = 37

· Lowest class mark = 12

37 – 12 = 25,

Thus, the range = 25

Advantages of a range

Range gives a quick rough estimate of variability

It is simple to calculate and the majority are much aware with it.

Disadvantages of a range

It considers only two values of highest and lowest and thus not sensitive to the total distribution

It is affected by the extreme values

STANDARD DEVIATION

Deviation is the difference between the value and the mean. It is computed by subtracting a the mean from the value.

Whereby:-

X = value given in a set of distribution

X = average of all values

Standard deviation

refers to the common difference of all values from the mean. It is the root mean square deviation from the mean. It is the measure which determines how far or scattered are the values from the mean.

Standard deviation is represented by sigma symbol of $E:\..\..\thlb\cr\tz\__i__images__i__\rrrt.PNG$

Computation of a standard deviation

Calculation of a standard deviation also depends on the nature of dataset given whether ungrouped or grouped.

For the ungrouped data; standard deviation is calculated by the following application.

Where by:-

X = value in a set of distribution

N = The total number of observation

Example:-

Calculate the standard deviation for the following data set.

3, 2, 1, 4, 6

Solution

Mean determination

X	3	2	1	4	6
X-	-0.2	-1.2	-2.2	0.8	2.8
X-X²	0.0.4	1.44	4.84	0.64	7.84

· $E:\..\..\thlb\cr\tz\__i__images__i__\imag.png$
Then;
$E:\..\..\thlb\cr\tz\__i__images__i__\statistics_11.png$

$E:\..\..\thlb\cr\tz\__i__images__i__\statistics_12.png$

Hence; The SD = 1.541

For the grouped data set; standard deviation is computed by the following application:-

$E:\..\..\thlb\cr\tz\__i__images__i__\imago.png$

Example:-

Calculate the SD for the following set of grouped data.

Class interval	Frequency
40 – 44	7
45 – 49	8
50 – 54	11
55 – 59	10
60 – 64	4

Procedure:

· Determination of the mean

$E:\..\..\thlb\cr\tz\__i__images__i__\statistics_17.png$

Class interval	F	X	Fx
40 – 44	7	42	294
45 – 49	8	47	376
50 – 54	11	52	572
55 – 59	10	57	570
60 – 64	4	62	248

$E:\..\..\thlb\cr\tz\__i__images__i__\imagoy.png$

Hence; 51.5

Then:-

X	42	47	52	57	62
X – X	-9.5	-4.5	0.5	5.5	10.5
(X-X)²	90.25	20.25	0.25	30.25	110.25
F(X – X)²	631.75	162	2.75	302.5	441

$E:\..\..\thlb\cr\tz\__i__images__i__\imagoyu.png$ = 1540

$E:\..\..\thlb\cr\tz\__i__images__i__\imagoyut1.png$ = 40

$E:\..\..\thlb\cr\tz\__i__images__i__\statistic.png$

Thus; The SD = 6.204

Note:-

The square root of SD is known as variance. Its computation is done by the following applications which also consider the nature of data set whether ungrouped or grouped.

For the ungrouped data; variance is computed by the following application:-

$E:\..\..\thlb\cr\tz\__i__images__i__\shio1.png$

MEAN DEVIATION

Mean deviation is the average of all deviation values. Or is the amount by which the individual values deviate from mean irrespective of its sign. It is computed by dividing the sum of all deviations irrespective of signs by the number of observation.

Calculation of mean deviation

Calculation of a mean deviation also depends on the nature of data set given whether ungrouped or grouped.

For the ungrouped data set; the mean deviation is calculated by the following application:-

$E:\..\..\thlb\cr\tz\__i__images__i__\statistic1.png$

Example:-

Determine the mean deviation for the following data set. 4, 7, 8, 2, 9, 6

Solution

Mean determination

$E:\..\..\thlb\cr\tz\__i__images__i__\image1i1.png$

4 + 7 + 8 +2 + 9 + 6 = 36

$E:\..\..\thlb\cr\tz\__i__images__i__\statistics331.png$

Hence; the mean = 6

Deviations determination

X	X – $E:\..\..\thlb\cr\tz\Statistics I_files\image026.gif$	D
4	4 – 6	2
7	7 – 6	1
8	8 – 6	2
2	2 – 6	4
9	9 – 6	3
6	6 – 6	0

The sum of deviations determination.

$E:\..\..\thlb\cr\tz\__i__images__i__\statistic2.png$

· 2 + 1 + 2 +4 + 3 + 0 = 12

Then; $E:\..\..\thlb\cr\tz\__i__images__i__\statistic3.png$

Thus; the mean deviation = 2

For the grouped data set, mean deviation is computed by the following application:-

$E:\..\..\thlb\cr\tz\__i__images__i__\statistic4.png$

Example:-

Class interval	Frequency
40 – 44	7
45 – 49	8
50 – 54	11
55 – 59	10
60 – 64	4

Determination of the mean

$E:\..\..\thlb\cr\tz\__i__images__i__\a420962.png$

Class interval	F	X	Fx
40 – 44	7	42	294
45 – 49	8	47	376
50 – 54	11	52	572
55 – 59	10	57	570
60 – 64	4	62	248

$E:\..\..\thlb\cr\tz\__i__images__i__\statistic5.png$

Hence; The mean = 51.5

Determination of the deviations.

$E:\..\..\thlb\cr\tz\__i__images__i__\statistic6.png$

Where by:

X = Class mark

$E:\..\..\thlb\cr\tz\__i__images__i__\h30.PNG$

X	X – $E:\..\..\thlb\cr\tz\Statistics I_files\image026.gif$	D	F	Fd
42	42 – 51.5	9.5	7	66.5
47	47 – 51.5	4.5	8	36
52	52 – 51.5	0.5	8	36
57	57 – 51.5	5.5	10	55
62	62 – 51.5	10.5	4	42