Why Standard Deviation is important to businesses, and what it means
Standard Deviation (SD) is often mentioned in the
financial press with regards to stock returns and valuations, and also
internally at companies when talking about risk and the quality of products. It
is therefore an important concept for Executives to understand; but what is it?
In essence, SD is a measure of how spread out a group of values are - it is a
standard way of knowing what is a small spread and what is a large spread. Why
is this important? If you want to know how typical the current valuation of a
company is, how risky a new product or business line can be, or how accurate
your manufacturing machines are, then these can all be measured with SD.
Conversely, it also gives you a measure to use when looking to reduce the risk
of your business or increase the accuracy of your machines. For example, if
something is normally distributed (many things are - eg most machine errors,
people's heights, often stock returns, etc…), then we can say that a value (eg
measurement, business return, etc…) within 1 SD of the mean has a 68% chance of
occurrence, while one within 2 SDs has a 95% chance of occurrence. See more
detail below.
More detail:
How is Standard Deviation calculated? SD is the
square root of the variance, with the variance just being the average of the
squared differences from the mean. Why do we square the differences? If we
didn't square them, then we'd have the negatives offset by the positives and so
the sum would be zero, while if you used absolute values without squaring them
then you'd end up with the mean (not the variance) which is not representative
of the spread of the numbers.
The formula changes if you have a sample, rather than
a population. If you have data for the entire population, then you
divide by the total number (N) while if you have a sample then you divide by
the total number less one (N - 1). This is a correction for when your data is
only a sample, as if it doesn't represent the entire population then it is
likely to have higher variability. Note that the closer the number of samples
is to the population size (ie the more samples you use), the closer the sample
SD is to the population SD - which is intuitive, the more data you have the
better. In most circumstances you have a set of statistics and are interested
in the standard deviation of the population, so you use sample standard
deviation. Note that in statistics, population is a parameter (ie definite)
while sample is a statistic.
What is a Normal Distribution, and why does it matter?
A Normal Distribution (also called a Bell Curve) is one where: 1) mean = median
= mode (ie most values cluster in the middle), 2) there is symmetry around the
middle, and 3) 50% of values are less than the mean and 50% are greater than
the mean. This is an important concept as many things closely follow a standard
distribution - eg heights of people, IQ scores, errors in measurements, stock
returns (although they often have fatter tails than a normal distribution),
etc… Assuming that data is normally distributed, then you can expect 68% of the
values to be within plus or minus 1 standard deviation of the mean, 95% within
2 standard deviations, and 99.7% within 3 standard deviations. This then gives
relevant confident levels of how likely a value is to be within different
distances from the mean - eg how likely are we to see stock valuations at this
level, or a measurement error of a certain size (very important for
manufacturers).
Source: By Dan
Kernler - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=36506025
What are Z-scores? Z-scores, or standard
scores, is another name for the number of standard deviations a value is away
from the mean. To convert a value to a z-score you just subtract the mean, and
then divide by the standard deviation - this is often called standardizing.
Note that a Z-score of zero indicates that the value is the same as the mean.
Examples of why this is useful include if you are manufacturing a product that
needs to have a certain level of precision in terms of size or weight, you can
fail any output that is not within a specified Z-score (eg 1 SD from the mean).
Using the same example, if your product needs to be a certain weight (eg 2kg,
etc…) you can adjust the machine so that the probability of producing below
this is very unlikely (eg 3 SDs from the mean, or about 0.15% as you only care
about going under the weight so (100% - 99.7%)/2). If you want to lower the
chance of going below this weight, you then have the option to either; 1)
increase the average weight, or 2) make the machine more accurate (ie reduce
the standard deviation).
What is statistical significance - Z-scores vs
P-values? Statistical significance is the chance that a relationship
between variables is more than just chance - it can be a fairly complex area,
and will be covered in more detail in a later post. To understand it, you use a
null hypothesis - the assumption that there is no relationship - which you'll
want to reject. The Z-score is a test of statistical significance which helps
to decide whether you should reject the null hypothesis, while the P-value is
the probability that you have wrongly rejected the null hypothesis (ie you've
found a relationship which doesn't actually exist). Consequently, a large
(positive or negative) Z-score is associated with a low P-score. There is
always a chance that the null hypothesis is right and that there is no
relationship (ie any relationship you've found is just chance), and
consequently you must decide what significance/confidence level you need - or
put another way, the degree of risk you're willing to accept. A common
confidence level used is 95%, which represents a z-score of +/- 1.96 and a
p-value of 0.05. If the Z-score is therefore between +/- 1.96 then the p-value
is above 0.05 and you cannot reject the null hypothesis. If the Z-score is
greater than +/- 1.96 then the null hypothesis can be rejected at this confidence
level. This is useful in many areas, including quality control at manufacturers
and sampling errors in marketing surveys.
Comments
Post a Comment