Why Standard Deviation is important to businesses, and what it means

Standard Deviation (SD) is often mentioned in the financial press with regards to stock returns and valuations, and also internally at companies when talking about risk and the quality of products. It is therefore an important concept for Executives to understand; but what is it? In essence, SD is a measure of how spread out a group of values are - it is a standard way of knowing what is a small spread and what is a large spread. Why is this important? If you want to know how typical the current valuation of a company is, how risky a new product or business line can be, or how accurate your manufacturing machines are, then these can all be measured with SD. Conversely, it also gives you a measure to use when looking to reduce the risk of your business or increase the accuracy of your machines. For example, if something is normally distributed (many things are - eg most machine errors, people's heights, often stock returns, etc…), then we can say that a value (eg measurement, business return, etc…) within 1 SD of the mean has a 68% chance of occurrence, while one within 2 SDs has a 95% chance of occurrence. See more detail below.

More detail:
How is Standard Deviation calculated? SD is the square root of the variance, with the variance just being the average of the squared differences from the mean. Why do we square the differences? If we didn't square them, then we'd have the negatives offset by the positives and so the sum would be zero, while if you used absolute values without squaring them then you'd end up with the mean (not the variance) which is not representative of the spread of the numbers.


The formula changes if you have a sample, rather than a population. If you have data for the entire population, then you divide by the total number (N) while if you have a sample then you divide by the total number less one (N - 1). This is a correction for when your data is only a sample, as if it doesn't represent the entire population then it is likely to have higher variability. Note that the closer the number of samples is to the population size (ie the more samples you use), the closer the sample SD is to the population SD - which is intuitive, the more data you have the better. In most circumstances you have a set of statistics and are interested in the standard deviation of the population, so you use sample standard deviation. Note that in statistics, population is a parameter (ie definite) while sample is a statistic.



What is a Normal Distribution, and why does it matter? A Normal Distribution (also called a Bell Curve) is one where: 1) mean = median = mode (ie most values cluster in the middle), 2) there is symmetry around the middle, and 3) 50% of values are less than the mean and 50% are greater than the mean. This is an important concept as many things closely follow a standard distribution - eg heights of people, IQ scores, errors in measurements, stock returns (although they often have fatter tails than a normal distribution), etc… Assuming that data is normally distributed, then you can expect 68% of the values to be within plus or minus 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. This then gives relevant confident levels of how likely a value is to be within different distances from the mean - eg how likely are we to see stock valuations at this level, or a measurement error of a certain size (very important for manufacturers).

Source: By Dan Kernler - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=36506025

What are Z-scores? Z-scores, or standard scores, is another name for the number of standard deviations a value is away from the mean. To convert a value to a z-score you just subtract the mean, and then divide by the standard deviation - this is often called standardizing. Note that a Z-score of zero indicates that the value is the same as the mean. Examples of why this is useful include if you are manufacturing a product that needs to have a certain level of precision in terms of size or weight, you can fail any output that is not within a specified Z-score (eg 1 SD from the mean). Using the same example, if your product needs to be a certain weight (eg 2kg, etc…) you can adjust the machine so that the probability of producing below this is very unlikely (eg 3 SDs from the mean, or about 0.15% as you only care about going under the weight so (100% - 99.7%)/2). If you want to lower the chance of going below this weight, you then have the option to either; 1) increase the average weight, or 2) make the machine more accurate (ie reduce the standard deviation).



What is statistical significance - Z-scores vs P-values? Statistical significance is the chance that a relationship between variables is more than just chance - it can be a fairly complex area, and will be covered in more detail in a later post. To understand it, you use a null hypothesis - the assumption that there is no relationship - which you'll want to reject. The Z-score is a test of statistical significance which helps to decide whether you should reject the null hypothesis, while the P-value is the probability that you have wrongly rejected the null hypothesis (ie you've found a relationship which doesn't actually exist). Consequently, a large (positive or negative) Z-score is associated with a low P-score. There is always a chance that the null hypothesis is right and that there is no relationship (ie any relationship you've found is just chance), and consequently you must decide what significance/confidence level you need - or put another way, the degree of risk you're willing to accept. A common confidence level used is 95%, which represents a z-score of +/- 1.96 and a p-value of 0.05. If the Z-score is therefore between +/- 1.96 then the p-value is above 0.05 and you cannot reject the null hypothesis. If the Z-score is greater than +/- 1.96 then the null hypothesis can be rejected at this confidence level. This is useful in many areas, including quality control at manufacturers and sampling errors in marketing surveys. 

Comments

Popular posts from this blog

ESG - What matters to investors?