Synthenomics: Inaccurate Estimation in a Gaussian World

Thursday, August 2, 2012

Inaccurate Estimation in a Gaussian World

How accurate is estimation in a Gaussian world?

Critics of finance lavish lots of attention on "fat-tail distributions" and how they call into question the way finance deals with low probability events. Nicholas Nassim Taleb is particularly angered by the way financiers use the Gaussian bell curve, affectionately known as the Great Intellectual Fraud, to "predict" and optimally hedge bets. While sympathetic to this argument, I still find the Gaussian bell curve to be a useful tool to help demonstrate how fragile and unpredictable low probability events are even in a normal world. Working on the problem also proves to be a convenient time to orient myself in R, a statistical package that I will likely be using in my undergraduate research at the University of Michigan this fall.

So here's the problem I want to look at:

Given a sample from a normally distributed population
1) How accurate of an estimate of the population standard deviation is the sample standard deviation?
2) As a result of (1), by how much do you over or under-estimate the probability of tail events?
3) How does the over or underestimation change as the event you're trying to estimate becomes more extreme (ie higher sigma event?)

I've previously written about (2), but the point of this post is to run the full simulation and what are the results.

To start, I generate a 500 element population, which has both a standard deviation and mean of 1.

From here, I take 1000 samples of 50 each, and from each sample I calculate the standard deviation. Below is the distribution of the percent errors of those standard deviations estimates.

As you can see, in spite of the fact that the standard deviation was measured 1000 times, there's still a substantial amount of spread in the distribution of standard deviations. While they average a 3% underestimate, they range from an underestimation of around 40% to an overestimation of over 20%.

This spread in the standard deviation estimate is particularly worrisome when one starts estimating the probability of low probability tail events. Below is the ratio of the actual left tail probability of a 3 sigma event divided by the estimated probability based on the standard deviation estimates. A big number implies an underestimation of the tail risk, while a small number implies an overestimation of the tail risk. Note the skew.

So when we're talking about a 3 sigma event, there's a very sizable risk of a dramatic underestimate. While most of the underestimation is concentrated around 1.14, which corresponds to a 13% underestimation, the ratio can go up to around 15, suggesting that risks can be massively underestimated even in a Gaussian world. Astute readers may note that the estimation error distribution looks to be log-normal, which at least has finite variance. But if a normal population can lead to a log-normal distribution of estimation error, it means that they end up with even more potential for underestimation. As pictured below, a log-normal population means your average underestimate is even higher at around 5 times, while the tail is even fatter.

Now that we've explored part 2, we can look at part 3; how does this change at different levels of standard deviation? I generate a population with a standard deviation , and collect data on it as per the sampling procedure above. I do this over and over again at different populations at different standard deviation levels. Surprisingly, I don't get a robust result for the estimation of error, as it's highly dependent on the population generated. However, most of the time, I do get the general trend for the mean of the log of the error ratio, pictured below. Note that it becomes more and more negative, suggesting that the mean underestimation goes down as the severeity of the tail event increases. This is likely because of the way the skewness works out with the over and under estimation of the standard deviation.

But is this necessarily good news? Not really. While the mean might be tending towards less underestimation, the proportion of underestimated risk stays relatively constant.

While the mean value of the underestimation given the risk is underestimated shoots up.

From these graphs, we can see that the average magnitude of the underestimation is increasing very quickly, while the proportion of underestimation stays relatively constant. It is actually increasing faster than it looks because the y-axis is the log of the ratio. Combined with the fact that the proportion of underestimation is staying constant, this implies the distribution is getting flatter and more dispersed, opening the possibility of catastrophic loss. The mean underestimation likely understates the actual damage that would result, as only one extremely bad underestimation can send the firm bankrupt, along with possibly the rest of the industry.

These graphs show with great detail at how insufficient risk measurement techniques like VaR or even ES are. Small probabilities are hard to estimate, even in a normal, Black-Swan free world. But add some grey swans, fragile balance sheets, and large banks, it's a time bomb that's just waiting to explode.

6 comments:

Blue AuroraAugust 2, 2012 at 10:26 AM
Yi-chuan, have you ever heard of the Cauchy distribution? It's the only probability distribution that fits the time series data for financial markets. It's also very difficult to work with mathematically, given that it has infinite variance.

The late mathematician discusses his fractal view of markets in his book for a popular audience, The (Mis)behavior of Markets. I think that you ought to read it, as it would suit your interests. The book also has a lot of explanatory power with regard to the financial markets.

http://www.amazon.com/The-behavior-Markets-Benoit-Mandelbrot/dp/0465043550
ReplyDelete
Replies
AdamAugust 3, 2012 at 9:54 AM
What estimator are you using for your standard deviation? It looks like it might a biased estimator to me. (Remember, divide by N-1, not N).
ReplyDelete
Replies
mc7447aAugust 6, 2012 at 10:27 AM
In the Gaussian case, the sample variance has a Chi-square distribution. The sample stdev also has a known distribution: http://mathworld.wolfram.com/StandardDeviationDistribution.html

In the general case you can get an asymptotic distribution of the sample variance or stdev via the Delta method (or direct Taylor expansion.)
ReplyDelete
Replies

Add comment

Pages

Thursday, August 2, 2012

Inaccurate Estimation in a Gaussian World

6 comments: