MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
April 28th, 2013 at 4:45:46 PM permalink
If one knows the sample mean and sample variance, the best approximation to the real probability distribtion would be a Gaussian function with said parameters of center and width.

But what if sample skewness and sample kurtosis is known, what would be the best approximating probability distribution ?
dwheatley
dwheatley
  • Threads: 25
  • Posts: 1246
Joined: Nov 16, 2009
April 28th, 2013 at 6:16:31 PM permalink
Both sample skewness and sample kurtosis are biased estimators, so I don't think there is a good answer. However, you can use a maximum likelyhood method to fit a Generalized_normal_distribution
Wisdom is the quality that keeps you out of situations where you would otherwise need it
kubikulann
kubikulann
  • Threads: 27
  • Posts: 905
Joined: Jun 28, 2011
April 29th, 2013 at 7:42:56 AM permalink
Define "best".

Your first sentence is not true: gaussian bell shape is not the "best" approximation for many samples of data.
According to the criterion of "goodness", there are an infinity of answers.

Now, if you expected your data to be gaussian-distributed, but your sample skewness and kurtosis do not fit with the estimated curve, then it is your initial assumption that is to be rejected: it is not gaussian.
You need to ask yourself why you made that assumption to begin with, then what better assumption you can make.
Reperiet qui quaesiverit
MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
April 29th, 2013 at 12:57:04 PM permalink
Thanks for the input, guys. Actually I have an unbiased estimation of the third and fourth cumulant. Originally I thought the name for them were "skewness" and kurtosis - but the latter being divided by the variance of their respective power and hence no unbiased estimator exists. My bad.

So the unbiased estimations of the first four cumulants are known.

What is meant by "best" approximation: In the limit of (mean-free) large numbers N->infininiy, due to the scaling properties of cumulants all but the second cumulant vanishes (unless they diverge in the first place). Hence the law of large numbers always gives the Gaussian distribution as a limit. Now the higher cumulants vanish faster with N than the lower ones, so one could ask the question: What (limiting) probability distribution has all cumulants beyond a certain order exactly zero. If only the first few cumulants are known, that distribution is what I would call "best", or at least an expansion in cumulants.

As I understand, cumulants directly related to the probability distribution. They are the Taylor series coefficients of the logarithm of the probability distribution's Fourier transform. With only the first four non-zero cumulants, the logarithm would be of power 4, hence the Fourier transform of the form exp(polynom of 4th grade). One would simply need the inverse Fourier transform as the "best" probability distribution. Although it might not be positive everywhere - what would be the general solution ?
kubikulann
kubikulann
  • Threads: 27
  • Posts: 905
Joined: Jun 28, 2011
April 30th, 2013 at 2:16:19 AM permalink
Hmm... Either I'm outwitted, or your problem is not clear.
What are you trying to estimate? There must be a random variable somewhere. Are you looking for estimates of its parameters, or for the whole distribution sahpe?

It is the SUM (or the average) of i.i.d. random variables that tends to a Gaussian. Not the original distribution.
Hence, the statistics "sum of X's" and "sum of X²'s" are sufficient only for estimating the first and second moments. Not the distribution.
And nothing like it is available concerning higher moments. You must retort to Max-Likelihood methods, or Bayesian methods.

Thus you must have some sort of a priori idea of what the shape of the original (individual) distribution is.

Sorry. Can't be more specific without the particulars.
Reperiet qui quaesiverit
MangoJ
MangoJ
  • Threads: 10
  • Posts: 905
Joined: Mar 12, 2011
April 30th, 2013 at 10:15:07 AM permalink
It's on sport, a certain type of events I'm looking for a heuristical model. The bets are competitive against other players, so some rough model will be better than no model (i.e. individual guessing) at all.

There are a few hundred samples available, indicating a rather longtail distribution - i.e. there are events many times the sample standard deviation away. Obviously I can't produce much more samples, but I can calculate the sample cumulants. For the model I would like to use a probability distribution which would reproduce the first 4 cumulants as estimated from the data. (If I would restrict myself to only the first 2 cumulants I would choose a Gaussian for this model, but a Gaussian cannot reproduce the observed long tail events). Higher order cumulants are especially sensitive to long-tail distributions, so I would want to look more into this direction.

If that's not possible, I will probably stick to a Hermite-Gaussian basis, and match the the first 3 vectors to reproduce the cumulants.
  • Jump to: