Trimmed Mean -- Pros and Cons discussed in Math/Questions and Answers at Wizard of Vegas

Wizard
Administrator

Threads: 1539
Posts: 27887

Joined: Oct 14, 2009

October 26th, 2025 at 9:30:45 PM permalink

About ten years ago I purchased a house from a friend. In negotiating the value we looked at comparable houses in the neighborhood. He wanted to remove the lowest recent sale because he said the house was in terrible shape and shouldn't depress the average. I said that was fine if I got to remove the highest recent sale from the average. This is known as a "trimmed mean."

My question concerns the overall quality of an average if the lowest and highest data points are removed from the sample. I believe this is how judging works at the Olympics in events like figure skating.

To make a concrete example, I created 1,000 sets of 10 random variables according to the standard normal distribution. I looked at the average of all 10 as well as the average of the 8 in the middle. I did this many times. The results are inconclusive. I think I need a sampling in the millions.

What are your thoughts? The question for the poll is with standard normal variables, do you prefer to straight mean or a trimmed mean? Multiple votes allowed.

"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)

harris

harris

Threads: 39
Posts: 280

Joined: Jun 30, 2025

October 26th, 2025 at 9:41:28 PM permalink

I think it depends on what is being measured.

For example, the super niche statistic of "typical amount of brisket for lunch eaten by Americans in April 2023" would be skewed by both Jews celebrating Passover and Muslims observing Ramadan, justifying a trimmed mean.

To give another example, arm length among Americans should follow a normal standard distribution but amputees would change the data sample a lot. However, additionally trimming off people with exceptionally long arms might not be fair in this situation.

Whereas when measuring "average time from sunrise to sunset" in a given location, it would never be justified to make a trimmed mean.

harris

harris

Threads: 39
Posts: 280

Joined: Jun 30, 2025

October 26th, 2025 at 10:07:15 PM permalink

To summarize, things relating to humans might be better for trimmed means (like the house example), while artificial things like slot machines do not need trimmed means in my opinion.

AutomaticMonkey

AutomaticMonkey

Threads: 20
Posts: 1243

Joined: Sep 30, 2024

October 26th, 2025 at 10:25:54 PM permalink

Not a fan. (Although there are situations in which I might do it for expediency.) The reason why dates back to something I learned very early in my scientific career.

I was doing an experiment in a lab, measuring the responses of some devices to a pulse of radiation in order to characterize them. On one of them the data, the ratio of the output to the input signal, was all over the place. So I did exactly that, tossed the really weird outliers and took the mean of the rest. No big deal, right?

Then I get cornered by some of my elders and betters, who seemed to take delight in educating this rookie, and the hard way! "What happened here? Why do you have these strange numbers?"

"Oh, just outliers. I didn't include them in the results."

"What do you mean, outliers? Isn't it data you took?"

"Sure."

"And you designed this experiment, right?"

"Yes."

"Didn't you choose the equipment, calculate the expected precision and accuracy of the results, and take all the data yourself?"

"Yes."

"Then why do you have outliers?"

"No idea."

"In that case, you don't understand your experiment. This setup is doing something you did not intend for it to do, therefore nothing about it can be trusted. You have to figure out what's going on and fix it."

And I thought about that for a while. Makes sense. I know what the precision of this equipment is supposed to be, and if I'm getting errors outside of that then something is wrong. It could be malfunctioning, I could be using it wrong, I could have done my calculations wrong... or I have just discovered something new! Whatever it is, I've got to track it down.

Turned out, it was just an alignment problem with some mounts and it had to be put together more neatly than I had thought to be reliable, and that really is no big deal to remedy. But it was very educational- if you've done all the preparation and you're still getting "outliers," there's something going on that you don't understand or didn't know about when you started your endeavor, and your experiment is not going to be answering the exact questions you think it is. So you have to fix it if you want a reliable measurement.

In the case of the house purchase, I would have recalculated the average to exclude houses that are dilapidated, because the criterion was "houses in similar condition" and I assume the house you bought was not in a dilapidated condition. I would also exclude houses that were not sold through a realtor on the open market, because that could include sales between family members at a discount that does not reflect actual market prices. On the other side I would exclude houses that have some special history that will increase their value to some, like having been owned by a celebrity.

AxelWolf

AxelWolf

Threads: 174
Posts: 23043

Joined: Oct 10, 2012

October 27th, 2025 at 12:12:15 AM permalink

Quote: Wizard
About ten years ago I purchased a house from a friend. In negotiating the value we looked at comparable houses in the neighborhood. He wanted to remove the lowest recent sale because he said the house was in terrible shape and shouldn't depress the average. I said that was fine if I got to remove the highest recent sale from the average. This is known as a "trimmed mean."

My question concerns the overall quality of an average if the lowest and highest data points are removed from the sample. I believe this is how judging works at the Olympics in events like figure skating.

To make a concrete example, I created 1,000 sets of 10 random variables according to the standard normal distribution. I looked at the average of all 10 as well as the average of the 8 in the middle. I did this many times. The results are inconclusive. I think I need a sampling in the millions.

What are your thoughts? The question for the poll is with standard normal variables, do you prefer to straight mean or a trimmed mean? Multiple votes allowed.
link to original post

How does one factor in the fact that the price can never be less than zero, but there's basically no limit to the top price?

And how does something like this affect the football over-under bet, where the score can't go under zero, but it's possible to go extremely high?

♪♪Now you swear and kick and beg us That you're not a gamblin' man Then you find you're back in Vegas With a handle in your hand♪♪ Your black cards can make you money So you hide them when you're able In the land of casinos and money You must put them on the table♪♪ You go back Jack do it again roulette wheels turinin' 'round and 'round♪♪ You go back Jack do it again♪♪

Wizard
Administrator

Wizard

Threads: 1539
Posts: 27887

Joined: Oct 14, 2009

October 27th, 2025 at 6:52:34 AM permalink

I collected some more data. One million sets of ten random standard normal variables. Here are the mean distances from the true mean.

Average of all 10 = 0.003450
Average of trimmed 8 =0.003445
Difference = 0.000004

So, the trimmed 8 is very slightly better. However, I think this is within the margin of error.

"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)

billryan

billryan

Threads: 296
Posts: 19814

Joined: Nov 2, 2009

October 27th, 2025 at 7:25:20 AM permalink

I eliminate the top 10% and the bottom 10% and average the remaining 80%.
When dealing with something that doesn't require an exact figure, I found this gives a better 'average" than using 100% of the data.
It's a rule of thumb I was taught as a boy.

The older I get, the better I recall things that never happened

unJon

unJon

Threads: 17
Posts: 5044

Joined: Jul 1, 2018

October 27th, 2025 at 7:31:01 AM permalink

Quote: Wizard
I collected some more data. One million sets of ten random standard normal variables. Here are the mean distances from the true mean.

Average of all 10 = 0.003450
Average of trimmed 8 =0.003445
Difference = 0.000004

So, the trimmed 8 is very slightly better. However, I think this is within the margin of error.
link to original post

I think it would be instructive to report the standard deviation of the all 10 vs trimmed 8 samples over the trials.

I�ve always thought the point of trimming was to reduce the variance of outcomes given a small sample size. Of course over time the variance should balance out as reflected in the averages you report above.

The race is not always to the swift, nor the battle to the strong; but that is the way to bet.

itsmejeff

itsmejeff

Threads: 6
Posts: 134

Joined: Aug 6, 2012

October 27th, 2025 at 7:33:18 AM permalink

Home prices are not random. They are determined by features and location and whatever else people care about. The cheapest home is probably not in competition with the best home even if they are part of same subdivision where the homes are each more identical than the last. The one next to the busy shopping plaza and crack dealership is probably worth less than the one nestled in woods with peaceful tranquilities.

Non-competitive homes should be eliminated as best as possible. Or values weighted to reflect how non-competitive they actually are.

camapl

camapl

Threads: 8
Posts: 713

Joined: Jun 22, 2010

October 27th, 2025 at 1:41:41 PM permalink

Quote: Wizard
I collected some more data. One million sets of ten random standard normal variables. Here are the mean distances from the true mean.

Average of all 10 = 0.003450
Average of trimmed 8 =0.003445
Difference = 0.000004

So, the trimmed 8 is very slightly better. However, I think this is within the margin of error.
link to original post

Your results from simulating a normal distribution sound about right. Now try some other distributions�

One example that I thought of (that won�t balance as nicely), is wait times for a bus. Yet another example like Axel�s where you can�t go below zero, but at the other end, the sky�s the limit.

Let�s say you have a bus that is scheduled to arrive every 15 minutes. You would need 3 buses arriving 10 minutes early to �balance� the one bus that�s 30 minutes late. Let�s say that these happen to be the most extreme data points in a random sample, do you throw out 2 data points (one high and one low)?

It�s a dog eat dog world out there� �or maybe it�s the other way around.

AutomaticMonkey

AutomaticMonkey

Threads: 20
Posts: 1243

Joined: Sep 30, 2024

October 27th, 2025 at 2:42:10 PM permalink

Quote: camapl
Quote: Wizard
I collected some more data. One million sets of ten random standard normal variables. Here are the mean distances from the true mean.

Average of all 10 = 0.003450
Average of trimmed 8 =0.003445
Difference = 0.000004

So, the trimmed 8 is very slightly better. However, I think this is within the margin of error.
link to original post

Your results from simulating a normal distribution sound about right. Now try some other distributions�

One example that I thought of (that won�t balance as nicely), is wait times for a bus. Yet another example like Axel�s where you can�t go below zero, but at the other end, the sky�s the limit.

Let�s say you have a bus that is scheduled to arrive every 15 minutes. You would need 3 buses arriving 10 minutes early to �balance� the one bus that�s 30 minutes late. Let�s say that these happen to be the most extreme data points in a random sample, do you throw out 2 data points (one high and one low)?
link to original post

The thing about real-world measurements is they aren't always independent. The same road conditions affecting one bus will be affecting many of them, and buses on a route are also interchangeable. When a bus on a 15 minute schedule is 30 minutes late, if the problem was with the bus the next bus would have gone around him so it couldn't be more than 15 minutes late. So the problem has to be on the road rather than one particular bus, thus you can't treat the other buses as independent of the one that you know is late.

How this applies to house prices is the same problems affecting one house in a neighborhood could be presented to others there. So let's say a house is discounted because it's loaded with termites and has structural damage. You can be sure that property is not the only place in the town where there are termites. It could be said it's fair to include that discounted price in the mean, because it represents the chance of you too having or getting termites and damage. On the other hand, termites can be inspected for, and with regular inspection and extermination you can prevent a serious termite infestation, and that expense is already factored into the prices of the provably non-termite homes, and if you can prove what you are buying is also non-termite it is fair to compare it only to the other non-termite homes.

SkinnyTony

SkinnyTony

Threads: 5
Posts: 196

Joined: Jul 22, 2025

October 27th, 2025 at 2:46:34 PM permalink

For a sufficiently large sample that is actually random and normally distributed it doesn't matter. The mean and median of the distribution are the same. Therefore, the observed mean and median converge to the same number as the sample gets larger (the median is, in a sense, "super-trimmed" -- you have trimmed all the way down to the one or two numbers in the middle). So whether you don't trim, trim a little, or trim a lot, you should end up with the same result in a large nomally distributed sample.

The more intersting question is the effect on small samples. My instinct tells me that trimming is better, and trimming more is better, all the way to to the point where median is a better measure than mean.

In the house example, there is certainly some nuance. While I still think that median is the best measure, you also have to take the particular house into consideration, because there is nothing that says that the particular house you are looking at is in average condition. A recently re-done roof and a new AC (I live in AZ; I assume that Nevada is similar) could easily be worth $20k or so in the first couple of years of ownership.

RE: judging in the olympics, I think that is (at least partially) to mitigate the effects of bias and corruption, rather than the belief that one measure is statistically better than the other. In other words, it's trying to get around the fact that the scores may not be normally distributed around the "fair" score, rather than trying to come up with a good estimate of the middle of a normal distribution. Though the small sample size also leans things towards the median being the best score IMO, and some trimming is better than none.

When I used to work at (large tech company) they would regress each interver's scores to the mean when deciding whether to hire someone (there were a lot of interviewers, but some are tougher than others, so if you are interviewed by 4-5 people you could easily get lucky or unlucky and get particularly tough or easy interviewers)

	Straight mean		1 vote (33.33%)
	Trimmed mean		No votes (0%)
	I need more information to decide		No votes (0%)
	I should cut my nails more often		No votes (0%)
	Total eclipse reminder 8-12-2045		1 vote (33.33%)
	I didn't care for One Battle after Another		No votes (0%)
	You capitalize "after," Wiz.		No votes (0%)
	You put the comma outside the quotes, Wiz.		No votes (0%)
	I like Gavin Newsom's hair.		1 vote (33.33%)
	Eve is my favorite character in the bible.		No votes (0%)

Trimmed Mean -- Pros and Cons

Recommended online casinos

Poll

Recommended online casinos

Trending Forum Threads