dwheatley
dwheatley
  • Threads: 25
  • Posts: 1246
Joined: Nov 16, 2009
November 3rd, 2010 at 8:41:39 AM permalink
Suppose the true population supporting Angle is 45% (theta = .45). Before the election day, you poll over and over again. Every poll will give you a different result, and each poll can be used to create a CI. In our case, we used one poll to create the .48 - .56 CI. This poll did not happen to contain theta, but that's ok, since only 95% of polls will create a CI that does contain theta.

The key is that you can't use one poll to create an accurate prediction of the probability of winning. Even many polls would only give you an idea of what theta is.
Wisdom is the quality that keeps you out of situations where you would otherwise need it
DorothyGale
DorothyGale
  • Threads: 40
  • Posts: 639
Joined: Nov 23, 2009
November 3rd, 2010 at 8:47:09 AM permalink
Quote: dwheatley

The key is that you can't use one poll to create an accurate prediction of the probability of winning. Even many polls would only give you an idea of what theta is.

The whole point of polls (even one poll) is to predict the probability of winning. Otherwise, Nate Silver and Mr. Rasmussen would both be out of a job.

In the case of Nevada, Sharon Angle was a wholesale idiot, but she was also a populist idiot. I think it was the way the LVRJ question was asked that skewed the result: "Would you vote for someone who thinks murdering babies is okay, like Harry Reid, or would you vote for someone who has never murdered a baby, like Sharon Angle?"

--Ms. D.
"Who would have thought a good little girl like you could destroy my beautiful wickedness!"
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 3rd, 2010 at 9:32:04 AM permalink
Quote: dwheatley

Suppose the true population supporting Angle is 45% (theta = .45). Before the election day, you poll over and over again. Every poll will give you a different result, and each poll can be used to create a CI. In our case, we used one poll to create the .48 - .56 CI. This poll did not happen to contain theta, but that's ok, since only 95% of polls will create a CI that does contain theta.

The key is that you can't use one poll to create an accurate prediction of the probability of winning. Even many polls would only give you an idea of what theta is.



I can agree that if the polling is done correctly and equally, then 95% of polls will contain the true a (where a is the true ratio of Angle votes, per my last post). So you if two separate polls were conducted, they would likely have different ranges. However, I maintain that both could still have a 95% chance of being right, if considered independently (one at a time).

Consider this example. There is a bag with three marbles. One is red, one is blue, and the third is either red or blue with 50/50 chance. John draws one marble, notes it is red, puts it back, and leaves. Then Jane comes in, draws one, and notes it is blue. They both know the odds the bag is RRB or RBB with 50/50 chance before the draw. John would say the probability the bag is RRB is 2/3. Jane would say the probability of RRB is 1/3. As long as they don't compare results, are not both correct?
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 3rd, 2010 at 10:24:50 AM permalink
Opening confession: this post is a distraction from the primary theme of this thread.

Wizard, your question about marbles in a bag reminded me about something from back about 1994. I actually taught a statistics course for one term -- the primary excuse for such behavior was that no one else who was available was willing to take it on. I did not know the material well enough to be teaching it, and most of what I knew back then I have forgotten now.

One class session I conducted an experiment with a bag containing several hundred marbles, some portion black and some white. I had each student reach in the bag, stir the marbles, and draw unseen a sample of (I think 5 or maybe 10) marbles from the bag. We recorded the number of black/white in the sample and returned the marbles to the bag. After we collected quite a few of these samples, we did some calculations about the likely distribution of black and white in the bag. We then considered the probable characteristics of other samples that we might draw: "If we draw a sample of ten marbles, what is the probability that there will be more than seven white ones?" We also considered confidence intervals in light of our knowledge of the distribution.

Finally, I asked a simple, three-part question: If I draw a single marble from the bag, what are the probabilities that it will be (a) black, (b) white, or (c) red? I think that the equations that we had been using indicated that the probability of "red" was zero, with a calculated uncertainty of zero. After we discussed that, I rummaged through the bag and pulled out the sole red marble that was in the bag. It had never been seen in any sample, but there was clearly a small but non-zero probability of selecting it in a sample of size 1. So the question for discussion became: "Is something wrong with our statistical equations or methods that led us to believe that there was no chance at all of finding a red marble?"
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 3rd, 2010 at 10:32:31 AM permalink
Quote: Doc

If I draw a single marble from the bag, what are the probabilities that it will be (a) black, (b) white, or (c) red? I think that the equations that we had been using indicated that the probability of "red" was zero, with a calculated uncertainty of zero. After we discussed that, I rummaged through the bag and pulled out the sole red marble that was in the bag. It had never been seen in any sample, but there was clearly a small but non-zero probability of selecting it in a sample of size 1. So the question for discussion became: "Is something wrong with our statistical equations or methods that led us to believe that there was no chance at all of finding a red marble?"



Wouldn't the maximum likelihood estimator for the number of red marbles be zero? I would have also said the probability of a red marble is zero, but would not have felt producing the red marble made me wrong. My answer of zero would have been the best I could do with the information available. Much as I don't think Angle's defeat yesterday made my 85% prediction for Angle incorrect.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 3rd, 2010 at 11:41:07 AM permalink
Quote: Wizard

Wouldn't the maximum likelihood estimator for the number of red marbles be zero? I would have also said the probability of a red marble is zero, but would not have felt producing the red marble made me wrong. My answer of zero would have been the best I could do with the information available. Much as I don't think Angle's defeat yesterday made my 85% prediction for Angle incorrect.

I recall what bothered me most back when I ran that experiment in class was that not only did we estimate a zero probability for red, but we also had an equation (from the text, but I don't remember it now) that suggested there was zero uncertainty in that estimate. Clearly wrong.

I think that essentially the equations said that, based on our fairly large number of samples, the probability distribution for the number "red" marbles had to be a mean of exactly zero with a variance of exactly zero. All this because we had never seen a red marble ever in the experiment. The conclusion would have been correct, if the question were about a yellow marble, because there truly were none of those in the population.

Of course, my little demonstration would have dropped like a lead balloon if one of the students had found the red marble in their initial sample. I did a number of other experiments intended to make students think more deeply about their statistical conclusions and assumptions rather than just take the answers casually, as we often do will poll results. I might have posted here previously about a dice experiment we conducted, or maybe about my deck of cards demonstration on day 1 of the class. I would ramble on about those here, but I have already distracted the thread enough, I suppose.
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 3rd, 2010 at 11:24:16 PM permalink
With no observations, wouldn't the variance of reds be undefined?
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
dwheatley
dwheatley
  • Threads: 25
  • Posts: 1246
Joined: Nov 16, 2009
November 4th, 2010 at 6:58:09 AM permalink
I still haven't found a great explanation as to WHY you can't use a CI to predict the actual value of the parameter you are estimating, but this section I'm quoting from onlinestatbook.com takes a good shot:

"It is natural to interpret a 95% confidence interval as an interval with a 0.95 probability of containing the population mean. However, the proper interpretation is not that simple. One problem is that the computation of a confidence interval does not take into account any other information you might have about the value of the population mean. For example, if numerous prior studies had all found sample means above 110, it would not make sense to conclude that there is a 0.95 probability that the population mean is between 72.85 and 107.15. What about situations in which there is no prior information about the value of the population mean? Even here the interpretation is complex. The problem is that there can be more than one procedure that produces intervals that contain the population parameter 95% of the time. Which procedure produces the "true" 95% confidence interval? Although the various methods are equal from a purely mathematical point of view, the standard method of computing confidence intervals has two desirable properties: each interval is symmetric about the point estimate and each interval is contiguous. Recall from the introductory section in the chapter on probability that for some purposes, probability is best thought of as subjective. It is reasonable, although not required by the laws of probability, that one adopt a subjective probability of 0.95 that a 95% confidence interval as typically computed contains the parameter in question."

There's another section that shows how to calculate a CI from a poll (which I think you followed properly for your CI answer)

http://onlinestatbook.com/chapter8/proportion_ci.html
Wisdom is the quality that keeps you out of situations where you would otherwise need it
dwheatley
dwheatley
  • Threads: 25
  • Posts: 1246
Joined: Nov 16, 2009
November 4th, 2010 at 7:18:46 AM permalink
Quote: Wizard

With no observations, wouldn't the variance of reds be undefined?



I think the temptation is to use the standard error estimate calculalation, which uses the point estimate p = 0 as a parameter. The result is an estimated standard error of... 0. You can adjust the CI by .5/N, where N is the sample size. That might allow the CI to incorporate the possibility of 1.
Wisdom is the quality that keeps you out of situations where you would otherwise need it
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 4th, 2010 at 7:53:08 AM permalink
Quote: Wizard

With no observations, wouldn't the variance of reds be undefined?

That certainly would make sense. And my point even back in 1994 was that the conclusions that we were reaching didn't make sense. If I could remember the specific equations from way back then or still had the textbook, I could likely be clearer about this. I don't now remember whether I decided that the text equation was flat out wrong in this situation or was just presented in a manner that lead the reader to reach incorrect conclusions.

For some reason, when teaching that course just that one time, I placed a fair amount of emphasis on the risks of misapplication of statistics -- perhaps bogus statistics -- and how often that occurs. My other, perhaps-non-standard emphasis was on the risks of making unjustified, and usually unstated and unrecognized, assumptions. I found that those frequently were about the randomness of a sample.


Later edit:
Quote: dwheatly

I think the temptation is to use the standard error estimate calculalation, which uses the point estimate p = 0 as a parameter. The result is an estimated standard error of... 0. You can adjust the CI by .5/N, where N is the sample size. That might allow the CI to incorporate the possibility of 1.

I did not see your post until after writing my comments above. This may indeed have been the error I/we made back then. I just cannot remember the details after such a long time -- only the concept.

Even later edit:
Comment deleted -- brain glitch.
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 4th, 2010 at 7:43:35 PM permalink
Following is an ammended part of my upcoming "Ask the Wizard" answer. Any complaints?

I'm told it would be mathematically incorrect to phrase that as "Angle's share of all Angle/Reid votes has a 95% chance of falling between 48.08% and 56.17%." That was how I originally phrased by answer, but two statisticians recoiled in horror at my wording. They said I had to use the passive voice, and say that "48.08% and 56.17% will surround Angle's share with 95% probability." To be honest with you, it sounds the same to me. However, they stressed that the confidence interval is random and Angle's share is immutable, and that my original wording implied the opposite. Anyway, I hope the Bayesian statisticians out there will be satisfied with the second wording.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
matilda
matilda
  • Threads: 3
  • Posts: 317
Joined: Feb 4, 2010
November 4th, 2010 at 10:01:51 PM permalink
Quote: Wizard

Following is an ammended part of my upcoming "Ask the Wizard" answer. Any complaints?

I'm told it would be mathematically incorrect to phrase that as "Angle's share of all Angle/Reid votes has a 95% chance of falling between 48.08% and 56.17%." That was how I originally phrased by answer, but two statisticians recoiled in horror at my wording. They said I had to use the passive voice, and say that "48.08% and 56.17% will surround Angle's share with 95% probability." To be honest with you, it sounds the same to me. However, they stressed that the confidence interval is random and Angle's share is immutable, and that my original wording implied the opposite. Anyway, I hope the Bayesian statisticians out there will be satisfied with the second wording.



Please hold off on posting on "ask the wizard". I will write something tomorrow.
matilda
matilda
  • Threads: 3
  • Posts: 317
Joined: Feb 4, 2010
November 5th, 2010 at 7:37:59 AM permalink
The quote that you attribute to the statisticians is not what was said. You can either cite an actual quote or state that you are paraphrasing. Either way you should include the concept that the 95% probability refers to the probability of selecting the interval and not the probability of selecting Angle's share.

Where did the Bayesian comment come from? It doesn't make any sense. The construction of a confidence interval is part of "classical" statistics. I speculate that no true Bayesian would accept a classical interval no matter how it was described. But I might be wrong. Perhaps you can find out at

http://www.bayesian.org/

also read http://en.wikipedia.org/wiki/Credible_interval for a good statement of the meaning of a confidence interval in the second section.

Matilda
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 5th, 2010 at 8:45:39 AM permalink
Thank you for your comments. Maybe I had the Bayesians and classical statisticians reversed. Should I be calling them Bayesians and Frequentists? From my reading of your links, I tend to favor the Bayesian side. I never knew about the distinction before this thread. I had an Email exchange with an FSA actuary on this topic. Here were some of his remarks, which I drew from in my answer:

Quote: J.F.


Your statement is fine. What you have to keep in mind, however, and what usually crosses people up, is that it's the confidence interval that's random, not the mean. The mean is a parameter. The confidence interval is random because it depends on the random sample that is drawn. Thus, it's not that the mean has a 95 percent chance of falling into the confidence interval -- it's that the confidence interval has a 95 percent chance of bracketing the mean. I realize the distinction is hard to see in this case, but it's really important in other cases. To a Bayesian, by contrast, the mean is a random variable, and you're letting data tell you what it can about the location and uncertainty of the mean.

Your last question is sort of a famous one in statistics. To the classical statistician (as artificial as this may sound) before the (fair) die is tossed it has a 1/2 chance of being odd. After it is tossed, it has a 100 percent chance of being whatever it is. I realize that sounds crazy, but it's the kind of example Bayesians have been using to make classical statisticians nervous for many years now. A fascinating book which explores a number of these questions is Richard Royall's Statistical Evidence: A Likelihood Paradigm, which I like a lot.



And another one, in response to this question I asked, "Suppose you sit down at a blackjack table with a freshly shuffled 52-card deck. You get a two nines and the dealer has an ace up, and the other card is face down. He offers insurance. What would the Bayesian say are the odds the face down card is a 10?"

Quote: J.F.


On pure probability problems like this example, the Bayesian and the classical statistician will have exactly the same answer, which I believe is 16/49. These aren't the problems that cause the answers to diverge. But card games like blackjack lie in a highly restricted universe of probability. These are events which are in theory replicable an infinite number of times and we can actually answer questions about the cases where there is a ten as a fraction of all infinite cases, which is the classical definition of probability and happens to align with the Bayesian's understanding of the deck. (A quick question: suppose I told you that Ricky Jay had dealt the cards. Do you still agree on 16/49? The Bayesian doesn't have to, though he might (at his great peril). And the classicist is required to imagine an infinite number of Ricky Jays dealing cards, which just seems stupid. But paradoxes can arise if he doesn't.

By the way, an example Royall uses throughout is the following. Someone turns over the top card of a deck. It's the Ace of Spades. Which is more likely: a normal deck or a deck consisting of 52 Aces of Spades? And by how much?

"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 5th, 2010 at 6:41:17 PM permalink
I just moved the discussion about college athletics to a thread titled Do colleges spend too much money on athletics?. It was largely my fault for getting off track, so please forgive me for any inconvenience.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 5th, 2010 at 8:57:32 PM permalink
Thanks, Wizard, but it was indeed my fault. I would say that I will try not to let it happen again, but I know that I would fail to comply. Now back to your regularly scheduled programming.
dwheatley
dwheatley
  • Threads: 25
  • Posts: 1246
Joined: Nov 16, 2009
November 6th, 2010 at 8:17:44 AM permalink
Math-content alert!

I've been idly thinking about our CI problem here. I think I have a better explanation for why we can't make a prediction with it:

We are trying to estimate a, the proportion of the population that support Candidate A. There was some debate as to whether it's a random variable or a unknown constant. There is a true value of a, that we could learn if we had the resources to poll everyone, so for now lets call it an unknown constant.

Remember, we are interested in the value of a given that we created a certain CI.

We can use conditional probability on CIs to try and calculate a. Let's see... Imagine we are in some multiverse, where a is actually a random variable alpha (that is, alpha has a mean and variance, whose value depends on which universe we are in). We conduct a poll. For a fixed sample size (say 1000 people), we will get a certain proportion favouring candidate A. This creates a certain CI, call it c. The probability of creating this particular CI in a universe where the true value of a is known is Pr( alpha = a ) * Pr ( poll creates CI c | alpha = a ).

In english, we can imagine we are in a certain universe where we know a, and then we create a poll. This creates the CI c with a certain probability. We can calculate the probability of creating that CI if we know a. Now, if we add up these probabilities across all possible values of a, we will get the total probability of creating our CI. This is important, because we are trying to find the probability we are in the universe where alpha = a, given that we created our CI. We could calculate this probability, if we knew how to calculate Pr( alpha = a ) .

There's the problem, if we pretend that a is the result of a random variable, we need to know it's distribution to calculate these probabilities. But alpha could be uniform, or normal around .5, or some wonky distribution with no name. For each of these potential distributions, the probability alpha takes on certain values of a is different, affecting the probability we construct a certain CI. This in turn affects our calculation of a, given we created a certain CI.

In a roundabout way, I've argued we can't predict where a will fall without knowing its distribution. Since it isn't a random variable, it doesn't actually have a distribution, so we're stuck.
Wisdom is the quality that keeps you out of situations where you would otherwise need it
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 6th, 2010 at 8:45:54 AM permalink
Quote: dwheatley

In a roundabout way, I've argued we can't predict where a will fall without knowing its distribution. Since it isn't a random variable, it doesn't actually have a distribution, so we're stuck.



I disagree. My college statistics book, which is now as worn as a missionary's bible, deals with exactly this kind of problem, where the true a is unknown. With small sample sizes you should use the T distribution instead of the normal, but otherwise set up the confidence interval as if the true mean were known. You do have to subtract one from the denominator in calculating the variance.

Maybe in my efforts to simplify the solution for the layman I've explained it badly, but I firmly believe there is a proper answer out there.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 6th, 2010 at 9:09:15 AM permalink
So here is a comment from the guy who has already admitted that he only took one statistics course in his whole life:

Without reviewing all of the posts, I think, Wizard, that you did the analysis properly except for the issue with the wording of your answer (which matilda objected to). With a small sample, yes, the T distribution would be the correct one, but I think that the "professional" poll that you were basing your analysis upon had a large enough sample size that the difference between the normal and T distributions (and the -1 in the denominator) are typically ignored.

The question arises, of course, as to why the final voting results did not conform to your analysis. I think the likely possible explanations are the following:

(1) the sample used by the polling organization was not sufficiently random to adequately represent the entire population of Nevada,
(2) the group that actually showed up at the polling booths on election day was not representative of the entire population of Nevada or was at least not properly represented by the sample used by the poll,
(3) people sampled by the poll did not accurately express their opinions to the pollsters, either deliberately or unknowingly,
(4) people changed their minds after the poll and before election day
(5) in the end, people didn't actually vote the way they felt, e.g., "I guess I'll vote my usual party even though I don't like the guy."
(6) the final vote represented an example of the 5% that fell outside of the confidence limits of your analysis (perhaps a 2.5% tail?)

Can anyone identify one of the "usual suspects" that I have overlooked? Which one(s) of these sounds the most guilty?
SOOPOO
SOOPOO
  • Threads: 123
  • Posts: 11604
Joined: Aug 8, 2010
November 6th, 2010 at 10:31:33 AM permalink
Add - weather- sometimes bad weather in a certain area (say democratic stronghold) will decrease the turnout there compared to the sunny weather in a republican stronghold, or vice-versa.
matilda
matilda
  • Threads: 3
  • Posts: 317
Joined: Feb 4, 2010
November 6th, 2010 at 11:46:48 AM permalink
Quote: Doc

Without reviewing all of the posts, I think, Wizard, that you did the analysis properly except for the issue with the wording of your answer (which matilda) objected to.



You are correct

Matilda
matilda
matilda
  • Threads: 3
  • Posts: 317
Joined: Feb 4, 2010
November 6th, 2010 at 12:42:02 PM permalink
Quote: Wizard

I disagree. My college statistics book, which is now as worn as a missionary's bible, deals with exactly this kind of problem, where the true a is unknown. With small sample sizes you should use the T distribution instead of the normal, but otherwise set up the confidence interval as if the true mean were known. You do have to subtract one from the denominator in calculating the variance.

Maybe in my efforts to simplify the solution for the layman I've explained it badly, but I firmly believe there is a proper answer out there.



Oh my---Here we go again. Please do not take offence--I do mean well.

With small samples, you may use the t distribution if and only if the population you are sampling from is normal or can reasonably be assumed to be normal. The sampling distribution called the t only exists theoretically if the population is normal. You subtract 1 from the sample size in calculating the variance in order to have an unbiased estimate of the population variance. Dividing by the sample size would return an estimate of the variance which has systemmatic bias. Bias means that the expected value of the estimated variance does not equal the population variance being estimated.

Now to the case at hand--the Angle-Reid poll confidence interval. The poll conducted is described by the multinomial distribution since three answers were allowed. Dorothy and the Wizard allocated the undecideds to Angle and Reed in order to apply the binomial distribution with only two possible answers and calculated the now famous CI we have discussed forever. In the analysis, the normal distribution was used to approximate the binomial distribution which was the correct thing to do since the sample size of 625 was a large sample.

If the sample size was not sufficiently large, the normal could not be used. This means also that you could not use the t distribution because the population has to be normal to apply the t and we know that the population is binomial and not approximatey normal with a small sample.


The upshot of this is that if the sample size is so small that you cannot reasonably approximate the binomial with the normal, you cannot create a CI with the t or any of the usual parametric methods but have to rely on techniques in non-parametric statistics such as Chebyshev's Theorem.


Matilda
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 6th, 2010 at 2:14:45 PM permalink
I fully admit I cut some corners in my answer, in the name of simplicity. In this case, I adhered to the KISS philosophy. I don't disagree with Matilda's comments. However, I think a perfectly done confidence interval would come out very close to mine. If anyone cares to dispute that, don't correct my math, show me specifically how YOU would do it. I'd like to see actual numbers. Otherwise, I propose we hammer a wooden stake in this thread.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 6th, 2010 at 2:43:45 PM permalink
Quote: SOOPOO

Add - weather- sometimes bad weather in a certain area (say democratic stronghold) will decrease the turnout there compared to the sunny weather in a republican stronghold, or vice-versa.

As a closing (as suggested by the Wizard) comment, I will just note that I view SOOPOO's weather factor as a specific case of my category #2: non-representative group actually showing up at the voting booths.
DorothyGale
DorothyGale
  • Threads: 40
  • Posts: 639
Joined: Nov 23, 2009
November 6th, 2010 at 3:09:43 PM permalink
Quote: Wizard

If anyone cares to dispute that, don't correct my math, show me specifically how YOU would do it. I'd like to see actual numbers.

Yep...

All the brown geese are gone from Coon's pond about a mile from here, fly'd south I suppose. The last time I surveyed, I only saw the more common gray geese, but who's to say... maybe someone did some clever goose shuffling ... after all, the number of brown geese is fixed, not variable, doesn't matter what my survey says ...

--Ms. D.
"Who would have thought a good little girl like you could destroy my beautiful wickedness!"
matilda
matilda
  • Threads: 3
  • Posts: 317
Joined: Feb 4, 2010
November 6th, 2010 at 3:42:11 PM permalink
Quote: Wizard

I don't disagree with Matilda's comments. However, I think a perfectly done confidence interval would come out very close to mine.



I think we all agree with that statement. In fact, it could be exactly the same as yours.

Quote: Wizard

I propose we hammer a wooden stake in this thread.



I agree with this also.

Bye-bye thread.

Matilda
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 6th, 2010 at 4:12:02 PM permalink
Doc, I liked your "six likely possible explanations" post. So much, I couldn't think of any good cross talk. It really is a valid topic of why the LVRJ poll was so off. The Sun, I think, should investigate. My best guess is it wasn't a representative sample, but that is getting off topic. For purposes of the thread, I assumed it was a good sample. I just might write a letter to the RJ about it.

Update, I just did. Here is what I wrote:

"Forgive me if I missed it, but I think a word of explanation is owed to the readers of the LVRJ of why their Mason-Dixon poll of the Senate race, published on Oct. 29, was so far off. As a reminder, the poll showed Angle leading Reid by 4 points, and a margin of error of 4 percent. In the end, Reid won by 6 points. According to my math, the probability of such a reversal is 1 in 150. Of course, it could have just been chance. However, I submit for your consideration that it was more likely a biased poll, capturing more likely Angle voters. Perhaps it was no coincidence that the RJ also endorsed Angle? Something smells fishy to me."


Quote: DorothyGale

All the brown geese are gone from Coon's pond about a mile from here, fly'd south I suppose. The last time I surveyed, I only saw the more common gray geese, but who's to say... maybe someone did some clever goose shuffling ... after all, the number of brown geese is fixed, not variable, doesn't matter what my survey says ...



I think Miss Gulch shot them.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 6th, 2010 at 4:58:59 PM permalink
Quote: Wizard

Doc, I liked your "six likely possible explanations" post. So much, I couldn't think of any good cross talk.

Thanks.
Quote: Wizard

It really is a valid topic of why the LVRJ poll was so off. ... I just might write a letter to the RJ about it.

Update, I just did.

There is an old saying that suggests you shouldn't get into an argument with someone who buys their ink by the barrel. How does that advice work these days in the case of arguments of print media vs. internet?

Quote: Wizard

Quote: DorothyGale

All the brown geese are gone from Coon's pond about a mile from here, fly'd south I suppose. The last time I surveyed, I only saw the more common gray geese, but who's to say... maybe someone did some clever goose shuffling ... after all, the number of brown geese is fixed, not variable, doesn't matter what my survey says ...


I think Miss Gulch shot them.

Aren't geese common dietary items for flying monkeys?
rxwine
rxwine
  • Threads: 220
  • Posts: 12788
Joined: Feb 28, 2010
November 6th, 2010 at 9:17:44 PM permalink
Quote: Doc

(1) the sample used by the polling organization was not sufficiently random to adequately represent the entire population of Nevada,
(2) the group that actually showed up at the polling booths on election day was not representative of the entire population of Nevada or was at least not properly represented by the sample used by the poll,
(3) people sampled by the poll did not accurately express their opinions to the pollsters, either deliberately or unknowingly,
(4) people changed their minds after the poll and before election day
(5) in the end, people didn't actually vote the way they felt, e.g., "I guess I'll vote my usual party even though I don't like the guy."
(6) the final vote represented an example of the 5% that fell outside of the confidence limits of your analysis (perhaps a 2.5% tail?)

Can anyone identify one of the "usual suspects" that I have overlooked? Which one(s) of these sounds the most guilty?



I don't think this was mentioned in the thread, but a rep from Mason-Dixon responded that the reason their polling was off as it did not account for Harry Reid's ground-game effort. So, maybe that would fit #4 still?

The news guy asked if it was because they only contacted people with landlines, but the rep said studies have never shown any significant effect from that limitation.

(Don't know a thing about polling, but I would be hiring a different company next time around)
Sanitized for Your Protection
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 7th, 2010 at 5:17:14 AM permalink
Quote: rxwine

... their polling was off as it did not account for Harry Reid's ground-game effort. So, maybe that would fit #4 still?

... Don't know a thing about polling, but ...

Just to show even further how little I know about this stuff, I don't even know what the expression "ground-game effort" refers to. Is that some kind of football analogy or maybe something akin to "grass roots" movements?
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 7th, 2010 at 5:23:07 AM permalink
Quote: Doc

Just to show even further how little I know about this stuff, I don't even know what the expression "ground-game effort" refers to. Is that some kind of football analogy or maybe something akin to "grass roots" movements?



I think that means he had volunteers, or paid staff, to knock on the doors of democrats to remind them to vote. Perhaps offering rides. Stuff like that. However, Angle had plenty of money behind her too. That excuse seems dubious to me. I'd have more respect for them if they admitted it was a bad poll. It cost me $1,500, as laid 3 to 1 on Angle after that article.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 7th, 2010 at 6:41:56 AM permalink
Quote: Wizard

... It cost me $1,500, as laid 3 to 1 on Angle after that article.

Ouch! Sounds a bit as if the Variance Gods may have been jerking your chain a little in consideration of your wins laying such high odds on the Oscars. Don't you know that in the long run they have to "get back even" with you?
rxwine
rxwine
  • Threads: 220
  • Posts: 12788
Joined: Feb 28, 2010
November 7th, 2010 at 10:56:04 AM permalink
It was referring to organizational efforts on getting out the vote for Harry Reid. But since no details were specified in the statement, I can't really tell you more than that about what was meant.
Sanitized for Your Protection
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 7th, 2010 at 1:30:43 PM permalink
Quote: Doc

Ouch! Sounds a bit as if the Variance Gods may have been jerking your chain a little in consideration of your wins laying such high odds on the Oscars. Don't you know that in the long run they have to "get back even" with you?



I can take losing a bet, especially due to chance, but I think I lost this one due to a dubious poll. That is hard to take.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 7th, 2010 at 2:07:35 PM permalink
Quote: Wizard

I can take losing a bet, especially due to chance, but I think I lost this one due to a dubious poll. That is hard to take.

It's hell when your tout is just trying to scam you.
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 7th, 2010 at 4:28:49 PM permalink
Quote: Doc

It's hell when your tout is just trying to scam you.



You're darn tootin'!
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
mkl654321
mkl654321
  • Threads: 65
  • Posts: 3412
Joined: Aug 8, 2010
November 7th, 2010 at 4:41:40 PM permalink
Quote: Wizard

I think that means he had volunteers, or paid staff, to knock on the doors of democrats to remind them to vote. Perhaps offering rides. Stuff like that. However, Angle had plenty of money behind her too. That excuse seems dubious to me. I'd have more respect for them if they admitted it was a bad poll. It cost me $1,500, as laid 3 to 1 on Angle after that article.



As I may or may not have posted, I would have laid 3 to 1 (or more) on REID, poll or no poll. It was just so obvious that Reid, if necessary, would marshal HUGE resources, from outside as well as within the state, to make sure he bought enough votes to win, because his defeat would have been something that the Democrats could not countenance---not as far as Nevada is concerned, because that state has to be a lost cause for the Democrats anyway--but as far as national Demo party prestige went. Reid was "too big to fail", and the Obama campaign showed that despite the depiction of Republicans all being fat, greedy millionaires, it is the Democrats who can really scare up the swag when they need to.

I'm still unclear as to why the Republicans would figure out the only way to lose the race, by nominating a back-to-the-1950's reactionary twit who had a penchant for saying things that sounded like they had come out of the mouth of a sixth grader. Maybe they figured that Reid was going down pretty much no matter what, so they put on the podium a neophyte who if elected would have been, out of myriad obligations and soul-selling during the campaign, attached firmly by strings to the puppet masters.

We had a similar situation here in Oregon, with most polls saying Dudley would knock over Kitzhaber. But Kitzhaber marshaled a last-minute charge, fueled by bank bags full of hundred-dollar bills. He bought THREE HOURS of prime TV time on every major network. I watched his commercials (hard to avoid), and I, too, concluded that he was not a mere mortal, but a reincarnated Olympian. I am so glad that we elected him and saved Civilization As We Know It.
The fact that a believer is happier than a skeptic is no more to the point than the fact that a drunken man is happier than a sober one. The happiness of credulity is a cheap and dangerous quality.---George Bernard Shaw
rxwine
rxwine
  • Threads: 220
  • Posts: 12788
Joined: Feb 28, 2010
November 7th, 2010 at 5:38:29 PM permalink
I don't think, what would be called the mainstream of the Republican Party, really wanted Angle. But after she got the nomination, they threw their lot in with her to try to take out Reid. It was basically the only bet they had at that point.
Sanitized for Your Protection
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 7th, 2010 at 6:20:15 PM permalink
The Republicans, especially the Tea Party, wanted that seat badly too. In the end I think it came down to enough moderate Republicans voting for Reid, because Angle was too far to the right. In the last week, Reid ran a "Republicans for Reid" commercial, which I thought was very effective. As a local, I can say they both bought tons of television commercials, and they seemed to be split 50/50.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
BioProf
BioProf
  • Threads: 1
  • Posts: 4
Joined: Nov 14, 2010
November 14th, 2010 at 12:11:04 PM permalink
Just saw this thread on Wizard of Odds. For those of you interested in election polls and predictions, there is no better analyst than Nate Silver at fivethirtyeight.com. For all election/political geeks, fivethirtyeight.com is the gold standard. Nate is to political polling what Michael "The Wiz" is to gambling and all things Vegas.
ItsCalledSoccer
ItsCalledSoccer
  • Threads: 42
  • Posts: 735
Joined: Aug 30, 2010
November 14th, 2010 at 5:37:13 PM permalink
Quote: BioProf

Just saw this thread on Wizard of Odds. For those of you interested in election polls and predictions, there is no better analyst than Nate Silver at fivethirtyeight.com. For all election/political geeks, fivethirtyeight.com is the gold standard. Nate is to political polling what Michael "The Wiz" is to gambling and all things Vegas.



Maybe not so much. Silver is a pretty committed liberal, having blogged for Daily Kos and the like before he started 538, and while he didn't missed that many calls in 2008, a) 2008 wasn't a difficult year for anyone to call, and b) all of his mistakes ... ALL of them ... were of one type: called for Democrats but won by Republicans. He never really went that much into US 2010 elections after laying a pretty big egg in predicting the UK 2010 elections last spring. Interestingly, the NYT essentially bought his blog after that debacle ... and we all know how unbiased the NYT is.

So, he's pretty much the gold standard, but only if "gold standard" = "predicts big wins for the left, and is admired for making those predictions even though he may or may not be right." I always thought "gold standard" = "pretty damn accurate," but I could be wrong about that. Predictions can be tested, and that's where Silver has fallen short. Oh, those pesky results ...

Looking at the larger body of his work, he's also done some things in baseball and soccer, none of which were particularly successful in terms of accuracy of results. The career development predictor for MLB thing was used by some publications but is not in wide practical use, and was constantly under revision until BP bought it and changed much of its core. And IIRC, he also wrote the soccer power index algorithm for ESPN, which didn't do as well as FIFA's rankings or Paul the Octopus (that is, a coin flip) in predicting World Cup performance.

So, yeah, he is pretty much the Flavor of the Month as far as statisticians go. I doubt he would be as well-received in the media if he had blogged for, oh, Weekly Standard rather than Daily Kos.
Jufo81
Jufo81
  • Threads: 6
  • Posts: 344
Joined: May 23, 2010
November 17th, 2010 at 10:00:13 AM permalink
As for the differences between Confidence intervals (frequentist approach) and Credible intervals (Bayesian approach) the following short summary explains it pretty well:

http://www.statisticalengineering.com/frequentists_and_bayesians.htm

So it seems that Wizard's approach related more to the Bayesian approach and Credible interval than Confidence interval.

-----

I noticed there was also a discussion about picking marbles from a bag and making predictions about the ratio of certain color marbles. I would like to present the following question which highlights the different results between frequentist and Bayesian probability theory.

"Suppose there is a bag with a large number of marbles. The marbles are either red or white and the ratio of red marbles is some unknown value between [0,1]. Ten picks with replacement are made from the bag and every time the marble is red. What is the estimated ratio of red marbles a) with frequentist reasoning b) with Bayesian reasoning?"
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 18th, 2010 at 10:16:53 AM permalink
Thanks for the link and the problem. Based on the explanation in the link, where do I go to sign up for the Bayesian side?

I still don't understand the difference well enough to answer your question. A similar question was asked a while back in the thread, and I remarked that the standard deviation is zero, so you can't make a confidence interval. I stand by that. If forced to make a comment on the the question, I would say that the maximum likelihood estimator of the percentage of red is 100%. However, that seems obvious. Let's call the actual ratio of reds r. I'd like to know pr(r>=0.95), but to be honest with you, I don't know the answer.

With that, I beg you for the answers.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
November 18th, 2010 at 11:02:29 AM permalink
Quote: Wizard

... With that, I beg you for the answers.

I concur. This is quite similar to the experiment that I conducted with my class, except that in our case the samples contained both black and white marbles, and we had (falsely) concluded that together they represented 100% of the population. In class, I just raised the question that seemed to be a paradox when we later discovered a red marble, and I would like to know the "real" answer myself.
Jufo81
Jufo81
  • Threads: 6
  • Posts: 344
Joined: May 23, 2010
November 20th, 2010 at 5:06:14 AM permalink
Quote: Wizard

Thanks for the link and the problem. Based on the explanation in the link, where do I go to sign up for the Bayesian side?

I still don't understand the difference well enough to answer your question. A similar question was asked a while back in the thread, and I remarked that the standard deviation is zero, so you can't make a confidence interval. I stand by that. If forced to make a comment on the the question, I would say that the maximum likelihood estimator of the percentage of red is 100%. However, that seems obvious. Let's call the actual ratio of reds r. I'd like to know pr(r>=0.95), but to be honest with you, I don't know the answer.

With that, I beg you for the answers.



No need to beg :) The funny thing is that I had a very long discussion/debate about this exact same topic last summer on another forum so I thought that I could write here what I learned from that discussion, and there might be some knowledgeable posters here who might have something to add that I didn't know yet. I also have to mention that I am not exactly an expert in this topic so don't take everything I write granted.

As for the question about estimated ratio r of rad marbles, like you suggested, frequentist would apply maximum likelihood method to estimate the ratio of red marbles. If ten picks were made and all were red then the probability to observe this would be:

p(r) = r^10

and trivially r = 100% maximizes this probability and is the maximum likelihood estimation for the ratio.

However, Bayesian would not agree with this result. If we assume that before any picks are made, each ratio r is equally likely (in other words it is uniformly distributed between [0,1]) then the Bayesian estimator for the ratio of red marbles is actually 11/12. More generally, in k successes within n trials, the estimator is:

k + 1
-------
n + 2

For reference and derivation of this result, see page: http://en.wikipedia.org/wiki/Rule_of_succession

In the next post I will address the construction of confidence interval and credible interval for the problem.
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 20th, 2010 at 5:48:12 AM permalink
Thanks, but I guess I'll have to wait until you next post. This one leaves me with a burning desire to turn to the next page.

As you said, it seems obvious that if forced to an answer, the best guess is that all the marbles are red. You never stated any information about the probability distribution of reds, so that Bayesian analysis doesn't hold water for me. I can see how the Bayesian would need some kind of assumption to sink his teeth into to construct an answer. However, if you don't have one, I think it isn't kosher to just make one up. It would seem to me the Bayesian should just throw his hands up in frustration.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Jufo81
Jufo81
  • Threads: 6
  • Posts: 344
Joined: May 23, 2010
November 20th, 2010 at 6:21:15 AM permalink
As for constructing confidence & credible intervals for the ratio r based on observing 10 red marbles in 10 trials, I first present another but very similar example where the construction of confidence interval was attempted:

See
http://stattrek.com/Lesson4/ProportionSmall.aspx?Tutorial=Stat

Carefully read through Example 1 in that page (middle of the page). In that example 2 out of 5 picks were red, and they got a result that there is a 83.52% confidence interval that the real ratio is between 20% and 60%. Do you agree with this example? I don't. The problem I see is that they assume the ratio to be 40% and then calculate the confidence interval based on that. So they are saying: "If we fix the ratio of red marbles to 40% then there is 83.52% chance that the ratio is between 20% and 60%" and to me this makes no sense. So I think constructing a confidence interval this way is wrong. I may be wrong with this and if so please somebody correct me.

I am not sure if there even exists a correct way of constructing a confidence interval based on the information given. However, we can always construct Bayesian credible interval. I calculated the Bayesian solution to the above StatTrek problem (calculation: http://www.beatingbonuses.com/forums/showpost.php?p=69111&postcount=80) and I got a probability 76.42% that the ratio is between 20% and 60%, so it seems that the probability was overestimated in the StatTrek example.

-----

Back to the original problem of 10 reds out of 10 picks. To construct the Bayesian credible interval we need to obtain the probability density for the ratio including the observations. If we assume that initially we know absolutely nothing about the process how the marbles are selected into the bag, then it is safe to assume that before any picks are made, r has uniform distribution across r = [0,1], ie. each ratio is equally likely. If we somehow knew that some ratios were initially more likely than others (for example the ratio r is normally distributed with peak at r =0.5), then the results would obviously be different.

With uniform "a priori" distribution, the "a posteriori" probability density follows Beta-distribution (reference: http://en.wikipedia.org/wiki/Beta_distribution) with parameters alpha = 11 and beta = 1. So the probability density simply becomes:

p(r) = 11*r^10

Now the probability that ratio r lies between A and B is simply obtained by integrating this density from A to B. Wizard asked what is the probability that r > 0.95 so by integrating from A = 0.95 to B = 1 we get

P(r > 0.95) = 43.12%

So we estimate a 43.12% chance for the ratio being at least 95%.

The expected value of the ratio of red marbles is the center of mass of the above probability density, so

E(r) = ∫r*11*r^10 dr, from = 0, to = 1

The result is E(r) = 11/12 which is the same result I mentioned before.

We can summarize that: The (frequentist) maximum likelihood estimator is the mode (maximum value) of the corresponding Bayesian probability density while the Bayesian estimator is the center of mass of the probability density, like shown in the image below (the image shows probability density for the case: 1 red marble in 1 picks).

Jufo81
Jufo81
  • Threads: 6
  • Posts: 344
Joined: May 23, 2010
November 20th, 2010 at 6:54:22 AM permalink
Quote: Wizard


As you said, it seems obvious that if forced to an answer, the best guess is that all the marbles are red. You never stated any information about the probability distribution of reds, so that Bayesian analysis doesn't hold water for me. I can see how the Bayesian would need some kind of assumption to sink his teeth into to construct an answer. However, if you don't have one, I think it isn't kosher to just make one up. It would seem to me the Bayesian should just throw his hands up in frustration.



This reminds me about a case of a wannabe poker player who was confident that in the long run he plays poker profitably. However he had lost his whole bankroll 10 times in a row without ever making profit. His father said to him: isn't there enough evidence that you are not winning player if you lost 10 out of 10 times already? The player said: "But Bayes' theorem (rule of succession) says that there is still a 1/12 chance that I am actually winning player but it just didn't materialize yet!"

Yes, it can seem that Bayesians pull out probabilities from thin air. The question is whether you accept the initial uniform probability distribution for the ratio. In other words, do you accept that if we know nothing about the probability of something, then we can assign an equal probability to every possible outcome? This is the part that some may not agree with. But if it really were so that Bayesians should just throw their hands up in frustration, I don't think the theory would have such popularity and success within applications. So, while controversial, the theory does seem to work.
Wizard
Administrator
Wizard
  • Threads: 1521
  • Posts: 27185
Joined: Oct 14, 2009
November 20th, 2010 at 9:19:48 AM permalink
I'm not saying you're wrong, but this leaves me unsatisfied. The answer either ways comes down to assumptions pulled out of thin air. However, I grant you that in this case there was no alternative. When I worked as a Social Security actuary we often had to just make educated guesses. For example, if the law were changed somehow, what would be the effect on behavior. I suppose we can file this under any answer is better than no answer.

Nice graph, by the way.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
  • Jump to: