The Las' rel='nofollow' target='_blank'>http://www.lvrj.com/news/angle-poll-data-improve-106287803.html]Las Vegas Review Journal just published the latest poll on the race. Here is what they have:
Reid: 45%
Angle: 49%
Margin of error: 4%
Sample size: 625
The questions I would like to bring up are, how is the "margin of error" determined, and what are each candidates odds of winning?
If we multiply the percentage of the vote based on the sample size of 625 we get:
Reid: 281.25
Angle: 306.25
Neither: 37.5
Let's get rid of those pesky indecisive voters and round the others. That would leave us with:
Reid: 281
Angle: 306
Total: 587
In terms of percentages:
Reid: 47.87%
Angle: 52.13%
To determine the probability of Reid winning, I think the question that should be asked is, what is the probability that Angle would get 306 or more in the survey if more than half of actual voters will favor Reid?
I show the variance of the mean in this pole is (.4787)*(.5213)/(587-1) = 0.000426. The standard deviation is the square root of that, or 0.020636.
The LVRJ said the margin of error was 4%. So, is that the standard deviation of the difference between Reid's and Angle's percentages? In other words, two times the variance of the mean (2*2.06%)?
Regardless of how the margin of error is defined and calculated, we see that Angle is 2.13% over 50%. That is 0.0213/.020636 = 1.031917 standard deviations. The probability of Angle falling to the left of the 1.031917 point on the bell curve can be found in any Gaussian curve statistics table, or put normsdist(1.031917) in Excel, and you get 0.848945.
So, if my math is right, and I'm far from sure it is, I say that Angle has an 85% chance of winning. Based on the LVRJ poll only, is my math correct?
When these polls say "4% margin of error," I always take that to be a 95% confidence interval. Knowing the confidence interval is the key to this, the rest of the calculation is easy from there. Without knowing the confidence interval that the 4% represents, I'm not sure what to do with these results.Quote: WizardSo, if my math is right, and I'm far from sure it is, I say that Angle has an 85% chance of winning. Based on the LVRJ poll only, is my math correct?
From Wikipedia:
Quote:Like confidence intervals, the margin of error can be defined for any desired confidence level, but usually a level of 90%, 95% or 99% is chosen (typically 95%).
--Ms. D.
Quote: DorothyGaleWhen these polls say "4% margin of error," I always take that to be a 95% confidence interval. Knowing the confidence interval is the key to this, the rest of the calculation is easy from there. Without knowing the confidence interval that the 4% represents, I'm not sure what to do with these results.
Thanks. So if one standard deviation is 2.06%, then there is a 95% chance that Angle's actual percentage in the election will fall within 2.06%*1.96 = 4.04% of that. So on election day, there is a 95% chance her actual share will be within 48.08% and 56.17%, or 52.13% +/- 1.96*2.06%. For other readers who may be wondering where the 1.96 comes from, there is a 95% chance of falling within 1.96 standard deviations of expectations in any random sampling.
It would be nice of the papers said "The 95% margin of error is 4%," rather than just "The margin of error is 4%." How are we supposed to know they are referring to a 95% confidence interval? Why not 90%, 98%, 99%, or something else?
http://elections.nytimes.com/2010/forecasts/senate/nevada
Quote: JerryLoganIsn't the LVRJ involved in some kind of legal issue with LVA for the unauthorized posting of one of their articles on a site without written permission?
Yes. Please visit the copyrighted material thread.
Also, everybody, please don't quote entire articles in this forum, especially from the LVRJ. Just small quotes, and properly attribute them.
Quote: crazyiamFivethirtyeight might be the best place for election predictions. It uses poll averaging and weighting metrics to come up with predictions. I believe the methodology uses more undecided people to increase the variance of possible results.
That is why I described those other 38 people in the poll is "pesky." It would be one thing if they were wasting their votes on a third party candidate. However, it does add more variance if they are still undecided. I wish the LVRJ would have made that clear.
Nice to see my election odds are close to those of the New York Times (77.2% Angle, 22.8% Reid). I would not expect them to match exactly, since they used a different survey.
Quote: WizardThanks. So if one standard deviation is 2.06%, then there is a 95% chance that Angle's actual percentage in the election will fall within 2.06%*1.96 = 4.04% of that. So on election day, there is a 95% chance her actual share will be within 48.08% and 56.17%, or 52.13% +/- 1.96*2.06%. For other readers who may be wondering where the 1.96 comes from, there is a 95% chance of falling within 1.96 standard deviations of expectations in any random sampling.
It would be nice of the papers said "The 95% margin of error is 4%," rather than just "The margin of error is 4%." How are we supposed to know they are referring to a 95% confidence interval? Why not 90%, 98%, 99%, or something else?
Your interpretation of the probability of a confidence interval is incorrect. The reason is that the population parameter that the sample is used to estimate is a constant. It is the sample estimate that varies according to a distribution such as the normal. Thus the correct statement is that if a large number of samples were taken, then 95% of the intervals constructed would contain the parameter being estimated. As for the probability of the parameter being contained in a single interval, it is zero or 1 depending on whether it is in the interval or not.
Quote: matildaYour interpretation of the probability of a confidence interval is incorrect. The reason is that the population parameter that the sample is used to estimate is a constant. It is the sample estimate that varies according to a distribution such as the normal. Thus the correct statement is that if a large number of samples were taken, then 95% of the intervals constructed would contain the parameter being estimated. As for the probability of the parameter being contained in a single interval, it is zero or 1 depending on whether it is in the interval or not.
I'm not following you. What is the 4% margin of error telling us in this poll?
Here are the crosstabs for this poll.
Quote: rdw4potus
Thanks. Here is what that link says, in part:
Quote: LVRJThe margin for error, according to standards customarily used by statisticians, is no more than ±4 percentage points. This means that there is a 95 percent probability that the "true" figure would fall within that range if the entire population were sampled. The margin for error is higher for any subgroup, such as a gender or regional grouping.
That is what I was trying to say in response to Dorothy's post, which Matilda has said is incorrect.
Quote: WizardSo if one standard deviation is 2.06%, then there is a 95% chance that Angle's actual percentage in the election will fall within 2.06%*1.96 = 4.04% of that. So on election day, there is a 95% chance her actual share will be within 48.08% and 56.17%, or 52.13% +/- 1.96*2.06%.
Reid: 45%
Angle: 49%
Margin of error: 4%
Sample size: 625
Assuming the undecided vote broke evenly (no reason to assume this, in reality she should get slightly more of the undecided because she is slightly ahead in the poll), then Angle is at 52%. If the standard deviation is 2.06%, then a result of 50% or higher corresponds to the rhs of z = -0.97, or about an 83.4% chance that Angle will win.
I'm not happy with giving Angle a true mean of 52% (assuming the undecideds break 50/50). So, assuming that the undecideds break along the same percent as the ratios in the poll, then you need to sum an infinite series to get Angle's true mean final result.
Angle's true mean is:
49% + 49%*6% + 49%*(6%)^2 + 49%*(6%)^3 + ... = 52.2%.
In this case, a result of 50% or higher corresponds to the rhs of z = -1.07, or about a 85.8% Angle will win.
In your original post you said:
I'm not so sure about your math, but I sure like your conclusion.Quote:So, if my math is right, and I'm far from sure it is, I say that Angle has an 85% chance of winning. Based on the LVRJ poll only, is my math correct?
--Ms. D.
Quote: WizardI'm not following you. What is the 4% margin of error telling us in this poll?
It is saying the interval is 8% wide. If it in fact is a 95% confidence interval, then it is telling us that if a large number of samples were taken and the sample proportion was calculated for each and such an interval was calculated for each sample, then 95% of the intervals would contain the true, but unknown population proportion. Or put another way, if 1,000,000 polls were taken at the same time, then approximately 95% of the polls would be correct because the interval would contain the true proportion of voters. 5% of the polls would be wrong because the true proportion would fall outside of the interval.
Quote: matildaIt is saying the interval is 8% wide. If it in fact is a 95% confidence interval, then it is telling us that if a large number of samples were taken and the sample proportion was calculated for each and such an interval was calculated for each sample, then 95% of the intervals would contain the true, but unknown population proportion. Or put another way, if 1,000,000 polls were taken at the same time, then approximately 95% of the polls would be correct because the interval would contain the true proportion of voters. 5% of the polls would be wrong because the true proportion would fall outside of the interval.
First, what do you mean by "wide"? The true Gaussian curve should be infinitely wide.
Second, I don't see how what you said is different from what I wrote. I'm claiming that the true proportion of Angle voters will fall between 48.08% and 56.17% with a 95% chance. Where do you put the range?
Quote: matildaThe LVRJ quote is also incorrect for the same reason.
At least I have company. Care to quote any source that takes your side?
That would confuse people even more. The immediate reaction would be, "What about the last 1%?"Quote: WizardIt would be nice of the papers said "The 95% margin of error is 4%,"
I never knew how they calculate the margin of error, or what it really means. I never before heard the phrase 'confidence interval'.
Does a 4% margin mean that they think that, based upon the demographic analysis, that their poll sample is only a 96% accurate representation of the population?
Or does the 4% margin mean that they think 4% of the people polled might change their mind?
Or does the 4% margin mean that they think 4% of the people polled are wise-asses who intentionally gave the wrong answer?
Quote: DJTeddyBearThat would confuse people even more. The immediate reaction would be, "What about the last 1%?"
I never knew how they calculate the margin of error, or what it really means. I never before heard the phrase 'confidence interval'.
Does a 4% margin mean that they think that, based upon the demographic analysis, that their poll sample is only a 96% accurate representation of the population?
Or does the 4% margin mean that they think 4% of the people polled might change their mind?
Or does the 4% margin mean that they think 4% of the people polled are wise-asses who intentionally gave the wrong answer?
Matilda will give a different answer, but I claim it means that the actual results will be within the stated margin of error of the poll results 95% of the time. For example, the LVRJ said Angle got 49% in their poll, and Reid 45%. So I claim that on election day Angle will get 45% to 53% with 95% chance. This factors in third party candidates and "none of the above," which I filtered out in my previous analysis. I'm still not sure what matilda would say, if put in layman's terms.
What I think the papers should do is just come right out and say what the probability of each candidate winning is. That is the important thing.
Just to butt my nose in where it wasn't invited, I think the difference in interpretations may go something like this, taking it away from political polls.Quote: Wizard... Second, I don't see how what you said is different from what I wrote.
Suppose our experiment were to flip a coin 100 times and determine how often it came up heads. Suppose we were to repeat that experiment a number of times. We can study that topic and learn what the true mean is and what the variance is. We can determine these numbers without ever actually flipping a coin at all. This is a population distribution.
Knowing the population mean and standard deviation, we can use this info to predict with a degree of confidence the result of an individual experiment, which would be a sample of size 1 from the population. We can also predict the variability that we would encounter in this sample mean if we conducted the experiment a number of times.
On the other hand, suppose we don't know the population distribution (as we don't know the true opinions of potential voters). If we take a sample, perhaps a fairly large one, we can use it's mean to estimate the mean of the population. If we took lots of samples, we could make lots of estimates of the mean of the population, each with some degree of confidence.
I think that matilda is pointing out that it is a different thing to use one or many samples to estimate a population mean than it is to use the true population mean and standard deviation to estimate what will be found in a sample.
But I am very rusty at statistics and never really knew it all that well in the first place.
Quote: WizardMatilda will give a different answer, but I claim it means that the actual results will be within the stated margin of error of the poll results 95% of the time. For example, the LVRJ said Angle got 49% in their poll, and Reid 45%. So I claim that on election day Angle will get 45% to 53% with 95% chance. This factors in third party candidates and "none of the above," which I filtered out in my previous analysis. I'm still not sure what matilda would say, if put in layman's terms.
What I think the papers should do is just come right out and say what the probability of each candidate winning is. That is the important thing.
I think any such rigorous calculations are GIGO, because poll results are respresentative samples consisting of, not voters, but those who choose to answer/are solicited to answer pre-election polls. I doubt very much that the two sets--"voters" and "poll respondents"--are mutually congruent enough to make polls any more than a highly inaccurate barometer of the eventual results.
False. Your interval does not include undecided voters. The actual election day interval is that with probability 0.95, the election day final total for Angle will be between 48.2% to 56.2% (per my computation of a 52.2% mean for Angle, above).Quote: WizardSo I claim that on election day Angle will get 45% to 53% with 95% chance.
Any final predictions must include all undecided voters. A reasonable way to cast their votes is to say they will break consistent with the polls.
This assumes that the poll question was not "If you knew that Reid murdered babies, would you vote for him, or would you vote for his opponent, Angle, who has never murdered a baby?"
--Dorothy
Quote: WizardMatilda will give a different answer, but I claim it means that the actual results will be within the stated margin of error of the poll results 95% of the time. For example, the LVRJ said Angle got 49% in their poll, and Reid 45%. So I claim that on election day Angle will get 45% to 53% with 95% chance. This factors in third party candidates and "none of the above," which I filtered out in my previous analysis. I'm still not sure what matilda would say, if put in layman's terms.
What I think the papers should do is just come right out and say what the probability of each candidate winning is. That is the important thing.
If you have time to kill, check out fivethirtyeight.com. There is nobody in the industry better at what you're looking at doing than Nate Silver. I think you would especially like the regression-based approach that Silver uses.
I can tell you with a high degree of confidence that the paper will never be willing to post the win probability based on a single poll. They'd be taking a HUGE flyer that the demographic weightings in the poll were correct. They'd also be staking their name to Mason-Dixon's work. Some pollsters in recent years (Strategic Vision, Research 2000) have come into significant legal troubles for basically fabricating polling results. NOTE: The R2000 case is ongoing. I am NOT making a statement about whether or not they actually did fabricate polls, I'm simply stating that LVJR would take significant risk by staking their reputation to the win % suggested by a single poll. *edit*: I also do not mean to imply that Mason-Dixon is anything less than an accurate and reputable institution.
Nate has a decidedly Libertarian/Right lean to his presentation. He's smart, but he's also personally biased. This bias shows in both his results, by means of how he weights the different polls, and his analysis of those results. He's not Fox biased, but he's out there.Quote: rdw4potusThere is nobody in the industry better at what you're looking at doing than Nate Silver.
But if you want to see REAL bias, just read a few copies of the LVRJ. The fact that the poll is presented in the LVRJ is almost enough to discredit it, prima facie.
--Dorothy
Quote: DorothyGaleFalse. Your interval does not include undecided voters. The actual election day interval is that with probability 0.95, the election day final total for Angle will be between 48.2% to 56.2% (per my computation of a 52.2% mean for Angle, above).
Any final predictions must include all undecided voters. A reasonable way to cast their votes is to say they will break consistent with the polls.
--Dorothy
Are you assuming that 100% of the "undecided" "likely" voters will show up to the polls? I'd caution against that. Anyone who says they're likely to vote but doesn't know who they're voting for 4 days before the election is at risk for not voting, leaving this question blank, or voting for "none of these." Speaking of which...is it too late for me to start campaigning for "none of these?" I think it'd have a real chance in this election...
I think, though I'm not sure, that the history also favors the incumbent with respect to undecided voters this late in the cycle. Of course, this election could easily break with that trend. But I would be willing to wager that more of the "undecideds" at this point are trying to justify voting against Reid instead of trying to justify voting for Angle.
Quote: mkl654321I think any such rigorous calculations are GIGO, because poll results are respresentative samples consisting of, not voters, but those who choose to answer/are solicited to answer pre-election polls. I doubt very much that the two sets--"voters" and "poll respondents"--are mutually congruent enough to make polls any more than a highly inaccurate barometer of the eventual results.
I think a good poll will factor in such biases. rdw4potus already made a good post on that issue. If you don't think the LVRJ poll is accurate, I'll give you 3 to 1 on Reid.
Quote: DorothyGaleNate has a decidedly Libertarian/Right lean to his presentation. He's smart, but he's also personally biased. This bias shows in both his results, by means of how he weights the different polls, and his analysis of those results. He's not Fox biased, but he's out there.
But if you want to see REAL bias, just read a few copies of the LVRJ. The fact that the poll is presented in the LVRJ is almost enough to discredit it, prima facie.
--Dorothy
It's funny you say that about Nate. His personal politics are pretty far left, and he tries to take that out of his analysis. Based on your comments, it sounds like he's over-correcting.
No, I'm saying the final results Reid+Angle = 100%, and Mr. W overlooked this when he gave his interval. It doesn't matter who shows up -- if no undecided voters showed up, then the final result would be:Quote: rdw4potusAre you assuming that 100% of the "undecided" "likely" voters will show up to the polls?
Reid = 45/(45+49) = 47.87%.
Angle = 49/(45+49) = 52.13%.
So, the Angle interval is centered around 52%, not around 49%.
--Dorothy
I'll take 9-to-2 -- in units of "honor," of course. By my records, you are currently up 3 units of honor on me.Quote: WizardI'll give you 3 to 1 on Reid.
--Dorothy
Quote: WizardI think a good poll will factor in such biases. rdw4potus already made a good post on that issue. If you don't think the LVRJ poll is accurate, I'll give you 3 to 1 on Reid.
I will take your action at 3:1. Payable in your choice of honor units or beverages (to be claimed at WOVcon I).
I would also book bets on other close races. I'll move that to another thread.
Quote: WizardI think a good poll will factor in such biases. rdw4potus already made a good post on that issue. If you don't think the LVRJ poll is accurate, I'll give you 3 to 1 on Reid.
If a poll did collate all such potential biasing information (demographic and otherwise), then use past results to weight the collected data accordingly, then the poll would indeed be much more reliable. I think that such a realization, that all such respondents do not have equal weight, is absolutely necessary for the poll to mean anything.
Here in my neck of the woods, in 2008, an Obama win was a foregone conclusion, but the polls showed the race to be closer than it was, because students don't respond to polls, and 99.99% of students voted for Obama (a couple were stoned and punched the wrong button). In that case, the polls were so unrepresentative of what turned out to be the actual electorate, that they were utterly meaningless.
Reid is almost certainly going to win, because he has entrenched Party machinery behind him, he has massive funding available that his opponent does not, and he doubtless has a list of favors to call in longer than Santa's Christmas list. There's no way the Demos will let Reid lose--he's "too big to fail". It will also help Reid that his opponent is a raving loon. So I think you're not offering high enough odds--I would lay 5 to 1, or higher.
(By the way, I wonder if Harrah's is lobbying the state legislature to allow betting on elections, like in Great Britain. Think of the revenue...)
Quote: mkl654321
If a poll did collate all such potential biasing information (demographic and otherwise), then use past results to weight the collected data accordingly, then the poll would indeed be much more reliable. I think that such a realization, that all such respondents do not have equal weight, is absolutely necessary for the poll to mean anything.
That is what they do, using a combination of past results and demography. But everyone does it differently, and it's more art than science. How do you scientifically adjust your biased poll to reflect the electorate, when the only clues you have about what this year's electorate will look like come from your admittedly biased poll (and other similarly flawed polls)?
Also, you imply that students simply do not reply to polls. I agree that students are both more likely to be unavailable in the evening and less likely to want to participate than the general population. There are a couple other factors to consider: 1. Many pollsters do not call cell phones. They completely miss households that do not include landline phones. This demographic is dominated by young voters. 2. Most pollsters only call residential landline phones. This can miss some key groups, like networks that are associated with one main "business" number. For example, the residence halls at my alma mater all roll back up to the University's main switchboard. Thus, no on-campus student landline phones can be sampled.
Quote: DorothyGaleFalse. Your interval does not include undecided voters. The actual election day interval is that with probability 0.95, the election day final total for Angle will be between 48.2% to 56.2% (per my computation of a 52.2% mean for Angle, above).
I was trying to make things as simple as possible for DJTeddyBear. So I assumed that there were 0 undecided voters, and the other 6% were for third party candidates, or "none of the above," which is on the ballot in Nevada. As stated in other posts, I put the Angle mean at 52.13% and the 95% range at 48.08% to 56.17%.
Also, I think your 52.2% is in error.
49% + 49%*6% + 49%*(6%)^2 + 49%*(6%)^3 + ... = 52.13%, not 52.2%.
--Ms. D.
Quote: DorothyGaleAgree ... the sum of the series is the same as (49/(49+45)) ... 52.13% ... given that the probability of a Reid win is about 15%, that makes true odds about 17-to-3. So, will you take 9-to-2? I'll put up 2 units of Honor on Reid.
My gut tells me that Reid's odds are better than 15%. I think the phone polling with favor Angle, and last two major polls were done by right-leaning media. No, 3 to 1 is the best I can do.
Quote: WizardAt least I have company. Care to quote any source that takes your side?
Sorry for the delay in answering. Any beginning statistics textbook will give the interpretation of a confidence interval.
• The random interval �X¯− √n ,X + √n � contains the true parameter θ with 95% probability. It is wrong to say that θ lies in the interval with 95% probability...θ is not a RV!
http://ocw.mit.edu/courses/economics/14-30-introduction-to-statistical-method-in-economics-spring-2006/lecture-notes/l9.pdf
(the quote didn't translate completely)
Quote: matildaSorry for the delay in answering. Any beginning statistics textbook will give the interpretation of a confidence interval.
• The random interval �X¯− √n ,X + √n � contains the true parameter θ with 95% probability. It is wrong to say that θ lies in the interval with 95% probability...θ is not a RV!
I don't see the difference. Until we know what θ is, it is a random variable. If we knew what it was, we wouldn't need to bother having the election.
Quote: WizardI don't see the difference. Until we know what θ is, it is a random variable. If we knew what it was, we wouldn't need to bother having the election.
This couldn't be a more semantic argument, but I'm going to try to make it. θ is an unknown, but not random, number. It does not move. It is more correct to say that a given poll's interval surrounds the stationary θ with 95% confidence.
Quote: WizardThe Las' rel='nofollow' target='_blank'>http://www.lvrj.com/news/angle-poll-data-improve-106287803.html]Las Vegas Review Journal just published the latest poll on the race. Here is what they have:
Reid: 45%
Angle: 49%
Margin of error: 4%
Sample size: 625
I'm not sure how the exact math would work, but here is how I have always understood "Margin of error" in layman's terms:
The margin of error represents how the results would move given a normal distribution within 3 Sigmas (standard deviations) on each side of the curve. Of course, anything is possible, but from a practical standpoint a 3 Sigma event is considered outside of the "Margin of error"
So, on one end of the spectrum, you can add 4% to Reid, and subtract 4% from Angle, and this would be considered a 3 Sigma event:
Reid: 49%
Angle: 44%
At the other end of the spectrum, you can add 4% to Angle and subtract 4% from Reid, and this would be considered a 3 Sigma event:
Reid: 41%
Angle: 53%
The normal distribution between these +/- 3 Sigma events represents the probabilities of the election outcome.
I'm not saying this is correct, just that that is what I have always been told. I have no source for this.
Quote: scotty81The margin of error represents how the results would move given a normal distribution within 3 Sigmas (standard deviations) on each side of the curve. Of course, anything is possible, but from a practical standpoint a 3 Sigma event is considered outside of the "Margin of error".
Maybe that is how it is used in some other venue, but for elections, if my understanding is correct you only move 1.96 standard deviations in either direction.
Quote: WizardI don't see the difference. Until we know what è is, it is a random variable. If we knew what it was, we wouldn't need to bother having the election.
No, it is not a variable; it is fixed in value. Let me try this: What is the probability that the dealer has a ten down given he shows an ace up? We all can calculate this. Why? Because we know the population distribution., 52 cards, 16 tens, 4 aces. If we did not know the composition of a deck of cards, we could not calculate the probability. Suppose you see ten cards on the table, but you have no idea of the deck used by this casino. You cannot calculate the probability of a ten down.
In the case of a poll, unlike the deck of cards, we do not know the population distribution because the binomial (actually its a multinomial with undecideds) is defined by the probability of success and that parameter is unknown. If it was known why take the poll? Therefore you cannot calculate a probability such as a 95% range because you don't know what the population distribution is.
But when you interpret the interval as you do, you are actually saying that you know what the population distribution is for there is no way to calculate the probability without this knowledge. What you are actually doing is saying that the sampling distribution is same as the population distribution. But it is not. If it was then how do you account for different sampling distribution coming from samples of different size.
This is of course the key. The interval is the variable dependent on which sample happens to be collected. For each possible sample an interval can be calculated. While we cannot calculate a probabiliy about the population since the distribution is not known, we can calculate probabilities about the intervals themselves because the sampling distribution is known. We can calculate the probability that a particular interval contains a certain value. And more important for this discussion, we can calculate the percentage of intervals which contain a certain value. Therefore the correct interpretation of a 95% CI is that if all samples possible were taken and a 95% CI was constructed for each sample, then 95% of these intervals could contain the unknown, but constant, population parameter being estimated. The interval is the random variable not the population parameter which is a constant whether you know its value or not.
Again, my original limited knowledge of stat has become rusty.
Quote: matildaNo, it is not a variable; it is fixed in value. Let me try this: What is the probability that the dealer has a ten down given he shows an ace up? We all can calculate this. Why? Because we know the population distribution., 52 cards, 16 tens, 4 aces. If we did not know the composition of a deck of cards, we could not calculate the probability. Suppose you see ten cards on the table, but you have no idea of the deck used by this casino. You cannot calculate the probability of a ten down.
I don't see the point of any of that. Would you say that the total points scored in the next Monday Night Football game is a random variable?
Quote: matildaIn the case of a poll, unlike the deck of cards, we do not know the population distribution because the binomial (actually its a multinomial with undecideds) is defined by the probability of success and that parameter is unknown. If it was known why take the poll? Therefore you cannot calculate a probability such as a 95% range because you don't know what the population distribution is.
But even you quoted (and I hate the syntax) "The random interval �X¡Â− ¡Ôn ,X + ¡Ôn � contains the true parameter £c with 95% probability." That is all I am saying. The actual Angle share will fall between 48.08% and 56.17% with 95% probability.
Quote: matildaBut when you interpret the interval as you do, you are actually saying that you know what the population distribution is for there is no way to calculate the probability without this knowledge. What you are actually doing is saying that the sampling distribution is same as the population distribution. But it is not. If it was then how do you account for different sampling distribution coming from samples of different size.
My confidence interval was based on the assumption that the 625 people sampled represented a fair random sampling of all Nevada voters. I think the random phone method favors Angle, but didn't want to muddy my math with that, so accepted the assumption. Of course if the poll was repated the results would likely not be exactly the same, due it being random who was called.
Quote: matildaThis is of course the key. The interval is the variable dependent on which sample happens to be collected. For each possible sample an interval can be calculated. While we cannot calculate a probabiliy about the population since the distribution is not known, we can calculate probabilities about the intervals themselves because the sampling distribution is known. We can calculate the probability that a particular interval contains a certain value. And more important for this discussion, we can calculate the percentage of intervals which contain a certain value. Therefore the correct interpretation of a 95% CI is that if all samples possible were taken and a 95% CI was constructed for each sample, then 95% of these intervals could contain the unknown, but constant, population parameter being estimated. The interval is the random variable not the population parameter which is a constant whether you know its value or not.
I don't disagree with any of that. But I still don't retract my statement that the results on election day will fall between 48.08% and 56.17% with 95% probability. Again, even that garbled quote from MIT seems to support that. I am still having a hard time identifying our point of departure.
Quote: WizardBut I still don't retract my statement that the results on election day will fall between 48.08% and 56.17% with 95% probability.
The wording in the polling question is "if the election were held today...," so I think technically your CI tells you that if the election were held from 10/25-10/27, Angle's voteshare would have fallen between 48.08% and 56.17% with 95% probability. And of course, 2 days ago is very different from today - giant security threat and all...
Quote: WizardI I am still having a hard time identifying our point of departure.
The parameter to be estimated is a constant not a variable. This is a requirement of the methodology of a CI. It is analogous to setting a value in the null hypothesis in a statistical test.
Your statement is incorrect for several reasons:
You are calculating a the limits of 95% symmetrical range of the binomial probability function , P unknown and N Known.
1. You used the result of the sample to estimate the unknown P. This means you have error because the probability, using the normal, a continuous function, that this estimate is exactly equal to P is zero. Yes, you have a 95% range, but it is of a different distribution from that you say you are measuring.
2. Since the variance of the binomial is a function of P, you cannot calculate it. You estimated the variance using the sample result and calculated your interval with the estimate. This procedure adds more error to you interval range.
3. You used the normal distribution to approximate the binomial. This is standard procedure, nothing wrong with it, but it is an approximation, a good one with a large sample, but still introduces error.
Because of these three errors introduced by the methodology used you cannot say that "the results on election day will fall between 48.08% and 56.17% with 95% probability. " The probability is not 95% because the parameters, the sample results, you used are are estimates.
However you could say that you are 95% confident in the interval that you have constructed.
A more major error is that in your statement is the implicit assumption that the parameters of the population will not change between the day of the sample and election day. But that is not a problem of statistical inference and is beyond the bounds of the current discussion.
This is the last post I will make on this subject.
Matilda
Quote: matildaThis is the last post I will make on this subject.
Then I won't bother to reply in depth. It would have been nice to see the correct way to use the LVRJ poll to construct a confidence interval. At least I tried.
In short, you can't use confidence intervals to predict the outcome of an election. You can only say that 95% of confidence intervals you create through proper polling will contain the true proportion of people who support a candidate. This is not a prediction, nor can it be properly used to make an accurate prediction.
Instead, you can use this paper by Medler & Tull to answer this exact question:
POLL POSITIONS AND WIN PROBABILITIES: A STOCHASTIC MODEL OF THE ELECTORAL PROCESS
The table on page 7 says that if Candidate X received 49% of the poll vote, and candidate Y received 45% of poll vote, then Candidate X has a 90% chance of winning.
pr(48.08%<= a <=56.17%)=95%, where a = Angle's actual vote share of Reid/Angle votes.
However he said I can't phrase that as a has a 95% chance of falling between 48.08% and 56.17%. That seems just ridiculous to me.
If the bathtub surrounds the baby, that means the baby is in the bathtub.
Quote: WizardI wrote several times yesterday to an FSA actuary about the confidence interval issue. He said that if we trust the sampling, and nobody changes his mind, then it is fine to say that:
pr(48.08%<= a <=56.17%)=95%, where a = Angle's actual vote share of Reid/Angle votes.
However he said I can't phrase that as a has a 95% chance of falling between 48.08% and 56.17%. That seems just ridiculous to me.
If the bathtub surrounds the baby, that means the baby is in the bathtub.
I quess I will write again.
To complete the analogy, what if the baby is in a different tub that you mistake for the first tub? This is what you did. You created a baby and then placed it in a different tub.
Now we know Angle got 45% of the vote. Your baby was created in a 52% tub. You placed it in another tub that we did not know the parameter of at that time. The election tells us that this tub is in fact a 45% tub and you placed your 52% tub baby into it.