Written by a data scientist and sometimes writer who sometimes applies his skills to politics.
Showing posts with label Ron Paul. Show all posts
Showing posts with label Ron Paul. Show all posts
Wednesday, March 7, 2012
Ron Paul and Cold Weather
Climates that are cold focus their populations on more basic issues. The less money you have, the more you are affected by less people friendly climates.
So perhaps it is no surprise that Ron Paul's brand of fiscal conservatism appeals most to people living in northern states.
A particularly interesting difference is in the north to south alignment of Minnesota, Iowa and Missouri. None of these states have non-trivial mormon population, so that can't skew the results. None of these states are the "home" of a candidate. As you go south, Ron Paul votes dissipate.
See the map below:
Ron Paul's results in Michigan would have been much better if the chief rival (Mitt Romney) wasn't born there and have a father who was governor there. The same is true in Massachusetts where Romney lives and was himself a governor. His success in the directly adjoining states of Vermont, New Hampshire and Maine would likely have been even better had Romney not lived and been governor nearby.
So perhaps it is no surprise that Ron Paul's brand of fiscal conservatism appeals most to people living in northern states.
A particularly interesting difference is in the north to south alignment of Minnesota, Iowa and Missouri. None of these states have non-trivial mormon population, so that can't skew the results. None of these states are the "home" of a candidate. As you go south, Ron Paul votes dissipate.
See the map below:
Ron Paul's results in Michigan would have been much better if the chief rival (Mitt Romney) wasn't born there and have a father who was governor there. The same is true in Massachusetts where Romney lives and was himself a governor. His success in the directly adjoining states of Vermont, New Hampshire and Maine would likely have been even better had Romney not lived and been governor nearby.
Sunday, February 12, 2012
Stealing back an election
Update: my Twitter attempts to goad Nate Silver and some others into rethinking Maine seems to have worked. And as I suspected, the Ron Paul enthusiasm is already at work making this happen. I suspect there are similar efforts happening via other internet RP communication websites as well as the campaign's internal operations.´
We need 195 more votes!
According to this, as of Nov 8. there were 6907 registered Republicans and 8,247 eligible independents who could vote in the Washington County, Maine GOP caucus.
If the turnout at this Saturday's caucus was the same as the Iowa caucus (5.4%), you'd have 373 GOP and as many as an additional 445 independent voters at the caucus. In 2008 only 113 showed up, but what was their incentive?
Surely the Ron Paul campaign's get out the vote operations can result in 300-500 attendees showing up this Saturday in order to WIN Maine (in the eyes of the MSM).
Do not let the Maine GOP Party chairman decide that 84% of the precincts with Romney in the lead is a FINAL VOTE.
The Iowa GOP Chairman tried to give Romney a tie and now he is no longer the Iowa GOP Chairman, and the media finally reported that Santorum won.
(Of course no one won any final delegates, but that's another story).
Ron Paul can win those 195 votes and the media will have to pay attention an call him the winner of Maine.
From the 2008 Maine caucus, County gave 32 more votes to Mitt than Paul. I have no idea whether Ron Paul's campaign was active in Washington County in 2008, but just to be on the safe side, add those 32 votes to the 195 needed above. In 2008, 113 voters showed up to vote. Add 227 to that making 340.
So if voting more than triples in Washington County caucus next Saturday, that should be enough to give Ron Paul the win in Maine.
But let's be safe, and shoot for finding at least 300 Ron Paul voters to show up in Washington County.
As someone from far away, I'm not legally allowed to vote, but give me a phone list and I'll make those calls.
Let's make all the votes count.
We need 195 more votes!
According to this, as of Nov 8. there were 6907 registered Republicans and 8,247 eligible independents who could vote in the Washington County, Maine GOP caucus.
If the turnout at this Saturday's caucus was the same as the Iowa caucus (5.4%), you'd have 373 GOP and as many as an additional 445 independent voters at the caucus. In 2008 only 113 showed up, but what was their incentive?
Surely the Ron Paul campaign's get out the vote operations can result in 300-500 attendees showing up this Saturday in order to WIN Maine (in the eyes of the MSM).
Do not let the Maine GOP Party chairman decide that 84% of the precincts with Romney in the lead is a FINAL VOTE.
The Iowa GOP Chairman tried to give Romney a tie and now he is no longer the Iowa GOP Chairman, and the media finally reported that Santorum won.
(Of course no one won any final delegates, but that's another story).
Ron Paul can win those 195 votes and the media will have to pay attention an call him the winner of Maine.
From the 2008 Maine caucus, County gave 32 more votes to Mitt than Paul. I have no idea whether Ron Paul's campaign was active in Washington County in 2008, but just to be on the safe side, add those 32 votes to the 195 needed above. In 2008, 113 voters showed up to vote. Add 227 to that making 340.
So if voting more than triples in Washington County caucus next Saturday, that should be enough to give Ron Paul the win in Maine.
But let's be safe, and shoot for finding at least 300 Ron Paul voters to show up in Washington County.
As someone from far away, I'm not legally allowed to vote, but give me a phone list and I'll make those calls.
Let's make all the votes count.
Tuesday, January 24, 2012
Solid Demographics + Most Recent Poll = Best Prediction
Reviewing the SC primary results, the poll closest to the actual voting date that had good demographics was the most accurate. That poll was PPP.
Virtually every primary has at least 10% of the vote being decided on the day of the election. Voters who decided on the day of the election or within a few days make up at least 20% and in SC it was a whopping 55%!
So trying to predict a vote within a couple percentage points, which is important when you have 4 to 7 viable contestants, seems to require some sort of mysticism to be accurate.
But if you simply poll in those last few days, and model your demographics based on past demographics, taking into account any obvious reasons for a significant change from the most recent election or two (e.g. voters can reasonably easily switch from one primary/caucus to another... even in a so-called "closed" primary, or a prior contest was moot because the party's nominee was clear by the time that prior year's contest was held, etc.
Anyway, looking at the just released PPP poll for Florida and reviewing the exit poll data, I see no reason to consider any major change in demographics from the 2008 exit poll and for the most part, the PPP poll deviates very little from that.
But here is one outlier: it says 40% of voters describe their political philosophy as 40%. That is WAY off the mark. In 2008 it was 27%. In 2000 and 1996 the Florida primary was preceeded by two dozen contests, so voter interest is low. How that affects political philosophy is not yet clear to me. However, the very conservative % in those two contests was even lower at 20% and 21%. Even though Florida is a state with higher growth than most other states, a demographic shift statewide seems virtually impossible.
So how would this affect the results?
Let us assume that the conditional probabilities crosstab reported by PPP is not significantly affected by any other demographic anomalies (one I noted was the much lower hispanic sample of 7% vs. 12% in the '08 primary).
So the conditional probability of choosing a candidate based on your political philosophy is:
If one assumes the exit polling in the 2008 primary is reflective of the 2012 exit polling, then the % of voters for each political philosophy is 2% very liberal, 8% somewhat liberal, 28% moderate, 34% somewhat conservative, and 27% very conservative.
For Newt Gingrich, 2%x42% + 8%x44% + 28%x30% etc... you get 36.5%.
The numbers for Mitt Romney calculate to 33.9% which cuts the 5 pt lead from Newt in half.
The numbers for Santorum and Ron Paul reverse their positions from a 3 pt lead for Santorum to a .7 pt lead for Ron Paul.
Now consider what the media narrative is when the pollsters say someone is leading by 2.5 % vs. the 8% lead reported by Insider Advantage (who gives ZERO demographics, ZERO polling methodology).
The best scientific research is open and relatively easily reviewable by anyone with the time and inclination to do so. Mistakes are found, and corrected. This is in stark contrast with the majority of media polls put out every week during election season.
Virtually every primary has at least 10% of the vote being decided on the day of the election. Voters who decided on the day of the election or within a few days make up at least 20% and in SC it was a whopping 55%!
So trying to predict a vote within a couple percentage points, which is important when you have 4 to 7 viable contestants, seems to require some sort of mysticism to be accurate.
But if you simply poll in those last few days, and model your demographics based on past demographics, taking into account any obvious reasons for a significant change from the most recent election or two (e.g. voters can reasonably easily switch from one primary/caucus to another... even in a so-called "closed" primary, or a prior contest was moot because the party's nominee was clear by the time that prior year's contest was held, etc.
Anyway, looking at the just released PPP poll for Florida and reviewing the exit poll data, I see no reason to consider any major change in demographics from the 2008 exit poll and for the most part, the PPP poll deviates very little from that.
But here is one outlier: it says 40% of voters describe their political philosophy as 40%. That is WAY off the mark. In 2008 it was 27%. In 2000 and 1996 the Florida primary was preceeded by two dozen contests, so voter interest is low. How that affects political philosophy is not yet clear to me. However, the very conservative % in those two contests was even lower at 20% and 21%. Even though Florida is a state with higher growth than most other states, a demographic shift statewide seems virtually impossible.
So how would this affect the results?
Let us assume that the conditional probabilities crosstab reported by PPP is not significantly affected by any other demographic anomalies (one I noted was the much lower hispanic sample of 7% vs. 12% in the '08 primary).
So the conditional probability of choosing a candidate based on your political philosophy is:
Political Philosophy | Newt | Romney | Santorum | Paul | Other |
---|---|---|---|---|---|
Very Liberal | 42% | 34% | 9% | 5% | 10% |
Somewhat Liberal | 44% | 23% | 9% | 20% | 5% |
Moderate | 30% | 39% | 6% | 17% | 9% |
Somewhat Conservative | 35% | 42% | 9% | 9% | 6% |
Very Conservative | 44% | 23% | 20% | 8% | 5% |
For Newt Gingrich, 2%x42% + 8%x44% + 28%x30% etc... you get 36.5%.
The numbers for Mitt Romney calculate to 33.9% which cuts the 5 pt lead from Newt in half.
The numbers for Santorum and Ron Paul reverse their positions from a 3 pt lead for Santorum to a .7 pt lead for Ron Paul.
Now consider what the media narrative is when the pollsters say someone is leading by 2.5 % vs. the 8% lead reported by Insider Advantage (who gives ZERO demographics, ZERO polling methodology).
The best scientific research is open and relatively easily reviewable by anyone with the time and inclination to do so. Mistakes are found, and corrected. This is in stark contrast with the majority of media polls put out every week during election season.
Saturday, January 21, 2012
SC GOP Voter Demographics
UPDATE: Actual Vote % and Exit Poll % added
It seems that very few polls actually try to reach more than one (if any) demographic targets within their surveys. Monmouth is a notable exception. Unfortunately their latest poll was a week ago, before two debates, one candidate withdrawl, one candidate reentry (Cain/Colbert), a potential scandal or two, etc.. Given the number of news events just before the election and the questionable demographics of their pollees, I suspect many pollsters will be lucky if they come close to the actual vote percentage.
The 2012 SC GOP Primary (which is an OPEN primary in which Democrats or Independents can vote) should be a cross between the 1996 election where there was no Democratic primary as President Clinton ran unopposed and the 2008 SC GOP primary election where the competition was very close (however the record Democratic primary turnout drew away voters from the GOP primary). The SC population is getting older, and the GOP primary voters are getting more conservative. One counter trend is that Ron Paul is running a competitive campaign this time vs. '08 so his dedicated following dominated by younger voters (and the 50ish age group) and independent voters will somewhat offset other prevailing demographic trends. The 2000 election is another kind of anomaly in that the Democratic election was essentially unopposed and one popular candidate was very popular with independents (McCain) while the other candidate was also popular (Bush). So turnout was VERY high for the 2000 GOP primary.
When looking at polls before the election, it can be VERY important to look at the demographics of the people being polled. Many polls do not provide much information.
As a general rule, the less detail you see, or the more minimal the methods you see described to explain how one could reproduce the same result, the more likely there are to be errors in the results. In the case of this blog entry, I have omitted steps, but I will be continually revising this, to add more details, as time permits.
Many polls reported detailed demographics ONLY in the form of crosstabs so that you never see a simple statistic saying how many people over the age 65 were polled or how many people who do not consider themselves Republican were polled.
Methodoligical note: When only crosstabs are provided, I was able to estimate the range of possible values by solving many simultaneous algebraic equations and taking into account that a % without a decimal point (say 25% actually represents a number between 24.5% and 25.4999999...%) and trying both values to determine the maximum and minimum ranges of the final result. I then reduced the range, by using the maximum of ALL minimum values from each permutation of simultaneous equations. Likewise, I used the minimum of all maximum values. Perhaps if I had solved EVERY combination of simultaneous equations I could have narrowed the range a little more, but the range in most cases is reasonably narrow. Probably the midpoint is very close to the true value, but I offer no math at this point to support this notion.
The numbers for each year are either from exit polls, census data, or GOP election results. 2012 data is my best guess based on trends and my comparisons of the different mix of candidates each election.
* Before weighting due to voting history and other demographics (likely age, sex, race and perhaps party id)
It seems that very few polls actually try to reach more than one (if any) demographic targets within their surveys. Monmouth is a notable exception. Unfortunately their latest poll was a week ago, before two debates, one candidate withdrawl, one candidate reentry (Cain/Colbert), a potential scandal or two, etc.. Given the number of news events just before the election and the questionable demographics of their pollees, I suspect many pollsters will be lucky if they come close to the actual vote percentage.
The 2012 SC GOP Primary (which is an OPEN primary in which Democrats or Independents can vote) should be a cross between the 1996 election where there was no Democratic primary as President Clinton ran unopposed and the 2008 SC GOP primary election where the competition was very close (however the record Democratic primary turnout drew away voters from the GOP primary). The SC population is getting older, and the GOP primary voters are getting more conservative. One counter trend is that Ron Paul is running a competitive campaign this time vs. '08 so his dedicated following dominated by younger voters (and the 50ish age group) and independent voters will somewhat offset other prevailing demographic trends. The 2000 election is another kind of anomaly in that the Democratic election was essentially unopposed and one popular candidate was very popular with independents (McCain) while the other candidate was also popular (Bush). So turnout was VERY high for the 2000 GOP primary.
When looking at polls before the election, it can be VERY important to look at the demographics of the people being polled. Many polls do not provide much information.
As a general rule, the less detail you see, or the more minimal the methods you see described to explain how one could reproduce the same result, the more likely there are to be errors in the results. In the case of this blog entry, I have omitted steps, but I will be continually revising this, to add more details, as time permits.
Many polls reported detailed demographics ONLY in the form of crosstabs so that you never see a simple statistic saying how many people over the age 65 were polled or how many people who do not consider themselves Republican were polled.
Methodoligical note: When only crosstabs are provided, I was able to estimate the range of possible values by solving many simultaneous algebraic equations and taking into account that a % without a decimal point (say 25% actually represents a number between 24.5% and 25.4999999...%) and trying both values to determine the maximum and minimum ranges of the final result. I then reduced the range, by using the maximum of ALL minimum values from each permutation of simultaneous equations. Likewise, I used the minimum of all maximum values. Perhaps if I had solved EVERY combination of simultaneous equations I could have narrowed the range a little more, but the range in most cases is reasonably narrow. Probably the midpoint is very close to the true value, but I offer no math at this point to support this notion.
The numbers for each year are either from exit polls, census data, or GOP election results. 2012 data is my best guess based on trends and my comparisons of the different mix of candidates each election.
Category | Politico/GWU | Clemson | You Gov | 2012 Actual | 2012 Estimate | 2008 | 2000 | 1996 |
---|---|---|---|---|---|---|---|---|
Poll: Days Before Election | 3-4 | 2-3 | 1-3 | 0 | 0 | 0 | 0 | 0 |
Percent Undecided | 8% | 20% | 2% | 0 | 0 | 0 | 0 | 0 |
Percent May Change Choice | 5-34% | 0 | 0 | 0 | 0 | 0 | ||
Decided Day of Election | 17% | 19% | 18% | 9% | 17% | |||
Decided Day Before or 2 Days Before Election | 38% | 17% | 16% | 10% | 14% | |||
Decided within Last Week | 50% | 47% | 38% | 55% | ||||
Decided in January | 76% | 75% | 73% | |||||
Decided prior year | 24% | 25% | 27% | |||||
Voters or Polled | 600 | 429 | 759 | 601,166 | 480,000 | 445,499 | 565,704 | 278,183 |
Voters/Adult Pop Voters | 16.5% | 13.1% | 13.0% | 18.8% | 11.1% | |||
GOP Primary Winner/% | Romney 37% | Gingrich 32% | Gingrich 33% | Gingrich 40% | ?? 30% | McCain 33% | Bush 53% | Dole 45% |
GOP 2nd Place/% | Romney 30% | Romney 26% | Romney 29% | Romney 28% | ?? 27% | Huckabee 30% | McCain 41% | Buchanan 29% |
Dem Primary Winner/% or GOP 3rd Place | Paul 11% | Paul 11% | Paul 18% | Unopposed | Obama 55% | Gore 92% | Unopposed | |
Dem 2nd Place/% or GOP 4th Place | Santorum 10% | Santorum 9% | Santorum 16% | No One | Clinton 27% Edwards 18% | Bradley 2% | No One | |
GOP % | 71% | 71% | 65% | 78% | 61% | 69% | ||
Indy % | 27% | 25% | 28% | 18% | 30% | 26% | ||
Dem % | 2% | 4% | 7% | 5% | 9% | 5% | ||
Very Conservative % | 45% | 36% | 40% | 34% | 24% | 25% | ||
Somewhat Conservative % | 44% | 32% | 28% | 34% | 37% | 41% | ||
Moderate or Liberal% | 11% | 32% | 32% | 31% | 40% | 33% | ||
Moderate % | 23% | 22% | 24% | 29% | 25% | |||
Somewhat Liberal % | 7% | 7% | 5% | 8% | 6% | |||
Very Liberal % | 3% | 2% | 2% | 3% | 2% | |||
Male (census data) | 51% | 49% | 53% | 51% | 52% (48%) | 51% (48%) | 50% (48%) | 53% |
Age 65+ (census data) | 50% | 29% | 27% | 26% (18%) | ?? (17%) | 25% (16%) | 23% (16%) | |
Age 60+ (census data) | 64% | 36% (26%) | 35% (25%) | 35% (22%) | 32% (23%) | |||
Age 45-64 (census data) | 40% | 46% | 45% | 43% (35%) | ??% (35%) | 40% (31%) | 37% (28%) | |
Age 45-59 (census data) | 27% | 33% (27%) | 32% (27%) | 30% (25%) | 28% (21%) | |||
Age 30-44 (census data) | 8% | 16% | 19% | 20% (25%) | 23% (26%) | 25% (30%) | 31% (32%) | |
Age 18-29 (census data) | 1% | 9% | 9% | 11% (22%) | 10% (22%) | 10% (23%) | 8% (24%) |
Category | 20/20 Insight | Monmouth | ARG | NBC/Marist | PPP | 2012 Actual | 2012 Estimate |
---|---|---|---|---|---|---|---|
Poll: Days Before Election | 4-6 | 6-9 | 1-2 | 4-5 | 1-3 | 0 | 0 |
Percent Undecided | 4% | 7% | 2% | 8% | 5% | 0 | 0 |
Percent May Change Choice | 16-45% | 22% | 0 | 0 | |||
Decided Day of Election | 17% | 19% | |||||
Decided Day Before or 2 Days Before Election | 38% | 17% | |||||
Decided within Last Week | 50% | ||||||
Decided in January | 76% | 75% | |||||
Decided prior year | 24% | 25% | |||||
Voters or Polled | 512 | 963 | 600 | 684 | 1540 | 601,166 | 480,000 |
Voters/Adult Pop Voters | 16.5% | 13.1% | |||||
GOP Primary Winner/% | Romney 34% | Romney 33% | Gingrich 40% | Romney 34% | Gingrich 37% | Gingrich 40% | ?? 30% |
GOP 2nd Place/% | Gingrich 23% | Gingrich 22% | Romney 26% | Gingrich 24% | Romney 28% | Romney 28% | ?? 27% |
GOP 3rd Place | Santorum 15% | Santorum 14% | Paul 18% | Paul 11% | Santorum 16% | Santorum 17% | Paul 20% |
GOP 4th Place | Paul 11% | Paul 12% | Santorum 13% | Santorum 14% | Paul 14% | Paul 13% | Santorum 10% |
GOP % | 69% | 76% | 62.5-68.7% | 75% | 71% | 65% | |
Indy % | 30% | 24% | 31.3-37.5% | 21% | 25% | 28% | |
Dem % | 1% | 4% | 4% | 7% | |||
Very Conservative % | 41%* | 10.5-32.9% | 41% | 36% | 40% | ||
Somewhat Conservative % | 38%* | 31.7-72.9% | 35% | 32% | 28% | ||
Moderate or Liberal% | 21%* | 17.7-35.4% | 25% | 23% | 32% | ||
Moderate % | 23% | 22% | |||||
Somewhat Liberal % | 7% | 7% | |||||
Very Liberal % | 2% | 3% | |||||
Male (census data) | 52% | 52% | 40-50% | 53% | 51% | 52% (48%) | |
Age 66+ (census data) | 32% | 25% (17%) | |||||
Age 65+ (census data) | 26% | 27% | 26% (18%) | ||||
Age 45-64 (census data) | 41% | 45% | 43% (35%) | ||||
Age 46-65 (census data) | 38% | 44% (35%) | |||||
Age 45+ (census data) | 67% | 72.7-77.8% | 69% (53%) | ||||
Age 18-44 (census data) | 33% | 22.2-27.3% | 31% (47%) | ||||
Age 30-45 (census data) | 22% | 21% (26%) | |||||
Age 30-44 (census data) | 19% | 20% (25%) | |||||
Age 18-29 (census data) | 8% | 9% | 11% (22%) |
Category | We Ask America | Insider Advantage | Ipsos/Reuter | CNN/Time | 2012 Actual | 2012 Estimate |
---|---|---|---|---|---|---|
Poll: Days Before Election | 2 | 3 | 8-11 | 4-8 | 0 | 0 |
Percent Undecided | 14% | 2% | 10% | 8% | 0 | 0 |
Percent May Change Choice | 43% | 0 | 0 | |||
Decided Day of Election | 17% | 19% | ||||
Decided Day Before or 2 Days Before Election | 38% | 17% | ||||
Decided within Last Week | 50% | |||||
Decided in January | 76% | 75% | ||||
Decided prior year | 24% | 25% | ||||
Voters or Polled | 988 | 718 | 398 | 505 | 601,166 | 480,000 |
Voters/Adult Pop Voters | 16.5% | 13.1% | ||||
GOP Primary Winner/% | Gingrich 32% | Gingrich 32% | Romney 37% | Romney 33% | Gingrich 40% | ?? 30% |
GOP 2nd Place/% | Romney 28% | Romney 29% | Paul 16% | Gingrich 23% | Romney 28% | ?? 27% |
GOP 3rd Place | Paul 14% | Paul 15% | Santorum 16% | Santorum 16% | Santorum 17% | Paul 20% |
GOP 4th Place | Santorum 9% | Santorum 11% | Gingrich 12% | Paul 13% | Paul 13% | Santorum 10% |
GOP % | 100% | 63-83% | 71% | 65% | ||
Indy % | 17-37% | 25% | 28% | |||
Dem % | 4% | 7% | ||||
Very Conservative % | 36% | 40% | ||||
Somewhat Conservative % | 32% | 28% | ||||
Moderate or Liberal% | 32% | 32% | ||||
Moderate % | 23% | 22% | ||||
Somewhat Liberal % | 7% | 7% | ||||
Very Liberal % | 2% | 3% | ||||
Male (census data) | 51% | 52% (48%) | ||||
Age 65+ (census data) | 27% | 26% (18%) | ||||
Age 45-64 (census data) | 45% | 43% (35%) | ||||
Age 30-44 (census data) | 19% | 20% (25%) | ||||
Age 18-29 (census data) | 9% | 11% (22%) |
IPSOS claims to have weighted results to "South Carolina current population registered voter data by gender, age, education, ethnicity and an eight item political values scale".
Time/CNN claims to have weighted to "reflect statewide Census figures for gender, race, age, education and region of the state." Since older people are much less likely to vote, and virtually no non-whites vote in the GOP primary, it is unclear how much this is a distortion of the actual voting population for this primary.
Sunday, January 15, 2012
GIGO - Garbage In Garbage Out
This is not my usual tone for this blog, but I grow weary of bad polls and bad explanations of the GOP race.
From Wikipedia:
Garbage in garbage out "was coined as a teaching mantra by George Fuechsel,[1] an IBM 305 RAMAC technician/instructor in New York. Early programmers were required to test virtually each program step and cautioned not to expect that the resulting program would 'do the right thing' when given imperfect input. The underlying principle was noted by the inventor of the first programmable computing device design:
On two occasions I have been asked,—'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
In my years of working with health insurance data, we spent far more time cleaning up garbage data, redefining vague information into something more useful and coming up with models that rationally explain how healthcare services were used before we got to using much more than basic arithmetic. Rarely were T tests, R squared values or more sophisticated statistics used like actuaries or less successful competitors did. You can't put lipstick on a pig.
As I look at the internals of various primary polls for Iowa, New Hampshire and now South Carolina, I am struck by the number of times a poll with seriously flawed input data is touted by the media and used to build a false narrative. The worst case has to be the late December CNN Iowa poll, which surveyed ZERO voters who weren't registered Republicans, despite the undisputed significant voting by non-Republicans in Iowa GOP caucuses. The right % should have been 25%, the same % was demonstrated during the last non-competitive Iowa Democratic caucus (1996) - the same number that would be validated in the 2012 exit polls. Since the media either was ignorant of it, or gave so little warning, and there were no other polls for two days, the 24 hour media trumpted a nonexistent surge by Santorum due to this seriously flawed poll. So today, I read an article about how the evangelical leaders have coalesced around supporting Santorum... someone who would have likely stayed in 6th or maybe reached 5th place in Iowa and then returned home if not for a single flawed, heavily promoted poll.
Today I see an even worse poll, from some organization I've never heard of, that somehow believes that less than 5% of the voters in South Carolina's primary will be less than 40 years old and 55% will be at least 65 years old. For reference, in 2008 35% were older than 60 like in 2000 according to the exit poll. The 2000 exit poll showed that 25% were older than 60.
This poll is inexplicably given a weight rating of 4 bars out of 5 on Nate Silver's otherwise credible forecasting model. Aggregators favor combining as many polls as possible, no matter their quality, hoping that with enough garbage, the various garbage factors will cancel each other out. Some skilled analysts like Nate Silver, attempt to quantify and somewhat discount the garbage by using theoretical formulas about sampling error on a bell curve, applying some likely useful heuristics like the age of the poll, and employing the somewhat controversial, though still likely useful strategy of rating a pollster by how close it's polls come to predicting the actual result.
I guess this is better than nothing, but it is not something we did in the healthcare data analysis industry, nor did any of our competitors. Admittedly, we were spending millions of dollars on these tasks, while the polling aggregators do this for something with at least one or two zeros in their budget.
Still, I can't believe that spending a little effort on trying to adjust the data or at least dump a poll that has such problems as today's horrendously bad SC poll isn't low hanging fruit for these small organizations.
Nate Silver's latest article also repeats a widely spread meme that the South Carolina GOP is home to 60% evangelicals/born agains and as evidence, cites a 2008 exit poll that doesn't even ask this question. Worse yet, the 2000 GOP exit poll shows 34% belong to the religious right (which is not the same as born again/evangelical, but it is the closest I could find). Perhaps some non-exit poll has this 60%, but before I cite that, I want to do a serious look at the internals to figure out if other anomalies exist.
Another false notion:
Too often I hear that South Carolina isn't like New Hampshire where independents are such a factor. Wrong. It is an open primary. In 2000 when there Gore still had some modest competition from Senator Bradley, 39% of voters who identified themselves as Independent or Democratic voted in the GOP primary. This is compared to Iowa's caucus in 2012 where there was no competitive race for the Democrats and 25% of the GOP voters were non-Republicans.
These bad polls due to age distribution and independent voter % can heavily penalize Ron Paul - someone who the establishment wants to discredit, but I'll leave that explanation for another post.
As I look at suspect polling internals, I wonder if the accuracy of the aggregations in Iowa and New Hampshire weren't the beneficiaary of a certain amount of luck, and that a repeat of the NH Dem 08 primary is waiting.
So for the next week I'm going dumpster diving into SC and FL voting/polling data, sifting through the garbage, hoping to bring out some good.
Or for my dyslexic friends: IGOG. Into the Garbage, Out comes Good.
From Wikipedia:
Garbage in garbage out "was coined as a teaching mantra by George Fuechsel,[1] an IBM 305 RAMAC technician/instructor in New York. Early programmers were required to test virtually each program step and cautioned not to expect that the resulting program would 'do the right thing' when given imperfect input. The underlying principle was noted by the inventor of the first programmable computing device design:
On two occasions I have been asked,—'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
In my years of working with health insurance data, we spent far more time cleaning up garbage data, redefining vague information into something more useful and coming up with models that rationally explain how healthcare services were used before we got to using much more than basic arithmetic. Rarely were T tests, R squared values or more sophisticated statistics used like actuaries or less successful competitors did. You can't put lipstick on a pig.
As I look at the internals of various primary polls for Iowa, New Hampshire and now South Carolina, I am struck by the number of times a poll with seriously flawed input data is touted by the media and used to build a false narrative. The worst case has to be the late December CNN Iowa poll, which surveyed ZERO voters who weren't registered Republicans, despite the undisputed significant voting by non-Republicans in Iowa GOP caucuses. The right % should have been 25%, the same % was demonstrated during the last non-competitive Iowa Democratic caucus (1996) - the same number that would be validated in the 2012 exit polls. Since the media either was ignorant of it, or gave so little warning, and there were no other polls for two days, the 24 hour media trumpted a nonexistent surge by Santorum due to this seriously flawed poll. So today, I read an article about how the evangelical leaders have coalesced around supporting Santorum... someone who would have likely stayed in 6th or maybe reached 5th place in Iowa and then returned home if not for a single flawed, heavily promoted poll.
Today I see an even worse poll, from some organization I've never heard of, that somehow believes that less than 5% of the voters in South Carolina's primary will be less than 40 years old and 55% will be at least 65 years old. For reference, in 2008 35% were older than 60 like in 2000 according to the exit poll. The 2000 exit poll showed that 25% were older than 60.
This poll is inexplicably given a weight rating of 4 bars out of 5 on Nate Silver's otherwise credible forecasting model. Aggregators favor combining as many polls as possible, no matter their quality, hoping that with enough garbage, the various garbage factors will cancel each other out. Some skilled analysts like Nate Silver, attempt to quantify and somewhat discount the garbage by using theoretical formulas about sampling error on a bell curve, applying some likely useful heuristics like the age of the poll, and employing the somewhat controversial, though still likely useful strategy of rating a pollster by how close it's polls come to predicting the actual result.
I guess this is better than nothing, but it is not something we did in the healthcare data analysis industry, nor did any of our competitors. Admittedly, we were spending millions of dollars on these tasks, while the polling aggregators do this for something with at least one or two zeros in their budget.
Still, I can't believe that spending a little effort on trying to adjust the data or at least dump a poll that has such problems as today's horrendously bad SC poll isn't low hanging fruit for these small organizations.
Nate Silver's latest article also repeats a widely spread meme that the South Carolina GOP is home to 60% evangelicals/born agains and as evidence, cites a 2008 exit poll that doesn't even ask this question. Worse yet, the 2000 GOP exit poll shows 34% belong to the religious right (which is not the same as born again/evangelical, but it is the closest I could find). Perhaps some non-exit poll has this 60%, but before I cite that, I want to do a serious look at the internals to figure out if other anomalies exist.
Another false notion:
Too often I hear that South Carolina isn't like New Hampshire where independents are such a factor. Wrong. It is an open primary. In 2000 when there Gore still had some modest competition from Senator Bradley, 39% of voters who identified themselves as Independent or Democratic voted in the GOP primary. This is compared to Iowa's caucus in 2012 where there was no competitive race for the Democrats and 25% of the GOP voters were non-Republicans.
These bad polls due to age distribution and independent voter % can heavily penalize Ron Paul - someone who the establishment wants to discredit, but I'll leave that explanation for another post.
As I look at suspect polling internals, I wonder if the accuracy of the aggregations in Iowa and New Hampshire weren't the beneficiaary of a certain amount of luck, and that a repeat of the NH Dem 08 primary is waiting.
So for the next week I'm going dumpster diving into SC and FL voting/polling data, sifting through the garbage, hoping to bring out some good.
Or for my dyslexic friends: IGOG. Into the Garbage, Out comes Good.
Friday, January 13, 2012
New Hampshire Primary Election Trivia
Over the last 60 years, Romney's 2012 vote total is a relatively high percent of all votes cast combined in the combine primaries in New Hampshire per official vote totals from the New Hampshire secretary of state.
Of the leading NH vote getters, only Henry Cabot Lodge in 1964 and Hillary Clinton in 2008 failed to be nominated by their party. The candidate who was elected President has been in the top 3 every time except in 1992 when Bill Clinton came in 4th with 12.4%.
Comparing Obama's result to past incumbent Presidents running for reelection, all of those who only received about 1/2 the votes of their own party were not reelected, all of those above this mark were reelected.
Of the leading NH vote getters, only Henry Cabot Lodge in 1964 and Hillary Clinton in 2008 failed to be nominated by their party. The candidate who was elected President has been in the top 3 every time except in 1992 when Bill Clinton came in 4th with 12.4%.
Year | % of ALL Votes Cast for Leader | Leading Candidate | 2nd Place Candidate | 3rd Place Candidate |
---|---|---|---|---|
1952 | 36.1% | Eisenhower | 27.8% - Taft | 15.3% - Kefauver |
1956 | 68.3% | Eisenhower | 26.2% - Kefauver | 4.6% - Stevenson |
1960 | 52.7% | Nixon | 36.8% - Kennedy | 7.5% - Fisher |
1964 | 26.9% | Lodge | 23.7% - Johnson | 16.9% - Goldwater |
1968 | 52.2% | Nixon | 18.4% - Johnson | 18.1% - McCarthy |
1972 | 38.9% | Nixon | 20.3% - Muskie | 16.3% - McGovern |
1976 | 28.9% | Ford | 28.3% - Reagan | 12.5% - Carter |
1980 | 29.0% | Reagan | 20.8% - Carter | 16.4% - Kennedy |
1984 | 39.8% | Reagan | 23.6% - Hart | 16.6% - Mondale |
1988 | 21.1% | Bush (Sr) | 15.9% - Dole | 15.7% - Dukakis |
1992 | 26.9% | Bush (Sr) | 19.1% - Buchanan | 17.0% - Tsongas |
1996 | 26.2% | Clinton (Bill) | 19.8% - Buchanan | 18.4% - Dole |
2000 | 29.4% | McCain | 19.9% - Gore | 18.6% - Bush (Jr) |
2004 | 30.3% | Kerry | 20.7% - Dean | 18.9% - Bush (Jr) |
2008 | 21.7% | Clinton (Hillary) | 20.3% - Obama | 16.9% - McCain |
2012 | 32.2% | Romney | 19.1% - Paul | 16.1% - Obama |
Comparing Obama's result to past incumbent Presidents running for reelection, all of those who only received about 1/2 the votes of their own party were not reelected, all of those above this mark were reelected.
Year | % of Party | Candidate | % of All Votes |
---|---|---|---|
1956 | 98.9 | Eisenhower | 68.3 |
1964 | 95.3 | Johnson | 23.7 |
1984 | 86.4 | Reagan | 39.8 |
1996 | 84.3 | Clinton | 26.2 |
2012 | 81.1 | Obama | 16.1 |
2004 | 79.8 | Bush | 18.9 |
1972 | 67.6 | Nixon | 38.9 |
1992 | 53.2 | Bush | 26.9 |
1976 | 50.1 | Ford | 28.9 |
1968 | 49.6 | Johnson | 18.4 |
1980 | 47.1 | Carter | 20.8 |
1952 | 43.9 | Truman | 12.3 |
Labels:
Buchanan,
Bush,
Clinton,
Dole,
Kerry,
McCain,
Mitt Romney,
New Hampshire,
Obama,
Primary,
Ron Paul,
Tsongas
Tuesday, January 10, 2012
Varying Age Demographics in NH Polls
Updated to include today's Suffolk poll release: Depending on which age distribution you use for the Suffolk poll, Romney's lead goes from 20.4% (averaged Suffolk) to just 18.2% (Marist) over Ron Paul. Also, Huntsman drops 0.1 (Avg Suffolk), drops 1.6 with Marist and 1.8 with Suffolk (because Huntsman's support is disproportionately with older voters unlike Paul's whose support is disproportionatel with young and middle age voters.)
There is a another article that also looks at the age distribution issue, claiming it favors Romney. What they don't look at, apparently, is more detailed breakdowns (because although the general claims are true, various candidates have other strengths (for example Ron Paul does well in the 45-54 age group). But the article is from National Journal, so name credibility also helps.
I hope to locate exit poll distributions before the polls close, but competing demands for time (I still need to look at undecideds and others who may switch votes... particularly those who switch to Huntsman, before I get back to exit poll analysis. With a 6 hour time zone lag vs. Eastern Standard Time, I guess I won't be getting much sleep tonight. ;-)
***Earlier Post****
According to many folks, pollsters adjust their raw polling data to reflect expected age, sex and geographic demographics. It is not clear how this is done, as I have yet to find a poll that provided this detail (even with polls providing 300 pages of data). If one assumes this is true, I cannot explain why the % of likely voters vs. age groups changes on a daily basis on the highly regarded Suffolk polk as show below:
I also cannot explain why several other major polls vary between each other. The following data is taken from their most recent poll (or in the case of Suffolk, an average of their 9 daily tracking polls).
Nearly all candidates have a different appeal depending on the age of the voter. For some candidates (e.g. Ron Paul) the sensitivity with regard to the age distribution of is quite high (younger voters overwhelmingly prefer him and retirees have some reluctance to choose him). There is even some indication that middle aged voters (45-54, at least within NH) have a strong preference for Ron Paul. Another example is the strong preference of retirees for Newt Gingrich.
Many people talk about debate performance or new commercials as the reason for daily changes in polls as well as the undefined term "momentum". Sometimes one hears or reads (usually a few words or a second or two) about normal random sampling error (inaccurately described as margin of error). But the affect of other deliberate sampling biases like age or voter registration are rarely mentioned. Perhaps this should change.
There is a another article that also looks at the age distribution issue, claiming it favors Romney. What they don't look at, apparently, is more detailed breakdowns (because although the general claims are true, various candidates have other strengths (for example Ron Paul does well in the 45-54 age group). But the article is from National Journal, so name credibility also helps.
I hope to locate exit poll distributions before the polls close, but competing demands for time (I still need to look at undecideds and others who may switch votes... particularly those who switch to Huntsman, before I get back to exit poll analysis. With a 6 hour time zone lag vs. Eastern Standard Time, I guess I won't be getting much sleep tonight. ;-)
***Earlier Post****
According to many folks, pollsters adjust their raw polling data to reflect expected age, sex and geographic demographics. It is not clear how this is done, as I have yet to find a poll that provided this detail (even with polls providing 300 pages of data). If one assumes this is true, I cannot explain why the % of likely voters vs. age groups changes on a daily basis on the highly regarded Suffolk polk as show below:
I also cannot explain why several other major polls vary between each other. The following data is taken from their most recent poll (or in the case of Suffolk, an average of their 9 daily tracking polls).
Nearly all candidates have a different appeal depending on the age of the voter. For some candidates (e.g. Ron Paul) the sensitivity with regard to the age distribution of is quite high (younger voters overwhelmingly prefer him and retirees have some reluctance to choose him). There is even some indication that middle aged voters (45-54, at least within NH) have a strong preference for Ron Paul. Another example is the strong preference of retirees for Newt Gingrich.
Many people talk about debate performance or new commercials as the reason for daily changes in polls as well as the undefined term "momentum". Sometimes one hears or reads (usually a few words or a second or two) about normal random sampling error (inaccurately described as margin of error). But the affect of other deliberate sampling biases like age or voter registration are rarely mentioned. Perhaps this should change.
Monday, January 2, 2012
Iowa Forecast
PPP Published a poll late last night with lots of details to devine movement amongst the candidates before election night on Tuesday. Assuming every moves who isn't committed, and they move to the 2nd choice candidate that they indicated they would, here is how it will turn out.
Candidate | PPP Poll % | PPP Poll Likely Voters |
Switch Away | Switch To | Final % |
---|---|---|---|---|---|
Paul Sunday | 21 % | 281 | -43 | 29 | 20.0 % |
Paul Weekend | 20 % | 268 | -41 | 35 | 19.6 % |
Romney Weekend | 19 % | 255 | -56 | 49 | 18.4 % |
Santorum | 18 % | 241 | -63 | 62 | 17.9 % |
Romney Sunday | 18 % | 241 | -54 | 52 | 17.9 % |
Gingrich | 14 % | 188 | -58 | 38 | 12.5 % |
Perry | 10 % | 134 | -44 | 65 | 11.6 % |
Bachmann | 8 % | 107 | -27 | 44 | 9.2 % |
Huntsmann | 4 % | 54 | -14 | 16 | 4.1 % |
Roemer | 2 % | 27 | -6 | 12 | 2.5 % |
Friday, December 30, 2011
The Cure for Racism
A number of people with with hardened hearts are spreading disinformation about the most honest presidential candidate. They ignore a fundamental truth, that actions speak louder than words.
There is probably a mild bit of the human tendency to overclassify people at a mostly subconscious level in Ron Paul, as in everyone. If you want to believe the worst in Ron Paul, here is the only credible evidence, and it paints Ron Paul no worse than virtually any human being.
There is probably a mild bit of the human tendency to overclassify people at a mostly subconscious level in Ron Paul, as in everyone. If you want to believe the worst in Ron Paul, here is the only credible evidence, and it paints Ron Paul no worse than virtually any human being.
Friday, December 16, 2011
The Times They Are Accelerating
Is it just me, or are the frontrunner's boom and bust cycles getting shorter?
All Iowa Polls from polling firms that have been rated by New York Times polling analyst Nate Silver (except for polls from ARG and Insider Advantage which are rated near the bottom). Trends are created using a 3 poll moving average.
All Iowa Polls from polling firms that have been rated by New York Times polling analyst Nate Silver (except for polls from ARG and Insider Advantage which are rated near the bottom). Trends are created using a 3 poll moving average.
Tuesday, September 27, 2011
Jon Stewart - Liberal or Libertarian?
Jon Stewart gave Ron Paul 10 minutes of network time. I think that is the longest on air interview the Daily Show has ever done. (Including webonly parts, it runs 17 1/2 minutes).
It is fitting that Jon Stewart is the one who gave him so much time since he singlehandedly got the media to stop (blatantly) ignoring him.
Now the ignoring is more subtle.
It is fitting that Jon Stewart is the one who gave him so much time since he singlehandedly got the media to stop (blatantly) ignoring him.
Now the ignoring is more subtle.
Sunday, September 25, 2011
Solving the Paul/Cain Arizona Straw Poll Winner Riddle
In a memorable movie Kathleen Turner says to her teacher "I happen to know that in the future I will not have the slightest use for algebra, and I speak from experience".
But to solve the riddle of major news organizations (CNN, Reuters, Politico, etc) offering conflicting stories about who won the Arizona Tea Party Straw Poll, Ron Paul or Herman Cain, a little high school algebra does wonders.
Ron Paul did win 49% of one part of an Arizona Tea Party straw poll and 15% of another part of an Arizona Tea Party straw poll.
Herman Cain did win 12% of one part of an Arizona Tea Party straw poll and 22% of another part of an Arizona Tea Party straw poll.
Ron Paul did get 581 straw poll votes
Herman Cain did get 256 straw poll votes.
About 1600 people did take part in the Arizona Tea Party straw polls.
It is probably true that 1600 people paid up to 125$ to attend an onsite one day event.
It is probably true that 2300 (maybe even 3000) people paid up to 10$ to view this event online.
The other information in the various articles
2300 people took part in an online poll
1600 people took part in a live/onsite poll
are FALSE.
High school algebra is how you sort out truth from sloppy reporting because the number of people taking each version of this poll was never reported correctly by the news media or the straw poll event organizers. Why, is up for someone else to figure out. I just do math.
But to solve the riddle of major news organizations (CNN, Reuters, Politico, etc) offering conflicting stories about who won the Arizona Tea Party Straw Poll, Ron Paul or Herman Cain, a little high school algebra does wonders.
Ron Paul did win 49% of one part of an Arizona Tea Party straw poll and 15% of another part of an Arizona Tea Party straw poll.
Herman Cain did win 12% of one part of an Arizona Tea Party straw poll and 22% of another part of an Arizona Tea Party straw poll.
Ron Paul did get 581 straw poll votes
Herman Cain did get 256 straw poll votes.
About 1600 people did take part in the Arizona Tea Party straw polls.
It is probably true that 1600 people paid up to 125$ to attend an onsite one day event.
It is probably true that 2300 (maybe even 3000) people paid up to 10$ to view this event online.
The other information in the various articles
2300 people took part in an online poll
1600 people took part in a live/onsite poll
are FALSE.
High school algebra is how you sort out truth from sloppy reporting because the number of people taking each version of this poll was never reported correctly by the news media or the straw poll event organizers. Why, is up for someone else to figure out. I just do math.
Saturday, September 24, 2011
Ron Paul says he'd consider putting Dennis Kucinich in his Cabinet
Not your typical politicians...
"You've got to give credit to people who think," he said.
"Being pragmatic is about forming coalitions," Paul said Wednesday at a breakfast sponsored by the Christian Science Monitor. "I probably work with coalitions better than the other candidates. I don't think I've said anything negative here about the president."
"You've got to give credit to people who think," he said.
"Being pragmatic is about forming coalitions," Paul said Wednesday at a breakfast sponsored by the Christian Science Monitor. "I probably work with coalitions better than the other candidates. I don't think I've said anything negative here about the president."
Mainstream Media Takes Paul Seriously
The mainstream media is finally starting to treat Ron Paul's candidacy seriously
Huffington Post likes Paul's chances
Michelle Bachmann is no longer considered credible and the author believes Paul has a shot to supplant Perry as the best non-Romney. I am surprised and pleased.
Huffington Post likes Paul's chances
Michelle Bachmann is no longer considered credible and the author believes Paul has a shot to supplant Perry as the best non-Romney. I am surprised and pleased.
Monday, September 19, 2011
Green Libertarian
Is it so weird to vote for both Ron Paul and Ralph Nader?
Apparently not. There is a name for such behavior.
Green Libertarianism
Read more on Wikipedia
Apparently not. There is a name for such behavior.
Green Libertarianism
Read more on Wikipedia
Subscribe to:
Posts (Atom)