Tuesday, January 24, 2012

Immigration - A Santorum Personal History

It strikes me as odd how strident Rick Santorum is against illegal immigration given his background as the first generation in his family who was born in America. It seems that it is possible that his grandfather got to America just before (or perhaps not so legally) just after the draconian Immigration Act of 1924 and that his father had to wait until he was 7 years old until he could join his dad. I'm not sure why Rick Santorum talks about 5 years. This immigration act was targetted at Italians and Jews, cutting down Italian immigration from 200,000 per year to 4,000 a year (while increasing immigration quotas from "favored" nations).

I wonder if someone did some real investigating whether they would find out some detail, perhaps embarrassing or perhaps some deep psychological wound, that would explain why Rick Santorum is so unforgiving of people wanting to move to America, despite the roadblocks put up by this generation's bigots. When his father was born, the Italians and Jews were the target of bigots in America. Today it is Mexicans. The target changes, and "legal" means of enforcing the bigotry changes, but fundamental bigotry is still there.

Solid Demographics + Most Recent Poll = Best Prediction

Reviewing the SC primary results, the poll closest to the actual voting date that had good demographics was the most accurate. That poll was PPP.

Virtually every primary has at least 10% of the vote being decided on the day of the election. Voters who decided on the day of the election or within a few days make up at least 20% and in SC it was a whopping 55%!

So trying to predict a vote within a couple percentage points, which is important when you have 4 to 7 viable contestants, seems to require some sort of mysticism to be accurate.

But if you simply poll in those last few days, and model your demographics based on past demographics, taking into account any obvious reasons for a significant change from the most recent election or two (e.g. voters can reasonably easily switch from one primary/caucus to another... even in a so-called "closed" primary, or a prior contest was moot because the party's nominee was clear by the time that prior year's contest was held, etc.

Anyway, looking at the just released PPP poll for Florida and reviewing the exit poll data, I see no reason to consider any major change in demographics from the 2008 exit poll and for the most part, the PPP poll deviates very little from that.

But here is one outlier: it says 40% of voters describe their political philosophy as 40%. That is WAY off the mark. In 2008 it was 27%. In 2000 and 1996 the Florida primary was preceeded by two dozen contests, so voter interest is low. How that affects political philosophy is not yet clear to me. However, the very conservative % in those two contests was even lower at 20% and 21%. Even though Florida is a state with higher growth than most other states, a demographic shift statewide seems virtually impossible.

So how would this affect the results?
Let us assume that the conditional probabilities crosstab reported by PPP is not significantly affected by any other demographic anomalies (one I noted was the much lower hispanic sample of 7% vs. 12% in the '08 primary).

So the conditional probability of choosing a candidate based on your political philosophy is:
Political PhilosophyNewtRomneySantorumPaulOther
Very Liberal42%34%9%5%10%
Somewhat Liberal44%23%9%20%5%
Moderate30%39%6%17%9%
Somewhat Conservative35%42%9%9%6%
Very Conservative44%23%20%8%5%
If one assumes the exit polling in the 2008 primary is reflective of the 2012 exit polling, then the % of voters for each political philosophy is 2% very liberal, 8% somewhat liberal, 28% moderate, 34% somewhat conservative, and 27% very conservative.

For Newt Gingrich, 2%x42% + 8%x44% + 28%x30% etc... you get 36.5%.
The numbers for Mitt Romney calculate to 33.9% which cuts the 5 pt lead from Newt in half.
The numbers for Santorum and Ron Paul reverse their positions from a 3 pt lead for Santorum to a .7 pt lead for Ron Paul.

Now consider what the media narrative is when the pollsters say someone is leading by 2.5 % vs. the 8% lead reported by Insider Advantage (who gives ZERO demographics, ZERO polling methodology).

The best scientific research is open and relatively easily reviewable by anyone with the time and inclination to do so. Mistakes are found, and corrected. This is in stark contrast with the majority of media polls put out every week during election season.

Sunday, January 22, 2012

The Myth of SC's Importance in choosing Presidents

1) Somehow, the SC GOP has gotten a lot of pundits to believe that the choice of SC voters is what determines the nominee of the GOP.

Related myths of political significance are
2) No one has won the first two contests (and until last week when the IA win result was changed, Romney was believd to be the first)
3) There has never been a different winner in the first three contests (which presumes that SC is always in the first three contests)

Myth 3)
2012 is the first year that SC was in the first three contests. In 2008, besides IA and NH, the Michigan primary and the Wyoming caucus preceded them. Huckabee, Romney and McCain won the first three events.
In 2000, the Delaware primary and Alaska caucus also preceded SC. In 1996 the first three events were the Alaska, Louisiana and Iowa caucuses and Buchanan's win in Louisiana was a major factor to his performance in Iowa.
In 1988, Bush, Robertson and Dole won the first three events with Roberton's Hawaii caucus win being a major factor in his performance in the 4th event (the Iowa caucus).

Myth 1 and 3)
In 1996, it was the NINTH contest of the year.
In 1988, it was the THIRTEENTH contest of the year.
After 8 or 12 contests, the winner of the nomination is clear. Whatever SC has to say about it is irrelevant.
In 1980 it was the 5th primary of the year (6th election including caucuses). Even then, it would be 2 months before the eventual nominee stopped losing primary/caucuses.
Prior to 1980 they had no vote (caucus or primary) so it had zero relevance.
No event had relevance in 1984 or 2004 when incumbent presidents were reelected president. Possibly in 1992 it was of minor significance though Bush Sr's competition was effectively over after the New Hampshire primary.

Myth 2)
In 2000 Bush won both the Alaska and subsequent Iowa caucuses (NH was the 3rd election that year). In 1996, Alaska was joined by Louisiana as pre-Iowa caucuses, and both were won by Buchanan.
In 1980, Bush sr won the first two contests, the Iowa Caucus followd by another primary that was NOT New Hampshire. It was Puerto Rico.
In 1968, Nixon one the first two contests (Wisconsin being the 2nd primary after New Hampshire). Iowa did not begin its caucuses until 1972.

Saturday, January 21, 2012

SC GOP Voter Demographics

UPDATE: Actual Vote % and Exit Poll % added
It seems that very few polls actually try to reach more than one (if any) demographic targets within their surveys. Monmouth is a notable exception. Unfortunately their latest poll was a week ago, before two debates, one candidate withdrawl, one candidate reentry (Cain/Colbert), a potential scandal or two, etc.. Given the number of news events just before the election and the questionable demographics of their pollees, I suspect many pollsters will be lucky if they come close to the actual vote percentage.

The 2012 SC GOP Primary (which is an OPEN primary in which Democrats or Independents can vote) should be a cross between the 1996 election where there was no Democratic primary as President Clinton ran unopposed and the 2008 SC GOP primary election where the competition was very close (however the record Democratic primary turnout drew away voters from the GOP primary). The SC population is getting older, and the GOP primary voters are getting more conservative. One counter trend is that Ron Paul is running a competitive campaign this time vs. '08 so his dedicated following dominated by younger voters (and the 50ish age group) and independent voters will somewhat offset other prevailing demographic trends. The 2000 election is another kind of anomaly in that the Democratic election was essentially unopposed and one popular candidate was very popular with independents (McCain) while the other candidate was also popular (Bush). So turnout was VERY high for the 2000 GOP primary.

When looking at polls before the election, it can be VERY important to look at the demographics of the people being polled. Many polls do not provide much information.

As a general rule, the less detail you see, or the more minimal the methods you see described to explain how one could reproduce the same result, the more likely there are to be errors in the results. In the case of this blog entry, I have omitted steps, but I will be continually revising this, to add more details, as time permits.

Many polls reported detailed demographics ONLY in the form of crosstabs so that you never see a simple statistic saying how many people over the age 65 were polled or how many people who do not consider themselves Republican were polled.

Methodoligical note: When only crosstabs are provided, I was able to estimate the range of possible values by solving many simultaneous algebraic equations and taking into account that a % without a decimal point (say 25% actually represents a number between 24.5% and 25.4999999...%) and trying both values to determine the maximum and minimum ranges of the final result. I then reduced the range, by using the maximum of ALL minimum values from each permutation of simultaneous equations. Likewise, I used the minimum of all maximum values. Perhaps if I had solved EVERY combination of simultaneous equations I could have narrowed the range a little more, but the range in most cases is reasonably narrow. Probably the midpoint is very close to the true value, but I offer no math at this point to support this notion.

The numbers for each year are either from exit polls, census data, or GOP election results. 2012 data is my best guess based on trends and my comparisons of the different mix of candidates each election.
CategoryPolitico/GWUClemsonYou Gov2012 Actual2012 Estimate200820001996
Poll: Days Before Election3-42-31-300000
Percent Undecided8%20%2%00000
Percent May Change Choice5-34%  00000
Decided Day of Election   17%19%18%9%17%
Decided Day Before or 2 Days Before Election   38%17%16%10%14%
Decided within Last Week    50%47%38%55%
Decided in January   76%75%73%  
Decided prior year   24%25%27%  
Voters or Polled600429759601,166480,000445,499565,704278,183
Voters/Adult Pop Voters   16.5%13.1%13.0%18.8%11.1%
GOP Primary Winner/%Romney 37%Gingrich 32%Gingrich 33%Gingrich 40%?? 30%McCain 33%Bush 53%Dole 45%
GOP 2nd Place/%Romney 30%Romney 26%Romney 29%Romney 28%?? 27%Huckabee 30%McCain 41%Buchanan 29%
Dem Primary Winner/% or GOP 3rd PlacePaul 11%Paul 11%Paul 18%UnopposedObama 55%Gore 92%Unopposed
Dem 2nd Place/% or GOP 4th PlaceSantorum 10%Santorum 9%Santorum 16%No OneClinton 27%
Edwards 18%
Bradley 2%No One
GOP %  71%71%65%78%61%69%
Indy %  27%25%28%18%30%26%
Dem %  2%4%7%5%9%5%
Very Conservative %  45%36%40%34%24%25%
Somewhat Conservative %  44%32%28%34%37%41%
Moderate or Liberal%  11%32%32%31%40%33%
Moderate %   23%22%24%29%25%
Somewhat Liberal %   7%7%5%8%6%
Very Liberal %   3%2%2%3%2%
Male (census data)51%49%53%51%52% (48%)51% (48%)50% (48%)53%
Age 65+ (census data)50% 29%27%26% (18%)?? (17%)25% (16%)23% (16%)
Age 60+ (census data)64%   36% (26%)35% (25%)35% (22%)32% (23%)
Age 45-64 (census data)40% 46%45%43% (35%)??% (35%)40% (31%)37% (28%)
Age 45-59 (census data)27%   33% (27%)32% (27%)30% (25%)28% (21%)
Age 30-44 (census data)8% 16%19%20% (25%)23% (26%)25% (30%)31% (32%)
Age 18-29 (census data)1% 9%9%11% (22%)10% (22%)10% (23%)8% (24%)


Category20/20 InsightMonmouthARGNBC/MaristPPP2012 Actual2012 Estimate
Poll: Days Before Election4-66-91-24-51-300
Percent Undecided4%7%2%8%5%00
Percent May Change Choice   16-45%22%00
Decided Day of Election     17%19%
Decided Day Before or 2 Days Before Election     38%17%
Decided within Last Week      50%
Decided in January     76%75%
Decided prior year     24%25%
Voters or Polled5129636006841540601,166480,000
Voters/Adult Pop Voters     16.5%13.1%
GOP Primary Winner/%Romney 34%Romney 33%Gingrich 40%Romney 34%Gingrich 37%Gingrich 40%?? 30%
GOP 2nd Place/%Gingrich 23%Gingrich 22%Romney 26%Gingrich 24%Romney 28%Romney 28%?? 27%
GOP 3rd PlaceSantorum 15%Santorum 14%Paul 18%Paul 11%Santorum 16%Santorum 17%Paul 20%
GOP 4th PlacePaul 11%Paul 12%Santorum 13%Santorum 14%Paul 14%Paul 13%Santorum 10%
GOP % 69%76%62.5-68.7%75%71%65%
Indy % 30%24%31.3-37.5%21%25%28%
Dem % 1%  4%4%7%
Very Conservative % 41%* 10.5-32.9%41%36%40%
Somewhat Conservative % 38%* 31.7-72.9%35%32%28%
Moderate or Liberal% 21%* 17.7-35.4% 25%23%32%
Moderate %     23%22%
Somewhat Liberal %     7%7%
Very Liberal %     2%3%
Male (census data) 52%52%40-50%53%51%52% (48%)
Age 66+ (census data)    32% 25% (17%)
Age 65+ (census data) 26%   27%26% (18%)
Age 45-64 (census data) 41%   45%43% (35%)
Age 46-65 (census data)    38% 44% (35%)
Age 45+ (census data) 67% 72.7-77.8%  69% (53%)
Age 18-44 (census data) 33% 22.2-27.3%  31% (47%)
Age 30-45 (census data)    22% 21% (26%)
Age 30-44 (census data)     19%20% (25%)
Age 18-29 (census data)    8%9%11% (22%)
* Before weighting due to voting history and other demographics (likely age, sex, race and perhaps party id)


CategoryWe Ask AmericaInsider AdvantageIpsos/ReuterCNN/Time2012 Actual2012 Estimate
Poll: Days Before Election238-114-800
Percent Undecided14%2%10%8%00
Percent May Change Choice   43%00
Decided Day of Election    17%19%
Decided Day Before or 2 Days Before Election    38%17%
Decided within Last Week     50%
Decided in January    76%75%
Decided prior year    24%25%
Voters or Polled988718398505601,166480,000
Voters/Adult Pop Voters    16.5%13.1%
GOP Primary Winner/%Gingrich 32%Gingrich 32%Romney 37%Romney 33%Gingrich 40%?? 30%
GOP 2nd Place/%Romney 28%Romney 29%Paul 16%Gingrich 23%Romney 28%?? 27%
GOP 3rd PlacePaul 14%Paul 15%Santorum 16%Santorum 16%Santorum 17%Paul 20%
GOP 4th PlaceSantorum 9%Santorum 11%Gingrich 12%Paul 13%Paul 13%Santorum 10%
GOP %  100%63-83%71%65%
Indy %   17-37%25%28%
Dem %    4%7%
Very Conservative %    36%40%
Somewhat Conservative %    32%28%
Moderate or Liberal%    32%32%
Moderate %    23%22%
Somewhat Liberal %    7%7%
Very Liberal %    2%3%
Male (census data)    51%52% (48%)
Age 65+ (census data)    27%26% (18%)
Age 45-64 (census data)    45%43% (35%)
Age 30-44 (census data)    19%20% (25%)
Age 18-29 (census data)    9%11% (22%)

IPSOS claims to have weighted results to "South Carolina current population registered voter data by gender, age, education, ethnicity and an eight item political values scale".

Time/CNN claims to have weighted to "reflect statewide Census figures for gender, race, age, education and region of the state." Since older people are much less likely to vote, and virtually no non-whites vote in the GOP primary, it is unclear how much this is a distortion of the actual voting population for this primary.

Friday, January 20, 2012

Tuesday, January 17, 2012

The Unexamined Assumption

I've been recently reading some blogs from very good statisticians and modelers, who've made me question things EVEN MORE than I normally do.

So when I today read two columns here and here about how voters would consolidate if other candidates were gone, I let my skepticism run wild.

These columns (and some of my own forecasts) read data from a poll that that asks 'Who is your second choice candidate' and assume that the voter reading this question was thinking "If for ANY REASON WHATSOEVER, I decided not to vote for the guy I listed a few moments ago as my first choice candidate, when the date and time comes to actually vote, I will vote for person I am about to write down as my second choice."

Even though, in reality, there are many possible scenarios for why and how someone decides to change their mind. It is HIGHLY unlikely that the authors of these two articles, myself when I was making some forecasts, or any randomly chosen pundit, will have considered all of the following and assigned various conditional probabilities to each one (more likely they said it was 100% chance it is scenario Z... Z being which ever one of these or others I didn't write down, appeal to them most).

So, what are some scenarios:

1) Candidate 1st Choice may withdraw from the race (or scenario 1B appear to be unlikely to do well on election day) before I actually vote
and I think the candidate 2nd choice has similar enough values to candidate 1st choice

2) By the time I vote, candidate 1st choice may look highly unlikely to do well in the vote, in which case I will go with candidate 2nd choice because he will likely do well and he is tolerable (and I *may* even tell acquaintances that I voted for him)

Both scenarios are plausible. What percentage of scenario 1 people are in a given cross tab of 1st choice vs. 2nd choice in some poll will vary widely depending on what other characteristics the actual candidate 1st choice and candidate 2nd choice have and the probability that the person will vote for the second choice.

No polls provide us with this information. So each reader makes their own (likely unconscious) decisiion.

Just to make it less abstract, consider that the election is this Saturday's SC primary, Santorum is candidate 1st choice and in scenario 1 candidate 2nd choice is Gingrich whie in scenario 2 candidate 2nd choice is Romney.

Going back to the abstraction, there are other plausible scenarios the pollee had in mind when they made their answer.

3) I don't know anything about any of the other candidates except that I've heard candidate 2nd choice's name a lot, and I feel obliged to vote for someone. I'll chose candidate 2nd choice if at the last minute candidate 1st choice withdraws (or scenario 3B - if I hear at the last minute bad news from TV/my friend/a false rumor in my email or mailbox)

4) I'm tired of waiting for this automated poll to end, and I don't really care about this 2nd choice question, so I'll just give them this answer for 2nd choice quickly so I can finish the poll.

5) Candidate 1st choice is my first choice today, though candidate 2nd choice was my first choice last week, and I might go back to him

6) Candidate 1st choice is actually my significant other's REALLY favorite first choice and he/she may not be able to vote, so I'll be nice and vote it for them, but if they are able to vote, then I will vote for candidate 2nd place because they seem pretty good.

I am sure creative minds can think up a dozen more examples that at least some pollees had in mind when they cast their 2nd choice 'vote' in the poll that has no actual binding requirement/lasting effect on their actual vote.

The point is, don't be so sure you know how voters will change their vote. At least not if you are going to believe this is going to cause a double digit swing between today's polls and Saturday's actual voting.

Sunday, January 15, 2012

GIGO - Garbage In Garbage Out

This is not my usual tone for this blog, but I grow weary of bad polls and bad explanations of the GOP race.

From Wikipedia:
Garbage in garbage out "was coined as a teaching mantra by George Fuechsel,[1] an IBM 305 RAMAC technician/instructor in New York. Early programmers were required to test virtually each program step and cautioned not to expect that the resulting program would 'do the right thing' when given imperfect input. The underlying principle was noted by the inventor of the first programmable computing device design:

On two occasions I have been asked,—'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."


In my years of working with health insurance data, we spent far more time cleaning up garbage data, redefining vague information into something more useful and coming up with models that rationally explain how healthcare services were used before we got to using much more than basic arithmetic. Rarely were T tests, R squared values or more sophisticated statistics used like actuaries or less successful competitors did. You can't put lipstick on a pig.

As I look at the internals of various primary polls for Iowa, New Hampshire and now South Carolina, I am struck by the number of times a poll with seriously flawed input data is touted by the media and used to build a false narrative. The worst case has to be the late December CNN Iowa poll, which surveyed ZERO voters who weren't registered Republicans, despite the undisputed significant voting by non-Republicans in Iowa GOP caucuses. The right % should have been 25%, the same % was demonstrated during the last non-competitive Iowa Democratic caucus (1996) - the same number that would be validated in the 2012 exit polls. Since the media either was ignorant of it, or gave so little warning, and there were no other polls for two days, the 24 hour media trumpted a nonexistent surge by Santorum due to this seriously flawed poll. So today, I read an article about how the evangelical leaders have coalesced around supporting Santorum... someone who would have likely stayed in 6th or maybe reached 5th place in Iowa and then returned home if not for a single flawed, heavily promoted poll.

Today I see an even worse poll, from some organization I've never heard of, that somehow believes that less than 5% of the voters in South Carolina's primary will be less than 40 years old and 55% will be at least 65 years old. For reference, in 2008 35% were older than 60 like in 2000 according to the exit poll. The 2000 exit poll showed that 25% were older than 60.

This poll is inexplicably given a weight rating of 4 bars out of 5 on Nate Silver's otherwise credible forecasting model. Aggregators favor combining as many polls as possible, no matter their quality, hoping that with enough garbage, the various garbage factors will cancel each other out. Some skilled analysts like Nate Silver, attempt to quantify and somewhat discount the garbage by using theoretical formulas about sampling error on a bell curve, applying some likely useful heuristics like the age of the poll, and employing the somewhat controversial, though still likely useful strategy of rating a pollster by how close it's polls come to predicting the actual result.

I guess this is better than nothing, but it is not something we did in the healthcare data analysis industry, nor did any of our competitors. Admittedly, we were spending millions of dollars on these tasks, while the polling aggregators do this for something with at least one or two zeros in their budget.

Still, I can't believe that spending a little effort on trying to adjust the data or at least dump a poll that has such problems as today's horrendously bad SC poll isn't low hanging fruit for these small organizations.

Nate Silver's latest article also repeats a widely spread meme that the South Carolina GOP is home to 60% evangelicals/born agains and as evidence, cites a 2008 exit poll that doesn't even ask this question. Worse yet, the 2000 GOP exit poll shows 34% belong to the religious right (which is not the same as born again/evangelical, but it is the closest I could find). Perhaps some non-exit poll has this 60%, but before I cite that, I want to do a serious look at the internals to figure out if other anomalies exist.

Another false notion:

Too often I hear that South Carolina isn't like New Hampshire where independents are such a factor. Wrong. It is an open primary. In 2000 when there Gore still had some modest competition from Senator Bradley, 39% of voters who identified themselves as Independent or Democratic voted in the GOP primary. This is compared to Iowa's caucus in 2012 where there was no competitive race for the Democrats and 25% of the GOP voters were non-Republicans.

These bad polls due to age distribution and independent voter % can heavily penalize Ron Paul - someone who the establishment wants to discredit, but I'll leave that explanation for another post.

As I look at suspect polling internals, I wonder if the accuracy of the aggregations in Iowa and New Hampshire weren't the beneficiaary of a certain amount of luck, and that a repeat of the NH Dem 08 primary is waiting.

So for the next week I'm going dumpster diving into SC and FL voting/polling data, sifting through the garbage, hoping to bring out some good.

Or for my dyslexic friends: IGOG. Into the Garbage, Out comes Good.

Friday, January 13, 2012

New Hampshire Primary Election Trivia

Over the last 60 years, Romney's 2012 vote total is a relatively high percent of all votes cast combined in the combine primaries in New Hampshire per official vote totals from the New Hampshire secretary of state.

Of the leading NH vote getters, only Henry Cabot Lodge in 1964 and Hillary Clinton in 2008 failed to be nominated by their party. The candidate who was elected President has been in the top 3 every time except in 1992 when Bill Clinton came in 4th with 12.4%.

Year% of ALL Votes Cast for LeaderLeading Candidate2nd Place Candidate3rd Place Candidate
195236.1%Eisenhower27.8% - Taft15.3% - Kefauver
195668.3%Eisenhower26.2% - Kefauver4.6% - Stevenson
196052.7%Nixon36.8% - Kennedy7.5% - Fisher
196426.9%Lodge23.7% - Johnson16.9% - Goldwater
196852.2%Nixon18.4% - Johnson18.1% - McCarthy
197238.9%Nixon20.3% - Muskie16.3% - McGovern
197628.9%Ford28.3% - Reagan12.5% - Carter
198029.0%Reagan20.8% - Carter16.4% - Kennedy
198439.8%Reagan23.6% - Hart16.6% - Mondale
198821.1%Bush (Sr)15.9% - Dole15.7% - Dukakis
199226.9%Bush (Sr)19.1% - Buchanan17.0% - Tsongas
199626.2%Clinton (Bill)19.8% - Buchanan18.4% - Dole
200029.4%McCain19.9% - Gore18.6% - Bush (Jr)
200430.3%Kerry20.7% - Dean18.9% - Bush (Jr)
200821.7%Clinton (Hillary)20.3% - Obama16.9% - McCain
201232.2%Romney19.1% - Paul16.1% - Obama

Comparing Obama's result to past incumbent Presidents running for reelection, all of those who only received about 1/2 the votes of their own party were not reelected, all of those above this mark were reelected.

Year% of PartyCandidate% of All Votes
195698.9Eisenhower68.3
196495.3Johnson23.7
198486.4Reagan39.8
199684.3Clinton26.2
201281.1Obama16.1
200479.8Bush18.9
197267.6Nixon38.9
199253.2Bush26.9
197650.1Ford28.9
196849.6Johnson18.4
198047.1Carter20.8
195243.9Truman12.3

Tuesday, January 10, 2012

My NH Predictions

My Forecast
34.5Romney
22.6Paul
21.7Huntsman
My age distribution post was updated around 2 PM EST as was my Indy vs. Republican Breakdown post.

Live Blog for the election night and an analysis of the quality of my predictions after the fact are here

At this point, I am assuming the last Suffolk poll is my base for predictions. It correlates with Nate Silver's poll and although I am VERY suspicious about how Suffolks 2 day rolling average could change from 39.0 to 34.6 to 33.2 then back up to 37.4 over the past two days (250 voters surveyed per day have a VERY large sampling error) Suffolk is the most recent poll (which matters most in volatile primaries). (I exclude ARG's polls as their quality repuation is not good). The news has been bad for Romney in the past few days, so I can't believe a poll that implies at least an 8 point increase for Romney on Monday vs. Sunday is truly representing the electorate. However, Suffolk has an excellent polling reputation and it provides enough detail to let me do some adjustments where I feel they are needed (vs. polling averages and aggregation models which are so complex I have no idea how to analyze the fundamentals).

For reasons detailed here, I also assume that 56% of the electorate will be "Independent" which in New Hampshire is legally called "Undeclared".

BTW, despite hearing frequently in the media, that Indy's are not a factor in South Carolina, they too have an open primary and the Dems do not have an interesting primary, and moreover, I believe it is held at a later date. So Indy's should be an underrated factor in SC. But more on that after tonight.

I am also adjusting due to age distribution. As detailed here I believe the Suffolk poll is too heavily weighted towards older voters. I have decided to chose the well regarded Marist poll for age distribution in my forecast, though by reputation, I am also enticed by the even more well regarded Selzer poll. However since Selzer only polled NH once, over a month ago, I am not sure if the same quality holds as with their famous Iowa polls.

When the exit polls come out, I will compare this to see whether I guessed right. While I am not satisfied that I fully understand the dependency function (given age, what is the chance you will vote for candidate X), due to time constraints, I will assume that most recent Suffolk poll captures this function well enough. For South Carolina I will have more time to analyze this.

Finally, I realize that these variables aren't really additive. There are co-dependencies. Given the larger uncertainties in these polling adjustments, I feel that this mathematical sloppiness is smaller than the inaccuracy of the underlying data and underlying model. I also don't claim accuracy to 3 significant digits. I use a single decimal point mainly because I think the difference between 2nd and 3rd place could be VERY close.

For the 7.4% undecideds, I will make a gut feel call and give 1% to all the candidates who will get 3% of the vote or less, and 1/3 of the remainder to the candidates who may appeal to the values of the voter, but who are generally perceived by the voter as having no chance to win (e.g. Paul, Gingrich, Santorum) split allocated linearly according to their base polling % (so 1/3 * 6.4 * 17.6/(17.6+10.6+9=1. For the remaining 2/3 of that 6.4% given 2/3 of it to Huntsman who has momentum and 1/3 to Romney who has perception of being the odds on winner).

I wish I had more time to develop a reasonable method for allocating people who change their preference. Suffolk doesn't provide 2nd choice votes and I didn't have time (like I did for Iowa) to look at the PPP poll which does provide that info. We'll try harder for SC. So I went with my gut, which is informed by watching too much cable news.

So how did I get at the numbers at the top?

Last Suffolk PollCandidateIndy Split ChangeAge Distribution ChangeUndecided AllocationLast Minute Candidate SwitchTotal
37.4Romney-1.9+.2+1.4-2.634.5
17.6Paul1.2+1.8+1+1.022.6
15.6Huntsman2.7-1.6+2.8+2.221.7

Varying Age Demographics in NH Polls

Updated to include today's Suffolk poll release: Depending on which age distribution you use for the Suffolk poll, Romney's lead goes from 20.4% (averaged Suffolk) to just 18.2% (Marist) over Ron Paul. Also, Huntsman drops 0.1 (Avg Suffolk), drops 1.6 with Marist and 1.8 with Suffolk (because Huntsman's support is disproportionately with older voters unlike Paul's whose support is disproportionatel with young and middle age voters.)

There is a another article that also looks at the age distribution issue, claiming it favors Romney. What they don't look at, apparently, is more detailed breakdowns (because although the general claims are true, various candidates have other strengths (for example Ron Paul does well in the 45-54 age group). But the article is from National Journal, so name credibility also helps.

I hope to locate exit poll distributions before the polls close, but competing demands for time (I still need to look at undecideds and others who may switch votes... particularly those who switch to Huntsman, before I get back to exit poll analysis. With a 6 hour time zone lag vs. Eastern Standard Time, I guess I won't be getting much sleep tonight. ;-)

***Earlier Post****

According to many folks, pollsters adjust their raw polling data to reflect expected age, sex and geographic demographics. It is not clear how this is done, as I have yet to find a poll that provided this detail (even with polls providing 300 pages of data). If one assumes this is true, I cannot explain why the % of likely voters vs. age groups changes on a daily basis on the highly regarded Suffolk polk as show below:


I also cannot explain why several other major polls vary between each other. The following data is taken from their most recent poll (or in the case of Suffolk, an average of their 9 daily tracking polls).


Nearly all candidates have a different appeal depending on the age of the voter. For some candidates (e.g. Ron Paul) the sensitivity with regard to the age distribution of is quite high (younger voters overwhelmingly prefer him and retirees have some reluctance to choose him). There is even some indication that middle aged voters (45-54, at least within NH) have a strong preference for Ron Paul. Another example is the strong preference of retirees for Newt Gingrich.

Many people talk about debate performance or new commercials as the reason for daily changes in polls as well as the undefined term "momentum". Sometimes one hears or reads (usually a few words or a second or two) about normal random sampling error (inaccurately described as margin of error). But the affect of other deliberate sampling biases like age or voter registration are rarely mentioned. Perhaps this should change.

Monday, January 9, 2012

Are Corporations People?

Public Policy Polling stands out (among other reasons) because they're not afraid to ask unusual questions like What is God's Approval Rating?

So 4 days ago they asked for suggestions for their South Carolina poll and several people asked them to include the Colbert Referendum which I'm happy to see they took us up on.

So we can now answer the existential question:

Do you think that corporations are people or that only people are people?

2012:
67% of South Carolinians think only people are people
33% are so enslaved to corporations, they think corporations are people too.

While 150 years ago:
60% of slaves were counted as people in the Constitution.
That's progress!

80% Age 30-45: Only people are people
38% Age 65+: Corporations are people too
The young may have a heart, but by the time they reach retirement many have to accept that some people aren't human.

47% Age 18-29 - Corporations are people
With young unemployment around 30%, perhaps they misheard the question as "Corporations are the man."

82% of Ron Paul supporters feel that only people are people, but 38% percent of Newt Gingrich supporters are quite content to let corporations be people as long as the paychecks keep coming.

The poll twice asked people who their favorite candidate was. 15 questions after the first time it was asked, they slipped in Steven Colbert's name just to make trouble.

21% of Santorums supporters dropped him by the time they reached the repeated question. That's Santorum momentum.

Colbert finished above two real candidates Huntsman and Roemer: the only GOP candidates to appear on his show. This gives new meaning to the Colbert Bump!

Friday, January 6, 2012

Indies Vital in New Hampshire

Ron Paul and Jon Huntsman attract many more Independent voters in New Hampshire than Mitt Romney, Rick Santorum and Newt Gingrich. So getting the right mix of Independent voters is very important to accurate polling.

To date, only the last Selzer poll over a month ago and a couple of the Suffolk tracking polls have been near the 50% Indy split which should occur on election day. The daily Suffolk tracking polls have been particularly confusing as they have varied from 37% to 48.5% Indy split.

Why is virtually every poll missing the right mix of Independent voters?

In New Hampshire, undeclared voters (Independents) may vote in the GOP primary Tuesday. Currently they make up 42% of all registered voters in the state. Traditionally, the percent of independent voters in a NH primary when both Democratic and Republican primaries are contested is very similar to the percent of registered independent voters. However, when one party has an unchallenged incumbent (like 1996 with President Clinton or 2004 with President Bush), the percent of independents in the opposing parties primary goes way up. See stats below. % of Registered voters comes from state controlled lists (often reported via news services). The % of actual voters in a primary comes from exit polls.

New Hampshire Independent Voter Percentage
Year% of All Voters% of GOP Primary% of Dem Primary
199628%35%No Contest
200038%39%36%
200437.7%No Contest52%
200844.9%37%44%
201240.7%51-56%??No Contest

The range of independent turnout for 2012 GOP Primary is based on linear extrapolation from the 1996 and 2004 primaries with the low number assuming a similar ratio of % in the GOP primary vs. Registered Voters from 1996 while the high number assuming a similar ration of % in the Dem Primary vs. Registered Voters from 2004.

Now, what are pollsters using as their ratio of Independents for next Tuesday's contest? What would the difference be if they had a different overall Indy ratio (based on their own candidate specific results by party breakdown)? See below:

Pollster% of IndependentsPoll's Romney LeadRomney Lead w/56% Indies
NBC/Marist38%20%16%
Magellan39%20%??
PPP42%17%13%
Suffolk41.1%19.8%17.7%
UNH/Boston Globe43%22%19%
UNH/WMUR43%24%22%
ARG44%19%14%
Selzer/Bloomberg53%23%??
RasmussenUNKNOWN24%??
CNN/TimeUNKNOWN27%??
Zogby/Washington TimesUNKNOWN14%??

Although some pollsters provided Indy vs GOP breakdowns at the poll level, some did not also breakdown splits by candidate by Indy vs. GOP so they have a ?? in the left column.

Those with UNKNOWN may have even excluded ALL non-GOP registered voters as CNN has done for every Iowa poll. Take those polls with a HUGE grain of salt.

Monday, January 2, 2012

Iowa Forecast

PPP Published a poll late last night with lots of details to devine movement amongst the candidates before election night on Tuesday. Assuming every moves who isn't committed, and they move to the 2nd choice candidate that they indicated they would, here is how it will turn out.

Candidate PPP Poll % PPP Poll
Likely Voters
Switch Away Switch To Final %
Paul Sunday 21 % 281 -43 29 20.0 %
Paul Weekend 20 % 268 -41 35 19.6 %
Romney Weekend 19 % 255 -56 49 18.4 %
Santorum 18 % 241 -63 62 17.9 %
Romney Sunday 18 % 241 -54 52 17.9 %
Gingrich 14 % 188 -58 38 12.5 %
Perry 10 % 134 -44 65 11.6 %
Bachmann 8 % 107 -27 44 9.2 %
Huntsmann 4 % 54 -14 16 4.1 %
Roemer 2 % 27 -6 12 2.5 %

The overinfluence of the media

From the details in yesterday's PPP Iowa poll:

Saturday vs Sunday, Who do you think will win Iowa?

The only changes were
Romney from 23 to 26%
Santorum from 8% to 11%
Not sure from 31% to 25%

Saturday vs Sunday, Who runs the strongest campaigns?
The only changes were
Santorum from 11% to 16%
Not sure from 32% to 27%

I don't see how the typical voter can form such an opinion about these two questions without being heavily influenced by the media.