to do list

http://stats.stackexchange.com/questions/15784/how-can-i-simulate-census-microdata-for-small-areas-using-a-1-microdata-sample

we use supercomputers to forecast the weather. why not public policy?

if they can do it with the universe, surely we can do it with our country: http://www.nature.com/nature/journal/v509/n7499/full/nature13316.html

make 100 difficult predictions. e-mail them to reporters? no action now: i will contact you in a year and remind you of this e-mail.

use PCA in the synthetic model? has to be survey-weighted.

the world needs small area estimation of climate change. “this is what will happen in my backyard”

training has always been in healthcare, so applying this broader model to answer health-related questions might be a safe way to start. here are five simulations: http://www.academyhealth.org/Training/ResourceDetail.cfm?ItemNumber=14458

does the first derivative matter? your healthcare costs might not depend on your healthcare costs last year as strongly as they depend on the rate of change over the past five years

are the records of publicly-listed companies available? if so, you need to incorporate them in your scraping & synthetic firm calculations as well

how should the results be presented? a locality-based lookup? https://github.com/impunation/presentation

http://hagutierrezro.blogspot.mx/2017/04/small-area-estimation-101.html

the compression and extraction of survey and census data

acs 2010 has population by everything for 1%

what if you instead condensed the 2010 census down to three margins

margin 1: population by linear age margin 2: population by sex margin 3: population by state

[ linear ages 0-85 ] x [ male + female ] x [ 50 states + DC ]

can be stored in ( 86 x 2 x 51 ) = 8,772 records and four columns: age, sex, state, value

and you also stored the correlation coefficients and regression coefficients containing the relationships between these three variables

if you then expanded those 8,772 records into a synthetic population of 309.3 million total americans, how bad would it be?

how many margins do you need AND what type of regression coefficients need to be stored for the synthetic population to have a negligible amount of information loss?