M2M – February – Update #1

My February goal is to make an interactive R visualization using political data. This covers technical skill growth as well a personal interest of mine.

Here is how I am building my dataset:

  • Obtain publicly-available Ohio voter file data to get party affiliation, voting record, congressional district, and zip code for each voter.
  • Obtain publicly-available zip code level income data. Get the average income for each zip code and assume that each voter in the district makes that income.
  • Obtain publicly-available Congressional and state senator representation data. Identify if each voter is represented by a Democrat or Republican.
  • Calculate the number of times each voter has cast a ballot in the last five elections.

I have not yet identified the visualization I am going to build. Using this dataset, I could look at how party affiliation, income, past voting behavior, and opposite party representation affect turnout.

Furthermore, there are “big data” implications for this project. The total size of the raw CSV files involved is over 3GB and may cripple or even crash R Studio locally. I will need to look into optimal ways to load in the data and reduce dimensions to save memory.

Leave a Reply

Your email address will not be published. Required fields are marked *