This month, I learning machine learning in R through this DataCamp track. In the “Supervised Learning for Classification” course, the second track is Bayesian Methods, my topic for this week. This week’s learnings: Bayesian statistics is based on estimating probabilities based on past information. When one event is predictive of another event, they are considered dependent. For… Read More
Month 2 Master
M2M – May – Update #1
For this month, I am starting a DataCamp track to learn about machine learning fundamentals in R. I have started a course called “Supervised Learning for Classification” with the first topic being KNN models. I originally set out to do one course (also called a module) per week, but instead I’ll take quality over quantity here.… Read More
M2M – May Introduction
For this month, I originally set out to do a machine learning course in R. And I plan to do exactly that. I have identified a DataCamp course to take. Each module of the course is 4 hours. A stretch goal is to complete one module a week, but this may be difficult due to the… Read More
M2M – April Recap
In April, I set out to do weekly analysis on different data sets. I successfully did so. There are my four analyses: First, I looked at the poll trends for the Democratic primary. Second, I looked at “endorsement points” for the Democratic candidates. Third, I looked at Trump’s approval rating over time. Fourth, I looked… Read More
M2M – April – Update #4
This month I set out to complete a separate ad-hoc analysis each week. For this week, I decided to analyze the effectiveness of FiveThirtyEight’s 2018 House elections predictions. However, due to data quality issues, the results are not accurate. Therefore, I will not dwell on the exact numbers but the framework through which I approached the… Read More
M2M – April – Update #3
This month, I set out to complete a separate ad-hoc analysis each week. This week, I chose to dive into President Trump’s approval rating, with inspiration from FiveThirtyEight. First, I grouped the polls by week, to reduce noise in the averages and make for easier plotting. Then I got the weekly average across three dimensions: all… Read More
M2M – April – Update #2
This month, I set out to do weekly ad-hoc analysis from different data sets. This week, I choose to use FiveThirtyEight’s Democratic primary endorsements data. This dataset assigns “endorsement points” based on the position/stature of the person giving the endorsement. I used the “ave” function to get each candidate’s cumulative points over time. endorse_agg$csum = ave(endorse_agg$totalpoints,… Read More
M2M – April – Update #1
This month, I set out to do weekly analysis based on different datasets. For this week, I chose presidential primary polling data from FiveThirtyEight. First, I filtered for Democratic polls in the last month. dems = polls[polls$party == ‘DEM’ & as.Date(polls$start_date, “%m/%d/%y”) >= as.Date(‘2019/03/11’) & polls$state ==”” & polls$candidate_name %in% c(“Bernard Sanders”,”Elizabeth Warren”,”Joseph R. Biden Jr.”,”Pete… Read More
M2M – April Introduction
This month, I originally set out to build on the past three months and build an R visualization from scratch. However, after working with a single data source the last couple months, I am going to take a different approach: I will do doing weekly analysis with a different data source each time. An inspiration… Read More
M2M – March Recap
This month, I set out to dive deeper into my dataset and Tableau dashboard from last month. Ultimately, I ended up going with this option from the list of “potential analysis” topics: Compare voting behavior in 2016 versus 2018 (segment by youth vote, senior vote, high income vote, etc) First I defined age (18-29, 30-44,… Read More