Predictive Analysis

This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). The prediction task is to determine whether a person makes over $50K a year. The methods we will be using in this project to predict income will be Logistic Regression and Decision Tree.

Section Title

This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.

50K.JPG
50K_1.JPG
50K_1.JPG

This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.

50K_2.JPG

Logistic Regression

I split data into test and training set: 70% vs. 30%. 

Accuracy of this model using all predictors is 82.7%, which is fairly good. There are a lot of confounding variables in this dataset. After removing confounding variables, we only have relationship and years of education left as variables. Accuracy of this model of using only relationship and years of education as predictors is 81.4%, which is very close to using most of the variables in the dataset to predict income. We also tried predicting this model using only sex and years of education, but the accuracy of this model is only at 76.6%. Below are the conclusions from the model using only relationship and years of education as predictors. People with more than 10 years of education are 21 times more likely to make more than 50K than people who had 5 or less years of education. People who are in the husband relationship status are 11 times more likely to make more than 50K a year than people who are unmarried. People who are in the wife relationship status are 13 times more likely to make more than 50K a year than people who are unmarried. People who are in the Not-in-family relationship status are 30% more likely to make more than 50K a year than people who are unmarried. People who are in the own-child relationship status are 79% less likely to make more than 50K than people who are unmarried.