Emil Fine

Personal Website

Python Machine Learning

In this article I provide various examples of using Machine Learning in Python, with a focus on the SciKit library.

Components:

– Principal Component Analysis & Dimension Reduction: Predicting Wine Quality

– Linear, Lasso & Ridge Regressions: Predicting Car Prices

– Mean Shift Clustering: Determining Survivability Chances of Titanic passengers 

Principal Component Analysis & Dimension Reduction

This program (github, data) returns the probability of guessing wine quality from 7 categories using Support Vector Machines, with and without Principle Component Analysis & Dimension Reduction. 

Linear, Lasso, Ridge & Support Vector Machine Models

In this program (github) I apply various models to car metadata to come up with the algorithm that returns a price prediction with the least Root Mean Square Error (RMSE). The models used were Linear Regression, Lasso Regression, Ridge Regression, Support Vector Machine (SVM), and Gradient Boosting (GB) model. I used hyperparameters to optimize the variables in the Lasso, Ridge, GB, and SVM models. The results show that the model with the smallest RMSE was the Gradient Boosting Model with a 97% probability of picking a price point within +/-$1,466. Note: This does not necessarily determine the best model but a starting point as the interpretations of models can be different conclusions.

Price Predictions by Algorithms
Mean Shift Clustering

This program code (github) explores the Mean Shift Clustering method. I will take passenger data from the Titanic and have an algorithm determine how to ‘best’ cluster the data. The program determines this by using the Kernel Density Equation.

Here you can read in more detail how the MSC method works. Below are the results.

You can see that the program clustered the results into 5 groups, with the first 3 containing meaningful insights. 

From Group 0, we see that it had the lowest survival rate of 34% for passengers who were mostly in lower class (between 2 and 3), were male (1 = male, 0 = female), and paid the lowest fee.

From Group 1, we see that it had a much higher survival rate than Group 0. They were closer to 1st class, about evenly split in men and women, but paid almost double of Group 0.

From Group 2 & 3, we see the highest meaningful survival rates of 73%. Those passengers were mostly women (65.3%), and again paid double of the previous Group.

Group 4 only has 3 passengers and is not enough to be significant. But we can see that all 2 men and 1 women survived and paid an average of $512.