top of page

Portfolio

Predicting Operating condition
of Water Pumps in Tanzania

Primary tools: Python, Pandas, scikit-learn

Tanzania is a developing country with a large demand for water. Because of this demand, waterpoints have been established all across the country. Many of these pumps are non-functional or need repairing. It is crucial to be able to determine when a pump might need replacement or repair so that the people who rely on these waterpoints are safe. The goal of this project was to use real data to construct a model using ensemble methods that could, as accurately as possible, predict the functionality status of water pumps in Tanzania based on select features.

Predicting Real Estate Prices

in King County, WA

Primary tools: Python, Pandas

The scope of this project was to use multiple linear regression to predict the price of houses in King County. Using a dataset on King County Housing data, the data was cleaned and explored before attempting to optimize a model and interpreting the results.

1_xEOInDwizj2UIkCkh0jq1Q.png

Building a Recommender System

using MovieLens

Primary tools: PySpark

For those of us constantly accessing the web, we are all too familiar with Recommender Systems. They are all around us and are either those pesky ads that are listening in on our conversations or a holy grail for new information or purchases. Love them or hate them, these recommender systems are widespread for a reason. These systems have played an integral part in the success of many large online businesses such as Amazon, Google and Spotify and are providing some of these companies up to 30% of their total revenue.

​

The goal of this project is to build a basic recommender model to get more familiar with aspects of Machine Learning. To achieve this goal, the popular online movie dataset from MovieLens has been employed to make movie recommendations to users based on how they rate certain movies. The MovieLens dataset contains over 100,000 real world ratings for many different films and many different users. The dataset was produced by having volunteers rate movies on a scale from 1-5 based on how they enjoyed the movie.

bottom of page