Welcome to my blog with some of my work. I’ve been working on Data Science projects. You can take a look some of my project works.
Recent Projects
Build A Model That Can Predict Whether a Company Will Go Bankrupt or Not
In this project, I have explored data collected by a team of Polish economists studying bankruptcy. The goal is to build a model that can predict whether a company will go bankrupt or not.
Navigate a file system from the Linux command line Extract data that’s been stored in a JSON file. Address imbalanced data using resampling techniques Expand decision tree model into an entire forest (ensemble model) Use a grid search to tune hyperparameters; and create a function that loads data and a pre-trained model, and uses that model to generate a Series of predictions.
read more
Build a Deep Learning Model to Predict whether a Borrower will Payback Loan or Not
LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. The historical data on loans given out with information on whether or not the borrower defaulted (charge-off). The goal is to build a model that can predict whether or nor a borrower will pay back their loan.
read more
SpaceX Falcon 9 Rocket Launch, Rocket’s First Stage Landing Outcome and Cost Prediction
Data collected from SpaceX Public API and SpaceX Wiki Page, and data processed by data collection API and web scraping. Exploratory Data Analysis (EDA) with data wrangling, data visualization Interactive Visual Analytics with Folium and interactive dashboards with Plotly Dash Logistic Regression, Support Vector Machine, Decision Tree Classifier, and K Nearest Neighbors machine learning models were used to predict the outcome. Link to GitHub Repository
read more
A Deep Learning Model using Keras TensorFlow to Predict Breast Cancer
Deep Learning model using Keras Tensorflow has been built to predict the breast cancer whether it is Malignant or Benign. UCI ML Breast Cancer Wisconsin (Diagnostic) datasets is used.
Exploratory Data Analysis: Get an understanding for which variables are important, view summary statistics, and visualize the data Created a binary classification model using Keras TensorFlow Adam optimizer has been used in this model Model performance has been evaluated. Link to GitHub Repository
read more
Earthquake Damage Prediction in Gorkha, Nepal
In this project, I worked with data from Open Data Nepal to build classification models to predict building damage from the Nepal 2015 Earthquake primarily with data from the Gorkha district.
Collected data by querying a SQL database. Wrangling Data with SQL Built a logistic regression model for classification. Build a decision tree model for classification. Create a horizontal bar chart to find out the Gini Importance. Incorporate ethical considerations into model building.
read more
Housing in Mexico
Worked with a dataset with 21,000 properties for sale in Mexico through the real estate website Properati.com. The goal is to determine whether sale prices are influenced more by property size or location.
Organized information using basic Python data structures. Imported data from CSV files and clean it using the pandas library. Created data visualizations like scatter and box plots. Examined the relationship between two variables using correlation. Link to GitHub Repository
read more
Housing Prices Prediction in Buenos Aires, Argentina
In this project, learned about data wrangling and visualization skills and move from descriptive to predictive data science. The focus is real estate, and created a machine learning model that predicts apartment prices in Buenos Aires, Argentina.
Created a linear regression model using the scikit-learn library. Built a data pipeline for imputing missing values and encoding categorical features. Improved model performance by reducing overfitting. Created a dynamic dashboard for interacting with completed model.
read more
Predicting Air Quality in Dar es Salaam, Africa
Data was collected from querying a MongoDB database of one of Africa’s largest open data platforms openAfrica and built a timeseries model to predict PM 2.5 readings.
Created a wrangle function that will extract the PM2.5 readings from the site that has the most total readings in the Dar es Salaam collection, localize reading time stamps to the timezone for “Africa/Dar_es_Salaam”, remove all outlier PM2.5 readings that are above 100.
read more
Predicting Air Quality in Nairobi
In this project, I have worked with data from one of Africa’s largest open data platforms openAfrica, looked at air quality data from Nairobi and built a timeseries model to predict PM 2.5 readings throughout the day.
Get data by querying a MongoDB database. Prepared time series data for analysis. Created ACF, PACF plot for the data. Built autoregression model, ARMA model. Improved a model by tuning its hyperparameters. Link to GitHub Repository
read more
Predicting Apartment Prices in Mexico City
The focus of this project is to create a wrangle function that takes the name of a CSV file as input and returns a DataFrame, a machine learning model that predicts apartment prices in Mexico City, to show the 10 most influential coefficients for the model
Subset the data in the CSV file and return only apartments in Mexico City (“Distrito Federal”) that cost less than $100,000, Remove outliers by trimming the bottom and top 10% of properties in terms of “surface_covered_in_m2”.
read more
Predicting Building Damage in Kavrepalanchok, Nepal
In this project, data was collected from querying a SQL database. Data was explored and cleaned to build a classification model to predict building damage for the district of Kavrepalanchok. • Wrote a wrangle function that create a “severe_damage” column, where all buildings with a damage grade greater than 3 should be encoded as 1. All other buildings should be encoded at 0, drop any columns that could cause issues with leakage or multicollinearity in your model.
read more
Predictive Model to Predict Bankruptcy of a Company in Taiwan
In this project, I have explored data collected by a team of Taiwanese economists studying bankruptcy. The goal is to build a model that can predict whether a company will go bankrupt or not.
Data extracted from a JSON file. Load and save files using Python Explore some of the features of the dataset, use visualizations to help us understand those features, and develop a model that solves the problem of imbalanced data by under- and over-sampling.
read more
Price of Housing in Brazil
Worked with a dataset of homes for sale in Brazil. The goal is to determine if there are regional differences in the real estate market, and to look at southern Brazil to see if there is a relationship between home size and price.
Combined CSV files of real estate data from Brazil and cleaned the data for exploration. Used data visualization skills to explore more about the regional differences in the Brazilian real estate market.
read more
Using TensorFlow, House Sales Prediction in King County, USA
The dataset contains house sale prices for King County, which includes Seattle. Deep Learning model is used to predict house price using regression.
Exploratory Data Analysis: Get an understanding for which variables are important, view summary statistics, and visualize the data Creating a model using TensorFlow Adam optimizer has been used in this model Evaluating the model performance Link to GitHub Repository
read more