A real-world client-facing project with real loan data

· 1. Introduction
· 2. Data Cleaning and Exploratory Analysis
· 3. Modeling
3.1 Preprocessing
3.2 Model Selection
3.3 Model Optimization
· 4. Conclusions
· References
· About Me

1. Introduction

This project is part of my freelance data science work for a client. There is no non-disclosure agreement required and the project does not contain any sensitive information. So, I decided to showcase the data analysis and modeling sections of the project as part of my personal data science portfolio. The client’s information has been anonymized.

The goal of this project is to build a machine learning model…

An End-to-end Machine Learning Project with Real Bank Data

Table of Content

· Introduction
· About the Dataset
· Import Dataset into the Database
· Connect Python to MySQL Database
· Feature Extraction
· Feature Transformation
· Modeling
· Conclusion and Future Directions
· About Me

Note: If you are interested in the details beyond this post, the Berka Dataset, all the code, and notebooks can be found in my GitHub Page.


For banks, it is always an interesting and challenging problem to predict how likely a client is going to default the loan when they only have a handful of information. In the modern era, the data science teams in the…

Zhou (Joe) Xu

Data Scientist at Chapeau AI. I am an aspiring technologist, a quick learner, and a problem solver with engineering backgrounds.

