About Me

I am an Data Science Graduate student at Stevens Institute of Technology and I will be graduated May 2022. I studied Applied Statistics and Data Science at the Pennsylvania State University. I am currently looking for full-time data science/data analyst roles.

I am proficient in Machine Learning, Data Analysis, Visualization, cleaning. I am passionate about learning new techniques.

Education

Stevens Institute of Technology

MS. Data Science August 2020 - May 2022

GPA: 3.54

Pennsylvania State University

BS. Statistical Modeling Data Sciences September 2015 - May 2020

BS. Statistics September 2015 - May 2020

Work Experience

Russell Reynolds Associates

Data Scientist Intern January 2022 - Present

Performed exploratory data analysis, tested significance of new features and applied LDA to perform topic modeling, implemented hyperparameter tuning for LSTM models using Keras
Took in LDA transformed data and applied Random Forest and Gradient Boosting Classifier models to make classification
Worked on Client Feedback test data, performed data analysis and applied machine learning models to classified comments

Python LDA Gradient Boosting Classifier Random Forest Classifier Keras Tuner

Russell Reynolds Associates

Data Scientist Intern June 2021 - August 2021

Built up classification models with TF-IDF, Random Forest, LDA and LSTM using sklearn and Keras library to predict CEO turnover, which TF-IDF model accomplished 80% accuracy with a 0.54 F1 score
Performed exploratory data analysis on 70K Resume data and generated visualizations using Python pandas, matplotlib, and seaborn packages in both Jupyter and Zeppelin notebooks
Constructed data transformation pipelines from SQL in Python using SQLAlchemy and sklearn to extract CapIQ data from Microsoft SQL Server

Python LSTM TF-IDF Random Forest Classifier Gridsearch SQL (SQL Server Management Studio)

Agriculture Bank of China

Data Scientist Intern May 2019 - August 2019

Discussed and planned for a recommendation system for wealth management products.
Employed LightGBM model and used Gridsearch to tune parameters in Python for an internal machine learning competition.
Placed No.6 in the competition (predict how likely customers will purchase products)

Python Gridsearch LightGBM

Academic Projects

Netflix Movie Recommendation

BIA 678 Big Data Technologies Seminar Final Project: Fall 2021

Built ALS recommendation model using Pyspark with Pyspark.ML and mlfow packages on Databricks achieved 1.24 RMSE score
Performed data cleaning and data aggregation using Python with Pandas and NumPy packages

Github Link

Python Pyspark ALS Databricks

PSU Capstone Project: Estimate the Upper Bound of US innovation

DS 440 Data Sciences Capstone Course: Spring 2020

Collected and performed data cleaning from multiple datasets and performed exploratory data analysis on region, income, education level and numbers of patent
Identified the flaws of given data and convinced sponsors to change and let team to find new datasets

Python EDA Pandas

PUBG Finish Placement Prediction

DS 320 Data Integration: Fall 2019

Applied XGBoost, LightGBM and CatBoost algorithms in Python to predict final placement and implemented ensemble learning method with algorithms to improve accuracy
Developed and published a website for project by using HTML & CSS

Github Link

Python XGBoost LightGBM Catboost

PUBG Matches Analysis & Visualization

DS 330 Visual Analytics for Data Sciences: Fall 2019

Built interactive plots using Plotly package in Python and employed plots in webpage
Analyzed the relationship between matches data and death data through different visualizations

Github Link

Python Plotly Data Visualization Data Analysis

Home Credit Default Risk

Kaggle Competition: Summer 2018

Achieved top 11% of whole competition with approximately 7100 groups
Performed EDA and feature selection on datasets using Python with Pandas and XGBoost packages
Developed XGBoost model and tuned parameters in Python to improve accuracy

Python XGBoost Feature Selection

Activities

PSU ASA DataFest

ASA DataFest at Penn State: Spring 2018

Formed team J4Fun and achieved finalist award in ASA DataFest PSU 2018 competition
Used Tableau to create data visualization and analyze 2.59GB of data provided by Indeed

2018 Award Link

Tableau Data Visualization

Hai Huang

Master's Student of Stevens Institute of Technology

About Me

Education

Work Experience

Academic Projects

Activities

Contact