Hai Huang

Master's Student of Stevens Institute of Technology

Resume

About Me


I am an Data Science Graduate student at Stevens Institute of Technology and I will be graduated May 2022. I studied Applied Statistics and Data Science at the Pennsylvania State University. I am currently looking for full-time data science/data analyst roles.

I am proficient in Machine Learning, Data Analysis, Visualization, cleaning. I am passionate about learning new techniques.

Education


Stevens Institute of Technology

MS. Data Science                                                       -

GPA: 3.54

Pennsylvania State University

BS. Statistical Modeling Data Sciences         -

BS. Statistics                                                                 -

Work Experience


Russell Reynolds Associates

Data Scientist Intern         -

  • Performed exploratory data analysis, tested significance of new features and applied LDA to perform topic modeling, implemented hyperparameter tuning for LSTM models using Keras
  • Took in LDA transformed data and applied Random Forest and Gradient Boosting Classifier models to make classification
  • Worked on Client Feedback test data, performed data analysis and applied machine learning models to classified comments
Python LDA Gradient Boosting Classifier Random Forest Classifier Keras Tuner

Russell Reynolds Associates

Data Scientist Intern         -

  • Built up classification models with TF-IDF, Random Forest, LDA and LSTM using sklearn and Keras library to predict CEO turnover, which TF-IDF model accomplished 80% accuracy with a 0.54 F1 score
  • Performed exploratory data analysis on 70K Resume data and generated visualizations using Python pandas, matplotlib, and seaborn packages in both Jupyter and Zeppelin notebooks
  • Constructed data transformation pipelines from SQL in Python using SQLAlchemy and sklearn to extract CapIQ data from Microsoft SQL Server
Python LSTM TF-IDF Random Forest Classifier Gridsearch SQL (SQL Server Management Studio)

Agriculture Bank of China

Data Scientist Intern         -

  • Discussed and planned for a recommendation system for wealth management products.
  • Employed LightGBM model and used Gridsearch to tune parameters in Python for an internal machine learning competition.
  • Placed No.6 in the competition (predict how likely customers will purchase products)
Python Gridsearch LightGBM

Academic Projects


Netflix Movie Recommendation

BIA 678 Big Data Technologies Seminar Final Project:    

  • Built ALS recommendation model using Pyspark with Pyspark.ML and mlfow packages on Databricks achieved 1.24 RMSE score
  • Performed data cleaning and data aggregation using Python with Pandas and NumPy packages
  • Github Link
Python Pyspark ALS Databricks

PSU Capstone Project: Estimate the Upper Bound of US innovation

DS 440   Data Sciences Capstone Course:    

  • Collected and performed data cleaning from multiple datasets and performed exploratory data analysis on region, income, education level and numbers of patent
  • Identified the flaws of given data and convinced sponsors to change and let team to find new datasets
Python EDA Pandas

PUBG Finish Placement Prediction

DS 320   Data Integration:    

  • Applied XGBoost, LightGBM and CatBoost algorithms in Python to predict final placement and implemented ensemble learning method with algorithms to improve accuracy
  • Developed and published a website for project by using HTML & CSS
  • Github Link
Python XGBoost LightGBM Catboost

PUBG Matches Analysis & Visualization

DS 330   Visual Analytics for Data Sciences:    

  • Built interactive plots using Plotly package in Python and employed plots in webpage
  • Analyzed the relationship between matches data and death data through different visualizations
  • Github Link
Python Plotly Data Visualization Data Analysis

Home Credit Default Risk

Kaggle Competition:    

  • Achieved top 11% of whole competition with approximately 7100 groups
  • Performed EDA and feature selection on datasets using Python with Pandas and XGBoost packages
  • Developed XGBoost model and tuned parameters in Python to improve accuracy
Python XGBoost Feature Selection

Activities


PSU ASA DataFest

ASA DataFest at Penn State:    

  • Formed team J4Fun and achieved finalist award in ASA DataFest PSU 2018 competition
  • Used Tableau to create data visualization and analyze 2.59GB of data provided by Indeed
  • 2018 Award Link
Tableau Data Visualization