This portfolio is a compilation of projects that I created for data cleaning/structuring, data analysis, machine learning, or dashboarding.
I developed a chatbot for a used car dealership to automate customer support inquiries after hours. The virtual assistant was crafted to engage customers in real-time, addressing inquiries, streamlining the car-buying experience, and enabling data capture for the sales team. I created a workflow using conversational design software and with that workflow, I called the Open AI to query a custom knowledge base, captured user information and sent a confirmation email using webhooks and automations. This project not only exhibited my capabilities in integrating cutting-edge AI with business processes but also in creating solutions that drive tangible business value. Through this initiative, I further solidified my expertise in conversational AI, automation, and system integration within a dynamic retail environment.
Skills Highlighted:

I developed a real estate price prediction and analysis app for homes that are of interest to me in my immediate area, leveraging various data science techniques to enhance functionality and insights. The core of the app centers around the ElasticNet machine learning model, which I selected by using an algorithm harness including various models like Linear Regression, ElasticNet, and XGBoost Regressor, with a variety of scalers including Standard, MinMax, and Robust to identify the best-performing pipeline. A significant feature of this project was the use of SHAP (SHapley Additive exPlanations) to not only remove outliers but also to understand feature importances, providing a deeper insight into the factors driving housing prices. There were a few instances where I identified data that needed to be cleaned. One real estate nuance I learned, was if a house has a pond in the back, then the square footage of the pond is included in the lot size! There was one house that had a pond with some woods which had a gigantic lot size and was skewing my predictions so I imputed the mean lot size of houses that had sold in a similar range to get a sense of useable backyard space. Through SHAP partial dependence plots, I identified an interaction between living area and latitude implying that as houses are farther north (closer to Tampa Bay) living area has a larger impact on house price.
Skills Highlighted:

My interest in Kiteboarding has increased with almost the same velocity as my interest in data science. In Kiteboarding, you need to have a good weather forecast and I always found myself checking the wind. Took the opportunity to create code that pulls the relevant information from an API for my location and uploaded it to AWS so that I get an email every week for the forecast for the upcoming weekend.
Skills Highlighted:
The goal of this project was to analyze and optimize a client’s customer routing capabilities. Specifically, taking into account key factors by DMA to assess if there were any DMA’s that were converting at a secondary or tertiary location better than primary. I used a variety of factors to filter my results including average monthly flight count and average flight time. I used Selenium to automatically download a file from the US Bureau of Transportation Statistics and incorporate it into my analysis. The analysis reports back the opportunity DMA’s based upon the customer requirements.
Skills Highlighted:
I enrolled in a Kaggle Learning on automating data pipelines. I chose the live gun violence data set and set it to update weekly while I ran a validation check on data types, presence of key inputs, and missing data. I then used ETL to pair down the national data to specifically the number of FL residents injured or killed by month and year.
Skills Highlighted:
The goal of this project was to analyze and identify client’s spend opportunities. We combined corporate productivity metrics with time studies gathered at the sites to validate the initial data set. The initial data set was 500k+ rows and included 24 departments across 5 locations. A follow-up, deeper analysis data set was an additional 500k+ rows and focused on discrepancies in 7 departments. Information was exported from Qlikview as a txt file and uploaded for analysis. I identified the root causes for location differences.
Skills Highlighted:
I was interested in learning best practices for linear regression and wanted to put some of my skills to use. I joined the Kaggle Housing Prices competition and my main focus to was simply learn about and use common linear regression algorithms. Specifically, I focused on Linear Regression (as a baseline), Ridge Regression, Lasso Regression, and Elasic Net. I was able to explain 91.2% of the variance with a RMSE (competition scoring metric) of 0.12117, which put me around the top 28%.
Key Learnings:
Similarly, I was interested in learning best practices with logistic regression. I joined the Kaggle Titanic Competition with my focus on optimizing prediction using a variety of algorithms. I cycled through various classification algorithms to test performance against the data set including Logistic Regression, K Nearest Neighbors, Random Forest, Gaussian Naive Bayes, Decision Trees, Support Vector Machines, and Gradient Boosting with XGBoost. I then tuned hyperparameters to maximize performance. Ultimately, I achieved an accuracy of 81% which landed me in the top 6% of participants.
Skills Highlighted:
The goal of this project was to create a dashboard that focused on tracking and displaying the economic impacts of the COVID-19 pandemic globally. I owned project management, analytics, and automation for this interactive daily dashboard. The dashboard is hosted using ESRI. I identified 50 different publicly available sources covering indicators including leading stock price indices, consumer price index, consumer confidence index, daily commodity prices, yield curves, and interest rates. I fully automated the data pipeline using AWS. Specifically, a multi-layered lambda function with a cloudwatch event timer to save to an S3 bucket and upload through ESRI’s Python API daily. This automation saves 2.5 hours of manual effort daily. The dashboard was pinned on the top of my organization’s COVID-19 webpage and received praise from the Chief Marketing Officer.
Skills Highlighted:

I enrolled in a Kaggle Learning on dashboarding with Notebooks. I chose the live L.A. parking ticket dataset to visualize. I created data validation, visualized the last week’s parking citations using Folium, and trended the count by day, by hour, and by reason using Plotly. I then uploaded a refreshed code to Google Cloud so that it would be on the Kaggle server nightly.
Skills Highlighted: