For my time at Capital One, I was on the Data Infrastructure team where I built an internal tool for company analysts. I was also lucky to work on a second project with the machine learning team.
Query Parser (Data Infrastructure Team)
- Built an internal tool for Capital One employees to cross query company databases using Apache Drill and Facebook Presto
- Developed plugin scripts to connect MySQL, PostgreSQL, Snowflake, Redshift, S3, and CSV files to the Drill and Presto code base so that single SQL queries can adapt to and scrape all databases and return a unified result set to the user
- Wrote shell scripts to automate the entire Drill setup on an AWS elastic compute cloud (EC2) server
Jira Bot (ML Team Project)
- Built an internal Slack Bot that connects Capital One employees based off their skillsets and project similarities
- Used an unsupervised machine learning algorithm, tf-idf, to cluster correlated projects so developers could reach out for advice
- Compared results to the Paragraph Vector neural network model and determined that tf-idf was better suited for Jira use cases
- Python, GenSim, tf-idf, Paragraph Vector Model, Doc2Vec