Joseph Helbing
Download PDFMachine Learning Engineer + Data Scientist
About Me
I am the son of a sculptor and a painter. I was raised with a maker's mindset, an approach to building, making, and yes, coding things thoroughly rooted in experimentation and tinkering. We build a thing because we desire it to exist.
I am currently an AI engineer at Imprivata. Previously I was a fellow with the U.S. Digital Corps at the Equal Employment Opportunity Commission's division supporting machine learning and algorithmic based employment discrimination investigations, and before that I worked at the Library of Congress Federal Research Division as a data analyst. Before this, I had a ten year career in Chinese political economy studies, research, and US manufacturing.
I am married to my partner Kara, and we live in Chicago with our son James-Michael and prissy cat Freya. I am an avid boardgame and tabletop game master, prohibition era cocktail aficionado, and 70s era vintage motorcycle enthusiast.
Education
M.A. Computational Social Science
University of Chicago 2023M.A. East Asian Languages and Literatures (Mandarin)
Ohio State University 2013B.A. Political Science and China Studies
University of Illinois at Urbana-Champaign 2011Skills
Experience
AI Engineer
Imprivata 2025 - Present- Rewrote legacy EMR data processing pipeline using Polars lazy frames, reducing processing time from 5 hours to under 2 minutes per ~30GB file across 1000+ daily runs
- Developing agentic analysis system using Pydantic AI, automating security dataset analysis through SQL querying of ephemeral DuckDB instances to triage authentication failures and accelerate incident response
- Built the core analysis platform as a FastAPI service, deployed on Kubernetes via Helm charts and ArgoCD
- Built TensorZero-based LLM gateway for API load balancing and per-customer usage and cost tracking
- Containerized MCP server wrapping AWS Bedrock Knowledge Bases, providing internal company documentation as context for agentic tabular dataset analysis
Data Scientist
US Digital Corps at Equal Employment Opportunity Commission (EEOC) 2024 - 2025- Assisted EEOC investigators on AI and ML related investigations with technical consulting, forensic source code review, and case related testing
- Modernized analytics infrastructure with cloud computing and VM systems for data science tasks
- Assisted EEOC data analysts and statisticians on case investigations by training Language Model based classification systems, network analysis, and other modern data science techniques
- Took ownership of existing codebases of internal analytics tools—upgrading and refactoring with modern best practices to achieve large speedups, limit concurrency issues, and minimize resource footprints, while adapting them to lower cost platforms
- Co-led US Digital Corps NLP working group, managing group administration, facilitating inter-agency knowledge sharing, and training on NLP methods
Highlighted Project: Language Model Based Case Analysis Support
- Fine-tuned bi-directional encoder decoder vectorization models using BERT and modernBERT architectures for free form textual resume classification in support of case investigations for applicant flow analysis
- Used unstructured application text to predict job titles for unhired applicants, using hired applicants' job titles as training data with test-train splits and post training accuracy exploration to assist statisticians in identifying group differences in discrimination investigations
Data Analyst
Library of Congress Federal Research Division 2023 - 2024- Long form research reporting for military and federal clients, including literature review, statistical data analysis, and visualization
- Technical lead on large-scale Natural Language Processing (NLP) project utilizing a combination of AWS GovCloud and local compute based open weights foundation models in a data extraction pipeline from legal document images
- Contributor on dual-use technologies research reports via Chinese source material for US government client
Highlighted Project: Large-Scale Document Processing System
- Architected and implemented an end-to-end pipeline for extracting structured data from 500,000+ military court martial documents across all U.S. military branches
- Architected an advanced document processing pipeline integrating AWS Textract, custom bounding box algorithms, and LLM refinement to extract 60+ variables from heterogeneous military forms, with built-in quality assurance through a purpose-built GUI for sampling based human verification
- Designed and implemented SQL database architecture for efficient storage and retrieval of extracted information
- Led technical training sessions for US Digital Corps NLP working group on LoC standardize form extraction methodology
Research Assistant
University of Chicago Data Science Institute 2022- Developed a web scraper for the Security and Exchange Commission (SEC) EDGAR API to access corporate reports
- Utilized statistical textual matching techniques, XBRL scraping, and open-source pretrained machine learning models to create an information extraction pipeline
Marketing Coordinator / Industrial Sales Manager / China Regional Manager
Paratech Inc 2015 - 2021Marketing Coordinator
(2019 - 2021)- Oversaw the update of Paratech's corporate website, managed marketing materials, and fostered dealer partnerships, focusing on tech integration and staff training
- Launched a new WordPress website, handling design, content, and CMS
- Developed webinar series utilizing YouTube, trained the sales team, and ran live training events in the field broadcast to customers
Industrial Sales Manager
(2017 - 2019)- Led the development of a new industrial sales market, establishing partnerships with distributors and manufacturing representatives
- Produced industrial, military, and maritime marketing materials, including brochures and videos, using tools like InDesign, Photoshop, and DaVinci Resolve
- Contributed on the production floor in product assembly working from engineering drawings during peak times to meet manufacturing targets
China Regional Sales Manager
(2015 - 2017)- Overhauled Chinese operations strategically and organizationally, improving the company's position and navigating away from problematic relationships without harming client networks
- Transitioned local contacts to direct company relationships, retaining all dealers and clients
- Handled contract design and translation between Chinese and English, and directly negotiated partnerships through multiple visits to the country
Founder
SiMple International Inc 2014 - 2015- Founded company selling international telecommunications to exchange students through partnership distribution channels
US Recruitment Manager Beijing and North China
INTO University Partnerships 2013 - 2014- Worked with education consultancy network to recruit students to attend degree programs in the United States in representation of 6 US universities
- Northern China recruitment territory based in Dalian, Liaoning China. Self-planned and executed travel, events, and position responsibilities executed exclusively in Mandarin Chinese
Featured Projects
- RL-ABM Experiments -- Reinforcement Learning Enhanced Schelling Segregation Model
- Cascade -- Data flow and processing framework