Featured Projects
Solutions I've built to solve real-world problems
An AI-powered grocery assistant that bridges personal health data with real-time food decisions. Users upload lab reports and scan product labels to receive personalized verdicts — Safe, Caution, or Avoid — powered by Claude's vision and reasoning.
Key Achievements
Built at the Claude Builder Club Hackathon by Indiana University students
Lab report analysis extracting health markers like cholesterol and allergies
Real-time product scanning via camera with instant Safe/Caution/Avoid verdicts
Voice feedback via ElevenLabs for hands-free shopping and smart alternative suggestions
Technologies
An AI-powered platform that transforms raw CSV/Excel data into rich, multimodal stories using Google Gemini. Features a real-time voice agent for hands-free data exploration, RAG-powered conversations, and parallel image generation woven into narratives.
Key Achievements
Live deployment on Google Cloud Run with full async FastAPI backend
Triple-perspective storytelling (ELI5, Architecture, Analyst) streamed in real-time
Gemini Live Voice Agent via bidirectional WebSocket for conversational data exploration
RAG pipeline with Vertex AI Vector Search (text-embedding-004, 768 dims)
Technologies
A Chrome extension that analyzes legal documents instantly using Google Gemini 2.5 and allows users to ask questions via voice interaction powered by ElevenLabs.
Key Achievements
Built for AI Partner Catalyst Hackathon 2025
Integrated Google Gemini 2.5 Flash for analysis
Real-time voice interaction with ElevenLabs Conversational AI
Seamless browser integration via Chrome Extension
Technologies
A real-time campus event engagement platform built for a 24-hour hackathon. Students earn points by entering event check-in codes, with live leaderboard updates via Server-Sent Events and a JWT-protected admin dashboard for full event management.
Key Achievements
Real-time leaderboard via Server-Sent Events with live connection status indicator
Achievement badges (Top, Rising, Dedicated, Consistent) with week-over-week rank trends
JWT-protected admin dashboard with full event CRUD and QR code generation
Cryptographically secure check-in codes with timestamped attendance logging and CSV export
Technologies
A multi-agent AI orchestration platform that transforms clinical workflows through intelligent document processing. Uses 20+ specialized AI agents working in parallel to analyze medical records, generate SOAP notes, and automate care coordination.
Key Achievements
20+ specialized AI agents for each distinct clinical task
Multi-source data fusion, processes 5+ documents in parallel with temporal analysis
Generates complete SOAP notes automatically from brief clinical notes
Automated referral letters, follow-ups, and patient education materials
Technologies
Architected an end-to-end idea evaluation engine using Flask, Docker, and Qwen LLM via Ollama, implementing the ReAct (Reasoning + Action) framework to simulate iterative reasoning steps for multi-criteria scoring.
Key Achievements
2nd Prize – Hackathon Winner out of 50+ teams
Handled 100+ idea inputs with sub-180ms latency
Hybrid pipeline combining vector similarity with LLM evaluations
Production-ready REST API with input validation and concurrency
Technologies
End-to-End Health Data Processing System
A scalable big data pipeline handling 100,000+ health records using distributed PySpark on AWS EC2. Features automated ML training via SageMaker Autopilot and serverless event-driven processing.
Key Achievements
Processed 100k+ records with distributed PySpark on EC2
Automated ML model training using AWS SageMaker Autopilot
Serverless data pipeline with AWS Lambda & S3 triggers
Discovered 72% population risk via Power BI analytics
Technologies
AWS-Based Automated Data Pipeline
Developed a secure, fully automated web scraping pipeline leveraging Scrapy for data extraction, containerized with Docker, and deployed on AWS ECS Fargate for scalability and reliability.
Key Achievements
Fully automated web scraping pipeline
Real-time monitoring with CloudWatch and SNS
Interactive analytics dashboards with QuickSight
Daily automated workflows with EventBridge
Technologies
Airflow and Docker Implementation
Developed an automated ETL pipeline to extract real-time weather data from the OpenWeather API, transform it using Python, and load it into a PostgreSQL database, processing data for over 10+ locations daily.
Key Achievements
Automated ETL pipeline for 10+ locations daily
Containerized deployment with Docker
Modular DAGs for workflow automation
Seamless analysis and visualization capabilities
Technologies
Built a comprehensive data visualization platform with interactive charts and real-time analytics. Features dynamic dashboards, advanced filtering, and multi-source data integration for comprehensive business intelligence.
Key Achievements
Interactive dashboards with real-time data updates
Multi-source data integration and processing
Advanced filtering and drill-down capabilities
Responsive design with mobile optimization
Technologies
More Projects
Explore More Projects
Check out my GitHub profile for more projects, contributions, and open-source work.
View GitHub ProfileExperience
My professional journey and key achievements
Key Achievements
Supported instruction for two graduate and undergraduate data science courses with a combined enrollment of 150+ students.
Facilitated weekly office hours and graded assignments for DSCI-D595 (Data Science On-Ramp), a graduate-level course covering Scala, Apache Spark, Tableau, NLP, and web scraping for ~30 students.
Technologies Used
Key Achievements
Achieved real-time, cross-system data syncing, as measured by a 100% reduction in manual handoffs and faster reporting cycles, by building data pipelines from PostgreSQL to both PostgreSQL and BigQuery, and implementing CDC using Google Cloud Datastream.
Improved query maintainability and execution reliability, as measured by a 40% drop in SQL runtime errors and smoother dev handoffs, by refactoring legacy SQL into modular SQLx files and integrating them into Python-based BigQuery pipelines.
Technologies Used
Key Achievements
Increased backend system reliability by reducing recurring service failures across 5+ microservices by implementing unit tests and integrating CI/CD pipelines.
Delivered a high-impact user-facing feature by building and integrating a React-based frontend with REST APIs, improving usability.
Technologies Used
Key Achievements
Implemented structured project management for 4 hackathons and 8+ coding competitions, enhancing team work while showing attention to detail in requirements gathering across technical challenges.
Coordinated 10+ member teams for 5 industry speaker sessions and 7 technical workshops, defining clear project scope and success metrics while demonstrating communication, organization, and problem-solving abilities.
Technologies Used
Key Achievements
Led the project 'Predicting Term Deposit Subscription,' showcasing proficiency in data science through extensive research and advanced analytical techniques.
Employed web scraping for comprehensive data acquisition and implemented advanced data cleaning techniques to ensure high-quality datasets.
Technologies Used
Key Achievements
Specialized in Ethical Hacking and Cyber Security, gaining expertise in identifying system vulnerabilities through intrusion evasion, firewall management, and honeypot analysis.
Honed skills in proposing effective mitigation strategies through comprehensive ethical hacking exercises and security assessments.
Technologies Used
Technical Skills
Technologies and tools I use to build scalable solutions
About Me
My journey from curiosity to scalable engineering
I started with a B.E. in Information Technology from Savitribai Phule Pune University, graduating with honors and a specialization in Data Science. Now I’m finishing my Master’s in Computer Science at Indiana University Bloomington (GPA 3.86/4.0), with coursework in Cloud Computing, Advanced Databases, and Applied Machine Learning.
My work sits at the intersection of data engineering and cloud architecture. I build pipelines that move millions of records, warehouses on Medallion Architecture, and AI systems that actually ship. From my first SQL query to deploying RAG pipelines on Vertex AI, the through line has been the same: turn a hard problem into something that scales.
Outside coursework, I’ve competed in and won hackathons, built full-stack data products with real teams, and picked up AWS and Azure certifications along the way. I’m at my best when the problem is messy and the solution has to be clean.
Current Focus
Publications
Implementation Paper: Forecasting Stock Price using Machine Learning
IJARSCT • May 16, 2023
The Review: Forecasting Stock Price using Machine Learning
IJARSCT • May 4, 2023
Education
Master of Science in Computer Science
Indiana University, Bloomington
Aug 2024 - May 2026 • GPA: 3.86/4.0
Bachelor of Engineering in Information Technology
Savitribai Phule Pune University
Aug 2019 - May 2023 • GPA: 8.90/10.00
Certifications
AWS Certified Developer – Associate
Valid until Aug 4, 2027
Azure AI Fundamentals
Issued Jul 22, 2022
Get In Touch
Let's discuss opportunities and ideas
Contact Information
© 2026 Varun Sonawane. All rights reserved.