Hello, World.

I'm Shrumi Dedhia.

Data Analyst Agentic AI Builder Cloud Architect

More About Me

ABOUT

Let me introduce myself.

Profile Picture

Hi, I'm Shrumi. I hold a Master's degree in Computer Science from George Washington University and currently work as a Data Analyst at Zane Networks, LLC, where I design and scale cloud-native data platforms across healthcare and enterprise environments. My work spans the full data engineering stack — from building real-time streaming pipelines with Spark and Databricks, architecting modern data warehouses on GCP and AWS, to engineering agentic AI systems using LangChain, MCP servers, and LLMs. I'm passionate about bridging production-grade data engineering with the emerging world of GenAI to build intelligent, scalable, and reliable data ecosystems.

Profile

Through diverse experience spanning Data Analysis, Data Engineering, Cloud Architecture, and Agentic AI across healthcare and enterprise environments, I have built production-grade data platforms on GCP and AWS. With a Master's in Computer Science from George Washington University, I bring deep expertise in Python, Spark, Databricks, dbt, and LLM-powered systems, delivering scalable and intelligent data ecosystems.

  • Fullname: Shrumi Hasmukh Dedhia
  • Hobbies: Global Travel & Cultural Exploration, Community Volunteering, Dance & Creative Arts, Fitness, Culinary Experimentation
  • Aspiring Job Roles: Data Engineer, AI Engineer, Sr. Data Analyst
  • Location: Washington, District of Columbia, USA
  • Email: shrumi.dedhia@gmail.com

Skills

I have strong expertise in Python, SQL, and R, with hands-on experience in cloud data engineering across GCP and AWS, distributed computing with Spark and Databricks, and agentic AI development using LangChain and MCP servers.

  • 90%
    Python
  • 95%
    SQL
  • 85%
    GCP
  • 75%
    Apache Spark
  • 65%
    Agentic AI
  • 80%
    Databricks
  • 75%
    R

RESUME

More of my credentials.

Work Experience

Data Analyst

July 2024 - Present

Zane Networks LLC

Engineered production-grade data warehouse pipelines on BigQuery processing 5M+ daily records, reducing query execution time by 40% through partitioning, clustering, and slot reservation. Architected a GCP-native ingestion platform integrating MySQL with Pub/Sub and Cloud Functions for CDC-based incremental loading, improving dashboard latency by 55%. Built HIPAA-compliant patient pipelines on Databricks with PHI masking and automated schema validation, reducing manual effort by 75%. Developed distributed Spark workflows leveraging Delta Lake for ACID-compliant versioned storage across the enterprise lakehouse. Implemented dbt on Snowflake modularizing SQL logic across 40+ models with medallion architecture. Deployed containerized workloads on GKE, designing modular ELT pipelines that reduced compute costs by 30%. Built a Gemini-powered text-to-SQL agent integrated with BigQuery and Looker enabling audit report generation via natural language, reducing manual SQL requests by 45%.
Tech stack: Python, GCP, BigQuery, Pub/Sub, Databricks, Spark, Delta Lake, dbt, Snowflake, Vertex AI, GKE, Airflow, FHIR API, Gemini, Looker, Jenkins

Sr. Data Analyst

Mar 2025 - Sept 2025

George Washington University

Engineered a production R package using data.table for high-performance in-memory transformations, published via pkgdown and deployed with automated CI through GitHub Actions. Designed and maintained a PostgreSQL database with indexed schemas and query optimization, integrating synthetic dataset generation for pipeline validation. Engineered an MCP server in Python exposing PostgreSQL, statistical outputs, and R Markdown generation as standardized tools, enabling a LangChain-orchestrated Claude agent to autonomously generate publication-ready narrative reports, reducing manual report writing time by 70%.
Tech stack:Python, R, PostgreSQL, data.table, pkgdown, GitHub Actions, Shiny, ggplot2, LangChain, MCP Servers, LLM

Data Analyst Intern

Jan 2024 - June 2024

Zane Networks LLC

Automated data ingestion workflows using Python to handle cleaning, validation, and transformation across healthcare datasets in BigQuery, reducing manual preparation effort and improving data quality for downstream analytics. Developed tailored SQL queries and supported Tableau and Looker dashboard builds for client compliance reporting. Assisted in building ETL pipelines on GCP and contributed to statistical analysis using Python (SciPy, NumPy), supporting a 20% improvement in operational efficiency.
Tech stack: Python, SQL, BigQuery, PowerBI, Tableau, Looker, Excel

Graduate Teaching Assistant

August 2023 - May 2024

Computer Science Department, George Washington University

Supervising a class of Cloud Computing, Database Systems and Projects of more than 60 students, conducting lab sessions on Oracle, grading assignments, holding office hours. Assisting with creation of new material, and evaluating AWS lab assignments and quizzes.
Tech stack: AWS, Google Cloud Platform, Oracle, Relational Databse

Graduate Research Assistant

May 2023 - August 2023

Prof. Gina Adam - Adam Group, George Washington University

Developed a Python-based mental health support chatbot, utilizing NLP techniques such as sentiment analysis. Independently leveraged TensorFlow for conversational AI, focusing on quality real-time assistance to over 1,000 users monthly with enhanced problem-solving accuracy.
Tech stack: NLP, Deep Learning, Artificial Intelligence, Python, Machine Learning

Data Scientist

September 2021 - July 2022

CDP India Pvt. Ltd.

Implemented ARIMA for time-series forecasting of railway traffic, and reducing data processing time to 84%. Led the development of data pipeline for Mumbai’s railway sector, effectively maintaining a system to manage over 10TB of data daily for 1M+ commuters using AWS and Kafka, improving real-time data analysis. Leveraged Power BI for dashboard creation and conducted exploratory data analysis to uncover insights. Applied agile methodologies to facilitate collaboration, prioritize tasks, and deliver data-driven solutions.
Tech stack: AWS, Python, Spark, Kafka, Machine Learning, Power BI, Time-Series Data

Machine Learning Intern

August 2021 - October 2021

Kubix Square

Built and fine-tuned a machine learning model and implemented six different machine learning algorithms namely Gaussian Naïve Bayes, SVM, CNN, KNN, Linear Regression, and Random Forest to test and improve accuracy of sales predictions.
Tech stack: Python, Machine Learning, Tensorflow, Keras, PyTorch, Pandas, Numpy, Data Visualization, Feature Extraction

Data Scientist & Business Analyst Intern

February 2021 - April 2021

Sparks Foundation

Implemented predictive analytics on real-world banking data; applied computational methods, increasing accuracy in anticipating results to 84.5%.
Tech stack: Python, Google Cloud Platform, SQL, RDBMS, Machine Learning

Data Scientist Intern

January 2020 - December 2020

Dalal Finserve Pvt Ltd

Engineered database for 750+ customers, enabling loan credibility analysis and predictive modelling with 88% accuracy through feature engineering and hyperparameter optimization. Conducted statistical analysis, cleaned, preprocessed, and analyzed, resulting in improved data accuracy and model performance. Worked with cross-functional teams to develop dashboards and visualizations using Tableau and Power BI to communicate data insights and key performance metrics.
Tech stack: MySQL, SQL, Python, Microsoft Azure, Tableau, Machine Learning

Education

Master's Degree

August 2022 - May 2024

George Washington University, Washington DC

Completed Master’s in Computer Science (May 2024) at George Washington University with a focus on Data Analysis and Machine Learning; GPA: 3.91. Coursework includes Big Data & Analytics, Data Mining, Advanced Software Paradigms, Algorithms, Applied Machine Learning, Information Retrieval Systems.

Bachelors Degree

August 2018 - May 2022

University of Mumbai, India

Completed Bachelor of Engineering in Information Technology (May 2022) at St. Francis Institute of Technology, University of Mumbai; GPA: 3.6. I took up courses like Software Engineering, Cloud Computing, Artificial Intelligence, Business Intelligence, Python, SQL, Java, UI/UX. These courses helped me to develop skills required to work in industry.

Portfolio

Check Out Some of My Works.

As an aspiring Data Scientist and Machine Learning enthusiast, I am driven to leverage the power of data to extract valuable insights and develop intelligent solutions. With a solid foundation in data analytics and machine learning, I have spent two years in the industry honing my skills in Python and utilizing popular libraries and frameworks such as NumPy, pandas, scikit-learn, and TensorFlow. From exploratory data analysis to building predictive models, I am passionate about uncovering patterns and trends to drive data-informed decision-making. Keeping pace with advancements in the field, I am eager to explore new techniques and architectures in data science and contribute to innovative projects. With a keen eye for detail and a collaborative mindset, I am committed to delivering impactful results and am excited to bring my expertise to new opportunities in the field of data science and machine learning.

Computer Science Projects

Taxi Fare Prediction

Big Data Analytics -- Github Link

Deployed a pipeline using AWS to create a model with 75 million rows of data. Employed CodeBuild for dataset operations, Sagemaker for model construction, and CloudWatch for continuous performance monitoring, all orchestrated through an S3 bucket for efficient data flow.

Clinical Dashboard in R

R Programming

Developed an R Shiny application to visualize patient health trends and clinical study metrics, enabling real-time monitoring for researchers. Built custom R functions and packages to automate data cleaning, transformation, and statistical analysis for structured and unstructured datasets.

Email Phishing Attack Detection

Machine Learning

Developed deep neural network models (LSTM, RNN, CNN) for email phishing detection, surpassing baseline performance and demonstrated the superiority of the CNN model through comparative analysis using the Enron dataset. Also, published research paper got accepted at HCII 2024.

Email Sender System

Software Engineering -- Github Link

Built a web application with React, Node.js, MySQL, and REST API, allowing users to compose and send emails conveniently while storing message data in a MySQL database.

Microprocessor System

Computer System Architecture -- Github Link

Developed a microprocessor system design using openjdk@18 and Maven. Used swing for GUI, and tested with Maven.

GW Meals

Database Management System

Developed an application in APEX Oracle for GW Meal Delivery System. Created Chen-style ERD Diagrams, Relational Schemas, SQL Scripts and Frontend of the application.

Personality Predictors

Machine Learning -- Github Link

Built a ML based project to predict personality of a person using Myers Briggs Type Indicator and published a paper on same in IEEE 2nd Asian Conference on Innovation in Technology. Classified 8000 rows of data into 16 types of personalities and generated a bert base model using Python. Implemented real-world use cases, applied cross-entropy techniques to refine models’ accuracy, and predicted personalities using actual tweets from Twitter.

Published Paper Link

Skin Care Website

Web Development -- Github Link

Independently developed a full-fledged skincare e-commerce website from scratch, encompassing both frontend and backend components. Using HTML, CSS, PHP, JavaScript, JSON, and XML, crafted a visually appealing interface and seamlessly connected it to a MySQL database via PHPMyAdmin for efficient data management.

Smart Water Level Detection System using IEEE 802.15

Internet of Everything

Built a water level detector using Arduino UNO, IEEE 802.15, Sensors and Actuators. Configured the system to cut off the water supply automatically once the tank is full.

25

Projects Completed

90

Sleepless Nights

4

Internships Completed

120

Crazy Ideas

1500

Coffee Cups

2000

Hours

Contact

I'd Love To Hear From You.

Call Me At

Phone: (+1)202 569 9548