{ status: "open_to_hire", roles: ["DS", "AI Eng", "MLE"] }

R#hît @nànt#àn

D@ta $c!ent!st & A! Eng!neer

Model trained on 4+ years of real-world data — converging on AI/ML, product analytics & GenAI problems with GCP, AWS, LLMs & real-time inference pipelines. val_loss → 0.0000 ✓

4+
epochs (yrs)
GPT-4o
backbone LLM
60%↓
ETL latency
40%↓
review time
01.

About Me

Data Scientist with 4+ years building production ML systems across e-commerce, non-profit, and enterprise domains. Specializing in LLM-powered applications, product analytics, real-time anomaly detection, and scalable data engineering pipelines using Python, PySpark, GCP, AWS, and Snowflake.

With a background spanning Mechanical Engineering (B.Tech from IIITDM Kancheepuram) to Information Systems (MS from University of Maryland — Smith School of Business), I bring a rare blend of engineering rigor and business context to every data problem.

PythonGCP Vertex AILLMs / RAGMLOpsSQLTableau
0+
Years Experience
0+
ML Projects Deployed
0%
Avg. Efficiency Gain
0+
LinkedIn Connections
📍
United States
Open to Remote & Hybrid Roles
Available
02.

Experience

Data Scientist Consultant

Current

Invision Global Tech Inc

Full-time · United States

Feb 2026 – Present
2 mos
MLPythonData ScienceConsulting

Data Scientist

Community Dreams Foundation

Full-time · Remote

Feb 2025 – Feb 2026
1 yr
  • Built an AI-powered Legal & Compliance Assistant using GPT-4o with a RAG pipeline on LangChain, Pinecone, and ANN search — cutting manual review time by 40%
  • Deployed end-to-end ML pipelines on GCP Vertex AI and Dataproc with automated hyperparameter tuning via MLflow — improving demand forecasting accuracy by 15%
  • Implemented MLOps workflows using Vertex AI, Cloud Build, and GitHub Actions for automated model versioning, drift detection, and CI/CD across environments
  • Built a real-time fraud and anomaly detection system using Pub/Sub, Dataflow (Apache Beam), and XGBoost — reducing undetected fraud by 20%
  • Engineered scalable data pipelines with Dataflow, Composer (Airflow), and BigQuery — cutting ETL latency by 60%
  • Developed LLM-based apps for document summarization and Q&A using OpenAI APIs and Vertex Matching Engine for semantic search
  • Created predictive donor churn models with TensorFlow and Scikit-learn, deployed on Vertex AI with Looker Studio dashboards
GPT-4oLangChainGCPVertex AIXGBoostTensorFlowBigQueryAirflowMLflow

Financial Analyst

The Premiere Group

Full-time · Columbia, MO · On-site

Sep 2025 – Oct 2025
2 mos
Financial AnalysisData Analytics

Technical Consultant – Course Renewal Automation

University of Maryland – Extended Studies

Internship · College Park, MD · Remote

Jan 2024 – Dec 2024
1 yr
SalesforceAutomationMicrosoft Teams Planner

Graduate Assistant

University of Maryland

Part-time · College Park, MD · On-site

Jan 2024 – Dec 2024
1 yr
Data DigitizationCMSResearch

Data Scientist

Kameleon Technologies

Full-time · Chennai, India

May 2021 – Jul 2023
2 yrs 2 mos
  • Engineered a real-time fraud detection pipeline using Neo4j graph database and XGBoost — processing 5M+ daily transactions with an 18% reduction in false positives
  • Built scalable ETL/ELT pipelines on PySpark and AWS (S3, Glue, EMR) to process terabytes of financial data — reducing pipeline processing time by 35%
  • Developed customer segmentation and churn prediction models using ensemble methods — driving a 12–15% improvement in customer retention across key segments
  • Established MLOps practices with MLflow experiment tracking, SageMaker model registry, and automated retraining workflows for production model governance
  • Delivered executive-facing Tableau dashboards for transaction monitoring, KPI tracking, and fraud trend analysis — adopted across operations and risk teams
Neo4jXGBoostPySparkAWSSageMakerMLflowTableauFraud Detection
03.

Skills

Languages & Libraries

PythonSQLRPySparkMATLABPandasNumPyPyTorchTensorFlowScikit-learn
🧠

ML & AI

Machine LearningDeep LearningNLPLLMsGPT-4oRAGXGBoostGenAIA/B TestingCausal Inference
🔧

Data Engineering

ETL / ELTApache AirflowApache BeamDataflowPub/SubDBTMLflowCI/CDGitHub Actions
☁️

Cloud & Infra

GCP Vertex AIAWS SageMakerAzureBigQuerySnowflakeDataprocCloud BuildDocker
📊

BI & Visualization

TableauPower BILooker StudioData Storytelling
🗄️

Databases & Search

Neo4jPineconeLangChainLlamaIndexANN SearchVector DBs
Python
SQL
R
PySpark
MATLAB
Pandas
NumPy
PyTorch
TensorFlow
Scikit-learn
Machine Learning
Deep Learning
NLP
LLMs
GPT-4o
RAG
XGBoost
GenAI
A/B Testing
Causal Inference
ETL / ELT
Apache Airflow
Apache Beam
Dataflow
Pub/Sub
DBT
MLflow
CI/CD
GitHub Actions
GCP Vertex AI
AWS SageMaker
Azure
BigQuery
Snowflake
Dataproc
Cloud Build
Docker
Tableau
Power BI
Looker Studio
Data Storytelling
Neo4j
Pinecone
LangChain
LlamaIndex
ANN Search
Vector DBs
Python
SQL
R
PySpark
MATLAB
Pandas
NumPy
PyTorch
TensorFlow
Scikit-learn
Machine Learning
Deep Learning
NLP
LLMs
GPT-4o
RAG
XGBoost
GenAI
A/B Testing
Causal Inference
ETL / ELT
Apache Airflow
Apache Beam
Dataflow
Pub/Sub
DBT
MLflow
CI/CD
GitHub Actions
GCP Vertex AI
AWS SageMaker
Azure
BigQuery
Snowflake
Dataproc
Cloud Build
Docker
Tableau
Power BI
Looker Studio
Data Storytelling
Neo4j
Pinecone
LangChain
LlamaIndex
ANN Search
Vector DBs
04.

Projects

🎙️

AI Voice FAQ Assistant

Conversational FAQ system powered by Google Gemini Pro with a full RAG pipeline — enabling semantic search over a product knowledge base with real-time voice interaction.

RAG pipelineSemantic searchVoice interface
Gemini ProRAGLlamaIndexPineconeFastAPINLP
🛡️

Real-time Fraud Detection System

Production-grade fraud detection engine using Neo4j graph relationships and XGBoost — processing 5M+ daily transactions with 18% reduction in false positives.

5M+ daily txns18% fewer false positivesGraph-based detection
Neo4jXGBoostPySparkAWSGraph MLStreaming
💊

Gym Aesthetic Trap

NLP research project using LDA topic modeling to analyze online discourse around SARMs and steroid usage — uncovering themes, risk perception patterns, and community sentiment from bodybuilding forums.

LDA topic modelingCommunity sentimentForum analysis
NLPLDAPythonTopic ModelingScikit-learnReddit
📐

Thermal Error ML Modeling

Published Springer research on machine learning compensation strategies for thermal deformation in precision machine tools — achieving state-of-the-art accuracy in error prediction.

Springer publicationOct 2022Precision manufacturing
Machine LearningMATLABRegressionManufacturingSpringer
05.

Education

UMD

Master of Science – Information Systems

University of Maryland

Robert H. Smith School of Business

📍 College Park, MD

GCPRequirements GatheringMLBusiness Intelligence
IIITDM

Bachelor of Technology – Mechanical Engineering (Smart Manufacturing)

IIITDM Kancheepuram

Indian Institute of Information Technology Design & Manufacturing

📍 Kancheepuram, India

PythonMATLABMachine LearningSmart ManufacturingResearch
06.

Certifications

🔷

Neo4j Graph Data Science Certification

Neo4j

Issued Feb 2025
☁️

AWS Certified AI Practitioner (AIF-C01)

Amazon Web Services

Issued 2024
📊

BCG Data Science Job Simulation

Boston Consulting Group × Forage

Issued Feb 2025
07.

Publications

Mathematical Modeling of Thermal Error Using Machine Learning

Springer · Oct 6, 2022

Research on thermal error modeling in machine tools using machine learning algorithms to identify the most effective compensation strategies for linear expansion and deformation caused by heat inputs from internal and external sources.

Machine LearningThermal ModelingManufacturingSpringer
08.

Contact

$model.deploy(env="production", candidate="rohit")

Ready to deploy?
Let's ship something intelligent.

Training complete — now seeking inference in the real world. Open to Data Scientist, AI Engineer, and ML Engineer roles. Whether you have a full-time opportunity or just want to talk about model architectures — my context window is open.

status: "open_to_hire"·DS · AI Eng · MLE