Data Scientist at
Prodapt Solutions
2022
-
2026
• Designed, developed, and optimized enterprise-scale ETL pipelines using Informatica PowerCenter on AWS EC2 to support Business Intelligence (BI) and data warehouse analytics, processing large-scale Fact and Dimension datasets with robust validation, transformation logic, and data quality checks
• Managed AWS data infrastructure (EC2, S3, Glue, Redshift) to support scalable data ingestion, transformation, and storage for analytics and machine learning workflows.
• Performed advanced analytical querying using PostgreSQL (joins, aggregations, window functions) for data validation, reconciliation, and resolving data discrepancies to ensure data accuracy across datasets.
• Engineered ML-ready datasets through data preparation, exploratory data analysis (EDA), feature engineering, and statistical analysis to support predictive analytics and revenue forecasting use cases.
• Developed predictive models using Python (Random Forest) to identify missing files and implemented anomaly detection models using Isolation Forest and SQL to detect data spikes, deviations, and inconsistencies across ODS, Fact, and Dimension layers.
Built an LLM-powered Retrieval-Augmented Generation (RAG) chatbot for enterprise document Q&A using document ingestion pipelines, chunking strategies, embeddings, FAISS vector database, and prompt engineering for semantic retrieval.
• Applied Python libraries (NumPy, Pandas, SciPy) for data manipulation, preprocessing, and exploratory analysis across large datasets.
• Evaluated classification model performance using metrics such as confusion matrix, ROC-AUC, and precision/recall.
• Designed and published 10+ interactive dashboards using Tableau and Excel, leveraging dynamic parameters and user-driven filters to visualize trends, predictive insights, and anomalies for business stakeholders.
• Applied business rules and statistical logic to generate KPI-driven analytical outputs, supporting operational monitoring and strategic decision-making.
• Collaborated with data engineers, analysts, and stakeholders to deliver scalable, production-ready data solutions and communicate insights to both technical and non-technical audiences.