S
sanjayas's photo
Sanjay Sekar
From India 07:13 PM (GMT+05:30)
$24/hr or $30,000/yr

Active over a week ago


Member since May 2026

Share this profile:

Data Scientist

Data Scientist
Available for hire
Years of experience
4+ years
Experience level
Mid-level
Available for
Full-time
Available from
23 Jun 2026

Data professional close to 4 years of experience in data engineering and machine learning, building scalable ETL pipelines, analytics platforms, and ML-ready datasets in enterprise environments. Proficient in Python, SQL, Informatica PowerCenter, AWS, and Tableau for developing data pipelines and analytics solutions. Hands-on experience in Regression, predictive modeling, and ensemble tree models and anomaly detection, and LLM-powered Retrieval-Augmented Generation (RAG) systems using embeddings and FAISS vector databases. Recognized as Value Champion of the Quarter for delivering scalable, high-quality data solutions.

Employment History

Data Scientist at Prodapt Solutions 2022 - 2026
• Designed, developed, and optimized enterprise-scale ETL pipelines using Informatica PowerCenter on AWS EC2 to support Business Intelligence (BI) and data warehouse analytics, processing large-scale Fact and Dimension datasets with robust validation, transformation logic, and data quality checks • Managed AWS data infrastructure (EC2, S3, Glue, Redshift) to support scalable data ingestion, transformation, and storage for analytics and machine learning workflows. • Performed advanced analytical querying using PostgreSQL (joins, aggregations, window functions) for data validation, reconciliation, and resolving data discrepancies to ensure data accuracy across datasets. • Engineered ML-ready datasets through data preparation, exploratory data analysis (EDA), feature engineering, and statistical analysis to support predictive analytics and revenue forecasting use cases. • Developed predictive models using Python (Random Forest) to identify missing files and implemented anomaly detection models using Isolation Forest and SQL to detect data spikes, deviations, and inconsistencies across ODS, Fact, and Dimension layers. Built an LLM-powered Retrieval-Augmented Generation (RAG) chatbot for enterprise document Q&A using document ingestion pipelines, chunking strategies, embeddings, FAISS vector database, and prompt engineering for semantic retrieval. • Applied Python libraries (NumPy, Pandas, SciPy) for data manipulation, preprocessing, and exploratory analysis across large datasets. • Evaluated classification model performance using metrics such as confusion matrix, ROC-AUC, and precision/recall. • Designed and published 10+ interactive dashboards using Tableau and Excel, leveraging dynamic parameters and user-driven filters to visualize trends, predictive insights, and anomalies for business stakeholders. • Applied business rules and statistical logic to generate KPI-driven analytical outputs, supporting operational monitoring and strategic decision-making. • Collaborated with data engineers, analysts, and stakeholders to deliver scalable, production-ready data solutions and communicate insights to both technical and non-technical audiences.

Education

Bachelor of Vocational Information Technology at Bishop Heber College 2019 - 2022