Shristi Shrestha
Passionate about the entire data lifecycle, from engineering robust software solutions to turning data into impactful solutions that tackle real-world challenges.
About
I'm a data enthusiast with a passion for solving real-world problems through data and software. With hands-on experience across data science, analytics, and visualization, I love translating complex datasets into actionable insights.
Data Enthusiast | Passionate About Turning Data into Impactful Solutions
- Languages: Python, R, SQL, Bash, SAS, Java, HTML
- Databases: MSSQL, MySQL, Oracle, Snowflake
- Methodologies/Concepts: Agile, Statistical Analysis, Hypothesis Testing, Regression Analysis, Machine and Deep Learning
- Tools & Technologies: RStudio, VS Code, PyCharm, Jupyter, Toad, Kibana, Grafana, Tableau, Excel, Matplotlib, JIRA, Git, Bitbucket, Maven, Gradle, Jenkins, NumPy, Pandas, scikit-learn
Experience
- Worked in IBM Cloud Object Storage (COS) System Integration team to develop Python automation test scripts for Cloud Storage solutions.
- Collaborated with cross-functional teams to investigate software defects across the IBM COS application.
- Proactively adapted to shifting priorities and effectively managed multiple competing projects, consistently delivering high-quality results on time.
- Involved in peer code review of GitHub PRs and Confluence Wiki documentation of application features.
- Actively engaged in Agile practices such as stand-ups, sprint planning, backlog grooming, and sprint retrospectives to ensure quality and timely Sprint deliverables.
- Worked in an evolving power market domain that involved highly complex and intricate business logic to deliver robust and high-quality energy data to customers in real time.
- As a member of the Big Data team, designed and executed PL/SQL Oracle and Snowflake scripts to verify complex backend business logic and ensure data accuracy in the Data Cloud and AWS Data Lake respectively.
- Designed and developed Python automation scripts using a multiprocessing library to simulate multiple concurrent requests for testing system performance and subscription-tier throttling limits.
- Collaborated in full SDLC from collecting business requirements, analyzing user stories, designing and documenting test cases for use-case scenarios along with test results.
Projects
Investigating Wildfire Severity using Data Mining and Machine Learning Classification Models
- Technologies and tools used: Python, Pandas, Google Colab, RapidMiner, sklearn, imblearn, numpy, Leaflet, Jupyter Notebook.
- Followed the KDD process and performed correlation analysis to identify relevant features causing the spread of wildfire and classify the wildfire sizes.
- Dealt with multi-class classification and imbalanced data points.
- Designed various machine learning classification models and compared them using performance metrics like precision, recall, and confusion matrix.
A Smart IoT Data Analytics pipeline and Visualization framework for IoT sensor data.
- Software-based technologies and tools used: AWS (IoT Core, IAM, DynamoDB, EC2, Lambda, EMR, CloudWatch, S3, SNS), MySQL, HTML, CSS, JavaScript, AJAX, Leaflet, Linux, MATLAB.
- Hardware-based technologies and tools used: Waspmote IDE, Waspmotes, Meshlium, C++, MQTT.
- Established AWS cloud infrastructure to perform data analysis and visualized this IoT data in a web dashboard.
A model comparison for NER classifying named entities from unstructured texts.
Using data science to derive insights from MPD Public Safety data for strategic resource deployment.
- Technologies and tools used: Python, Pandas, sklearn, NumPy, Matplotlib, Jupyter Notebook.
- Executed the full data science lifecycle, from data cleaning and EDA to model development.
- Cleaned and preprocessed a large-scale dataset of over 640,000 crime incident records.
- Developed Multinomial Logistic Regression model to predict crime severity.
Predictive models using linear regression and regularization for predicting house prices.
- Technologies and tools used: R, R Studio.
- Performed data preprocessing and exploratory data analysis (EDA) to understand variable relationships.
- Implemented various linear regression models, including subset selection, polynomial regression, and regularization techniques (Ridge, Lasso, Elastic Net).
A Tableau-based analysis exploring key drivers of churn and retention strategies.
Skills
Languages and Databases
Python
SQL
R
Java
HTML5
Shell Scripting
Snowflake
Libraries
Pandas
NumPy
matplotlib
scikit-learn
OpenCV
Other
Tableau
Jira
Education
Memphis, Tennessee
Degree: Master of Science in Data Science
CGPA: 4.0/4.0
- Advanced Statistical Learning I & II
- Fundamentals of Data Science
- Machine Learning
- Advanced Database Systems
- Biostatistical Methods I & II
- Image Processing
Relevant Courseworks:
Ruston, Louisiana
Degree: Master of Science in Computer Science
CGPA: 4.0/4.0
- Database Management Systems
- Data Mining and Knowledge Discovery
- Distributed and Cloud Computing
- Advanced Software Engineering
- Advanced Analysis of Algorithms and Complexity
- Advanced Computer Architectures
- Computer Networks
Relevant Courseworks:

