Shristi Shrestha
A
Passionate about the entire data lifecycle, from engineering robust software solutions to deploying machine learning models that tackle real-world challenges.
About
I am a results-driven Data Scientist and Software Engineer with a passion for transforming complex data into actionable insights. I thrive at the intersection of machine learning, cloud computing, and robust software development. My professional experience at industry leaders like IBM has equipped me with a strong background in developing and testing scalable, data-intensive applications.
Currently, I'm pursuing my Master's in Data Science at the University of Memphis, where I'm diving deep into advanced statistical learning and deep learning. I'm on a mission to leverage technology to solve real-world challenges and build intelligent, data-driven applications.
- Languages: Python, R, SQL, Bash, SAS, Java, HTML
- Databases: MSSQL, MySQL, Oracle, Snowflake
- Methodologies/Concepts: Agile, Statistical Analysis, Hypothesis Testing, Regression Analysis, Machine and Deep Learning
- Tools & Technologies: RStudio, VS Code, PyCharm, Jupyter, Toad, Kibana, Grafana, Tableau, Excel, Matplotlib, JIRA, Git, Bitbucket, Maven, Gradle, Jenkins
Looking for an opportunity to work in a challenging position combining my skills in Data Science and Software Engineering, which provides professional development, interesting experiences and personal growth.
Experience
- Worked in IBM Cloud Object Storage (COS) System Integration team to develop Python automation test scripts for Cloud Storage solutions.
- Collaborated with cross-functional teams to investigate software defects across the IBM COS application.
- Proactively adapted to shifting priorities and effectively managed multiple competing projects, consistently delivering high-quality results on time.
- Involved in peer code review of GitHub PRs and Confluence Wiki documentation of application features.
- Actively engaged in Agile practices such as stand-ups, sprint planning, backlog grooming, and sprint retrospectives to ensure quality and timely Sprint deliverables.
- Worked in an evolving power market domain that involved highly complex and intricate business logic to deliver robust and high-quality energy data to customers in real time.
- As a member of the Big Data team, designed and executed PL/SQL Oracle and Snowflake scripts to verify complex backend business logic and ensure data accuracy in the Data Cloud and AWS Data Lake respectively.
- Designed and developed Python automation scripts using a multiprocessing library to simulate multiple concurrent requests for testing system performance and subscription-tier throttling limits.
- Collaborated in full SDLC from collecting business requirements, analyzing user stories, designing and documenting test cases for use-case scenarios along with test results.
Projects

Investigating Wildfire Severity using Data Mining and Machine Learning Classification Models
- Technologies and tools used: Python, Pandas, Google Colab, RapidMiner, sklearn, imblearn, numpy, Leaflet, Jupyter Notebook.
- Followed the KDD process and performed correlation analysis to identify relevant features causing the spread of wildfire and classify the wildfire sizes.
- Dealt with multi-class classification and imbalanced data points.
- Designed various machine learning classification models and compared them using performance metrics like precision, recall, and confusion matrix.

A Smart IoT Data Analytics pipeline and Visualization framework for IoT sensor data.
- Software-based technologies and tools used: AWS (IoT Core, IAM, DynamoDB, EC2, Lambda, EMR, CloudWatch, S3, SNS), MySQL, HTML, CSS, JavaScript, AJAX, Leaflet, Linux, MATLAB.
- Hardware-based technologies and tools used: Waspmote IDE, Waspmotes, Meshlium, C++, MQTT.
- Established AWS cloud infrastructure to perform data analysis and visualized this IoT data in a web dashboard.

A model comparison for NER classifying named entities from unstructured texts.

Using data science to derive insights from MPD Public Safety data for strategic resource deployment.
- Technologies and tools used: Python, Pandas, sklearn, NumPy, Matplotlib, Jupyter Notebook.
- Executed the full data science lifecycle, from data cleaning and EDA to model development.
- Cleaned and preprocessed a large-scale dataset of over 640,000 crime incident records.
- Developed Decision Tree and Random Forest models to predict crime severity.

Predictive models using linear regression and regularization for predicting house prices.
- Technologies and tools used: R, R Studio.
- Performed data preprocessing and exploratory data analysis (EDA) to understand variable relationships.
- Implemented various linear regression models, including subset selection, polynomial regression, and regularization techniques (Ridge, Lasso, Elastic Net).
Skills
Languages and Databases







Libraries





Other


Education
Memphis, Tennessee
Degree: Master of Science in Data Science
CGPA: 4.0/4.0
- Advanced Statistical Learning I & II
- Fundamentals of Data Science
- Machine Learning
- Advanced Database Systems
- Biostatistical Methods I & II
- Image Processing
Relevant Courseworks:
Ruston, Louisiana
Degree: Master of Science in Computer Science
CGPA: 4.0/4.0
- Database Management Systems
- Data Mining and Knowledge Discovery
- Distributed and Cloud Computing
Relevant Courseworks:
Visitor Map
Visitors to my portfolio from around the world!