Research | Peter Shamoun

Data Science Alliance Research

Fall 2024 - Present

Machine Learning Spacial Analysis Modeling Python

Overview

I am currently working with the Data Science Alliance and a group of undergraduate HDSI students to build a model to optimize the placement of food banks in San Diego County.

Research Goals

Create an interactive dashboard for stakeholders to optimize food bank placement through data driven decision making
Develop a model incorporating demographics, income, and location data
Build a scalable, open-source solution for wider adoption
See more here!

Methodology

Coming Soon...

Key Findings

Coming Soon...

Impact & Applications

Coming Soon...

View Paper View Code

San Diego Supercomputer Center Research

2024

High Performance Computing Data Analysis Machine Learning

Overview

At SDSC, my research focused on developing a machine learning model to predict and compress matrix values derived from plasma physics simulations. The main objective was to create a model capable of estimating matrix elements from input parameter tuples while reducing storage requirements. The project aimed to strike a balance between prediction accuracy and data compression efficiency, contributing to more optimized data management for large-scale computational simulations.

Research Goals

Develop a Neural Network model that accurately predicts 256x256 matrix elements from given input parameters.
Optimize the model to minimize data storage needs without significantly sacrificing prediction accuracy.
Enhance the generalization capability of the model for unseen data to ensure reliable performance across various scenarios.

Methodology

The research involved extensive data preparation and normalization to ensure high-quality inputs for training. Neural network models were designed and trained to predict matrix elements based on parameter tuples, with iterative adjustments made to the architecture to improve performance. The process included selecting appropriate training data, tuning hyperparameters, and employing validation techniques to assess and refine the model. The primary focus was on balancing the trade-off between prediction precision and data compression.

Key Findings

The model achieved approximately 85% accuracy on unseen data, demonstrating potential for predictive reliability.
Over 90% of predicted matrix values fell within a tolerance of ±1 from actual values, indicating reasonable accuracy for practical applications.
Generalizing predictions to new, unseen data proved challenging, highlighting areas for further enhancement and model robustness improvements.

Impact & Applications

This research offers significant contributions to the field of data compression in computational simulations. By enabling efficient estimation and storage of matrix values, it supports improved data management and resource allocation for large-scale scientific simulations, particularly in plasma physics. The findings lay the groundwork for future advancements in predictive modeling for high-dimensional data, potentially benefiting fields that rely on large-scale, complex simulations.