Experience 💼
Harvard Medical School & Massachusetts General Hospital
Data Scientist (Boston, MA)
April 2024 – Present
- Engineered scalable bioinformatics pipelines to process TB-scale multimodal data from 1500+ patient biopsies, supporting mutation discovery and ML modeling by leveraging cloud-based solutions (Google Cloud Batch, WDL, Bash)
- Optimized storage workflows by automating ad hoc usage reports, coordinating data migration with external collaborators on HPC systems, and leveraging tiered Google Cloud Storage classes to cut long-term costs by 20%
- Containerized Linux-based mutation detection tools using Docker on SLURM clusters and GCP to ensure cross-platform reproducibility of genomics analyses, enabling portable workflows with standardized results
- Developed real-time interactive dashboards with Plotly Dash to unify clinical findings with public genomics databases, enhancing data-driven decision making and advancing cancer research
Centrova
Data Engineer (Boston, MA)
February 2025 – July 2025
- Constructed an ETL pipeline to automate patient-to-trial matching, boosting screening efficiency by 42% through NLP-based parsing of EHR data, creation of embeddings using ClinicalBERT, and retrieval-augmented generation (RAG)
- Optimized backend infrastructure costs by 30% and enabled frontend integration by deploying REST APIs for patient and trial data access using serverless cloud platforms (AWS Lambda, API Gateway, Firebase Auth, Pulumi)
- Designed relational schemas in PostgreSQL to manage patient data, trial evaluations, audit logs, reducing query time by 22% through strategic data modeling, indexing, and caching
- Implemented CI/CD workflows with GitHub Actions to automate unit testing, infrastructure provisioning, and codebase deployments, accelerating the software development life cycle (SDLC)
UC San Diego Altman Clinical and Translational Research Institute
Clinical Data Analyst (San Diego, CA)
April 2024 – September 2024
- Migrated and transformed OMOP-standardized EHR data for 10,000+ patients across 35 hospitals into analytics-ready schemas using SQL and BigQuery, facilitating the training of clinical AI models
- Engineered clinical and social determinants of health features (MELD score, ICD-10 codes) to predict 30-day readmission of cirrhosis patients using a random forest classifier, achieving an AUC of 0.87
UC San Diego Health
Undergraduate Researcher (San Diego, CA)
April 2023 – April 2024
- Developed a GenePattern Notebook submodule to compute p-values and FDRs across 70+ cell types and 50+ gene sets from scGSEA data by simulating null distributions through subsampling
- Implemented parallel processing in Python and R scripts to enhance statistical testing of scGSEA data, achieving a 92% reduction in runtime validated across multiple AWS EC2 configurations
La Jolla Labs Inc.
Bioinformatics Intern (San Diego, CA)
June 2023 – September 2023
- Integrated Azure Batch and Blob services into DSL-2 Nextflow pipelines for scalable and automated FASTQ alignment, BAM quantification, and bigWig generation of transcriptomic data
- Containerized Conda environments to enable reproducible benchmarking of bioinformatics tools for intron retention detection, aiding the design of antisense oligonucleotides to treat haploinsufficiency diseases
Salk Institute for Biological Studies
Undergraduate Researcher (San Diego, CA)
June 2022 – March 2023
- Utilized DESeq2 for RNA-seq analysis, identified top 200 differentially expressed genes, and elucidated their functional roles in CD40-treated vs. untreated TNBC mice through GO enrichment analysis
- Leveraged Seurat and inferCNV for scRNA-seq analysis, immunophenotyped 18+ cell groups from aPD1/aCTLA4-treated vs. untreated TNBC mice by examining gene profiles and copy number variations
- Performed pseudotime analysis using Monocle3 and Slingshot on liver metastases in TNBC mice, effectively modeling differentiation patterns of immune cells
Triton Software Engineering
Frontend Engineer (San Diego, CA)
November 2021 – June 2022
- Utilized React-Bootstrap to build a form for future TSE applicants and manage data of past applicants
- Coordinated with a team of 5 web developers and 3 UI/UX designers to build components of the form using HTML, CSS, JavaScript
