My Projects

FX Project Image
FX Trend Strategy using Exponential Smoothing

A fully reproducible quantitative research project analyzing USD/CAD trend persistence using dual exponential smoothing filters. I built a complete forecasting and trading pipeline: signal engineering, α–β parameter tuning, long/short asymmetry testing, buffer and deceleration exit experiments, backtesting, Sharpe evaluation, and trade-level accuracy modeling. This project demonstrates my capabilities in quantitative analysis, data science workflow design, mathematical modeling, statistical reasoning, and technical communication.

Problem: Most simple FX trend-following rules look good in toy backtests but fall apart once you change parameters, markets, or exit rules. I wanted to design a robust, parameterized exponential smoothing system and understand exactly when a “dual-ES crossover” strategy actually adds value versus noise.

📌 Project Summary

Objective: Build a trend-following FX trading system for USD/CAD using dual exponential smoothing filters (ESα, ESβ) and evaluate long/short symmetry, parameter sensitivity, and exit logic.

Methodology: Designed full pipeline including parameter grid search, regime-specific performance evaluation, trade-level accuracy analysis, and experiments on buffer thresholds and deceleration-based exits.

  • Dual ES crossover signals with α < β
  • Trade-level performance aggregation (not just daily returns)
  • Long-only vs short-only optimization
  • Buffer & deceleration exit experiments

Key Results:

  • Optimal parameters: α = 0.20, β = 0.60
  • Sharpe ratio improves 0.26 → 0.45
  • Trade-level accuracy ≈ 73%
  • Distance buffer & deceleration exit reduce performance

See full technical report below.

MultiDocRAG Cover Image
MultiDocRAG

A full-stack retrieval-augmented generation (RAG) system designed to perform multi-document reasoning across uploaded PDFs. The system supports scalable document ingestion, semantic chunking, vector search retrieval, transparent evidence inspection, and automated evaluation. This project demonstrates my ability to integrate LLM engineering, applied machine learning, data pipeline design, evaluation methodology, and end-to-end product prototyping.

Problem: Traditional RAG pipelines work well for single-document lookup, but real-world analysis often requires synthesizing information across multiple sources. MultiDocRAG addresses this challenge by building a retrieval and reasoning pipeline capable of cross-document evidence comparison, grounded generation, and systematic evaluation.

📌 Project Summary

Objective: Build an AI assistant that can perform cross-document synthesis and answer questions using grounded, evidence-retrieved context from multiple PDFs.

System Design: Implemented an end-to-end pipeline including:

  • Multi-PDF ingestion and cleaning
  • Sliding-window chunking with semantic overlap
  • Embedding generation via Sentence-Transformers
  • FAISS vector search retrieval with score transparency
  • LLM reasoning layer with contextual grounding + controlled refusals
  • Automated evaluation framework across correctness, groundedness, and refusal safety
  • Streamlit demo UI with prompt inspection and retrieval visibility

Applications:

  • Cross-document analytics for research & reporting
  • Policy / business intelligence synthesis across multiple PDFs
  • Technical documentation QA and comparison
  • Automated literature review

What This Shows About My Skillset:

  • Ability to design end-to-end ML/LLM systems
  • Strength in data engineering workflow (cleaning → chunking → indexing → retrieval)
  • Evaluation methodology formulation and metric-driven iteration
  • Full-stack prototyping (backend + model + frontend UI)
  • Clear communication of system design and reasoning behavior

Current Progress:

  • Core ingestion, chunking, and vector retrieval implemented
  • LLM reasoning module integrated with memory + grounded prompting
  • Automated evaluation pipeline complete (27-question benchmark)
  • Live demo deployed via HuggingFace Spaces
  • Full report available

This project is actively evolving as I benchmark, refine prompts, evaluate failure modes, and introduce reranking & improved LLM backends.

Iris Recognition System Cover Image
Iris Recognition System

A full computer vision and pattern recognition pipeline for iris-based biometric identification, implemented as a Columbia University course project based on Ma et al. (2003). I built and refined an end-to-end system including iris localization, normalization, image enhancement, handcrafted feature extraction, PCA + Fisher Linear Discriminant matching, and verification/identification evaluation. This project demonstrates my ability in computer vision, machine learning system design, mathematical modeling, experimental debugging, evaluation methodology, and technical implementation.

Problem: Iris recognition requires much more than just classification. Raw eye images must first be localized, geometrically normalized, enhanced, converted into discriminative texture features, and then matched under rotation and illumination variation. I wanted to implement a full pipeline based on a classic paper and understand which design choices actually drive recognition performance.

📌 Project Summary

Objective: Reproduce and refine a complete iris recognition system based on Ma et al. (2003), using the CASIA-IrisV1 dataset under a fixed training/testing protocol.

System Design: Implemented an end-to-end modular pipeline including:

  • Iris localization using projection minima, thresholding, contour analysis, and Hough circle detection
  • Non-concentric rubber-sheet normalization into a fixed-size rectangular iris representation
  • Image enhancement through background illumination correction and local histogram equalization
  • Handcrafted texture feature extraction using two circularly symmetric spatial filters
  • Block-wise statistical encoding (Mean + Average Absolute Deviation) into a 1536-dimensional feature vector
  • PCA + Fisher Linear Discriminant (FLD) for dimensionality reduction
  • Nearest-center / multi-template matching with L1, L2, and cosine distance metrics
  • Performance evaluation through CRR and verification ROC curves

What problem I solved:

  • Turned raw grayscale eye images into a reproducible recognition pipeline rather than a single classifier
  • Handled geometric variation through normalization and rotation-aware template matching
  • Reduced sensitivity to illumination and local noise through enhancement and block-level feature design
  • Improved performance through iterative debugging of ROI selection, matching strategy, and evaluation protocol alignment

Key Results:

  • Original Space CRR: L1 = 73.38%, L2 = 71.99%, Cosine = 73.38%
  • Reduced Space CRR: L1 = 80.79%, L2 = 81.25%, Cosine = 86.11%
  • Verification ROC AUC: L1 = 0.9476, L2 = 0.9555, Cosine = 0.9912
  • Reduced-space matching substantially outperformed original-space matching
  • Cosine distance produced the strongest final identification and verification performance

What this shows about my skillset:

  • Ability to implement a full ML / CV pipeline from raw data to final evaluation
  • Strong debugging and iteration skills guided by metrics rather than guesswork
  • Experience translating research-paper methodology into working code
  • Comfort with classical machine learning, feature engineering, and experimental analysis
  • Ability to structure technical projects in a modular, reproducible way

This project was completed as a Columbia University course project and reflects both technical implementation and iterative performance improvement under a fixed experimental protocol.

Housing Project Image
Housing Price Prediction: An Exploratory Analysis

Built a housing price prediction pipeline using exploratory data analysis, feature engineering, and regression/ML models including Ridge, LASSO, Random Forest, and Group LASSO. The models achieved strong predictive accuracy while consistently identifying space, quality, and utility as the key drivers of value. Beyond forecasting, the project emphasized interpretability and stakeholder communication — turning high-dimensional data into actionable insights for decisions.

Problem: House price models often chase leaderboard metrics but fail to answer a practical question: what exactly is driving value? For a buyer, developer, or bank, we need an interpretable decomposition of space, quality, and neighborhood effects rather than a pure black-box forecast.

📌 Project Summary

Objective: Build an interpretable housing analytics pipeline that identifies economic drivers of value — not just produce a black-box prediction model.

Methodology: Starting from the full Ames dataset (80+ variables), we:

  • Separated numeric vs categorical features & re-classified ordinal variables (OverallQual, MoSold)
  • Used correlation + effect size (η²) to evaluate predictor strength
  • Applied adjusted GVIF to control multicollinearity
  • Built interactive visualizations: heatmaps, neighborhood maps, STL trend decomposition

Key Insights:

  • Space & construction quality are the dominant drivers (GrLivArea, TotalBsmtSF, OverallQual)
  • Neighborhood effects persist even after controlling for features
  • Garage & exterior finishing add second-tier but significant value
  • Time-series structure aligns with macro events (e.g., subprime crisis, tax credits)

What this demonstrates:

  • Ability to turn raw municipal data into decision-oriented insights
  • Bridging EDA → feature engineering → modeling → communication
  • Transferable to pricing, risk modeling, and applied analytics pipelines

Full interactive analysis available below.

Project Image
Socioeconomic Drivers of Crime in San Francisco

Built a large-scale spatial econometrics pipeline linking 900k+ SF police incident records with ACS socioeconomic panel data. Applied fixed-effects logistic models, Poisson/NegBin count models, and time-series forecasting to quantify how inequality, unemployment, and mobility patterns shape crime trends. The project demonstrates skills in causal inference, longitudinal modeling, data integration, and policy analytics—transferable to business forecasting & systems design.

Problem: City agencies and planners see crime as an “economic problem”, but it’s unclear whether inequality, unemployment, or mobility actually explain crime patterns once we control for where people live and move. This project builds a tract–year panel to test whether the data supports that narrative.

📌 Project Summary

Objective: Quantify whether crime patterns are driven by economic factors such as inequality, unemployment, transit patterns, and demographic changes.

Pipeline: Merged 913,732 incident-level crime records with census-tract ACS data (2017–2022) using spatial joins and longitudinal panel construction.

  • Panel structure: tract × year
  • Models: Fixed-effects logistic (individual), Poisson/Negative Binomial (aggregate)
  • Time-series forecasting using ARIMAX/SARIMAX
  • Feature engineering for economic deltas + mobility metrics

Key Findings:

  • Higher transit usage (public transit, cycling) → consistent increases in crime rates across categories
  • Income inequality + unemployment negatively associated with crime at tract level (counter-intuitive, suggests urban confounds)
  • Bachelor’s degree rate reduces violent/public order crime but increases property crime
  • COVID years: fewer public order crimes, more property crimes

Methodological Insights (Transferable):

  • Importance of panel vs individual-level inference: aggregate models outperform individual classifiers
  • Negative Binomial superior under over-dispersion → similar logic applies to ops forecasting
  • Mobility + density better predictors than pure economic indicators

Full methodology and regression tables available in report below.

Ikebana Site Image
Ikebana Portfolio — Immersive Front-End Microsite

A handcrafted, single-page microsite that turns my Ikebana course portfolio into an immersive digital experience. Built from scratch (no frameworks) with responsive layout, CSS animations, JavaScript-driven interactions, and background audio integration, this project reflects my attention to detail in UX, visual hierarchy, and front-end systems thinking rather than just static pages.

Problem: Most “portfolio sites” for creative work are either static grids of images or generic templates. I wanted to see if I could turn an Ikebana course portfolio into a small, product-like web experience with deliberate motion, sound, and layout — without relying on heavy frameworks.

📌 Project Summary

Objective: Design and implement a small, self-contained web experience that presents Ikebana work in a way that feels more like a product than a static gallery — with smooth transitions, responsive layout, and ambient audio.

What I built:

  • A fully responsive single-page site that adapts to different screen sizes and dark/light environments
  • Custom CSS animation system (entrance transitions, hover states, text reveals) without external libraries
  • JavaScript controllers for navigation, scroll-based effects, and HTML5 audio playback
  • A layout that balances photography, text, and whitespace so the site reads like a curated story rather than a code demo

Why it matters for my broader work:

  • Shows I can go from concept → UX structure → visual design → implementation on my own
  • Reinforces skills that are directly reusable for analytics dashboards, internal tools, and stakeholder-facing UIs
  • Demonstrates that I care about the last mile of data/insights — how people actually experience what we build

Below are selected screenshots from the live site.