Cheng Wu | Data Science at Columbia University

About Me

I’m Cheng Wu, a data and business analyst who cares about how analysis translates into real-world decisions.

With a background in econometrics and quantitative economics from the University of Illinois Urbana-Champaign and current graduate studies in Data Science at Columbia University, I work at the intersection of quantitative rigor and practical outcomes.

I’ve applied analytics across Markets, Finance, and Social Impact. My work has helped managers improve pricing and demand planning, speed up financial reporting, and identify inequities in education systems. What motivates me most are questions without ready-made answers — problems where careful analysis can bring clarity and drive meaningful change.

I focus on three things in my work: clarity in communication, rigor in method, and measurable impact in results.

Focus areas: market analytics, financial analysis, process improvement, education equity

📍 Tools: SQL, Python, BI dashboards (Power BI, Tableau, Plotly), Statistics / Econometrics

Education

Columbia University

M.S. in Data Science | In Progress

University of Illinois at Urbana-Champaign

Graduated: 2025

B.S. in Econometrics & Quantitative Economics
Minor: Data Science and Statistics
GPA: 3.9/4.0 Dean's List

Professional Experience

Data Analyst — Supply Chain Compliance & Operations

Tarte Cosmetics | October 2025 – Present | New York, NY

At Tarte, I work across compliance and supply-chain analytics to streamline import/export auditing and improve traceability for global product launches. My work focuses on HTS classification, NAV data integration, and 7501 entry audits, ensuring product metadata aligns with customs regulations and internal SKU catalogs.

I build workflows that reconcile TOR lists, COO records, and duty valuations against shipment entries, reducing manual review time and improving consistency across regulatory filings. I'm also designing an automated pipeline that flags discrepancies in valuation, commodity codes, and shipment attributes before clearance, helping minimize compliance risks and reporting delays.

Data Scientist Intern — Marketing Analytics

Donglai Natural BioTech Co. Ltd. | June 2024 – September 2024 | Remote

At Donglai, I joined the marketing strategies team at a time when the company was struggling with forecasting demand across dozens of product lines. My role started with customer segmentation: I applied clustering models like K-Means and DBSCAN to uncover groups of buyers with distinct purchasing patterns. These clusters were directly connected to SQL-based ETL pipelines for inventory planning, which meant my work could be tested quickly against real stock levels.

One highlight of the summer was building demand forecasting and price elasticity models that managers used to run “what if” scenarios. For example, we simulated price adjustments on seasonal products, which translated into a measurable 12% increase in order completion rates. I also spent weeks refining workflows with Spark and Dask to handle millions of transaction records; shaving 40% off reporting latency meant the team could react faster to market shifts. To make all of this accessible, I built interactive dashboards in Plotly, Dash, and Power BI. Watching non-technical managers click through predictive insights and immediately adjust their plans was one of the most rewarding moments of my internship.

Data Analyst — Institutional Finance

ZheShang Securities Co. Ltd. | May 2023 – September 2023 | Hangzhou, China

At ZheShang Securities, I rotated into the institutional finance department, where private equity and M&A deals were evaluated at a rapid pace. I was tasked with supporting the screening of over 20 transactions, which meant diving into financial statements, market data, and regulatory filings. Using multi-factor regression and scenario analysis, I helped improve the accuracy of our investment evaluations by about 25% — a difference that directly influenced whether deals moved forward.

A big challenge was data fragmentation: portfolio data lived in one system, accounting in another, and managers often spent hours reconciling them. I worked on integrating these into a unified SQL/VBA pipeline, which cut reconciliation time by nearly a third and made reporting more reliable. Beyond the technical work, I designed scalable portfolio analytics frameworks that benchmarked performance against market indices, which gave our risk team a more consistent way to track exposure. The BI dashboards I built for real-time portfolio and risk metrics became a talking point in weekly meetings, giving senior managers timely visibility to adjust allocation strategies and communicate with stakeholders.

Data Analyst & Lead Coordinator — Inclusive Growth

Innovative Bloom Foundation | May 2019 – June 2024 | Shanghai, China

My work with the Innovative Bloom Foundation spanned five years, starting when I was still in high school. I first joined as a volunteer math teacher for left-behind children in rural communities, but as I observed the deeper structural challenges these families faced, I began to take on broader responsibilities. Over time, I grew from front-line teaching to coordinating programs and ultimately leading data initiatives that shaped how the foundation measured and scaled its work.

My contribution became twofold: data analysis and program coordination. I started by structuring survey data — often messy, handwritten responses — into SQL datasets that could be analyzed systematically. This opened the door to building dashboards that tracked education access, food security, and health indicators across different villages.

One initiative I’m proud of was helping launch an e-commerce platform for local artisans. I set up real-time monitoring to track sales and logistics, and within a year the platform was supporting 15 families and increasing artisan household income by about 30%. These numbers weren’t abstract; they meant that families could afford school fees or healthcare they previously struggled with.

I also applied regression analysis to benchmark program outcomes, which guided funding decisions for future projects. Over time, I learned how to translate complex outcomes into metrics that stakeholders could act on, ensuring that data wasn’t just collected, but used to design better interventions.

Research Experience

Researcher — Educational Inequality & Intergenerational Poverty

UIUC | January 2023 – February 2025

My academic interest in inequality research grew out of my five years with the Innovative Bloom Foundation, where I worked directly with left-behind children in rural China. Seeing how lack of funding and resources limited opportunities made me want to study these issues with greater rigor.

At Illinois, I focused on quantifying how differences in school resources shape intergenerational mobility. I applied regression analysis and propensity score matching (PSM) with socio-economic controls, and engineered inequality metrics such as funding disparities and teacher-student ratios by integrating national education and mobility datasets.

To make these insights accessible, I built dashboards and automated reports that highlighted geographic gaps, providing evidence that policymakers could use when designing support programs.

Research Assistant — Education Policy

UIUC | January 2024 – July 2024

As part of Professor Powers’ research group on childcare policy, I contributed to a project evaluating how childcare grants affect families’ economic outcomes. My role was to build the technical backbone: integrating national surveys and administrative data in R, ensuring accuracy through standardization and auto-validation, and preparing reliable datasets.

Methodologically, I applied Difference-in-Differences (DiD) and Instrumental Variables (IV) to isolate the causal effects of childcare subsidies on employment, income distribution, and child welfare. I also developed feature-engineered regression models and created interactive R Shiny dashboards that simulated policy impacts in real time, enabling both policymakers and the public to see how grant structures influenced family well-being.

Researcher — Critical Medicines Access in Public Health Emergencies

UIUC | April 2023 – December 2023

This project was deeply personal: my grandmother passed away during COVID-19 after being unable to access medicine or a hospital bed. That loss motivated me to study why health systems fail during crises and how data-driven methods could help prevent similar tragedies.

I evaluated the impact of preventive health policies on disease incidence, mortality, and healthcare costs, applying quasi-experimental design to isolate effects across demographics. To anticipate shortages, I engineered policy exposure scores and health metrics from WHO and CDC datasets, and trained predictive models (XGBoost, LSTM, clustering) to identify high-risk populations and forecast long-term outcomes.

The results were delivered as decision-oriented dashboards and reports, providing resource allocation insights to improve resilience in future emergencies.

📩 Get in Touch

If you have any questions, feel free to reach out:

Email Me