Zach

Zezhou Huang

zh2408@columbia.edu

About Me

I'm Zezhou Huang — call me Zachary.

Experience

Microsoft Research AI Frontiers · Senior Researcher · Redmond, WA · May 2025 – Present
LLM agents, systems, and reinforcement learning. Co-developed Magentic Marketplace & Red-Teaming a Network of Agents.
Microsoft Gray Systems Lab · Research Intern · Redmond, WA · May – Aug 2023
Prototyped a GPU-accelerated SQL engine on lightweight-compressed data; led to VLDB 2026 and two granted US patents.
Databricks · Software Engineer Intern · San Francisco, CA · May – Aug 2022
Built data structures (Scala/JVM) for query optimization in Spark, shipped in Databricks Runtime 11.1. Designed incremental view maintenance over Delta Lake; contributed MV strategies to Enzyme.
TuSimple · Software Engineer Intern · San Diego, CA · May – Aug 2021
Built ETL pipelines for self-driving sensor data using Python/Flask, Kafka, MongoDB, Docker/Kubernetes; applied scikit-learn for anomaly detection.

Education & Research

Columbia University · Ph.D. & M.S. in Computer Science (GPA 4.00) · Sep 2019 – May 2025
Advisor: Prof. Eugene Wu · Awards: Google PhD Fellowship (2023), Avanessian Fellowship (2023)
Built Reptile (SIGMOD'22), JoinBoost (VLDB'23), Treant (SIGMOD'24), Saibot (VLDB'23), Kitana, Cocoon (HILDA'24), and more — a query layer over PostgreSQL/DuckDB and cloud OLAP engines for wide-table optimization, private data search, and LLM-driven data systems.
University of Wisconsin–Madison · B.S. in Computer Science (GPA 3.89) · May 2019
Advisors: Prof. AnHai Doan, Prof. Remzi Arpaci-Dusseau
Built WiscKeyHybrid — LSM-tree balancing layer in C++/Go (+17.3% throughput on 100GB SSD database). Built cloudFAHES — cloud data-cleaning system for disguised missing values (0.82 F1 on real-world datasets).

Publications

  1. GPU Acceleration of SQL Analytics on Compressed Data
    Zezhou Huang, Krystian Sakowski, Hans Lehnert, Wei Cui, Carlo Curino, et al.
    VLDB 2026.
  2. A Decade of Systems for Human Data Interaction
    Eugene Wu, Yiru Chen, Haneen Mohammed, Zezhou Huang.
    Information Systems, Vol. 138 (2026).
  3. Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets
    Gagan Bansal, Wenyue Hua, Zezhou Huang, Adam Fourney, Amanda Swearngin, et al.
    arXiv 2025, Code.
  4. Data Cleaning Using Large Language Models
    Shuo Zhang, Zezhou Huang, Eugene Wu.
    DAIS@ICDE 2025.
  5. Data-Centric Text-to-SQL with Large Language Models
    Zezhou Huang, Shuo Zhang, Kechen Liu, Eugene Wu.
    TRL@NeurIPS 2024.
  6. Transform Table to Database Using Large Language Models
    Zezhou Huang, Jia Guo, Eugene Wu.
    TaDA@VLDB 2024.
  7. SET: Searching Effective Supervised Learning Augmentations in Large Tabular Data Repositories
    Jiaxiang Liu, Zezhou Huang, Eugene Wu.
    GUIDEAI@SIGMOD 2024.
  8. Disambiguate Entity Matching through Relation Discovery with Large Language Models
    Zezhou Huang.
    GUIDEAI@SIGMOD 2024.
  9. Cocoon: Semantic Table Profiling Using Large Language Models
    Zezhou Huang, Eugene Wu.
    HILDA@SIGMOD 2024.
  10. Relationalizing Tables with Large Language Models: The Promise and Challenges
    Zezhou Huang, Eugene Wu.
    DBML@ICDE 2024, Video.
  11. The Fast and the Private: Task-based Dataset Search
    Zezhou Huang, Jiaxiang Liu, Haonan Wang, Eugene Wu.
    CIDR 2024.
  12. Lightweight Materialization for Fast Dashboards Over Joins
    Zezhou Huang, Eugene Wu.
    SIGMOD 2024.
  13. Data Ambiguity Strikes Back: How Documentation Improves GPT's Text-to-SQL
    Zezhou Huang, Pavan Kalyan Damalapati, Eugene Wu.
    TRL@NeurIPS 2023, Video.
  14. Saibot: A Differentially Private Data Search Platform.
    Zezhou Huang, Jiaxiang Liu, Daniel Gbenga Alabi, Raul Castro Fernandez, Eugene Wu.
    VLDB 2023.
  15. Kitana: Efficient Data Augmentation Search for AutoML.
    Zezhou Huang, Pranav Subramaniam, Raul Castro Fernandez, Eugene Wu.
    Arxiv.
  16. Random Forests over normalized data in CPU-GPU DBMSes.
    Zezhou Huang, Pavan Kalyan Damalapati, Rathijit Sen, Eugene Wu.
    DaMoN@SIGMOD 2023, Slides.
  17. JoinBoost: Grow Trees Over Normalized Data Using Only SQL.
    Zezhou Huang, Rathijit Sen, Jiaxiang Liu, Eugene Wu.
    VLDB 2023, Video 1, Video 2,
  18. Aggregation Consistency Errors in Semantic Layers and How to Avoid Them.
    Zezhou Huang, Pavan Kalyan Damalapati, Eugene Wu.
    HILDA@SIGMOD 2023, Slides.
  19. Reptile: Aggregation-level Explanations for Hierarchical Data.
    Zezhou Huang, Eugene Wu.
    SIGMOD 2022, Video, News, Interview
  20. Calibration: A Simple Trick for Wide-table Delta Analytics
    Zezhou Huang, Eugene Wu.
    arXiv 2022.
  21. Spatial and hedonic analysis of housing prices in Shanghai
    Zezhou Huang, Ruishan Chen, Di Xu, Wei Zhou.
    Habitat International 2017.

Patents

  1. System and Method for Performing Query Operations on Run-Length-Encoded Data
    Rathijit Sen, Zezhou Huang, Matteo Interlandi, Marius Dumitru, Krystian Sakowski, et al.
    US Patent 12,277,123, granted Apr 2025.
  2. System and Method for Accelerating Query Execution
    Rathijit Sen, Zezhou Huang, Matteo Interlandi, Marius Dumitru, Carlo Aldo Curino, et al.
    US Patent 12,277,122, granted Apr 2025.

Service

SIGMOD 2027 PC · TaDA@VLDB 2026 PC · DEEM@SIGMOD 2026 PC · ICLR 2026 Reviewer · TRL@ACL 2025 PC · TaDA@VLDB 2025 PC · DEEM@SIGMOD 2025 PC · ICML 2025 Reviewer · TRL@NeurIPS 2024 PC · TaDA@VLDB 2024 PC · GUIDEAI@SIGMOD 2024 PC · DEEM@SIGMOD 2024 PC · DataPlat@ICDE 2024 PC · DBML@ICDE 2024 PC · TRL@NeurIPS 2023 PC · DBML@ICDE 2023 PC.