Dr. John Doe

Zezhou Huang

zh2408@columbia.edu

About Me

I'm Zezhou Huang. You can call me Zachary.

I'm a PhD student at Columbia University advised by Professor Eugene Wu. My work centers on developing a semantic layer for large join graphs in cloud data warehouses. If you look at the image below, you'll see what join graph from a place like IMDB can look like - it's a bit messy, right? My job is to clean up that mess.

My previous projects built interactive dashboards, ML systems, and data discovery tools on top of join graphs. Now, I'm also looking into how to solve data problems with large language models (LLM) and speed up query processing with GPU acceleration.

The Google PHD Fellowship and Avanessian Fellowship provide generous funding for my research.

Publications

  1. Data Cleaning Using Large Language Models
    Shuo Zhang, Zezhou Huang, Eugene Wu.
    Under Review.
  2. Data-Centric Text-to-SQL with Large Language Models
    Zezhou Huang, Shuo Zhang, Kechen Liu, Eugene Wu.
    TRL@NeurIPS 2024.
  3. Transform Table to Database Using Large Language Models
    Zezhou Huang, Jia Guo, Eugene Wu.
    TaDA@VLDB 2024.
  4. SET: Searching Effective Supervised Learning Augmentations in Large Tabular Data Repositories
    Jiaxiang Liu, Zezhou Huang, Eugene Wu.
    GUIDEAI@SIGMOD 2024.
  5. Disambiguate Entity Matching through Relation Discovery with Large Language Models
    Zezhou Huang.
    GUIDEAI@SIGMOD 2024.
  6. Cocoon: Semantic Table Profiling Using Large Language Models
    Zezhou Huang, Eugene Wu.
    HILDA@SIGMOD 2024.
  7. Relationalizing Tables with Large Language Models: The Promise and Challenges
    Zezhou Huang, Eugene Wu.
    DBML@ICDE 2024, Video.
  8. The Fast and the Private: Task-based Dataset Search
    Zezhou Huang, Jiaxiang Liu, Haonan Wang, Eugene Wu.
    CIDR 2024.
  9. Lightweight Materialization for Fast Dashboards Over Joins
    Zezhou Huang, Eugene Wu.
    SIGMOD 2024.
  10. Data Ambiguity Strikes Back: How Documentation Improves GPT's Text-to-SQL
    Zezhou Huang, Pavan Kalyan Damalapati, Eugene Wu.
    TRL@NeurIPS 2023, Video.
  11. Saibot: A Differentially Private Data Search Platform.
    Zezhou Huang, Jiaxiang Liu, Daniel Gbenga Alabi, Raul Castro Fernandez, Eugene Wu.
    VLDB 2023.
  12. Kitana: Efficient Data Augmentation Search for AutoML.
    Zezhou Huang, Pranav Subramaniam, Raul Castro Fernandez, Eugene Wu.
    Arxiv.
  13. Random Forests over normalized data in CPU-GPU DBMSes.
    Zezhou Huang, Pavan Kalyan Damalapati, Rathijit Sen, Eugene Wu.
    DaMoN@SIGMOD 2023, Slides.
  14. JoinBoost: Grow Trees Over Normalized Data Using Only SQL.
    Zezhou Huang, Rathijit Sen, Jiaxiang Liu, Eugene Wu.
    VLDB 2023, Video 1, Video 2,
  15. Aggregation Consistency Errors in Semantic Layers and How to Avoid Them.
    Zezhou Huang, Pavan Kalyan Damalapati, Eugene Wu.
    HILDA@SIGMOD 2023, Slides.
  16. Reptile: Aggregation-level Explanations for Hierarchical Data.
    Zezhou Huang, Eugene Wu.
    SIGMOD 2022, Video, News, Interview
  17. Calibration: A Simple Trick for Wide-table Delta Analytics
    Zezhou Huang, Eugene Wu.
    Arxiv.
  18. Spatial and hedonic analysis of housing prices in Shanghai
    Zezhou Huang, Ruishan Chen, Di Xu, Wei Zhou.
    Habitat International 2017.

Service

Random

I developed a game, once live on the App Store, now offline due to the Apple Developer Program fee. Someone made a gameplay video about it. If there's interest, I can bring it back.