Kitana Agentic Wrapper
March – April 2025
TLDR
- Built an agentic wrapper on top of the Kitana AutoML system to intelligently select and clean joinable tables under a resource budget.
- Designed and tested multiple agentic architectures containing embedding-based pruning and LLM-powered reasoning for table ranking and filtering.
- Achieved 12.5% higher R² than embedding-only baselines on synthetic table benchmarks — supporting both no-hop and multi-hop joins.
- Implemented full pipeline in Python with OpenAI and Gemini API calls for LLM reasoning and embedding generation.
Project Summary
As data lakes continue to grow, ML engineers face a painful bottleneck: identifying the right tables to clean and join to improve model performance. This project extends Columbia’s Kitana AutoML system, which searches for useful joins to improve a target column’s prediction accuracy.
We proposed an agentic system to decide which tables to clean — under a budget — using past Kitana queries, accuracy improvements, and semantic cues. Our system beats a naive embedding baseline by up to 12.5% in R² and supports both no-hop and multi-hop table selection.
How can we leverage agentic planning to identify the most impactful tables — including multi-hop joins — while staying within a limited data cleaning budget?

Architecture: Selector Agent

Selector Agent: Embedding-based pruning + LLM-based enrichment pipeline
Code & Report
What I Learned
- Large language models can extract and reason about joinability far beyond what static embeddings offer.
- Agentic systems let us think about data engineering as a dynamic planning problem, not a static one-shot task.
- Thinking in multi-hop joins opened my eyes to how limited most current data search systems really are.
Contributions and Acknowledgements
This project was completed as part of COMS 6113: Agentic Systems Made Real, a graduate-level research seminar taught by Professor Eugene Wu at Columbia University. I worked alongside Mateo Juliani (msj2164@columbia.edu) and Kaushal Damani (akd2990@columbia.edu).