Building an AI agent is relatively easy; making it reliable enough to handle mission-critical customer support and million-dollar freight logistics is the challenge.
In this debut episode of our podcast, I sat down with Victor Sulaiman (Senior PM at Wayfair) and Aman Khan (Head of Product at Arize AI) to get into the weeds of how one of the world’s largest retailers moved beyond the “pilot phase”. Wayfair is one of the few retailers shipping Agentic AI at scale, and making it reliable enough to run mission-critical applications in customer support, freight logistics, and a globalized product catalog. If you’re trying to figure out where the ROI actually lives in Agentic AI, this one is for you.
Practical Insights from the Episode:
Agentic Commerce: The new SEO! The future of retail is not search, but an agent that finds, visualizes, and recommends products in entirely new ways [15:31]
The big shift in retail is the chat interfaces customers are using to ask what to buy. ChatGPT, Gemini, Claude, and Anthropic-powered agents are becoming the new top of funnel. Wayfair has chosen to lean in: partnering with all three foundation labs to make sure their catalog is surfaced when shoppers ask AI agents about home decor. Victor draws an explicit parallel to the early SEO era. Retailers who refused to optimize for Google because they had enough foot traffic eventually found themselves invisible!
“Imagine you’re at a party and you love the table. You can circle to search it and say ‘where did we buy this table?’ That gets surfaced from a catalog within Wayfair. Discovery is going to start changing beyond just SEO and keywords — into image catalog and image classification.” — Victor Sulaiman
Customer Service Win: Wayfair reduced ticket resolution times from 7 days to 2 days by automating low-lift tickets, which surprisingly boosted employee satisfaction by letting staff focus on high-impact work [03:08]
Agentic Logistics: A fascinating look at how Wayfair is using LLM reasoning—not just traditional algorithms—to optimize freight capacity and container volume [19:41]
Wayfair’s freight problem has historically been an optimization-algorithm problem. Containers were getting shipped at ~10% capacity because the system was rigidly optimizing for speed. The team turned to LLM reasoning to dynamically decide what to add to a container, where to route it, and how to maximize volume without slipping delivery promises. Early results show container-volume gains and meaningful savings.
LLM as a Jury: Wayfair uses multiple LLMs to “deliberate” on decisions like furniture translations to avoid hilarious (and costly) errors [31:10]
Wayfair sells globally, translating product catalogs into many languages at scale. They started with an LLM-as-a-judge approach: a single model deciding whether a translation was correct.
“‘Sofa’ in Polish would be ‘rug.’ The judge LLM said: ‘Yes, this is from English. Yes, this word is in Polish. Yes, this is home decor. Therefore, this translation makes sense.’” — Victor Sulaiman
The fix: replace the single-judge with a jury of multiple LLMs that surface a set of recommendations, then escalate ambiguous cases to a human. The jury catches the 1% that breaks the customer experience.
Aman has seen Arize customers adopt LLMs as a Jury across regulated and customer-facing use cases. It’s how you get reliability without bottlenecking on humans.
Build vs. Buy is Becoming “Buy and Build” — Victor’s framework: plot every system on a spectrum of control needed versus iteration speed needed, and let that drive the choice.
“Right now it’s ‘I have a problem, do I build or buy?’ I think that’s going to change to ‘buy, and then build on top of it.’” — Victor Sulaiman
The Pilot Trap: Choosing the right agent for the right job is key. Sometimes a simple “reflex agent” beats a complex hierarchical one [47:20]
📖 Full episode with timestamps, key takeaways, and show notes
