Day 1– Optimizing AI Conversations: A Case Study Personalized Shopping Assistance Frameworks

— Hello everyone and glad to be here and how to design AI for truly intelligent conversations
- How do we bridge gap between expectations of AI assistant and the actual experience?
— Goal to move AI from reactive tool to proactive guide, that works with natural and intutitive interaction
- Reasoning is model for structuring information for helpful responses
— Before we deep dive, let’s ask if you have had bad experience with chatbot or virtual assistants
- Will talk about how to approach challenge and significant impact it had on the user experience and why behind the design decisions

— Evolution of conversation design, and see where we came from
- Had bots built on rule-based scripts and clunky decision trees. The systems were brittle and easily broke
— Then had intent classifiers, which mapped user references to pre-defined labels and decision trees and recognized variations of requests
- Needed to code for every conceivable iterations
— Now with LLM have new era of contextual inference of inferring the meaning of the input from the user and grasp intent behind the words
- Designing conversation and how to reason, and orchestrating LLM cognitive process
— Think of GPS, which used to require the end user to put in exact steps, but it evolved to understand broader instructions and calculate routes dynamically

— Building on understanding of conversational design, brought focus to specific challenges team faced to create a truly helpful AI shopping assistant
- Goal of personalized guidance from assistant
— Assembled team of diverse and dedicated members (conversation designers, data scientists, and researchers)
- Collaboration let us approach problem from multiple angles, and data driven insights
— Important to have design and data working together
— Initial investigation of research and analysis showed key problem
- Users were overwhelmed and misunderstood
- Needed to grasp why users were feeling this way

— There are two layers of user friction
- Catalog Exploration Paradox, which is clash between the mental model of online shopping and AI results
- The expectation gap: where the assistant guesses instead of clarifying user intent
- Two layers intertwined and needed to tackle both problems simultaneously

— Example of catalog visualization paradox
- Online shopping experience has real value— E-commerce layout is something user is accustomed to, with high visual bandwidth
- Images and descriptions are happening all at once
— Efficient browsing and comparison

— This approach has drawbacks
- Abundance of choices lead to information overload— which AI can help mitigate
— How can we maintain benefits of rich bandwidth without overwhelm with personalized guidance and compliments?
— Now let’s look at conversational AI interaction
- Conversational AI restricts items that can be displayed,
- User choices are restricted, and people are wondering if missing different options and reduces confidence in AI ability to provide full overview of available products
- When AI fails to understand complex user input, and ask unnecessary questions to user which frustrates users, as opposed to them just browsing
- Limited preview they have, restricts ability to explore catalog tasks like filtering and browsing

— To solidify this, see the example above
- Left interface can scan and compare, but AI has to be prompted every time to get a result

— Example of expectation gap of how AI should work, with how it actually answers questions
- User asks for Diwali and classic example of expectation gap and cultural context and gender they were shopping for and erodes trust
— If an AI fails to meet unspoken expectations, a user loses confidence in the system— its equivalent to someone not answering the questions you ask them

— Our team was faced with mountain of evidence for broken UX, from rage-clicks to a drop-off rate in usage
- We shifted our perspective, as we didn’t need better NLP, but needed to define core problem, and moving beyond conversations as linear flows, but complex reasoning systems
- These would be systems powered by AI, but orchestrated by us
- This wasn’t technical challenge, but design, communications, and psychological challenge

— We analyzed how users interact with the assistant and thousands of chat-logs and phrasing request
- Not realizing how queries and how they were contained
— Three patterns for user queries
- Broad queries asking with little info for assistant to work with
- Medium queries had some more detail, and helped narrow down search space
- Narrow queries very specific requests with lot of information upfront
— Traditional bots treated all these queries in similar way and relied on keyword matching, regardless of diversity of input
- LLM can adapt response based on nuance of query, but only if we provided robust structure to reason with— so we asked if we could have model breakdown query and analyze it like a salesperson would

— Concrete example of this can be seen above
- Traditional system of grabbing keyword of laptop, but LLM would provide much more sophisticated analysis and, we taught our model to breakdown user input to three key structure facets and primary function for performance requirements
— If asking for gaming laptop it implied they need processor and constraints, like budget available, and preference, a desirable but flexible criteria
- Crucial to emphasize structure wasn’t through fine-tuning or ML models, but through crafted prompt design
- Structuring prompts to guide LLM to output info in format that system could easily use — like sales person grasping customer needs

— Here’s how AI assistant applied framework
- If query is broad and lacks detail— provide follow-up questions to get use case and information, i.e. budget and what to use laptop for
- If query is medium— choose to skip clarifying questions and show results and ask if more is needed
— AI thinks through the conversation and considers what details need to be provided, and crafted prompts with logic to guide assistant to decision making process
— Architecture is what makes assistant adaptive, and it allows to handle wide range of user inputs, with flexibility and intelligence

— Here’s a side-by-side comparison of the current and improved experience
— In the status quo
- Preview results immediately that might be unhelpful
- Two or three predefined choices and restricted user ability to specify their own requirements
- Mandatory checklist and people feel interrogated not helped
- Endless follow-up questions
— From the AI assistant
- Improved reasoning and not results and brief intelligence conversation with needs and goals and what matters to the users
- Ends gracefully with enough info to provide helpful recommendations and avoiding repetitive questions and expanded to display multiple options
— This shows how power of structuring AI behavior and guiding intelligence to shape user experience

— Favorite part was results we saw with 2.5x increase in monthly active users and using assistant more frequently and more time interacting with it, and higher-level of user satisfaction and engagement and substantial last touch, and taking desired actions with different AI assistants
— What excited us the most was qualitative feedback from users
- Our approach felt was validated
- The LLM was no longer a tool for specific task, but trusted guide and advisor
— More natural language, and fewer frustration markers and demonstrate. Power of designing for reasoning and not a response

— Arrive at core takeaway of guiding for reasoning as opposed to response, and design for reasoning
- Right response for AI and specific answers with efficiency over understanding
- Focus on designing prompts and frameworks that empower the AI to understand the intent behind the user’s query, extract the relevant info, and provide information on how to respond
- Like teaching a student, and teaching them how to think
- Can overcome limitations of keyword matching, and far more adaptable and helpful AI assistants

— Thank you
Q&A
- Can you explain U/V Last touch a bit more?
- Metric created to see how chatbot interactions responsible for making a purchase happen
- Did this require developing or refining taxonomy terms?
- Prompting helped us far more than taxonomy for us and orchestration layer for us and purchase for us, and relevant agent that went for us and extensive tagging
- Limited prompting
- Prompting helped us far more than taxonomy for us and orchestration layer for us and purchase for us, and relevant agent that went for us and extensive tagging
- Did you explore ways to help users ask better questions through the design?
- It was not explicit, but implicit. The prompts encouraged people to ask longer questions and make them feel comfortable in typing what they wanted