AR2021-Humanizing AI: Filling the Gaps with Multi-Faceted Research (Joel Branch, Lucd)

—> Good afternoon! I work at a Lucd, a company that helps other enterprises get their AI roadmaps planned out

 

 

—> There is a lot to be excited about regarding AI
  • Examples like automation in cars that help you stay in your lane, and automated driving
  • Deep Fakes are an example as well
—> My observation is that AI is uninformed and hurried, resulting in deployments that don’t operate well in real world
  • Reasons for this include infrastructure limits, and a lack of talent
—> But a major reason is a lack of humanized AI
  • Our goal should be creating governance frameworks that democratize the development of AI solutions
    • Not creating interfaces like Google Home or Alexa

 

—> AI is not new, and is as old as computing itself, but lot of breakthroughs in past 25 years have caused advancement and overall popularity of the discipline
—> Popular milestones include :
  • 1997, Deep Blue beating Gary Kasparov
  • 2000, ASIMO robot displayed human-like actions
  • 2011, IBM Watson beating former Jeopardy champions
  • 2016, DeepMind beating the world Go champion
  • 2020, GPT-3 program exhibited language capabilities that could mimic human writing
—> AI commoditization has increased as we have moved through time
  • Lot of key AI technology is just an API call away, with minimal code required

 

—> Next, I want to cover next trends in AI value and applications
—> Trends for AI are quite attractive
  • Gartner expects AI augmentation will creation trillions of dollars in business value
  • What do I mean by AI augmentation?
    • Not chatbot through customer service flow, but rather how augmentation helps business work better, from analyzing data, to writing reports
—> AI engineering is also a top tech strategy
  • Developing AI through low to no-code frameworks, and hiring people with deepest AI backgrounds

 

 

—> Next I want to talk about the state of AI Deployments, from the enterprise perspective
  • Even with hype and capabilities, AI efforts are largely failing
—> Companies report little to no impact from AI, and close to 90% of data science projects and don’t make into production

 

—> So why is enterprise AI failing?
—> For many reasons, and not just infrastructure problems
  • Differentiating factor and problems
—> Starting point is a lack of organizational support
  • AI limited to certain projects
  • There is a limited supply of AI talent
  • AI Development teams are composed of near exclusively AI-related talent
    • In my prior work at ESPN, my own AI work was isolated from the rest of ESPN’s organization

 

—> Next I want to talk about the state of AI Deployments, from the enterprise perspective
  • Even with hype and capabilities, AI efforts are largely failing
—> Companies report little to no impact from AI, and close to 90% of data science projects and don’t make into production

 

—> There are also adoption challenges,
  • Close to 50% of effort is spent on data access and cleaning
—> There is black-box decision-making
  • Lack of transparency and rationality in AI output produced
  • AI, for context, works as probabilistic learning, with no hard or fast decisions
    • Done with network of nodes and layers
  • Understanding why AI made a decision is hard to understand
  • Information goes in, something comes out, but no clear idea why
—> AI Bias
  • Examples of race bias in online ads, job postings, etc.
  • Fueled by overly technically driven solutions, and having a lot of engineering staff at the table
    • But this is missing non-technical rules that can handle bias

 

 

—> Example of Gender Shades project which analyzed accuracy of facial recognition companies like Microsoft, Face ++, IBM
  • Evaluated accuracy of algorithms, and data-set to balance among gender and skin types in training data used by AI
—> Exposure of biased performance resulted in internal reviews at Microsoft and IBM
—> The results?
  • Some facial detection algorithms do better with men rather than women
  • Unfairness in lighter/darker color subject is significant going from 97%, to 75% accuracy
  • Bias problem gets worse, the more features are interested

 

 

—> AI efforts are also failing through an immature development process
  • Short process with immature orgs/deparments that follow on process
—> There’s straightforward data collection, but with biased or insufficient data
—> People then directly create a model, without explainability (on their laptop, with little expert oversight)
  • Lot of work is not tracked, unversioned one
  • Little to no audit trail
—> Goes right into deployed system and  begins causing a mess
  • AI is inaccurate, unfair, or non-trustworthy

 

 

—> To recap, we talked about why you should care about AI, even with just a financial point of view
—> Then discussed why AI is failing
  • Largely caused by technical talent being left to their own devices
—> So, what do we do?

 

—> First order, address development workflow shown earlier
  • Overall, benefits of MLOps (borrowed from DevOps) offers governance, accountability, clear stakeholder responsibilities, to manage AI deployment
  • Distinctly have phase for data analysis
    • Have phase for validating data, if data is invalid or inaccurate can go backwards
  • Also work to prepare data and make sure the data that goes into model is good for production
—> Then you finally have ML system deployment
  • Still this is only part of the solution, as guidelines emphasize roles of engineers

 

 

—> We expand number of people brought into ML Ops process, and add people to mix, including a product researcher and AI ethicist to make sure it’s all developed responsibility with target customer in mind
—> How do we get to include these folks?
  • Enter product researcher and HCI researcher

 

 

—>  Relevant challenges
  • Exploding amount of data, and dimensions associated with data
    • Data is getting bigger and wider
  • Intersectional bias (hard to identify)
—> Two solutions
  • Latent Space Data Exploration: New ways to visualize data in multi-dimensional data, and seeing relationships between data, so people can catch bad data
  • Data Sub-Group Analysis: Identifying sub-groups for comparative analysis through AI
    • Need visualization experts to take audit specialists to essentially look at different groups in AI training data, and compare the results of different data sets as close as possible

 

—> Example of latent space data- Visualizing how digits are grouped together in a latent space, where 12*12 features are grouped
  • It’s then a matter of figuring out if data can be separated

 

—> Challenges in HCI
  • Explainability: Lack of explainability of AI models, limits capacity of deployment and evaluation
  • MLOps requires a lot of automation, but this process is still premature, and requires people to have AI domain knowledge

 

 

—> Next, I’ll show a slide on AI explainability
  • Left-hand side shows features and values. Visual of AI model that listed out features key to making its decisions
—> We just don’t want explainability for ML engineers, but want to get causality as well
  • HCI experts need explainability framework to create something cognitively understandable for humans
    • Includes metrics of explainability (i.e.  goodness of explanations)
—> Can integrate AI explanations with knowledge graphs or ontologies, and interpret AI into something humans can understand
  • Or augmenting training data used for AI models with features that can be expressed by a human

 

—> Humane AI is necessary, important, as it’ s maturing, and many open problems exist and require a cross disciplinary focus
—> Thank you!

 

Q&A
  1. Is there a meta-pattern or algorithm for QA-ing AI systems for other features (other than biological features)?
A: There are explainability techniques that are specifically for medical/biological arena, and for Natural Language processing
—> There might be other QA techniques, but can’t name any right now

2. Is there a heuristic that you use to help you look for problems that only reveal at scale?

A: What he’s seen is that discipline of monitoring of models in production.
  • If you have model in production, do a warm-start of a new model
    • Don’t just start new recommendation engine on January 1st
  • Also do synthetic data analysis (when you don’t have enough training data for particular domain)
    • You can use as much data as you can from Internet as a proxy
    • Be willing to give models time, and a distribution of daily data
3. Not sure if I’m following this correctly, is latent space visualization effectively just visualization of parameters identified by exploratory factor analysis
—> No, latent space focuses on compression  of data such as text data or image data
  • Looks at how you can compress multi-dimensional data into a simpler-dimension data (i.e analyzing three dimensions instead of thirty dimensions )
  • Look at PCA to see what it means to go from high to low dimensional space for latent sapce
—> You are visualizing features, not parameters
  • Technique isn’t perfect, in that you have to work with parameters in order to see if there are ways data can be clustered in any way
    • Techniques exist to help automate that
4. What’s our current knowledge about AI trust? What do we know and the gap?
A: It’s evolving. Hard to say what we know about AI trust, since it’s evolving
  • Tools we have now fall into combination of explainability analysis and visualizing explainability
—> Ultimate goal of AI trust is for human being to ask question to the AI, and the AI to explain to the human convincingly why it gave a specific answer to them over other answers.
  • Needs to have back and forth between person and AI