AR2021-Humanizing AI: Filling the Gaps with Multi-Faceted Research (Joel Branch, Lucd)

—> Good afternoon! I work at a Lucd, a company that helps other enterprises get their AI roadmaps planned out

—> There is a lot to be excited about regarding AI

Examples like automation in cars that help you stay in your lane, and automated driving
Deep Fakes are an example as well

—> My observation is that AI is uninformed and hurried, resulting in deployments that don’t operate well in real world

Reasons for this include infrastructure limits, and a lack of talent

—> But a major reason is a lack of humanized AI

Our goal should be creating governance frameworks that democratize the development of AI solutions
- Not creating interfaces like Google Home or Alexa

—> AI is not new, and is as old as computing itself, but lot of breakthroughs in past 25 years have caused advancement and overall popularity of the discipline

—> Popular milestones include :

1997, Deep Blue beating Gary Kasparov
2000, ASIMO robot displayed human-like actions
2011, IBM Watson beating former Jeopardy champions
2016, DeepMind beating the world Go champion
2020, GPT-3 program exhibited language capabilities that could mimic human writing

—> AI commoditization has increased as we have moved through time

Lot of key AI technology is just an API call away, with minimal code required

—> Next, I want to cover next trends in AI value and applications

—> Trends for AI are quite attractive

Gartner expects AI augmentation will creation trillions of dollars in business value
What do I mean by AI augmentation?
- Not chatbot through customer service flow, but rather how augmentation helps business work better, from analyzing data, to writing reports

—> AI engineering is also a top tech strategy

Developing AI through low to no-code frameworks, and hiring people with deepest AI backgrounds

—> Next I want to talk about the state of AI Deployments, from the enterprise perspective

Even with hype and capabilities, AI efforts are largely failing

—> Companies report little to no impact from AI, and close to 90% of data science projects and don’t make into production

—> So why is enterprise AI failing?

—> For many reasons, and not just infrastructure problems

Differentiating factor and problems

—> Starting point is a lack of organizational support

AI limited to certain projects
There is a limited supply of AI talent
AI Development teams are composed of near exclusively AI-related talent
- In my prior work at ESPN, my own AI work was isolated from the rest of ESPN’s organization

—> Next I want to talk about the state of AI Deployments, from the enterprise perspective

Even with hype and capabilities, AI efforts are largely failing

—> Companies report little to no impact from AI, and close to 90% of data science projects and don’t make into production

—> There are also adoption challenges,

Close to 50% of effort is spent on data access and cleaning

—> There is black-box decision-making

Lack of transparency and rationality in AI output produced
AI, for context, works as probabilistic learning, with no hard or fast decisions
- Done with network of nodes and layers
Understanding why AI made a decision is hard to understand
Information goes in, something comes out, but no clear idea why

—> AI Bias

Examples of race bias in online ads, job postings, etc.
Fueled by overly technically driven solutions, and having a lot of engineering staff at the table
- But this is missing non-technical rules that can handle bias

—> Example of Gender Shades project which analyzed accuracy of facial recognition companies like Microsoft, Face ++, IBM

Evaluated accuracy of algorithms, and data-set to balance among gender and skin types in training data used by AI

—> Exposure of biased performance resulted in internal reviews at Microsoft and IBM

—> The results?

Some facial detection algorithms do better with men rather than women
Unfairness in lighter/darker color subject is significant going from 97%, to 75% accuracy
Bias problem gets worse, the more features are interested

—> AI efforts are also failing through an immature development process

Short process with immature orgs/deparments that follow on process

—> There’s straightforward data collection, but with biased or insufficient data

—> People then directly create a model, without explainability (on their laptop, with little expert oversight)

Lot of work is not tracked, unversioned one
Little to no audit trail

—> Goes right into deployed system and begins causing a mess

AI is inaccurate, unfair, or non-trustworthy

—> To recap, we talked about why you should care about AI, even with just a financial point of view

—> Then discussed why AI is failing

Largely caused by technical talent being left to their own devices

—> So, what do we do?

—> First order, address development workflow shown earlier

Overall, benefits of MLOps (borrowed from DevOps) offers governance, accountability, clear stakeholder responsibilities, to manage AI deployment
Distinctly have phase for data analysis
- Have phase for validating data, if data is invalid or inaccurate can go backwards
Also work to prepare data and make sure the data that goes into model is good for production

—> Then you finally have ML system deployment

Still this is only part of the solution, as guidelines emphasize roles of engineers

—> We expand number of people brought into ML Ops process, and add people to mix, including a product researcher and AI ethicist to make sure it’s all developed responsibility with target customer in mind

—> How do we get to include these folks?

Enter product researcher and HCI researcher

—> Relevant challenges

Exploding amount of data, and dimensions associated with data
- Data is getting bigger and wider

Intersectional bias (hard to identify)

—> Two solutions

Latent Space Data Exploration: New ways to visualize data in multi-dimensional data, and seeing relationships between data, so people can catch bad data
Data Sub-Group Analysis: Identifying sub-groups for comparative analysis through AI
- Need visualization experts to take audit specialists to essentially look at different groups in AI training data, and compare the results of different data sets as close as possible

—> Example of latent space data- Visualizing how digits are grouped together in a latent space, where 12*12 features are grouped

It’s then a matter of figuring out if data can be separated

—> Challenges in HCI

Explainability: Lack of explainability of AI models, limits capacity of deployment and evaluation
MLOps requires a lot of automation, but this process is still premature, and requires people to have AI domain knowledge

—> Next, I’ll show a slide on AI explainability

Left-hand side shows features and values. Visual of AI model that listed out features key to making its decisions

—> We just don’t want explainability for ML engineers, but want to get causality as well

HCI experts need explainability framework to create something cognitively understandable for humans
- Includes metrics of explainability (i.e. goodness of explanations)

—> Can integrate AI explanations with knowledge graphs or ontologies, and interpret AI into something humans can understand

Or augmenting training data used for AI models with features that can be expressed by a human

—> Humane AI is necessary, important, as it’ s maturing, and many open problems exist and require a cross disciplinary focus

—> Thank you!

Q&A

Is there a meta-pattern or algorithm for QA-ing AI systems for other features (other than biological features)?

A: There are explainability techniques that are specifically for medical/biological arena, and for Natural Language processing

—> There might be other QA techniques, but can’t name any right now

2. Is there a heuristic that you use to help you look for problems that only reveal at scale?

A: What he’s seen is that discipline of monitoring of models in production.

If you have model in production, do a warm-start of a new model
- Don’t just start new recommendation engine on January 1st
Also do synthetic data analysis (when you don’t have enough training data for particular domain)
- You can use as much data as you can from Internet as a proxy
- Be willing to give models time, and a distribution of daily data

3. Not sure if I’m following this correctly, is latent space visualization effectively just visualization of parameters identified by exploratory factor analysis

—> No, latent space focuses on compression of data such as text data or image data

Looks at how you can compress multi-dimensional data into a simpler-dimension data (i.e analyzing three dimensions instead of thirty dimensions )
Look at PCA to see what it means to go from high to low dimensional space for latent sapce

—> You are visualizing features, not parameters

Technique isn’t perfect, in that you have to work with parameters in order to see if there are ways data can be clustered in any way
- Techniques exist to help automate that

4. What’s our current knowledge about AI trust? What do we know and the gap?

A: It’s evolving. Hard to say what we know about AI trust, since it’s evolving

Tools we have now fall into combination of explainability analysis and visualizing explainability

—> Ultimate goal of AI trust is for human being to ask question to the AI, and the AI to explain to the human convincingly why it gave a specific answer to them over other answers.

Needs to have back and forth between person and AI

AR2021-Humanizing AI: Filling the Gaps with Multi-Faceted Research (Joel Branch, Lucd)

Links

Other Rosenfeld Media Conferences