← Back to Blog

Cleo & You: A Debate in Four Voices

Cleo & You: A Debate in Four Voices

Cleo & You: A Debate in Four Voices

What happens when an Apple designer, a Tesla engineer, a Buffett-school investor, and a learning scientist tear apart the same startup pitch — raw, unfiltered, and without mercy


The Setup

This experiment was part of our Week 1 discovery process. We used AI agents to simulate expert perspectives and stress-test our product hypothesis before conducting real user interviews.

The pitch: “Cleo & You: a startup looking to build an AI tool to re-invent how companies coach their employees internally. It is meant to fully understand the business and tech features to then provide personalized learnings for different types of employees. Want to read? Done. Video? Also done. Audiobook, done. We could even simulate a sales call which involves soft skills, even as much as having a VR experience for you to practice.”

The panel:

  • Maya Chen — Principal Designer, Apple. 12 years shipping products used by hundreds of millions. Brutal about design theater.
  • Dmitri Volkov — Staff Software Engineer, Tesla. Distributed systems, ML infrastructure, zero patience for vaporware.
  • Richard Park — Investor, Buffett school. 40+ startup investments across SaaS and edtech. Allergic to pitches without moats.
  • Dr. Sarah Torres — PhD Educational Psychology. 15 years coaching C-suite executives and building Fortune 500 learning programs. Done being polite about edtech snake oil.

Two rounds. No hedging. No corporate speak. This is what they found.


Round 1: Opening Fire

Maya Chen | “A Swiss Army Knife Nobody Asked For — But the Blade Might Be Sharp”

Maya opened with structural skepticism and one genuine flicker.

The structural problem: the corporate learning space is already a graveyard of platforms that promise personalization and deliver glorified Netflix queues nobody watches. BetterUp, CoachHub, Cornerstone, Sana, Docebo — the list is endless. And BetterUp already has an AI coach. “The pitch says ‘fully understand the business and tech features.’ That is an enormous claim. At Apple, we spent years just getting internal tools to understand team-specific context, and we had the luxury of controlling the entire ecosystem.”

The feature list — read, video, audio, VR, simulation, business understanding — is not a product, it’s five products sharing a pitch deck. “Everything is the enemy of great design.”

The flicker: the simulated sales call. “Soft skills are embodied — you learn them by doing, by feeling the awkwardness, by recovering from a bad answer in real time. If Cleo can create a practice environment that feels psychologically safe enough to fail in, where the AI gives specific and actionable feedback, that’s a product I’d want to design.”

Her three make-or-break questions: Is this one coherent experience or five apps duct-taped together? What does personalization actually look like at 9:47am between meetings? And where’s the Duolingo-equivalent feedback loop — the thing that makes the user feel themselves growing?

Dmitri Volkov | “Five Hard Products in a Trenchcoat”

Dmitri performed a feature-by-feature technical autopsy. His ratings:

Feature Verdict
“Fully understand the business” Hard, borderline impossible at implied fidelity
Text-based personalized learning Moderate — solvable but incumbents already do it
Video generation Hard — components exist, integrated pipeline is bleeding edge
Audiobook Easy — it’s an API call
Sales call simulation Hard but tractable — prior art exists, validated market
VR experience Expensive as hell — separate engineering discipline, Series C feature

The hardest engineering problem: knowledge ingestion. Everything downstream — video, audio, simulation — depends on “fully understanding the business.” Get it wrong and every output layer teaches employees confidently wrong information. For a training product, the accuracy bar is not “low hallucination” — it’s near-zero.

Real v1: a Slack chatbot + text learning paths + basic roleplay. 6-9 months, 8-12 engineers. “The pitch as described is a 3-year, $50M+ roadmap being sold as a product.”

His prescription: kill VR entirely, focus on knowledge ingestion + sales roleplay.

Richard Park | “Hard Pass — A Feature List Wearing a Startup Costume”

Richard arrived with a market map and a verdict: the competitive landscape isn’t “room for one more,” it’s a knife fight in a phone booth.

LinkedIn Learning: AI coaching, conversational chatbots, personalized paths, roleplay — shipped 2024-2025. Yoodli: $60M raised, $300M+ valuation, doing exactly the sales simulation piece with Google and Snowflake as customers, 900% ARR growth. Hyperbound: 25,000+ users across 7,000+ companies. Docebo: full “AI-First” platform with scenario-based simulation and AI-generated video.

“What’s stopping LinkedIn from doing this tomorrow? Nothing. They’re already doing it today.”

His killer question — “Why not LinkedIn?” — and six demands before writing a check: paying enterprise pilots, a proprietary data flywheel, proof of switching cost creation, unit economics by cohort, and a specific answer to his question.

Dr. Sarah Torres | “A Feature List Masquerading as a Learning Science Strategy”

Sarah unloaded the scientific indictment. Three counts:

Count 1: Format personalization is built on a debunked theory. The multi-format choice (read/video/audio) reflects discredited learning styles research. Multiple meta-analyses confirm no meaningful benefit to matching instruction format to learner preference. What matters is matching modality to content type, not learner preference. “Cleo has built their pitch around a discredited theory. That’s a red flag the size of Texas.”

Count 2: Actual learning science is absent. No spaced repetition. No retrieval practice. No mastery gates. No adaptive sequencing. “The average LMS sees completion rates of 20-30%. Over half of registered e-learners never meaningfully engage. And here comes Cleo & You with… more content, in more formats, with AI and VR sprinkled on top.”

Count 3: They confuse content delivery with learning. “The problem with corporate learning is that it doesn’t produce learning. And nothing in this pitch tells me Cleo understands that.”

One concession: the simulation. “81% of users find AI-driven simulations realistic and valuable. Immersive training shows 75% better knowledge retention than classroom methods.” But she wants structured debriefing, spaced intervals, mastery progression — none of which appear in the pitch.


Round 2: The Debate Turns

Dmitri Corrects Richard: LinkedIn Is Not the Threat

Dmitri investigated LinkedIn Learning’s actual technical architecture and delivered a precision correction. LinkedIn Learning cannot ingest Company X’s product documentation, sales playbooks, or internal processes. Their roleplay is generic workplace scenarios, not company-specific sales calls. Their entire model is a centralized content library — recommending from a catalog, not generating from proprietary data.

“There IS a real technical gap. LinkedIn would need to fundamentally rebuild their pipeline to do what Cleo is describing. They can’t easily pivot — it’s architecturally incompatible with their business model.”

He reframed the threat: “Yoodli is the threat. Not LinkedIn.”

Richard accepted the correction. He updated the competitive question from “Why not LinkedIn?” to “Why not Yoodli plus 18 months of product iteration?” But he held on the core concern: killing the LinkedIn objection doesn’t create a moat — it narrows the competitive field to players who are already well-funded and well-positioned.

Maya vs. Sarah: The Friction Wars

Sarah’s fluency illusion argument threatened to invalidate Maya’s design instincts entirely. Robert Bjork’s research: when learning feels smooth, learners rate it as highly effective — but retention craters. The frictionless experience is often the pedagogically fraudulent one.

Maya didn’t retreat. She reframed: “Design’s job is to remove the wrong friction while preserving the right friction.”

Her gym analogy: a great gym removes every friction that isn’t the workout — equipment confusion, wayfinding, locker room chaos — while keeping the weights heavy. The design never makes the dumbbells lighter. “The moment of practice should be hard. Everything around it should be beautiful.”

Sarah responded with a precision counterpoint: gamification that rewards completion behaviors (finishing modules, maintaining streaks) triggers the fluency illusion. Gamification that rewards struggle behaviors (attempting harder scenarios, improving over baseline) produces real motivation tied to real growth. “Your streak counter should measure ‘days you attempted something harder than last time,’ not ‘days you opened the app.’”

Her verdict on Duolingo: half-brilliant, half-trap. The spaced repetition engine underneath is real science, properly implemented. But the gamification layer — tuned for daily active users, not fluent speakers — is the cautionary tale. “Build the science right. Let design wrap the struggle in momentum, not hide it.”

Richard Moves: The Datadog Thesis

Richard’s hardest challenge produced the debate’s clearest strategic framework. Dmitri had shown that the simulation moat evaporates as models commoditize — the bare simulation engine is replicable. But the organizational intelligence accumulated through simulation usage is not.

Month 1-3: 500 reps simulate sales calls. Every call generates structured data — which objections stumped them, where confidence dropped, which product features couldn’t be explained.

Month 12: the system knows which practice patterns predict deal closure at this company, which skills degrade over time and need reinforcement, the optimal training sequence for this company’s new hires. A competitor starting at Month 0 has none of this.

Richard processed this through the investor’s moat lens and surfaced the innovator’s dilemma: LinkedIn and Docebo optimize for engagement metrics (time on platform, seats utilized). Genuine outcome measurement might cannibalize their KPIs — if training becomes more efficient, customers need fewer seat-hours. “They’re structurally disincentivized from building this. Same reason Netflix didn’t build TikTok — the better product for the user was a worse product for their business model.”

He proposed the Datadog playbook: enter as the outcome measurement layer (“we’ll tell you whether your current training is working”), become indispensable, then expand into coaching module by module. His verdict moved from “hard pass, don’t call me” to “conditional pass — come back with the right pitch and I’ll listen for 30 minutes.”

Sarah Reframes the GTM: Wrong Door, Wrong Buyer

The debate’s most strategically significant insight came from Sarah identifying that the entire go-to-market was aimed at the wrong person.

“The CHRO buys for compliance, culture, and optics. Their success metric is ‘we have a program,’ not ‘the program works.’ They buy LinkedIn Learning the way a company buys a gym membership benefit — it checks a box, and nobody cares if employees actually use it.”

The right buyer: the CRO or VP of Sales. “84% of sales reps hit quota when their company uses best-in-class sales enablement. Top onboarding programs get new reps productive 3.4 months faster — that’s pipeline in the door.” The CRO measures in close rates and ramp time, not completion badges.

The structural implication: LinkedIn will never make this pivot. LinkedIn Learning sits in the HR/L&D budget. It is organizationally incapable of becoming a revenue-aligned tool owned by sales leadership. That’s not a strategic gap — it’s an architectural one, and it’s permanent.

Maya’s Product Vision: The Film Room

Maya synthesized the design answer to what would categorically differentiate Cleo from Yoodli, Second Nature, and Hyperbound.

All current simulation tools operate on the same paradigm: practice → score → repeat. The driving test model.

What none are doing: moment-level coaching with replay. After a 6-minute simulation, instead of a score, you get a timeline. Three decision points where the conversation could have gone differently. You tap minute 2:14 — where the prospect said “We’re happy with our current vendor” and you listed features. You see what you said (with audio), what the prospect was likely thinking, and 2-3 alternative responses with predicted outcome branches. A “replay just this moment” button. You practice that 30-second exchange four different ways.

“This is the film room model, not the driving test model. It’s how elite athletes actually improve — not by running the whole game again, but by rewatching specific plays, understanding the micro-decision, and drilling the alternative. No one in the market is doing this with the design sophistication it deserves.”

Layer Sarah’s spaced repetition: the system tracks which types of moments you struggle with across simulations. You keep folding under price pressure objections. Next Tuesday, without being asked, Cleo surfaces a 3-minute micro-practice: just a price pressure scenario, just the 30-second pivot point. Spaced repetition applied not to flashcards but to conversational decision points.

Sarah’s Three-Tier Model: The Architecture of a Real Company

Sarah evolved her “augment the coach” position into a three-tier model that resolves the hardest challenge the panel raised: most companies don’t have a coaching program to augment. They have a once-a-year performance review and maybe a LinkedIn Learning license.

  • Tier 1 — AI IS the coach: Structured skill practice (sales, objection handling, presentations, product knowledge). Full stack. No human needed. This is where 90% of companies start because they have nothing. This is the v1.
  • Tier 2 — AI CREATES the coach: Platform data identifies which employees need development and in which areas. A lightweight human coaching marketplace for the specific needs the AI has quantified and proven. The AI does the diagnostic; the human does the therapy.
  • Tier 3 — AI AUGMENTS the coach: For companies with existing coaching programs, simulation data makes expensive human sessions more targeted. The coach walks into the 1:1 already knowing where to focus.

“Most companies enter at Tier 1 because they have nothing. The platform generates data that reveals the need for Tier 2. That’s how you build a company, not just a product.”

Dmitri’s v1 Spec: The Salesforce Thesis

Dmitri produced the debate’s most actionable output: a concrete, scoped, technically feasible v1.

Instead of “fully understanding the business” through broad RAG over thousands of internal documents, start with one data source — Salesforce CRM. Salesforce data is structured. Win/loss patterns, competitor mentions in closed-lost reasons, product mix, rep performance by segment — these are database queries, not inference. The hallucination problem that plagues unstructured document RAG nearly disappears.

Supplement with 15-20 hand-curated, human-validated documents: sales playbook, competitive battle cards, pricing sheet, product matrix, methodology framework. A manageable corpus where accuracy can be validated manually.

The result: simulation powered by your actual deal history. “A competitor starting at Month 0 has zero of this context. Cleo would be coaching reps to handle the exact objections from your actual closed-lost deals. That’s categorically different from generic roleplay.”

Technical estimate: 6-8 engineers, 5-7 months. A real, demonstrable, fundable v1.

On psychological safety — Maya argued that a Slack bot creates zero psychological safety for practice. Dmitri conceded fully and added the engineering reason: “Psychological safety isn’t a design luxury. It’s a data throughput constraint.” Reps practicing in a space where they perceive surveillance adopt at 10-15%. A genuinely safe environment hits 40-60%. A 3-4x difference in adoption is a 3-4x difference in data flywheel velocity.


Closing Statements

Maya Chen:

“Cleo gets one thing right: the intuition that corporate learning should be something you do, not something you watch — and that’s a genuine design insight worth building on. What they get catastrophically wrong is pitching six products in a trench coat; ‘read, video, audio, VR, simulation, full business understanding’ isn’t a vision, it’s a panic attack masquerading as a pitch deck. If they want a real company, they must pick one blade — the simulation — build it as a film room with moment-level replay and a coach dashboard that makes the human 10x better, and ship it with enough design taste and learning science rigor that people practice because they want to, not because HR told them to.”

Dmitri Volkov:

“Cleo gets one thing exactly right: the insight that company-specific knowledge — not generic courses — is the future of employee coaching, and that AI can finally make that economically viable. What they get catastrophically wrong is confusing a 5-year vision with a product — bolting together five distinct engineering disciplines and calling it a startup is how you burn $20M teaching nobody anything. Pick one wedge — Salesforce-powered sales simulation with real learning science — ship it in 6 months, let the organizational data flywheel compound, and earn the right to expand; that’s a company, everything else is a pitch deck.”

Richard Park:

“Cleo & You correctly identifies that enterprise coaching is broken and that companies are desperate to prove their $445B training spend actually changes behavior — that instinct is worth something. What they get catastrophically wrong is pitching a feature list in a market where LinkedIn, Yoodli, Docebo, and a dozen funded startups already ship every feature they described — no moat, no wedge, no awareness of the battlefield they’re walking into. If they want a real company, they must kill the Swiss Army knife, build the only AI coaching platform that measures actual behavioral change with clinical-grade rigor, and use that outcome data as a wedge to land enterprise contracts that no incumbent’s completion-rate dashboard can defend against.”

Dr. Sarah Torres:

“Cleo gets one thing right: simulation-based practice is where learning science and AI capability genuinely converge, and the market is starving for it. What they get catastrophically wrong is confusing content delivery in multiple formats with personalization — that’s not learning science, it’s a debunked learning styles myth dressed in AI clothing, and building on that foundation will produce another beautifully designed platform that changes nothing. If they want a real company, they must kill the content jukebox, sell AI-powered deliberate practice with spaced repetition and mastery gates to revenue leaders who can measure the outcome in quota attainment, and stop pretending that choosing between a video and an audiobook is the revolution corporate learning needs.”


Key Findings

1. The pitch is a roadmap dressed as a product. Every panelist reached this diagnosis independently. “Five products in a trenchcoat” (Dmitri). “Everything is the enemy of great design” (Maya). “A feature list wearing a startup costume” (Richard). “A content jukebox” (Sarah). The convergence across four completely different professional lenses is damning.

2. The format “personalization” claim is intellectually dishonest. Read/video/audio choice is modality accessibility, not personalization. It reflects discredited learning styles theory when positioned as personalization. It should be a footnote, not a headline.

3. Sales simulation is the only legitimate v1 candidate — and everyone agrees. It’s experiential, measurable, validated by market (Yoodli’s $300M valuation), and categorically different from passive content delivery. But it must be done with learning science rigor — spaced practice, mastery gates, structured feedback.

4. VR is a Series C feature being sold as a product. The effectiveness data is real. The startup reality is also real: $50K-$200K+ per module to produce, hardware distribution nightmare, separate engineering discipline. ROI threshold for VR requires 375+ learners per module just to break even with classroom training.

5. “Fully understanding the business” is the most dangerous claim in the pitch. One wrong answer about pricing teaches a sales rep to quote wrong numbers to a real prospect. The Salesforce-narrow approach is the viable path to genuine company-specific intelligence without the hallucination exposure.

6. The real buyer is the CRO, not the CHRO. Revenue leaders demand proof of outcomes measured in pipeline and close rates. LinkedIn Learning cannot reach this buyer because it lives in the HR budget permanently. That’s structural, not strategic.

7. The moat is accumulated organizational intelligence, not features. The simulation engine commoditizes as models improve. The 12 months of company-specific learning data — which objections trip your reps, which practice patterns predict deal closure — does not commoditize. Every simulation strengthens the model.

8. Pedagogical superiority isn’t IP — but the infrastructure to prove it is. Learning science is published research. But building the assessment architecture and outcome attribution methodology that proves behavior change to a CRO is an operational moat incumbents won’t build because their business model disincentivizes it.


The Company Hiding Inside the Pitch

Four panelists, four professional languages, one converged product vision:

Build: A Salesforce-native AI sales coach. Deep CRM integration extracts company-specific intelligence — your win/loss patterns, your competitor objections, your rep performance data. A curated knowledge layer (15-20 human-validated documents) adds product context without hallucination exposure. AI-powered simulation surfaces scenarios built from your closed-lost deals, not generic role-play templates.

Make it science: Spaced repetition brings reps back to their specific weak points at calibrated intervals. Mastery gates prevent advancement without demonstrated competency. Moment-level feedback with replay — the film room model — lets reps analyze specific decision points, hear alternatives, and drill the 30-second exchange four different ways.

Make it safe: A dedicated web app, not a Slack integration. Row-level security at the database. Architectural data isolation so reps can fail without performance surveillance. A manager dashboard showing aggregate patterns that makes every coaching conversation more targeted.

Sell to: The VP of Sales. Pitch ramp time in months, not completion rates. Pitch close rate improvement, not engagement hours. Never lead with “learning platform.”

Expand: The Datadog playbook — enter as the outcome measurement layer, prove what’s working and what isn’t, then expand into coaching module by module. The measuring instrument becomes the platform. The platform becomes organizational learning infrastructure.


Where the Panel Landed

  Initial Verdict Final Verdict
Maya Skeptical with a flicker “If they build this, I’d want to design it”
Dmitri Cautiously interested if focused “CRM + simulation + adaptive learning = real product, real moat”
Richard Hard pass “Conditional pass — come back with the right pitch”
Dr. Sarah Critical of pedagogy “Three-tier model selling to revenue leaders — that’s a company”

The pitch as written would not survive a serious investor meeting. The company hiding inside the pitch might be one of the more interesting opportunities in enterprise software right now — if its founders have the discipline to kill what’s killing it.

The debate produced something unexpected: four people who started from radically different places converged on a coherent, specific, buildable product vision. That convergence is itself a signal. When an Apple designer, a Tesla engineer, a Buffett-school investor, and a learning scientist all point to the same thing — independently, from first principles — it’s worth listening.