Shipping a voice-native Brand Discovery product
I'm an engineering leader, but I'm also a builder. This is a short report on a product I'm taking from hypothesis to beta: BrandSoup.ai, a voice agent that interviews a founder about their brand and produces a strategist-grade report. I'm sharing it to show how good judgment enables development velocity, even with a tiny team.
Situation
My brother-in-law is a brand strategist in Australia. We were talking about how brand strategy is locked inside expensive consultants and intimidating, lengthy questionnaires. Most founders never do the work. We want to deliver a conversation — natural, spoken, adaptive — that extracts the same signal a strategist would, and turns it into a usable framework and PDF that can work as an direct input into Go-To-Market motions.
Hypothesis and assumptions
- Hypothesis: founders will go deeper, faster in a spoken conversation than in a form, and an LLM can reliably infer structured answers from that messy dialogue.
- Assumptions: beta users are primarily on mobile, on speaker, with no headphones; latency and barge-in matter more than polish; and the underlying brand framework will change frequently as we dial in our target users and better understand their requirements — so nothing should be hardcoded.
Goals
- Validate the hypothesis quickly: get real founders talking to it.
- Techical feasibility: voice and multi-agent coordination are durable capabilities – this isn't a one-off, so do it right.
- Find product-market fit: instrument enough to learn what converts and what doesn't.
Initial user stories
- As a founder, I speak with an agent that asks one good question at a time and adapts to what I've already said, working through a set of framework criteria.
- As a founder, I get a polished brand report by email within 20 minutes of starting the conversation.
- As our brand expert, I edit the criteria framework (pillars, aspects, queries) in a spreadsheet — with no deploy required.
- As our PM, I see per-call cost and coverage so we can tune for unit economics.
- As a beta partner, I get something that just works on my Android phone.
Stakeholders today are deliberately tight: the brand-manager expert, the PM, and a small set of beta partners. I kept the circle small on purpose, because a loud feedback loop with the wrong people would pull us toward scale before we've earned the right to it. I also sequenced the bets: prove the conversation works on the cheapest, hardest device first (Android, on speaker); make the framework editable by a non-engineer before tuning the model; and defer payments until there's something worth paying for. Each decision protects the team's attention for the one question that matters this quarter — does the hypothesis hold?
Technical challenges
I'll pick three.
1. Two agents, one conversation
The hard part of a voice agent is that thinking and talking compete for the same critical path. My answer is a dual-process blackboard. A Conversational Agent (CA) drives the dialogue. An Analytical Agent (AA) runs in the background after each turn — inferring answers, updating coverage, and writing a fresh StrategicBrief to a shared blackboard. The CA never waits on the AA; it just reads the latest brief at the top of its next turn.
The tradeoff I accepted: the AA is intentionally one turn behind. That's a small loss of immediacy for a large win in responsiveness, and it keeps each process independently testable.
sequenceDiagram
%%{init: {'theme':'base', 'themeVariables': {'background':'transparent','primaryColor':'#ffffff','primaryTextColor':'#1a1a1a','primaryBorderColor':'#333333','lineColor':'#333333','fontSize':'14px','edgeLabelBackground':'#ffffff'}}}%%
participant U as User (voice)
participant CA as Conversational Agent
participant BB as Blackboard
participant AA as Analytical Agent
U->>CA: speaks (turn N)
CA->>BB: read StrategicBrief
CA->>U: reply (turn N)
CA-->>AA: fire-and-forget evaluate(turn N)
AA->>AA: infer answers + coverage
AA->>BB: write coverage + new StrategicBrief
Note over CA,AA: AA stays one turn behind,
off the critical path
2. Real-time audio
Full-duplex voice on a phone speaker is a feedback nightmare: the mic hears the agent. We went half-duplex while the agent speaks, then restored interruption with a client-side RMS VAD in an audio worklet that emits a barge-in signal. Along the way we killed a class of audio clicks (WAV-header artifacts, PCM chunk misalignment) and worked around iOS ignoring the AudioContext sample-rate hint by reporting the real mic rate and resampling on playback. Every threshold lives in config and a tuning log, so the PM and I can tweak endpointing and barge-in timing without a code change.
3. Configurability as an architectural choice
The fastest team is the one that doesn't redeploy to learn something. So I treated configurability as a first-class requirement, not a nicety. The brand framework — pillars, questions, the strategist's prompts — lives in Google Sheets, cached locally, owned by our brand expert. Voice behavior (Deepgram model and voice, STT endpointing, barge-in and VAD thresholds) is env-driven and tracked in a tuning log. Web config is the same. The payoff is that the PM and the brand expert can run experiments without me in the loop — change a question, swap a voice, tighten endpointing, and watch the next call. That's the difference between a feedback loop measured in minutes and one measured in deploys.
The end-to-end loop
Conversations hold open a WebSocket; everything else (auth, conversation state, criteria progress, report download) is plain HTTP. A completed conversation flushes its confirmed answers to the database. Once enough answers meet the threshold for quality completion, the user can choose to wrap up and generate their report. Report generation (Claude authoring, PDF, email) runs async on Redis + RQ workers, off the request path.
%%{init: {'theme':'base', 'themeVariables': {'background':'transparent','primaryColor':'#ffffff','primaryTextColor':'#1a1a1a','primaryBorderColor':'#333333','lineColor':'#333333','fontSize':'14px','edgeLabelBackground':'#ffffff'}}}%%
sequenceDiagram
participant B as Browser
participant API as FastAPI
participant Q as Redis + RQ
participant W as Worker
B->>API: journey complete
API->>Q: enqueue report job
API-->>B: 202 accepted
W->>W: Claude → PDF → email
W-->>B: delivered (SparkPost)
Sentry watches all three tiers — browser, API, and worker.
%%{init: {'theme':'base', 'themeVariables': {'background':'transparent','primaryColor':'#ffffff','primaryTextColor':'#1a1a1a','primaryBorderColor':'#333333','lineColor':'#333333','fontSize':'14px','edgeLabelBackground':'#ffffff'}}}%%
flowchart LR
Web[React + Vite SPA]
Web <-->|HTTP / WebSocket| API[FastAPI]
API --> DB[("SQLite:
auth, journey, reports")]
API -->|enqueue| Q[(Redis + RQ)]
Q --> W[RQ Worker]
W --> LLM[Claude]
W --> PDF[PDF service]
W --> Mail[SparkPost]
API <-->|STT / TTS| DG[Deepgram]
API <-->|streaming| LLM
Sheets[("Google Sheets:
framework data")] --> API
Web -.-> Sentry[(Sentry)]
API -.-> Sentry
W -.-> Sentry
-->-->-->
Try it: brandsoup.ai