Giving Claude Code a Voice with ElevenLabs

I spend hours in Claude Code every day. Long sessions where I am reading, thinking, switching contexts, and occasionally glancing at the terminal to see if the agent finished a task. The problem: Claude Code is silent. It finishes a 10-minute build-and-deploy pipeline and just sits there, cursor blinking, waiting for me to notice. The whole concept here was inspired by J.A.R.V.I.S. from the Iron Man films, voiced by Paul Bettany. Tony Stark's AI assistant announces status, flags problems, and delivers dry commentary while Stark works on something else entirely. I wanted that. An AI assistant that speaks. That announces when it starts a task and summarizes what it accomplished when it finishes. Like a competent colleague who taps you on the shoulder and says "that deployment is done, here's what happened."

Thirty minutes of setup gave me exactly that. Claude Code now speaks through ElevenLabs text-to-speech, streaming audio through my speakers with ~300ms latency. A short bash script, an API key, and a prompt block in CLAUDE.md turned a silent terminal agent into one that announces its work. This article walks through the full implementation: the script, the prompt engineering, the voice selection, and the cost math. If you use Claude Code for extended sessions and want ambient awareness of what your agent is doing without watching the terminal, this is for you.

Why "Cooper"? Throughout this article you will see my agent address me as "Cooper" or "sir." That naming comes from a deliberate instruction in my CLAUDE.md file, and it is a nod to Interstellar. TARS addressing Cooper in that film captures exactly the dynamic I wanted: a dry, competent AI that treats you as the mission commander. The British butler tone and the name alternation create a surprisingly immersive collaboration feel after a few days of use.

Why Voice Output Changes the Workflow

The Attention Problem

Claude Code runs in a terminal. When it finishes a task, the only signal is that new text appears on screen. If you are in another window (reviewing a PR, reading documentation, responding to Slack), you miss it. You context-switch back to the terminal, realize the task finished three minutes ago, and lose those three minutes of idle time. Multiply that across a full workday of agent-assisted development and the accumulated dead time is significant.

What Voice Adds

Voice output solves the attention problem without requiring visual focus. I hear "That's done, Cooper. Five articles deployed to staging, all AI scores under threshold." from across the room and I know the state of my work without looking at the terminal. Three specific benefits:

Benefit	Without Voice	With Voice
Task completion awareness	Must watch terminal	Hear it from anywhere
Error notification	Discover on next glance	Hear immediately
Context retention	Re-read output to recall what happened	Spoken summary sticks in memory
Multi-task efficiency	Check terminal between tasks	Continue working, hear updates

The psychological effect surprised me. Having the agent announce its work creates a sense of collaboration that a silent terminal lacks. It feels like pair programming with a colleague who happens to work at 100x speed.

The Architecture: Three Components

The entire implementation is three pieces: a bash script that calls the ElevenLabs streaming TTS API, an .env file with credentials, and a prompt block in CLAUDE.md that instructs Claude Code when and how to use the script.

Component Overview

Voice output architecture

Claude Code calls the script through its Bash tool, passing the text to speak as an argument. The script POSTs to the ElevenLabs streaming endpoint, which returns audio chunks progressively. Those chunks pipe directly into mpv, which starts playing before the full response arrives. End-to-end latency from Claude Code deciding to speak to audio hitting the speakers is roughly 300-400ms.

Dependencies

Component	Purpose	Installation
`curl`	HTTP client for ElevenLabs API	Pre-installed on macOS/Linux
`jq`	JSON payload construction	`brew install jq` or `apt install jq`
`mpv`	Audio player with stdin streaming	`brew install mpv` or `apt install mpv`
ElevenLabs account	TTS API access	elevenlabs.io

The script has no Python dependencies, no Node.js runtime, no Docker container. Four command-line tools and an API key. That simplicity matters because Claude Code invokes this script potentially dozens of times per session; startup overhead needs to be near zero.

The Script

Create the directory structure:

~/.claude/scripts/
├── .env          # API credentials
└── speak.sh      # TTS script

The .env File

ELEVENLABS_API_KEY=your_api_key_here
ELEVENLABS_VOICE_ID=your_voice_id_here

Store your ElevenLabs API key and voice ID here. The script sources this file at runtime. Keep it out of version control.

speak.sh

#!/bin/bash
# Claude Code TTS — streams ElevenLabs audio through mpv
# Falls back to macOS ___PRESERVE_BLOCK_11___ if ElevenLabs is unreachable

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$SCRIPT_DIR/.env"

TEXT="$1"

if curl -sN --fail "https://api.elevenlabs.io/v1/text-to-speech/${ELEVENLABS_VOICE_ID}/stream" \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg text "$TEXT" '{
    text: $text,
    model_id: "eleven_turbo_v2",
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.75
    }
  }')" \
  | mpv --no-video --no-terminal --really-quiet - 2>/dev/null; then
  :
else
  say -v Daniel "ElevenLabs unavailable. Falling back to local voice."
  say -v Daniel "$TEXT"
fi

Make it executable:

chmod +x ~/.claude/scripts/speak.sh

How It Works

The script does four things in a single pipeline, with a fallback if the API call fails:

Sources credentials from the .env file adjacent to the script.
Constructs a JSON payload using jq with the text, model ID, and voice settings.
POSTs to the ElevenLabs streaming endpoint with curl -sN --fail (silent mode, no-buffer for streaming, fail on HTTP errors).
Pipes the audio stream to mpv which plays it in real time with no video window and no terminal output.
Falls back to macOS say if any step fails (network issue, expired key, rate limit). The fallback announces that ElevenLabs is unavailable before speaking the original text through the local Daniel voice. You always hear the announcement; the only question is which voice delivers it.

The eleven_turbo_v2 model delivers ~300ms time-to-first-byte. The voice settings control two parameters: stability (0.5 gives natural variation without wandering off-voice) and similarity_boost (0.75 keeps the output close to the selected voice's characteristics). I tuned these through experimentation; your preferences will vary.

Choosing a Voice

ElevenLabs offers three categories of voices:

Voice Type	Description	Cost	Best For
Pre-made voices	Curated defaults optimized for reliability	Included in plan	Quick setup, consistent quality
Community voices	10,000+ voices shared by users	Included in plan	Finding a specific character or accent
Cloned voices	Your own voice or a custom voice	Requires Pro plan+	Brand consistency, personal preference

I use a pre-made British male voice (the "butler" aesthetic fits the interaction model). Browse the ElevenLabs Voice Library to find one that suits your taste. Each voice has an ID string that goes in your .env file.

Voice Selection Tips

Pick a voice that is distinct from your own and from common notification sounds. The goal is instant recognition: when that voice speaks, you know it is your coding agent. I avoid voices that sound like podcast hosts or audiobook narrators because those blend into background audio. A slightly unusual accent or cadence cuts through ambient noise better.

Test your chosen voice with short, technical phrases. Some voices handle code terminology ("deployed to staging," "CI pipeline green," "three hundred millisecond latency") well. Others stumble on abbreviations, acronyms, or numbers. The turbo model handles technical language better than the older v1 models.

The CLAUDE.md Prompt

The script alone does nothing until Claude Code knows to call it. The prompt block in CLAUDE.md defines when to speak, what to say, and how to say it. Here is the exact prompt I use:

## Voice Announcements

Use the ElevenLabs TTS script for spoken announcements. Run in the background
so it doesn't block:

\___PRESERVE_BLOCK_12___\___PRESERVE_BLOCK_13___\___PRESERVE_BLOCK_14___

### When starting a task

Speak a brief acknowledgement when beginning work. Address the user as "sir"
or "Cooper" (vary which one). The phrasing must vary every time but convey
"I'm on it." Never repeat the same wording twice in a session. Examples of
the *tone* (do NOT reuse these verbatim):
- "Right away, sir."
- "On it, Cooper."
- "Consider it done, sir."
- "Straightaway, Cooper."
- "I'll see to it at once, sir."

### When completing a task

Speak a brief 1-sentence summary of what was accomplished. Address the user
as "sir" or "Cooper" (vary which one). The phrasing must vary every time.
Keep it concise — what was done, key outcome. British butler tone. Examples
of the *tone* (do NOT reuse these verbatim):
- "All sorted, sir. The README has been updated and pushed."
- "That's done, Cooper. Terraform validates cleanly across all twelve files."
- "Taken care of, sir. Tests are green and the commit is pushed."

### General rules

- Always vary the phrasing — never use the same opening or structure
  consecutively
- Alternate between "sir" and "Cooper" naturally
- Skip only for: pure Q&A conversations with no code or file changes
- When a task has an exceptionally high leverage factor (50x+), occasionally
  mention it in the completion announcement. Keep it dry and understated —
  e.g. "That would have taken a human the better part of a week, sir." or
  "Roughly eighty hours of work in under ten minutes, Cooper." Don't do this
  every time — just when the leverage is genuinely striking.

Why This Prompt Structure Works

Several design decisions in the prompt are deliberate:

Background execution with &. The trailing ampersand runs the script without blocking Claude Code's execution. Without it, the agent waits for the audio to finish playing before continuing work. With it, the agent speaks and keeps working simultaneously.

Forced variation. The instruction "never repeat the same wording twice in a session" prevents the robotic monotony of hearing the same phrase fifty times a day. Claude Code is good at varying phrasing when you explicitly ask for it. Without this instruction, it gravitates toward a small set of favorites.

Character consistency. The "British butler tone" instruction and the name/honorific alternation create a consistent personality. After a few days, the voice becomes a recognizable character rather than a generic TTS notification. This matters for the psychological benefit I mentioned earlier: collaboration feels more real when the collaborator has a consistent voice and manner.

Selective leverage mentions. The instruction to occasionally comment on high-leverage tasks adds a layer of awareness that reinforces the value of the AI-assisted workflow. Hearing "That would have been three weeks of work for a human team, sir" after watching a 12-minute task complete is a visceral reminder of what this tooling makes possible.

Prompt Placement

Put the voice announcement block in your global ~/.claude/CLAUDE.md if you want voice across all projects. Put it in a project-level CLAUDE.md if you only want voice for specific repositories. I use the global file because I want voice everywhere.

Cost Analysis

ElevenLabs bills per character. The turbo models cost 0.5 credits per character on self-serve plans.

Typical Usage

Metric	Value
Average announcement length	60 characters
Announcements per hour (active session)	8-12
Characters per hour	~600
Characters per 8-hour day	~4,800
Characters per month (22 working days)	~105,600

The free tier provides 10,000 characters/month, which covers roughly two days of heavy use. The Starter plan ($5/month) provides 30,000 characters. The Creator plan ($22/month) provides 100,000 characters, which covers a typical month with room to spare.

Plan	Monthly Characters	Monthly Cost	Coverage
Free	10,000	$0	~2 working days
Starter	30,000	$5	~6 working days
Creator	100,000	$22	Full month with headroom
Pro	500,000	$99	Heavy use across multiple projects

For my usage pattern (6-10 hours of Claude Code per day, 5-6 days per week), the Creator plan covers it. The announcements are short. A typical completion announcement like "Taken care of, sir. Three articles deployed to production with all AI scores passing." is 78 characters. At 0.5 credits per character on turbo, that is 39 credits per announcement. The math works out to roughly $0.01-0.02 per announcement at Creator plan rates.

Free Alternatives on macOS

If you want voice output without any recurring cost, macOS has built-in text-to-speech via the say command. No API key, no network dependency, zero latency to first audio. A minimal version of the script:

#!/bin/bash
say -v Daniel "$1"

The Daniel voice is a British English option that ships with macOS. Other voices are available in System Settings > Accessibility > Spoken Content > System Voice. You can download higher-quality voices there as well.

Approach	Voice Quality	Latency	Cost	Offline Capable
ElevenLabs API	Excellent, near-human	~300ms (network dependent)	$0-99/month	No
macOS `say` (default voices)	Functional, robotic	Instant	Free	Yes
macOS `say` (downloaded premium voices)	Good, natural cadence	Instant	Free	Yes

I chose ElevenLabs because the voice quality makes a meaningful difference over hours of listening. The built-in voices work, but they sound like what they are: synthesized speech. After a full day of hearing announcements, the naturalness of ElevenLabs reduces fatigue. That said, say is a perfectly viable starting point, and you can always upgrade later.

Operational Notes

Latency Tuning

The eleven_turbo_v2 model targets ~300ms time-to-first-byte for streaming. In practice, I see 250-400ms depending on network conditions and text length. For the short announcements Claude Code produces, the entire audio clip typically finishes generating before the first sentence finishes playing. The perceived latency is the time between Claude Code's bash call and audible sound: roughly half a second.

If latency matters more than voice quality for your use case, ElevenLabs also offers eleven_flash_v2_5 which targets sub-200ms latency at slightly reduced quality. For short announcements, the quality difference is negligible. Swap the model_id in the script to try it.

Failure Handling

If the ElevenLabs API call fails (network issue, expired key, rate limit), the script falls back to the macOS say command. You hear a brief "ElevenLabs unavailable" notice followed by the original announcement in the local Daniel voice. No announcement is ever lost. The fallback adds ~1 second of overhead compared to ElevenLabs streaming, but the tradeoff is worth it: you always know what your agent just did. Claude Code continues working regardless because the script runs in the background with &.

Volume and Environment

I run this in a home office. The announcements play through my desk speakers at conversation volume. In a shared office, you would want headphones or a lower volume. The mpv player respects system volume, so adjusting macOS volume works without script changes. For per-script volume control, add --volume=50 to the mpv flags (50 = half volume).

Multiple Concurrent Agents

If you run multiple Claude Code sessions simultaneously (I sometimes do, using Task agents in parallel), the announcements overlap. Each agent invokes its own speak.sh call, and mpv instances play concurrently. The voices layer on top of each other, which is occasionally confusing. One solution: assign different voices to different project directories by using project-level .env files instead of a single global one.

Multi-session voice routing with per-project voices

Key Takeaways

The setup is trivial. One bash script, one .env file, one prompt block in CLAUDE.md. Under thirty minutes from start to hearing your first announcement. No Python, no Node, no containers.
The prompt engineering matters more than the script. The CLAUDE.md instructions that define when to speak, what tone to use, and how to vary phrasing turn a raw TTS call into a coherent interaction pattern. Invest time tuning the personality and the variation rules.
Background execution is critical. Always append & to the speak command. Voice output should never block agent work. A silent, fast agent beats a vocal, slow one every time.
Cost is negligible. Individual announcements cost roughly a penny each. Even heavy daily use runs $0.50-1.00 per day on the Creator plan. The free macOS say command works if you want zero cost.
Voice creates presence. A silent terminal agent is easy to ignore. A speaking agent feels like a collaborator. That psychological shift changes how you structure your work: you delegate more freely, context-switch more confidently, and catch errors faster.

Why Voice Output Changes the Workflow

The Attention Problem

What Voice Adds

The Architecture: Three Components

Component Overview

Dependencies

The Script

The .env File

speak.sh

How It Works

Choosing a Voice

Voice Selection Tips

The CLAUDE.md Prompt

Why This Prompt Structure Works

Prompt Placement

Cost Analysis

Typical Usage

Free Alternatives on macOS

Operational Notes

Latency Tuning

Failure Handling

Volume and Environment

Multiple Concurrent Agents

Key Takeaways

Additional Resources