Introduction

VoiceLayer is a push-to-talk voice transcription SDK that adds voice input to any web application in two lines of code. Focus an input, hold a key, speak — your words appear instantly.

What is VoiceLayer?

VoiceLayer wraps browser microphone access, streams audio over a WebSocket to Deepgram's Nova-3 model, and injects the live transcript directly into whatever input the user is focused on. It works on any <input>, <textarea>, or contenteditable element — no configuration needed.

On mobile, a tap-to-speak pill appears automatically. On desktop, users hold the spacebar (or click the pill). Transcription is streaming — words appear as they're spoken, not after a delay.

How it works

User holds key / taps pill → SDK captures microphone → WebSocket → Deepgram Nova-3 → Text injected into input

Key features

Works on any input — drop the script tag and every input and textarea on the page gets voice input automatically
Streaming transcription — words appear as you speak using Deepgram Nova-3, not after you stop
Mobile + desktop — tap-to-speak on mobile, hold-to-speak on desktop, both work simultaneously
Framework agnostic — works with React, Next.js, Vue, plain HTML — anything that runs in a browser
Open source — MIT licensed, self-hostable, auditable
2KB SDK — tiny footprint, no dependencies in the browser bundle

Open source

The SDK and server are MIT licensed and available on GitHub. You can self-host the entire stack with your own Deepgram key, or use our hosted backend for zero-config setup.

GitHub: github.com/klhenry/voicelayer-sdk — SDK + WebSocket server + demo

Next steps

Quickstart — add voice input to your app in 2 minutes
SDK Reference — full API documentation
Self-Hosting — run your own VoiceLayer backend