Introduction

VoiceLayer is a push-to-talk voice transcription SDK that adds voice input to any web application in two lines of code. Focus an input, hold a key, speak — your words appear instantly.

What is VoiceLayer?

VoiceLayer wraps browser microphone access, streams audio over a WebSocket to Deepgram's Nova-3 model, and injects the live transcript directly into whatever input the user is focused on. It works on any <input>, <textarea>, or contenteditable element — no configuration needed.

On mobile, a tap-to-speak pill appears automatically. On desktop, users hold the spacebar (or click the pill). Transcription is streaming — words appear as they're spoken, not after a delay.

How it works

User holds key / taps pill SDK captures microphone WebSocket → Deepgram Nova-3 Text injected into input

Key features

Open source

The SDK and server are MIT licensed and available on GitHub. You can self-host the entire stack with your own Deepgram key, or use our hosted backend for zero-config setup.

GitHub: github.com/klhenry/voicelayer-sdk — SDK + WebSocket server + demo

Next steps