How I Built a Browser-Only Screen Recorder (WebCodecs + ffmpeg.wasm)
Capturing, compositing, and rendering an MP4 entirely in the browser - no backend - and the three browser bugs that nearly beat me.
A while back I needed to send a teammate a 30-second screen recording. Every tool wanted me to install an app or create an account first - for half a minute of video. So I built the thing I actually wanted: open a tab, hit record, get an MP4. No install, no signup, and nothing uploaded unless you choose to share. It became YoRecord, and it runs entirely in the browser - capture, editing, and the final MP4 render all happen on your device.
Here is how it works, and the three browser bugs that taught me how the platform really behaves.
Capture is the easy part
Modern browsers hand you most of this for free:
// screen / window / tab
const screen = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: true });
// webcam + mic
const cam = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
const rec = new MediaRecorder(screen, { mimeType: "video/webm;codecs=vp9" });
rec.ondataavailable = (e) => chunks.push(e.data);
rec.start();That gets you a recording blob. The hard part is everything after you hit stop.
The hard part: a real, edited MP4 - client-side
I wanted trimming, a draggable webcam overlay, zoom effects, and AI subtitles, then a clean MP4 export. The normal way to do that is a server running ffmpeg. I did not want a server - partly cost, mostly because I did not want to upload anyone's screen just to make the product work.
Two browser APIs make it possible with no backend:
- ffmpeg.wasm (
@ffmpeg/ffmpeg) - the full ffmpeg compiled to WebAssembly: muxing, audio filters, format conversion. - WebCodecs (
VideoEncoder/VideoDecoder) - low-level, hardware-accelerated frame encode and decode.
The exporter composites the screen frame and the webcam frame onto a canvas each tick, then encodes. I ended up with two paths: a simpler canvas.captureStream() + MediaRecorder path, and a WebCodecs path that demuxes, decodes, composites, and encodes entirely inside a Web Worker. That second path exists because of bug #1.
Bug #1: the export dies when the tab loses focus
canvas.captureStream() + MediaRecorder runs on the main thread, and browsers throttle background tabs hard. Switch away mid-export and the render slows to a crawl or stalls - not acceptable for “record, then go do something else while it renders.”
Fix: move the whole encode into a Web Worker using WebCodecs. VideoEncoder in a worker keeps running at full speed even when the tab is backgrounded. If WebCodecs is not available, it falls back to the main-thread path.
Bug #2: the webcam freezes for a second in the exported video
This is the one that cost the most sleep. canvas.captureStream(30) does not give you 30fps - it samples the canvas only when it changes. During a static stretch of screen (someone reading a slide), the screen source can drop to roughly one frame per second.
My export loop was “emit one output frame per source frame.” So when the screen went quiet, the loop stalled too - and the webcam, which was very much still moving, got frozen on whatever frame coincided with that slow tick. The exported MP4 had one-second webcam freezes.
Fix: drive the export at a fixed 30fps instead of following the source. Each tick, grab the latest available screen and webcam frames and emit a frame, even if the screen has not changed:
// fixed-rate loop, not "one output frame per source frame"
while (t < durationMs) {
await encodeFrame(latestScreenFrame, latestWebcamFrame, t);
t += 1000 / 30;
await sleep(0); // yield so the async decoder feeds can advance
}That sleep(0) is load-bearing: without it, fast test mocks (no encoder backpressure) starve the decode feeds and the loop never terminates.
Bug #3: audio drifts about 100ms behind the video
To record the microphone and system audio together, I mix them through a WebAudio graph before handing them to the recorder. That graph adds about 100 milliseconds of processing and encode latency that the video track does not have - so the audio lands slightly late, and over a few minutes it is noticeable.
Fix: measure the latency (AudioContext.baseLatency + outputLatency plus a small empirical constant) and compensate. The fun part is that preview and export fix it in opposite directions for the same net effect - preview nudges the webcam later, export trims the audio's head earlier. (Do not reach for adelay here; it prepends silence and shifts the audio the wrong way. Ask me how I know.)
What I deliberately did not do in the browser
Being honest about the edges:
- AI subtitles send the audio - just the audio - to a Whisper transcription API. That part is not local. Everything else (capture, edit, render, export) stays on your device, and nothing is uploaded unless you create a share link.
- It is Chrome, Edge, and Firefox on desktop today - no Safari, no mobile.
- Free exports carry a small “Made with YoRecord” watermark.
Takeaways
getDisplayMedia+MediaRecordergets you a recording in an afternoon. The render and export is 90% of the work.captureStream(fps)is change-driven, not clock-driven. Drive your own fixed-rate loop.- Anything heavy (encoding) belongs in a Worker, or background tabs will throttle it.
- WebAudio mixing adds real latency - measure and compensate, do not eyeball it.
You can try the result at the recorder - no signup, it records in a few seconds. It is a free Loom alternative if that is what brought you here.