11.9 Seconds: Building an Async Upload Framework with RabbitMQ at Capillary

11.9 seconds. That's how long a user stares at a spinner after uploading a WhatsApp video through Creatives.

Not a loading bar with progress feedback. A spinner — the universal symbol of "we have no idea when this will be done." No partial state. No acknowledgment. Just the full synchronous weight of S3 uploads, external API calls, and FFmpeg thumbnail generation dumped onto a single request thread, with the user on the other end waiting for all of it to finish before anything moves.

Context

Creatives is the asset management layer inside Capillary's Engage platform — where marketing teams upload images, videos, and PDFs that get delivered across WhatsApp, Viber, Email, SMS, and other channels. Every asset a brand sends through Engage passes through this pipeline. For WhatsApp specifically, uploaded media has to be registered with the BSP (Business Service Provider — Karix or Haptic in our case) before it can be sent. That BSP registration step is the problem.

The problem

The upload request was synchronous end-to-end. The user hits upload and waits for the API to finish every downstream operation before returning anything. S3 upload, BSP registration, FFmpeg thumbnail generation, MongoDB write — all of it, in series, before a 201 Created arrives. For a WhatsApp video, that meant 11.9 seconds of spinner. The user has no indication of progress, no partial state, no way to do anything else in the UI. The API Pod was sitting at 85% CPU, largely because threads were blocked waiting on a third-party API call we have no control over.

Where those 11.9 seconds actually went

Before we could fix anything, I mapped the sequence diagram. The Creatives upload pipeline had six actors in the synchronous flow: UI → API Pod → MongoDB, FileService (S3), WhatsApp BSP (Karix/Haptic), FFmpeg.

For a video upload on WhatsApp, the timeline looked like this:

T=0ms — POST /assets/video (multipart/form-data) received
T=700ms (+700ms) — Upload file to S3, get file_handle + version back
T=900ms (+200ms) — Fetch asset metadata from FileService
T=8400ms (+7500ms) — Upload media to WhatsApp BSP (Karix/Haptic)
T=11400ms (+3000ms) — FFmpeg generates video thumbnail → screenshot.png
T=11400ms — Thumbnail uploaded, thumbnail_url returned
T=11900ms (+500ms) — Asset entity saved to MongoDB → 201 Created returned to client

Seven and a half seconds on the WhatsApp BSP call alone. That's 63% of total request time spent on a single external API call we have no control over. FFmpeg thumbnail takes another 3,000ms on top. The S3 upload, metadata fetch, and MongoDB write — the stuff we actually own — takes 1,400ms combined.

The API Pod was running at 85% CPU. This isn't a code efficiency problem. This is an architectural problem.

The constraints

The external BSP call was not removable. Karix and Haptic are third-party integrations with fixed latency characteristics — that 7,500ms is not a code problem, it's a network and API problem. The 201 sync path had to keep working for channels not yet moved to async — we couldn't force a coordinated cutover across all channels at once. And the frontend state management had to be owned client-side: the backend could not maintain long-lived connections or push events. RabbitMQ infrastructure already existed in cap-creatives-api — that was already the tooling decision.

What we considered

For the backend: a job queue was the clear path, but which one. RabbitMQ was already there, which made it the obvious choice over SQS or Kafka — not because it was the best message broker for this use case in the abstract, but because adding infrastructure dependencies for an incremental improvement is a bad tradeoff.

For the frontend status tracking: Server-Sent Events and WebSockets were both on the table. SSE would have given us server-push without client polling, but it requires persistent server-side connections — every active upload holds a connection open. WebSockets have the same connection overhead. Polling on a bounded 90-second window is operationally simpler, eliminates persistent connections, and for a problem with a known maximum duration, adds no meaningful UX degradation. Simple wins when the constraints don't demand complex.

The decision

RabbitMQ + 202 Accepted + client-side polling. That's what we built. The BSP call moves off the request thread. The frontend gets an assetId and polls for completion. The 201 path stays unchanged. No forced coordination across channels.

The cost: the frontend now owns upload state. A synchronous endpoint either returns a result or fails — the caller's responsibility ends there. An async endpoint returns a tracking key and a promise. The caller has to manage the waiting, the timeout, the failure state, the concurrent-upload safety. That work has to live somewhere, and we put it in the client. Whether that's the right tradeoff depends on who has to maintain it. The frontend complexity is real; it's documented in the extension pattern so the next engineer can inherit rather than rebuild it.

The backend fix — RabbitMQ and the 202 contract

The fix on the backend is conceptually simple: decouple the slow stuff. The upload endpoint now accepts the request, drops the job into a RabbitMQ queue, and returns immediately. Processing happens downstream. The user gets control of the UI back in under two seconds instead of waiting twelve.

The mechanism: HTTP 202 Accepted.

POST /assets/video
→ 202 Accepted
→ { assetId, asset: { _id, type, ... }, processingStatus: 'processing' }

That's the entire contract. The frontend gets an assetId and a status. The backend handles S3, BSP, FFmpeg, and MongoDB in the background. RabbitMQ lives entirely in cap-creatives-api — queue topology, exchange names, routing keys — none of that is visible to the client. The 202 status code is the only API surface that changed.

The 201 path still works unchanged. Channels not yet moved to async continue functioning — zero migration risk, zero forced coordination.

The frontend architecture — where the interesting engineering actually lives

Here's the thing about a 202 response: you haven't solved the UX problem, you've just moved it. The user still needs to know when their upload is done. You've traded a synchronous wait for an asynchronous one, and now you own the state management on the client side.

The backend being async made this easy on one end and interesting on the other. The frontend had to build a complete upload tracking framework — polling engine, state management, concurrency safety, cancellation — without knowing anything about RabbitMQ internals.

This is where most of the engineering work actually happened.

The 202 branch

The saga checks one thing when an upload completes:

statusCode === 202 || responseData?.processingStatus === 'processing'

If 202: extract assetId, start polling. If 201: existing sync flow, untouched.

The assetId becomes the tracking key for everything downstream.

The polling engine

Centralized in app/sagas/assetPolling.js. Channel-agnostic — the same loop handles WhatsApp, Viber, and anything else wired to the framework.

The parameters:

1s initial delay — avoids an immediate poll before the backend has had time to process anything
2s fixed interval — polls GET /assets/{type}/{assetId}/status
90s hard ceiling — client-side safety net for worker crashes; about 44 max requests in the window

Fixed interval, not exponential backoff. Backoff makes sense when you're dealing with overloaded servers and want to reduce load. Here the polling window is bounded at 90 seconds — exponential backoff would just mean worse UX with no meaningful load reduction. Simple and predictable wins.

Status flow: processing → completed | failed | timeout.

timeout is a client-side sentinel. The backend never returns it. The UI generates it after 90 seconds to prevent a permanent spinner if a backend worker crashes without reporting failure.

The video two-phase completion edge case

Video uploads have a subtlety. Backend marks the job completed when processing finishes — but the thumbnail may not be attached yet. metaInfo.video_file_path_preview lags behind status: completed in some cases.

Upload done ≠ thumbnail ready.

The saga handles this transparently inside the shared polling loop: even after the backend reports completed, polling continues until video_file_path_preview is set. No special-casing in the UI layer. The caller doesn't need to know this edge case exists.

Concurrent upload safety — assetId-keyed state

Multiple uploads should be able to run concurrently without state collisions. The solution is structural, not procedural:

assetProcessing: {
  [assetId]: {
    status: 'processing' | 'completed' | 'failed' | 'timeout',
    startTime: <epoch ms>,
    asset: { ... },
    error: null | 'string',
  }
}

Each upload lives in its own keyed slot. No shared mutable state. No locking primitives. Concurrent uploads can't collide because they're never operating on the same key.

The assetId the backend assigns in the 202 response is the key. The backend does the work of making them unique. The frontend just uses them.

`takeLatest` cancellation

If the user uploads again before the previous job resolves, the stale polling saga needs to be torn down — otherwise you end up with two competing sagas writing to overlapping state.

Redux-Saga's takeLatest handles this automatically. When a new upload saga starts, the previous one is cancelled. No manual cleanup, no tracking which saga is "current." The framework handles it.

On failure, the saga dispatches onFailed and terminates. No infinite retry — network errors surface to the UI immediately.

The extension pattern

WhatsApp and Viber are live. MobilePush, Email, SMS, and RCS need no new API layer to adopt this — the generic utilities already exist. Wiring a new channel to the async framework takes three factory functions and about 15 lines:

// 1. Constants — generates 4 action types
createAsyncAssetUploadConstants('MOBILEPUSH', prefix)

// 2. Reducer — generates 4 handlers
createAsyncAssetUploadReducerCases({ PROCESSING, COMPLETED, FAILED, TIMEOUT })

// 3. Polling config — wires the saga
createPollingConfig(assetType, assetId, actions, templateType)

That's the entire wiring surface. The polling loop, state shape, cancellation, two-phase video handling — all inherited. Each new channel adopts the framework by describing itself to it, not by reimplementing it.

The full walkthrough lives in docs/Generic_Async_Asset_Upload_Guide.md — an internal document in the cap-creatives-ui repo.

What changed

The numbers, measured post-rollout Feb 3–7, 2026 (CAP-161024):

P99 response time: 6.69s → 1.82s — 76% faster
Trace duration: 4,000ms → 500ms — 88% faster
Error rate: 0% — no regression

The P99 of 6.69s is across all asset types — images upload faster since there's no FFmpeg or BSP call. For video specifically, the synchronous path was 11.9s. Both numbers are real: the P99 captures the fleet average, the 11.9s captures the worst case the user actually experienced.

The API Pod CPU dropped off the 85% ceiling. The external BSP call no longer pins a request thread.

What this bought

The 202 contract is tiny. One status code change on the backend, one branch in a saga on the frontend. The backend can change its queue topology, add worker replicas, or swap brokers without the frontend knowing. The frontend can change its feedback model without touching the API. They evolve independently — and for a platform with five more channels that aren't async yet, that matters.

The harder part wasn't making the async path work. It was making it safe — concurrent uploads, stale saga cancellation, the video thumbnail edge case — and making it extensible so the next engineer wiring MobilePush or Email doesn't have to think about any of this. Three factory functions and fifteen lines. That's the whole wiring surface.

If I were starting this again, I'd build the extension pattern before wiring the first channel. WhatsApp and Viber were built as one-offs before the framework existed — the framework emerged from the repetition, which meant a refactor. You can't always design abstractions in advance, but treating the second channel as a forcing function for the pattern would have saved a pass.

One 202 status code. The rest is just engineering.