GrowthStack — Building an End-to-End Experimentation Framework at BrowserStack

Changing who sees what required a full code change, a PR review, a deploy, and a wait. If you wanted to exclude a specific plan tier from an experiment mid-flight, you were looking at two days minimum. That's not experimentation. That's configuration theater.

The growth team had been running experiments that way for months. Every targeting change went through engineering. Every iteration cycled through the deploy pipeline. The constraint wasn't ideas — it was infrastructure.

The problem that forced the decision

BrowserStack has a sprawling pricing surface. Multiple product lines, plan tiers, user types, billing states — the combinations multiply fast. Growth experiments aren't just "show variant A to 50% of users." They're "show this upgrade nudge to free-trial users on the Automate product who haven't used the API in the last 30 days, excluding enterprise accounts." That kind of rule complexity doesn't fit into most off-the-shelf tools gracefully.

Beyond the targeting complexity, there was a throughput problem. The checkout surface and pricing pages carry serious load. Any experimentation layer running on top of them needs to be fast and silent — no flicker, no delayed rendering, no visible state changes after page load. Users in a pricing experiment can't see the wrong variant for half a second before the correct one snaps in. That's not a UX concern. That's a conversion concern.

We were running growth experiments manually before this. Hardcoded conditions, feature-flagged deployments, slow audience iteration. The product team wanted to accelerate experiment velocity. Engineering wanted a system that could actually scale to 200+ experiments without becoming a maintenance crisis.

Those two requirements pointed in the same direction: we needed infrastructure.

What we considered

The obvious alternatives were Wingify, GrowthBook, and LaunchDarkly. We evaluated them seriously. The problem wasn't capability — these are mature tools. The problem was fit.

GrowthBook is open-source and flexible, but it's built around the assumption that your targeting rules are relatively standard. Our rules needed to understand BrowserStack plan state, billing tier, product-specific usage history, and account type. Plugging that context in cleanly would have required building custom integrations on top of a tool that wasn't designed around our data model.

LaunchDarkly solves the targeting problem but comes with pricing at our request volume that made the conversation complicated. More importantly, it would have meant externalizing control over something that was increasingly becoming critical growth infrastructure.

Wingify's VWO is oriented toward marketing experiments — visual editors, JavaScript snippet injection, campaign-style management. That's a different model from what we needed, which was experiment logic embedded in application code paths, not layered on top of the DOM.

The in-house route wasn't a "we'll build it better" decision. It was a "the fit gap is large enough that building is faster than adapting" decision. That's a different claim, and it matters. We weren't building GrowthStack because it's more fun to build than to buy. We were building it because the BrowserStack-specific requirements — pricing plan awareness, deep monorepo integration, experiment logic tied to billing state — would have cost more to retrofit onto an existing tool than to build correctly from the start.

The architecture: three layers

GrowthStack is decomposed into three distinct responsibilities. This separation was intentional, and it's what makes the system composable across 10+ products without turning into a spaghetti of experiment conditions scattered across every codebase.

Layer 1: Allocate

This is where users enter experiments. Allocate handles trigger points — the conditions under which a user becomes eligible for an experiment — and runs rule-based evaluation against those conditions.

Rules can target by plan type, user attributes, product context, geography, account size. When a user matches the conditions for an active experiment, Allocate assigns them to a variant: control or treatment. This assignment is stored and consistent — a user doesn't flip variants between sessions.

The rule creation engine is the core of this layer. It's the thing that took audience modification from a two-day deploy cycle to instant. A growth PM can update experiment targeting criteria through the UI, and it takes effect immediately. No code change, no PR, no deploy. The rules evaluate at runtime against the current user context.

Layer 2: Evaluate

Evaluate runs as an interceptor before any page or workflow loads. At initialization, it checks whether the current user is part of any active experiment and resolves which variant they should see.

This is the moment-of-truth layer. Allocate may have assigned a user to an experiment days ago. Evaluate is what determines, right now at page load, what that assignment means for this specific page state. It handles the case where a user was allocated to an experiment that has since been paused, or where the variant logic has changed.

The reason Evaluate runs before render is to prevent flicker. If experiment resolution happened after page load, users would see the control state briefly before snapping to their assigned variant. At the conversion rates BrowserStack operates at, that flicker has measurable cost. Evaluate makes that a non-issue by resolving before anything is rendered.

Layer 3: Expose API

Downstream teams don't interact with Allocate or Evaluate directly. They consume the Expose API — a set of utility methods that answer one question: "Is this user/group part of experiment X?"

Product teams call these methods to gate features, render experiment-specific components, or apply variant-specific logic on the backend. The complexity of allocation and evaluation is completely abstracted away. A consuming team doesn't need to understand how the rule engine works. They call isUserInExperiment('upgrade-nudge-v2') and get a boolean.

This design decision — treating Expose as a clean API surface rather than asking teams to integrate with the internals — is what made adoption across 10+ products viable. Each team got a stable contract. The internals could change without affecting them.

The Growth Package and service layer

GrowthStack by itself handles experimentation. But in a monorepo housing multiple products and shared packages, there's another problem: how do product teams actually consume it without recreating integration code in every application?

The Growth Package is the answer. It's a shared library that sits in the monorepo and gives product teams a consistent way to interact with GrowthStack — component-level experiment logic, state management, API communication.

Before the Growth Package service layer, product teams that wanted to integrate experiment logic into their UIs had to coordinate with the product engineering team for any change. The service layer removed that dependency. All the business logic for growth experiments lives inside the Growth Package. Product teams consume it. The engineering team owns it. No coordination overhead for change requests.

The state management side handles ~280K API calls weekly with an average response time of 50ms. Redux's observer pattern handles efficient re-rendering without triggering unnecessary component updates when experiment state changes. These aren't vanity metrics — at the throughput we're operating at, inefficient state management would show up as measurable latency on pricing pages.

What we gave up

The stats engine is a gap. GrowthStack handles allocation, evaluation, and exposure. Metric tracking is built in. But the statistical analysis layer — significance calculation, p-values, confidence intervals — isn't deeply implemented. We're not running frequentist vs Bayesian significance calculations natively in the platform. Experiment results require manual analysis or external tooling to interpret rigorously.

This was a conscious scope decision. The highest-value parts of the system for the growth team were speed of targeting and experiment volume. Statistical rigor was a valid next phase, not a launch requirement. We shipped what unlocked velocity and deferred what required research.

The other tradeoff is that we now own this infrastructure. Third-party tools come with vendor support, ongoing development, and a community. GrowthStack is ours, which means the maintenance burden and the evolution cost is also ours. That's fine — the system is relatively stable and the team understands it deeply — but it's worth being honest about.

What actually happened

The numbers from the first month: eight concurrent experiments running, 700K requests processed weekly, 98% targeting accuracy. Experiment launch timelines dropped by 40% compared to the manual process. The two-day cycle to modify audience targeting is gone.

The experiments we powered in the first phase covered the full growth funnel — purchase nudges for free trial users, upgrade CTA prominence, the uninterrupted testing offer, Paywall Removal, user addition workflows for paid groups, upsell and cross-sell flows. These are the experiments that move conversion rate and expansion revenue. Running eight of them simultaneously, with independent targeting rules and variant logic, is something the previous system couldn't have handled.

The platform was designed for 200+ experiments. We're at eight. The architectural headroom was intentional — we built for where the growth team wants to be, not just where they are now.

The teams building on top of GrowthStack confirmed the outcome directly. The FTU to Paid Groups initiative — converting free users to paid through a redesigned journey with multiple touchpoints and workflows — ran on this infrastructure. Bhavnoor Singh, who led the initiative, noted that the framework "led to high confidence in the project" through better execution and implementation. Pankaj Vadnal, a product manager who built experiments on GrowthStack himself, said it plainly: "It has certainly reduced development time and streamlined our processes" — and called it "a strong foundation for future work within the team."

What changed

The most significant shift isn't in the metrics. It's in who controls experiments.

Before GrowthStack, experiment velocity was limited by engineering bandwidth. Changing who saw what required an engineer to write code, get it reviewed, and deploy it. The growth team was bottlenecked on every iteration.

After GrowthStack, the growth team can modify targeting rules without touching engineering. They can pause an experiment that's underperforming, adjust the audience, and relaunch — without a single PR. Engineering built the infrastructure once. The business moves inside it now.

Darsan Tatineni, an engineering manager at BrowserStack, framed the delivery directly: "Successful delivery of the GrowthStack framework will simplify the development of the allocation and trigger logic of Growth owned experiments." That's the shift. The allocation and trigger logic — which previously required engineering to touch for every targeting change — now has infrastructure around it.

That's the design goal that matters. Not "we built an A/B testing platform." The goal was to shift the constraint from engineering bandwidth to product thinking. GrowthStack does that. The constraint is now on the quality of hypotheses, not on the speed of deployment.

Whether that's the right constraint to optimize for is a question worth sitting with.

Recognition

Pankaj Vadnal recognizing Harish for creating and driving the GrowthStack Framework from the ground up

Darsan Tatineni recognizing Harish for the successful delivery of the GrowthStack framework

Bhavnoor Singh recognizing Harish for the FTU to Paid Groups initiative

Akanksha recognizing Harish, Santhosh, and Vipul for the Paywall Removal rollout

Santhosh Marsaline recognizing Harish for the Paywall Removal rollout