Feature Flags: From Concept to Cultural Revolution
by Ben Nadel
#Table of Contents
- Caveat Emptor
- Of Outages And Incidents
- The Status Quo
- Feature Flags, An Introduction
- Key Terms and Concepts
- Going Deep on Feature Flag Targeting
- Types of Feature Flags
- Life-Cycle of a Feature Flag
- Coding Defensively
- Server-Side vs. Client-Side
- There's No Place Like Production
- Life Without Tests
- KISS: Keep It Super Simple
- Ownership Boundaries
- Logs, Metrics, and Feature Flags
- Bridging the Sophistication Gap
- The Cost of Feature Flags
- Not Everything Can Be Feature Flagged
- Build vs. Buy
- Overthinking Analytics
- Stop Reading Here If You Work Alone
- People Like Us Do Things Like This
- The Democratization of Product Design
- Co-creating the MVP (Minimum Viable Product)
- An Opinionated Guide To Pull-Requests (PRs)
- Dynamic Code, Dynamic Teams
- Removing the Cost of Context Switching
- Measuring Team Productivity
- The Goal Gradient
- I Eat, I Sleep, I Feature Flag
#Caveat Emptor
I have opinions. Often, these opinions are strong; and, in most cases, these opinions are strongly held. But, they are just my opinions. In this book, I will speak with an air of authority because I believe deeply in what I am saying based on what I have seen: a team transformed. But, what I am saying is based on my own experience, context, and organizational constraints. What I say may not always apply perfectly to you and your situation.
You are a discerning, creative person. You are here because you are enrolled in the work of building better products; of building more effective teams; and, of delivering more value to your customers. Do not let that creativity take a back seat as you read this book. Be critical, but open; question my assertions, but do not dismiss them out of hand.
Feature flags are a deceptively simple concept. It can be hard to understand the extent of the impact they have on your team because the implications aren't just technical. If all you learn from this book is how to use feature flags as a means to control flow within your software, this book will be worth reading. However, the true value of what I'm sharing here is in the holistic cultural change that feature flags can bring to every part of your product development life-cycle.
This book does not represent an all-or-nothing approach to product development. But, I do believe that the more you take from this book, the more you will get.
#Of Outages And Incidents
I used to tell my people: "You're not really a member of the team until you crash production".
In the early days of the company, crashing production—or, at least, significantly degrading it—was nothing short of inevitable. And so, instead of wielding each outage as a weapon, I treated it like a right of passage: an attempt to create a safe space in which my people would learn about and become accustomed to our product.
Engineers need to ship code. This isn't a company mandate, it's a matter of self-actualization. Pushing code to production benefits us—and our mental wellbeing—just as much as it benefits our customers. But, when deployments become fraught, engineers become fearful. They begin to overthink their code and under-deliver on their commitments.
This wasn't good for the product. And, it certainly wasn't good for the team. Not to mention the fact that it created an unhealthy tension between our Executive Leadership Team (ELT) and—well—everyone else.
The more people we added to our engineering staff, the worse this problem became. Poorly architected code with no discernible boundaries soon lead to tightly coupled code that no one wanted to touch let alone maintain. Code changed in one part of the application often had a surprising impact over in another part of the application.
The whole situation felt chaotic and unstoppable. The best we could do at the time was prepare people for the trauma. We implemented an "IC Certification" program. The "IC"—or, Incident Commander—was responsible for getting the right people in the (Slack) room; and then, liaising between the triage team and the rest of the organization.
To become IC certified, you had to be trained by an existing IC. You had to run through a mock incident and demonstrate that you understood:
- How to identify the relevant teams.
- How to page the relevant teams, waking them up if necessary.
- How to start the "war room" Zoom call.
- How to effectively communicate the status of the outage (and estimated time to remediation).
This IC training and certification program was mandatory for all engineers. The issues were very real and very urgent; and, we needed everyone to be ready.
The certified ICs were good at communicating status, but each IC had their own style—their own way of writing and presenting updates to the team. This lead to a lot of fumbling and inconsistency, which ultimately distracted us from the goal at hand.
To address this, I built a small utility that brought order to the output: Incident Commander (www.incident-commander.com). This online tool provided ICs with a way to translate status updates into pre-formatted Slack message which the IC could then copy-and-paste into the #incident
channel.
As a team, we became quite adept at responding to each incident. And, in those early days, this coalescing around the chaos formed a camaraderie that bound us together. Even years later, I still look back on those Zoom calls with a fondness for how close I felt to those that were there fighting alongside me.
But, the kindness and compassion we offered each other meant nothing to our customers. The incidental joy we felt from a shared struggle was no comfort to those that were paying us to provide a stable product.
Our CTO (Chief Technical Officer) understood this. He never measure downtime in minutes - he measured it in lost opportunities. He painted the picture of customers, victimized by our negligence:
"People don't care about SLOs (Service Level Objective) and SLAs (Service Level Agreements). 30-minutes of downtime isn't much on balance. But, when it takes place during your pitch meeting and prevents you from securing a life-changing round of Venture Capital, 30-minutes means the difference between a path forward and a dead-end."
He put a Root Cause Analysis (RCA) process in place, and personally reviewed every write-up. Remediating an incident was only a minor victory, preventing the next incident was the real goal. Each RCA included a technical discussion about what happened, how we became aware of the problem, how we identified the root cause, and the steps we intended to take in order to prevent it from occurring again.
The RCA process—and the Quality Assurance Items (QAI) that they generated—did create a more stable platform. But, a platform is merely the foundational world that lives below the surface of the product. Most of the work that we were doing took place above the platform, in the ever-evolving user-facing feature-set. A stable platform is a necessity. But, as the platform stabilized, the RCA process began to see a diminishing return on investment (ROI). Even as the platform improved, the outages continued to happen.
In a last ditch effort to effect better outcomes, a Change Control Board (CCB) was put in place. A CCB is a group of subject matter experts (SME) that must review and approve all changes being made to the product.
A Change Control Board is the antithesis of worker autonomy, it is the antithesis of productivity. A Change Control Board says, "we don't pay you to think." A Change Control Board says, "we don't trust you to use your best judgement." If workers yearn to find fulfillment in self-direction, enhanced responsibility, and a sense of purpose, the Change Control Board seeks to strip responsibility and treat workers as nothing more than mindless resources.
And yet, with the choke-hold of the CCB in place, the incidents continued.
After working on and maintaining the same product for over a decade, I have the luxury of hindsight and experience. I can see what we did right and what we did wrong. The time and effort we put into improving the underlying platform was time well spent. Without a solid foundation on which to build, nothing much else matters.
Our mistake was in trying to create a predictable outcome for the product. We slowed down the design process in hopes that every single pixel was in the correct location. We slowed down the deployment pipeline in hopes that every single edge-case and feature had been fully-implemented and tweaked to perfection.
We thought that we could increase quality and productivity by slowing everything down. But, the opposite is true. Quality increases when you go faster. Productivity increases when you work smaller. Innovation and discovery happen at the edge, in that unpredictable, heady space where Product and Customer meet.
Eventually, we learned these lessons. Outages and incidents went from a daily occurrence to a weekly occurrence to a rarity. At the same time, productivity went up, team morale was boosted, and our paying customers finally started to see the value we promised them.
And, none of it would have been possible without feature flags.
Feature flags changed everything.
#The Status Quo
There's no "one way" for organizations to build and deploy a product. Even a single engineer will use different techniques in different contexts. When I'm at work, for example, I use a Slack-based chatbot to trigger new deployments; which, in turn, communicates with Kubernetes; which, in turn, executes an incremental rollout of new Docker containers. But, in my personal life—on side projects—I still use FTP (File Transfer Protocol) in order to manually sync files between my local development environment and my VPS (Virtual Private Server).
No given approach to web development is inherently "right" or "wrong". Some approaches do have advantages. But, everything is a matter of nuance; and, every approach is based on some set of constraints and trade-offs. At work, I get to use a relatively sophisticated deployment pipeline because I stand on the shoulders of giants who came before me. But, when I'm on my own, I don't have the ability to create that level of automation and orchestration.
Though many different approaches exist, most build and deployment strategies do have one thing in common: when code is deployed to a production server, users start to consume it. Add a new item to your navigation and—immediately—users start to click it. Change a database query and—immediately—users start to execute it. Refactor a workflow and—immediately—users start to explore it.
We're decades beyond the days of shipping floppy disks and CD-Roms; but, most of us still inhabit a state in which deploying code and releasing code are the same thing. Having users come to us—and our servers—gives us the ability to respond to issues more proactively; but, fundamentally, we're still delivering a "static product" to our customers.
To operate within this limitation, teams will oftentimes commit temporary logic to their control flow in order to negotiate application access. For example, a team may only allow certain parts of an application to be accessed if:
- The user is connecting though the VPN (Virtual Private Network).
- The request is coming from a set of allow-listed IP addresses.
- The authenticated user has an internal company email address.
- The incoming request contains a secret HTTP
Cookie
value. - The incoming URL contains a special query-string parameter.
- The HTTP
User-Agent
contains a magic string.
I've used many of these techniques myself. And, they all work; but, they are all subpar. Yes, they do allow internal users to inspect a feature prior to its release; but, they offer little else in terms of dynamic functionality. Plus, exposing the gated code to a wider audience requires changing the code and deploying it to production. Which, conversely, means that re-gating the code—such as in the case of a major bug or incident—requires the code to be updated or reverted and re-deployed.
Going beyond their hard-coded nature, these techniques often treat deployment as an afterthought. Meaning, they are typically implemented only after a feature has been completed and is now ready for review. This implies that the feature has been articulated and developed in relative isolation.
This is referred to as the "Waterfall Methodology". Waterfall Methodology comprises a set of product development stages that get executed in linear sequence:
- Analysis and requirements gathering.
- Graphic design and prototyping.
- Implementation.
- Testing and quality assurance (QA).
- Deployment to production.
- Maintenance.
Each one of these stages is intended to be completed in turn, with the outputs from one stage acting as the inputs to the next stage.
At first blush, the Waterfall Methodology is attractive because it looks to create structure and predictability. But, this is mostly an illusion. In the best case scenario, the timeline for such a project is grossly underestimated. In the worst case scenario, the engineering team builds the wrong product.
Anecdote: Years ago, I worked at a company that applied the waterfall methodology. On one particular project for a data-management tool, the team carried out the requirements gathering, did some graphic design, and then entered the implementation phase as per usual. But, building the product took a lot longer than anticipated (which is standard even in the "best case" scenario). And, naturally, the client was very upset about the slipping release date.
Finally, after many delays and many heated phone calls and much triaging, the team performed their "big reveal" to the client. And, after walking through the product, the client remarked, "This isn't at all what I asked for."
It turns out, there was a large understanding gap in the requirements gathering phase. This understanding gap was then baked into the design process which was subsequently baked into the engineering process.
The client was furious about the loss of time and money (with nothing to show for it). The engineers were furious because they felt that the client hadn't been forthcoming during the analysis phase (classic victim blaming). And, of course, product management was furious because the project failed and reflected poorly on our firm.
This isn't a black swan event. Many of us in product development have similar stories. As an industry, we mostly agree that the Waterfall Methodology is doomed; and, that "Agile Methodology" is the preferred approach.
The Agile Methodology takes the Waterfall Methodology, shrinks it down in scope, and repeats it over and over again until the product is completed. The cornerstone of Agile is a strong emphasis on "People over processes and tools" using a continual feedback loop:
- Build small.
- Show your work.
- Refactor based on feedback.
- Repeat.
If our team had been using an Agile Methodology (in the earlier anecdote), success would have been achieved. The client would have seen early-on that the product was moving in the wrong direction and they would have told us. In turn, the design and engineering teams would have changed course and adapted to the emergent requirements. And, ultimately, all stakeholders would have been happy with the outcome.
The Agile Methodology is fantastic!
At least, it is within a "greenfield" project—one in which no prior art exists. But, as soon as you release a product to the customers, all subsequent changes are being made within a "brownfield" project. This is where things get tricky. Or, expensive. Oftentimes both.
Agile Methodology has advantages over Waterfall Methodology; but, eventually, both approaches run into the same problem: deploying code to production is still a dangerous proposition. And so, the fear creeps in; and soon, even "agile" teams with the best intentions start to fall back into old waterfall tendencies.
In hopes of leaving nothing to chance, the design process becomes endless. Lack of trust in the engineering team leads to a longer, more tedious QA period. Paranoia about outages means no more deploying on Fridays (or, perhaps, with even much less frequency). Test coverage percentage becomes a target. Expensive staging environments are created and immediately fall out-of-sync. A product manager creates a deployment checklist and arbitrarily makes "load testing" a blocking step.
The whole system—the whole process—starts to creak and moan under the pressure (of time, of cost, of expectation) until, at some point, someone makes the joke:
Work would be great if it weren't for all the customers.
In a business where customer empathy builds the foundation of all great products, wanting to work without customers becomes the sign of something truly toxic: cultural death.
This isn't leadership's fault. Or the fault of the engineers or of the managers or of the designers. This doesn't happen because the wrong technology stack was chosen or the wrong management methodology was applied. It has nothing to do with your people working in-office or being remote. This happens because a small seed of fear takes root. And then grows and grows and grows until it subsumes the entire organization.
Fear erodes trust. And, without trust we don't feel safe. And, if we don't feel safe, the only motivation we have left is that of self-preservation.
This sounds dire (and it is); but, it isn't without hope. All we have to do is address the underlying fear and everything else will eventually fall into place. This transformation may not come quick and it may not come easy; but, it can be done; and, it starts with feature flags.
#Feature Flags, An Introduction
I've been working in the web development industry since 1999; and, before 2015, I'd never heard the term, "feature flag" (or "feature toggle", or "feature switch"). When my Director of Product—Christopher Andersson—pulled me aside and suggested that feature flags might help us with our company's "downtime problem", I didn't know what he was talking about.
Cut to me in 2023—after 8-years of trial, error, and experimentation—and I can't imagine building another product or platform without feature flags. They have become a critical part of my success. I hold feature flags in the same light as I do Logs and Metrics: the essential services on which all product performance and stability are built.
But, it wasn't love at first sight. In fact, when Christopher Andersson described feature flags to me in 2015, I didn't see the value. After all, a feature flag is "just" an if
statement:
if ( featureIsEnabled() ) {
// Execute NEW logic.
} else {
// Execute OLD logic.
}
I already had control-flow that looked like this in my applications (see The Status Quo); as such, I didn't understand why adding yet another dependency to our tech-stack would make any difference in our code (let alone have a positive impact on our downtime).
What I failed to see was the fundamental difference underlying the two techniques. In my approach, changing the behavior of the if
statement meant updating the code and re-deploying it to production. But, in the case of feature flags, changing the behavior of the if
statement meant flipping a switch.
That's it.
No code updates. No deployments. No latency. No waiting.
This is the magic of feature flags: having the ability to dynamically change the behavior of your application at runtime. This is what sets feature flags apart from environment variables, build flags, and any other type of deploy-time or dev-time setting.
To stress this point: if you can't dynamically change the behavior of your application without touching the code or the servers, you're not using "feature flags". The dynamic runtime nature isn't a nice-to-have, it's how feature flags bring both the psychological safety and the democratizing of power and creativity to your organization.
This dynamic nature means that in one moment, our feature flag settings can look like this:

Which means that our application's control flow operates like this:
if ( featureIsEnabled() /* false */ ) {
// ... dormant code ...
} else {
// This code is executing!
}
The featureIsEnabled()
function is currently returning false
, directing all incoming traffic through the else
block.
Then, if we flip the switch on in the next moment, our feature flag settings look like this:

And, our application's control flow operates like this:
if ( featureIsEnabled() /* true */ ) {
// This code is executing!
} else {
// ... dormant code ...
}
Instantly—or thereabouts—the featureIsEnabled()
function starts returning true
; and, the incoming traffic is diverted away from the else
block and into the if
block, changing the behavior of our application in real-time.
But, turning a feature flag on is only half the story. It's equally important that—at any moment—a feature flag can be turned off. Which means that, should we need to (in case of emergency), we can instantly disable the feature flag:

Which will immediately revert the application's control flow back to its previous state:
if ( featureIsEnabled() /* false */ ) {
// ... dormant code ...
} else {
// This code is executing (again)!
}
Even with the illustration above, this is still a rather abstract concept. To convey the power of feature flags more concretely, let's dip-down into the real-world use-case that opened my eyes to the possibilities: refactoring a SQL database query.
The efficiency of a SQL query changes over the lifetime of a product. As the number of rows increases and the access patterns evolve, some SQL queries start to slow down. This is why database index design is just as much art as it is science.
Traditionally, a refactoring of this type might involve running an EXPLAIN
query locally, looking at the query plan bottlenecks, and then rewriting or breaking the SQL apart in order to better leverage existing table indices. The query code, once updated, is then deployed to the production server. And, what the you hope to see is a latency graph that looks like this:

In this case, the SQL refactoring was effective in bringing the query latency times back down. But, this is the best case scenario. In the worst case scenario, deploying the refactored query leads to a latency graph that looks more like this:

In this case, something went terribly wrong. For any number of reasons, the SQL query that performed well in your local development environment does not perform well in production. The query latency rockets upward, consuming most of the database's available CPU. This, in turn, slows down all queries executing against the database. Which, in turn, leads to a spike in concurrent queries. Which, in turn, crashes the database.
If you see this scenario unfolding in your metrics, you might try to roll-back the deployment; or perhaps, try to revert the code and redeploy it. But, in either case, it's a race against time. Pulling down images, spinning up new nodes, warming up containers, starting applications, running builds, executing unit tests: it all takes time—time that you don't have.
Now, imagine that, instead of completely refactoring your code and deploying it, you design an alternate SQL query and gate it behind a feature flag. Code in your data-access layer could look like this:
public array function generateReport( userID ) {
if ( featureIsEnabled() ) {
return( getData_withOptimization( userID ) );
}
return( getData( userID ) );
}
In this approach, both the existing SQL query and the alternate SQL query get "deployed" to production. However, the alternate SQL query won't be "released" to the users until the feature flag is enabled. And, at that point, the if
statement will short-circuit the control flow and all new requests will use the optimized SQL query.
With this feature flag in place, the worst case scenario now looks rather tame:

The same unexpected SQL performance problem exists in this scenario. However, the outcome is very different. First, notice that the "deployment" itself had no effect on the latency of the query. That's because the alternate SQL query was deployed in a dormant state behind the feature flag. Then, the feature flag was enabled, causing traffic to route through the alternate SQL query. At this point, that latency starts to go up; but, instead of the database crashing, the feature flag is turned off, immediately re-gating the code and diverting traffic back to the original SQL query.
You just avoided an outage. The dynamic runtime capability of your feature flag gave you the power to react without delay, before the database—and your application—became overwhelmed and unresponsive.
Are you beginning to see the possibilities?
Knowing that you can disable a feature flag in case of emergency is empowering. This alone creates a huge amount of psychological safety. But, it's only the beginning. Even better is to completely avoid an emergency in the first place. And, to do that, we have to dive deeper into the robust runtime functionality of feature flags.
In the previous thought experiment, our feature flag was either entirely on or entirely off. This is a vast improvement over the status quo; but, this isn't really how feature flags get applied. Instead, a feature flag is normally "rolled-out" incrementally in order to minimize risk.
But, before we can think incrementally, we have to understand both "targeting" and "variants". Targeting is the act of identifying which users will receive a given a variant. And, a variant is the value returned by evaluating a feature flag in the context of a given request.
To help elucidate these concepts, let's take our very first if
statement and factor-out the featureIsEnabled()
call. This will help separate the feature flag evaluation from the subsequent control flow:
var booleanVariant = featureIsEnabled();
if ( booleanVariant == true ) {
// Execute NEW logic.
} else {
// Execute OLD logic.
}
In this example, our feature flag represents a Boolean value, which implicitly has two possible variants: true
and false
. Targeting for this feature flag then means figuring out which requests receive the true
variant and which requests receive the false
variant.

Boolean feature flags are, by far, the most common. However, a feature flag can represent any kind of data type: Strings, Numbers, Dates, JSON (JavaScript Object Notation), etc. These non-Boolean data types may compose any number of variants and unlock all manner of compelling functionality. But, for the moment, let's stick to our Booleans.
Targeting—the act of funneling requests into a specific variant—requires us to provide identifying information as part of the feature flag evaluation. The "right" identifying information is going to be context-specific; but, I find that "User ID" and "User Email" are a great place to start:
var booleanVariant = featureIsEnabled(
userID = request.user.id,
userEmail = request.user.email
);
if ( booleanVariant == true ) {
// Execute NEW logic.
} else {
// Execute OLD logic.
}
Once we incorporate this identifying information into our feature flag evaluation, we can begin to differentiate one request from another. And this is where things start to get exciting. Instead of our feature flag being entirely on for all users, perhaps we only want it to be on for an allow-listed set of User IDs. One implementation of such a featureIsEnabled()
function might look like this:
public boolean function featureIsEnabled(
numeric userID = 0,
string userEmail = ""
) {
switch ( userID ) {
case 1:
case 2:
case 3:
case 4:
return( true );
break;
default:
return( false );
break;
}
}
Or, perhaps we only want the feature flag to be on for users with an internal company email address:
public boolean function featureIsEnabled(
numeric userID = 0,
string userEmail = ""
) {
if ( userEmail contains "@bennadel.com" ) {
return( true );
}
return( false );
}
Or, perhaps we only want the feature flag to be enabled for a small percentage of users:
public boolean function featureIsEnabled(
numeric userID = 0,
string userEmail = ""
) {
var userPercentile = ( userID % 100 );
if ( userPercentile <= 5 ) {
return( true );
}
return( false );
}
In this case, we're using the Modulo operator to consistently translate the User ID into a numeric value. This numeric value gives us a way to consistently map users onto a percentile: each additional "remainder" represents an additional 1% of users. Here, we're enabling our feature flag for a consistently-segmented 5% of users.
We can even combine several of these targeting concepts at once in order to exert even more granular control. Imagine that we want to target internal company users only; and, of those targeted users, only enable the feature for 25% of them:
public boolean function featureIsEnabled(
numeric userID = 0,
string userEmail = ""
) {
if ( userEmail contains "@bennadel.com" ) {
var userPercentile = ( userID % 100 );
if ( userPercentile <= 25 ) {
return( true );
}
}
return( false );
}
User targeting, combined with a %-based rollout, is an incredibly powerful part of the feature flag workflow. Now, instead of enabling a (potentially) risky feature for all users at one time, imagine a much more graduated rollout using feature flags:
- Deploy dormant code to production servers.
- Enable feature flag for your user ID.
- Test feature in production.
- Discover a bug.
- Fix bug and redeploy code (still only active for your user).
- Examine error logs.
- Enable feature flag for internal company users.
- Examine error logs and metrics.
- Discover bug(s).
- Fix bug(s) and redeploy code (still only active for internal company users).
- Enable feature flag for 10% of all users.
- Examine error logs and metrics.
- Enable feature flag for 25% of all users.
- Examine error logs and metrics.
- Enable feature flag for 50% of all users.
- Examine error logs and metrics.
- Enable feature flag for 75% of all users.
- Examine error logs and metrics.
- Enable feature flag for all users.
- Celebrate!
Few deployments will need this much rigor. But, when the risk level is high, the control is there; and, much of the risk associated with your deployment will be mitigated.
Are you beginning to see the possibilities?
So far, for the sake of simplicity, I've been hard-coding the "dynamic" logic within our featureIsEnabled()
function. But, in order to facilitate the graduated deployment outlined above, this encapsulated logic must also be dynamic. This is, perhaps, the most elusive part of the feature flags mental model.
The feature flag evaluation process is powered by a "rules engine". You provide inputs, identifying the request context (ex, "User ID" and "User Email"). And, the feature flag service then applies its rules to your inputs and returns a variant. There is nothing random about this process—it is "pure", deterministic, and repeatable. The same rules applied to the same inputs will always result in the same variant output. Therefore, when we talk about the "dynamic runtime nature" of feature flags, it is in fact the rules, within the rules engine, that are actually dynamic.
Consider an earlier version of our featureIsEnabled()
function that ran against the userID
:
public boolean function featureIsEnabled(
numeric userID = 0,
string userEmail = ""
) {
switch ( userID ) {
case 1:
case 2:
case 3:
case 4:
return( true );
break;
default:
return( false );
break;
}
}
Instead of a switch
statement, let's refactor this function to use a data structure that reads a bit more like a "rule configuration":
public boolean function featureIsEnabled(
numeric userID = 0,
string userEmail = ""
) {
var rule = {
input: "userID",
operator: "IsOneOf",
values: [ 1, 2, 3, 4 ],
variant: true
};
if (
( rule.operator == "IsOneOf" ) &&
rule.values.contains( arguments[ rule.input ] )
) {
return( rule.variant );
}
return( false );
}
The outcome here is exactly the same, but the mechanics have changed. We're still taking the userID
and we're still looking for it within a set of defined values; but, the static values and the resultant variant have been pull out of the evaluation logic.
At this point, we can move the rule definition out of the featureIsEnabled()
function and into its own function, getRuleDefinition()
:
public boolean function featureIsEnabled(
numeric userID = 0,
string userEmail = ""
) {
var rule = getRuleDefinition();
if (
( rule.operator == "IsOneOf" ) &&
rule.values.contains( arguments[ rule.input ] )
) {
return( rule.variant );
}
return( false );
}
public struct function getRuleDefinition() {
return({
input: "userID",
operator: "IsOneOf",
values: [ 1, 2, 3, 4 ],
variant: true
});
}
Here, we've completely decoupled the consumption of our feature flag rule from the definition of our feature flag rule. Which means, if we needed to change the outcome of the featureIsEnabled()
call, we wouldn't change the logic within the featureIsEnabled()
code at all. Instead, we would update the getRuleDefinition()
.
But, everything is still hard-coded. In order to make our feature flag system dynamic, we need to replace the hard-coded data-structure with something like:
- A database query.
- A Redis
GET
command. - A reference to a shared in-memory cache (being updated in the background).
Which creates an application architecture like this:

The implementation details will depend on your chosen solution. But, each approach reduces down to the same set of concepts: a feature flag administration system that can update the active rules being used within the feature flag rules engine that is currently operating in a given environment. This is what makes the dynamic runtime behavior possible.
It may seem that integrating feature flags into your application logic includes a lot of low-level complexity. But, don't be put-off by this—you don't actually have to know how the rules engine works in order to extract the value. I only step down into the weeds here because having even a cursory understanding of the low-level mechanics can make it much easier to understand how feature flags fit into your product development ecosystem.
The reality is, any feature flags implementation that you choose will abstract-away most of the complexity that we've discussed. All of the variants and the user-targeting and %-based rollout configuration will be moved out of your application into the feature flags administration, leaving you with relatively simple code that looks like this:
var useNewWorkflow = featureFlags.getBooleanVariant(
feature = "new-workflow-optimization",
context = {
userID: request.user.id,
userEmail: request.user.email
}
);
if ( useNewWorkflow ) {
// Execute NEW logic.
} else {
// Execute OLD logic.
}
This alone will have a meaningful impact on your product stability and uptime. But, it's only the beginning—the knock-on effects of a feature-flag-based development workflow will bleed into your entire organization. It will transform the way you think about product development; it will transform the way you interact with customers; and, it will transform the very nature of your company culture.
#Key Terms and Concepts
Feature flags enable a new way of thinking about product development. This introduces some new concepts; and, adds more nuance to some existing ideas. As such, it's important to define—and perhaps redefined—some key terms that we use in this book:
Deploying
Deploying is the act of pushing code up to a production server. Deployed code is not necessarily "live" code. Meaning, deployed code isn't necessarily being executed by your users - it may be sitting there, on the server, in a dormant state. "Deployed" refers only to the location of the code, not to its participation within the application control flow.
A helpful analogy might be that of "commented out" code. If you deploy code that is commented out, that code is living "in production"; but, no user is actually executing it. Similarly, if you deploy code behind a feature flag and no users are being targeted, then no user is actually executing that deployed code.
Releasing
Releasing is the act of exposing deployed code and functionality to your users. Before the advent of feature flags, "Deploying" code and "Releasing" code were generally the same thing. With feature flags, however, these two actions can now be decoupled and controlled independently.
Feature Flag
A feature flag is a named gating mechanism for some portion of your code. A feature flag typically composes an identifier (ex, new-checkout-workflow
), a type (ex, Boolean), a set of variants (ex, true
and false
), a series of targeting rules, and a rollout strategy. Some of these details may vary depending on your feature flag implementation.
Variant
A variant is one of the distinct values returned when evaluating a feature flag in the context of a given request. All feature flags have at least two variants—with only a single variant, you can't create a dynamic runtime behavior.
Each variant value is an instance of the Type represented by the feature flag. For example, a Boolean-based feature flag can only return one of two finite values: true
or false
. On the other hand, a Number-based feature flag can return any number of variants between -Infinity
and Infinity
(depending on how numbers are represented in your programming language).
That said, at any given moment, the variants aggregated within a feature flag are finite, typically few, and predictable. Entropy has no place in a feature flag workflow.
Targeting
Targeting is the mechanism that determines which feature flag variant is served to a given user. Targeting rules include both assertions about the requesting user and a rollout strategy. Targeting rules may include positive assertions, such as "the user role is Admin"; and, it may include negative assertions, such as "the user is not on a Free plan". Compound rules can be created by AND
ing and OR
ing multiple assertions together.
The conditions within the targeting rules can be changed over time; however, at any given moment, the evaluation of the decision tree is repeatable and deterministic. Meaning, the same user will always receive the same variant when applying the same inputs to the same rules.
Rollout
Rollout is an overloaded term in the context of feature flags. When we are discussing a feature flag's configuration, the rollout is the strategy that determines which variant is served to a set of targeted users. This is often expressed in terms of percentage. For example, with a Boolean-based feature flag, the rollout strategy may assign the true
variant to 10% of targeted users and the false
variant to 90% of targeted users.
When not discussing a feature flag's configuration, the term rollout is generally meant to describe the timeline over which a feature will be enabled within the product. There are two types of rollouts: Immediate and Gradual.
With an immediate rollout, the deployed code is released to all users at the same time. With a gradual rollout, the deployed code is released to an increasing number of users over time. So, for example, you may start by rolling a feature out to a small group of Beta-testers. Then, once the feature has preliminary success, you roll it out to 5% of the general audience; and then 20%; and 50%; and so on, until the deployed code has been released to all users.
Roll-Back
Just as with "rollout", roll-back is another overloaded term in the context of feature flags. When we are discussing a feature flag's configuration, rolling back means reverting a recent configuration change. For example, if a targeted set of users is configured to receive the true
variant of a Boolean-based feature flag, rolling back the feature flag would mean updating the configuration to serve the false
variant to the same set of users.
When not discussing a feature flag's configuration, the term rolling back is generally meant to mean removing code from a production server. Before the advent of feature flags, if newly-deployed code caused a production incident, the code was then "rolled back", meaning that the new code was removed and the previous version of the application code was put back into production.
User
In this book, I often refer to "users" as the receiving end of feature flags. But, this is only a helpful metaphor as we often think about our products in terms of customer access. In reality, a feature flag system doesn't know anything about "users" - it only knows about "inputs". Most of the time, those inputs will be based on the requesting user. But, they don't have to be.
We'll get into this more within our use-cases section, but feature flag inputs can be based on any meaningful identifier. For example, we can use a "server name" to affect platform-level features. Or, we can use a static value (such as app
) to apply the feature flag state to all requests uniformly.
Progressive Delivery
This is the combination of two concepts: deploying a feature incrementally and releasing a feature incrementally. This is, eventually, the natural state for teams that lean into a feature-flag-based workflow. This becomes "the way" you develop products.
Environment
An environment is the application context in which a feature flag configuration is defined. A given feature flag can be consumed across multiple environments. But, each environment is configured independently such that a feature flag enabled in a "local development" environment has no bearing on the same feature flag in a "production" environment.
Feature Flag Administration
This is the application that your Developers, Product Managers, Designers, Data Scientists, etc. will use to create, configure, update, release, and roll-back feature flags. This application is generally separated out from your "product application"; but, it doesn't have to be. If you are buying a feature flag SaaS (Software as a Service) offering, your vendor will be building, hosting, and maintaining this administration module for you.
#Going Deep on Feature Flag Targeting
Note: If the targeting concepts that we discussed earlier make sense to you, feel free to skip the section. We're about to dip down into a more philosophical and technical view in an effort to better illustrate the mechanics of targeting. This is often the hardest part of feature flags to understand; so, I think it warrants a deeper probe.
As web application developers, we generally communicate with the database anytime we need to gather information about the current request. These external calls are fast; but, they do carry an overhead. And, the more calls that we make to the database while processing a request, the more latency we add to the response-time.
In accordance with this, feature flag implementations work using an in-memory "rules engine" which allows feature flag state to be "queried" without having to communicate with an external storage provider. This keeps the processing time blazing fast! So fast, in fact, that you should be able to query for feature flag state as many times as you need to without ever having to worry about latency.
Aside: Obviously, all processing adds some degree of latency; but, the in-memory feature flag processing—when compared to a network call—should be considered negligible.
That said, shifting from a database-access mindset to a rules-engine mindset can present a stumbling block for the uninitiated. At first, it may be unclear as to why you need anything more than a "User ID" (for example) in order to do all the necessary user targeting. After all, in a traditional server-side context, said "User ID" opens the door to any number of database records that can then be fed into any number of control flow conditions.
But, when you can't go back to the database to get more information, all targeting must be done using the information at hand. Which means, in order to target based on a given data-point, said data-point must be provided to the feature flag state negotiation process.
I find that a "Pure Function" provides a helpful analogy. A pure function, will always result in the same output when invoked using the same inputs. This is because the result of the pure function is based solely on the inputs (and whatever internal logic exists).
No external calls are made within a pure function. No side-effects are generated within a pure function. Unless there is a bug, a pure function will never error. It has no network calls that can fail. It has file I/O calls that might lack the proper permissions. It has nothing that incurs a variable latency.
Consider this function:
function calc( a, b ) {
return( a * b + 1 );
}
If you invoke the calc()
function with arguments (2,4)
you will always get the result, 9
. Even if you invoke this function 1,000 times in a row, you will get 9
on every single execution. This is because the result is based entirely on the inputs and the inputs aren't changing.
If you wanted the result of a (2,4)
invocation to change, you'd have to change the underlying logic embedded within the function itself. For example, you might change the +1
to a +2
. This would change the absolute result of the invocation; but, the relative result of all invocations going forward would be the same.
Feature flag targeting works the same way. If you want to target based on an email address (for example), you'd have to invoke the targeting mechanism using the email address as an input. And, the same email input will always result in the same targeting output; unless—and until—you change the logic embedded within the targeting system; and, that's where the "rules engine" come into play.
The rules engine represents that embedded logic. Only, unlike a pure function, a feature flag system allows us to dynamically change the embedded logic at runtime (without the deployment of code).
A Simplistic Feature Flag Targeting Implementation
Sometimes, a picture is worth a 1,000 words. And, sometimes, an engineer won't truly understand how something works until they pull back the covers and look at the code. As such, I want us to step-through the building of a completely lacking, overly simplistic, toy version of a feature flags system. What we look at here will not represent production-grade quality by any means; but, I hope that it contains detail enough to shed light on this confusing topic.
As in the previous section, we're going to define our feature flags using a dictionary of simple data structures. Each structure contains the following information:
- The variants that can be served by the feature flag targeting.
- The default rollout strategy.
- An optional, single rule that can target a subset of requests and apply a different rollout strategy.
A Boolean feature flag, with no optional rule, looks like this:
{
variants: [ false, true ],
distribution: [ 100, 0 ]
}
The fist array (variants
) defines the collection of valid variants that can be returned from the feature flag evaluation process. The second array (distribution
) defines the weighted distribution of the corresponding variants. In this case, the 100
means that 100% of users will receive the false
variant (and the 0
means that 0% of users will receive the true
variant).
If we wanted to start rolling this feature out slowly, we could change the distribution
property:
{
variants: [ false, true ],
distribution: [ 90, 10 ]
}
Here, we're saying that 90% of the users will receive the first variant (false
) and 10% of the users will receive the second variant (true
).
The number of items in the variants
array must match the number of items in the distribution
array (as the latter corresponds to the weighted distribution of the former). And, in the case of a Boolean feature flag, only 2 items make sense. But, if we were to serve up non-Boolean data, these arrays can be longer.
For example, if we wanted to implement dynamic logging in a production application, we could create a String-based feature flag that serves the "minimum log level" to be aggregated:
{
variants: [ "error", "warn", "info", "debug", "trace" ],
distribution: [ 100, 0, 0, 0, 0 ]
}
Here, the variants
represent the possible log levels emitted by the application. And, in this case, 100% of all requests will only aggregate error
level entries and higher.
If we were in the middle of an incident and wanted to turn on lower-level logging, we might move the 100
from the 1st index (error
) to the 4th index (debug
). Then, all requests coming into the production application would start aggregating error
, warn
, info
, and debug
level log entries:
{
variants: [ "error", "warn", "info", "debug", "trace" ],
distribution: [ 0, 0, 0, 100, 0 ]
}
Of course, turning debug
level logging on for an entire application at one time could both overwhelm the system and lead to a non-trivial cost increase. Instead, we can try to enable low-level logging for only 5% of users and hope that the slightly increased logging provides enough insight:
{
variants: [ "error", "warn", "info", "debug", "trace" ],
distribution: [ 95, 0, 0, 5, 0 ]
}
Or, maybe we only want to turn debug
level logging on for specific users, such as our internal Product Engineers. That's where our "rule" property comes into play. The "rule" property allows us to target a subset of users and apply a different distribution of the same variants
just to them.
A rule contains an operator, two operands, and the weighted distribution that will be used if the request passes the operator assertion. So, if we want to turn low-level logging on for our own developers, our rule could use the IsOneOf
operator and test the incoming "User ID" against a set of known developer IDs:
{
variants: [ "error", "warn", "info", "debug", "trace" ],
distribution: [ 100, 0, 0, 0, 0 ],
rule: {
operator: "IsOneOf",
input: "UserID",
values: [ 1, 16, 34, 2009 ],
distribution: [ 0, 0, 0, 100, 0 ]
}
}
With this configuration, the feature flag serves the error
log level to 100% of users by default. However, if the requesting UserID
"is one of" the given set of values, [1, 16, 34, 2009]
, then 100% of that subset of users will receive the debug
log level.
Using these simple constructs alone—variants, distributions, and operators—we have everything we need to create a power targeting system. For the sake of this exploration, I'm only allowing for a single rule to be applied; but, in a production-grade feature flags implementation, compound rules—those using AND
/OR
conditions—allow for even greater flexibility.
#Types of Feature Flags
- Development flags.
- Operational flags.
- Data-types.
- Use-cases (Ingress, Labs, T100, etc).
#Life-Cycle of a Feature Flag
- Demonstrate how a feature can be built iteratively.
- Graduated rollout.
- Ticket to clean-up flags.
- Keep environments synchronized.
#Coding Defensively
- Designing for failure.
- Designing with EOL in mind.
- Rolling-deploys.
- Rolling back code.
- Long-lived processes.
#Server-Side vs. Client-Side
- Coping with different rates of change.
- Keeping complexity low.
- No "prop drilling".
#There's No Place Like Production
- Reducing costs, complexity, and testing in production.
#Life Without Tests
- People who say it cannot be done should not interrupt those who are doing it.
- The Rich Hickey on bugs passing tests.
#KISS: Keep It Super Simple
- Tips of lowering complexity.
- Targeting users.
#Ownership Boundaries
- Scope of impact.
- Enabling / disabling.
- Compliance issues.
- A culture of trust.
- Monitoring during deployment.
#Logs, Metrics, and Feature Flags
- Table-stakes for robust application development.
#Bridging the Sophistication Gap
- We can't all be Google / Netflix / etc.
#The Cost of Feature Flags
- Dollars and cents.
- Code rot.
- Increased complexity and ambiguous state.
- Differentiating work.
#Not Everything Can Be Feature Flagged
- You can be lulled by the power of feature flags. But, not everything can feasibly (time, cost) be safely placed behind a feature flag (ex, database upgrades, database transformations, framework upgrades, design system upgrades).
#Build vs. Buy
- Deceptively simple - you get what you pay for.
- LaunchDarkly and friends.
#Overthinking Analytics
- It can be tempting to want to track all this state.
#Stop Reading Here If You Work Alone
- Above is all the technical stuff.
- Below is all the inter-personal stuff.
#People Like Us Do Things Like This
- On the culture of shipping and serving customers.
- Culture is created from the bottom-up.
#The Democratization of Product Design
- Roles are helpful, but they can also be a burden.
- Our deepest fear.
- Change requires tension.
- Asking for forgiveness, not permission.
#Co-creating the MVP (Minimum Viable Product)
- By working incrementally, we bring everyone to the table.
- This includes customers.
- Fallacy of Henry Ford.
- Incremental fidelity, from paper sketching to production.
#An Opinionated Guide To Pull-Requests (PRs)
- A culture of shipping doesn't happen by accident.
- Shipping requires intention from the start of code to the deployment of code. People have to buy into the process.
- Code completed is more important than code being written.
#Dynamic Code, Dynamic Teams
- When value created means value delivered, we don't fall victim to sunk cost fallacy.
#Removing the Cost of Context Switching
- Small goals mean less cognitive load.
#Measuring Team Productivity
- Goodhart's Law: When a measure becomes a target, it ceases to be a good measure.
- When we create a culture of shipping, the act of shipping becomes a meaningful metric.
- Fuzzy math over time still shows trends.
#The Goal Gradient
- The psychological challenge of long journeys.
#I Eat, I Sleep, I Feature Flag
- Outro.