Best Practices¶

Advice from teams that run FlagPal well — what to do, what to avoid, and how to get the most out of feature flags and experiments.

Feature Flags¶

Name flags for what they control, not how they're used¶

A flag named show_new_checkout tells you what it does. A flag named experiment_q1_group_b tells you nothing useful six months later.

Good names: show_new_checkout, enable_dark_mode, max_upload_size_mb

Avoid: flag_1, test_feature, q1_experiment_toggle

Give every flag a description¶

Before you save a flag, write one or two sentences explaining:

What feature it controls
Why it exists
Any important context

Future you — and your teammates — will thank you.

Clean up flags regularly¶

Feature flags accumulate over time. Old flags for fully-shipped features with no plans to toggle them again are just clutter. Set a reminder to review your flags every quarter and archive or delete the ones you no longer need.

Choose the right type¶

Use Boolean for simple on/off features
Use String when you have named options (not just on/off)
Use Integer for numerical configurations
Use Date for time-based features
Don't try to encode complex logic into a single String flag — split it into multiple flags

Experiences¶

One Experience per purpose¶

Don't try to make one Experience do too many things. If you're rolling out Feature A to beta users and Feature B to premium users, create two separate Experiences. They're easier to understand, debug, and turn off independently.

Use descriptive names that include context¶

Include who it targets and when it's relevant:

"Beta Users — New Navigation v2 — Jan 2024"
"Premium Plan — Advanced Analytics Access"
"Holiday Campaign — Promo Banner — Dec 2024"

Test targeting rules before activating¶

Before activating an Experience in production, ask your developer to test a user that should match and one that shouldn't. This catches typos in property names before they affect real users.

Deactivate rather than delete¶

When an Experience is no longer needed, deactivate it rather than deleting it immediately. Keep it around for a week or two in case you need to reactivate it. When you're certain it's no longer needed, then delete it.

Experiments¶

Define your hypothesis before you start¶

Before creating the experiment in FlagPal, write down:

"We believe that [change] will cause [outcome] because [reason]. We'll measure this with [metric]."

This keeps you honest. If you don't write it down first, you're more likely to fish for metrics after seeing results — which leads to false conclusions.

Resist stopping experiments early¶

It's tempting to stop an experiment the moment one variant looks like it's winning — but early results are often misleading. A variant that looks 30% better on day two might be completely flat by day ten once you have enough data.

Minimum guidance:

At least 1 full week (to account for day-of-week variation)
At least 100 users per variant (more is better; 500+ for reliable results)

Only test one thing at a time (per experiment)¶

If you change the button colour and the button text in the same experiment, you won't know which change caused the result. Keep each experiment focused on one variable.

Run experiments on meaningful traffic¶

If your experiment is targeting 5% of users but you need 1,000 users per variant to reach significance, it'll take a very long time. Make sure your traffic percentage and targeting rules give you enough volume to reach conclusions in a reasonable timeframe.

Add feature flags you're testing to the rules¶

If your Experiment sets a feature flag value, it's highly recommended to add it with an empty value to the targeting rules. This ensures deduplication and prevents the same users enrolling in the experiment twice. At the same time, if you're targeting a specific feature flag in the rules, it's recommended to set the value for this flag in your variants (even if it's the default one). This ensures that you have a consistent outcome after the experiment, and any other experiment does not overlap if it targets the same feature flag.

Document your results — wins and losses¶

When an experiment ends, write the result in the Description field:

"Result: Variant B won with 5.8% conversion vs. 4.2% for control (38% lift). Rolled out to 100% on 2024-03-15."

OR:

"Result: No significant difference after 3 weeks and 2,400 users per variant. Kept the control."

This builds institutional knowledge. Over time, you'll learn what types of changes work for your users.

Keep a "control" variant that matches the current behaviour¶

Always have a variant that represents the current state of your product. This is your baseline. Without it, you can't know if the new variant is better or worse — you can only know they're different.

Metrics¶

Define metrics before running experiments¶

Deciding what to measure after you've seen early results is a form of bias called "p-hacking" or "data dredging." Define your primary metric before you start.

Have one primary metric¶

Each experiment should have a single metric that determines the winner. Having too many metrics leads to ambiguity — what if Metric A favours Variant A but Metric B favours Variant B? Pick the one that matters most for this specific experiment.

Make sure metric events are firing correctly¶

Before launching an experiment, have your developer confirm that metric events are being sent correctly. A quick sanity check:

Trigger the event manually in a test environment
Confirm the event appears in FlagPal with the right metric and variant

Teams¶

Agree on naming conventions¶

As a team, decide upfront on naming conventions for flags, experiences, and experiments. Consistency makes the dashboard much easier to navigate as it grows.

Example conventions:

Flags: [area]_[description] → checkout_new_flow, nav_redesign
Experiences: [audience] — [feature] — [date]
Experiments: [description] — [quarter year]

Use the Description field like documentation¶

The Description field is your team's shared memory. Anyone looking at a flag, experience, or experiment should be able to understand its purpose without asking around.

Communicate before making live changes¶

Deactivating an Experience or changing an active Experiment's settings affects real users immediately. Let your team know before making significant changes in production.

Separate environments where possible¶

Use separate FlagPal projects for staging/testing and production. This lets you experiment safely without affecting real users.

The Golden Rule¶

Use the simplest setup that solves your problem.

Don't create five Experiences when one will do. Don't run an Experiment to decide something you could decide with common sense. Feature flags and experiments are tools to reduce risk and gain insight — not bureaucracy for its own sake.