Validate Assumptions Faster with Experimentation and Feature Flags
25 Oct 2024 | Josh Blockwell
Validate Assumptions Faster with Experimentation and Feature Flags
As a product team, we often make assumptions about what will drive the best outcome for users and ensure our products’ success. We can and should dedicate time to interviewing, observing, and segmenting our users, but we must still validate any assumptions we have while delivering outcomes for them.
There is usually no one true way to approach a problem, so we find that assumptions (alongside continuous discovery) are a natural result of trying to solve the open-ended problems usually found in product management. To avoid analysis paralysis, product managers must embrace assumptions as part of the discovery process.
These assumptions can vary in size, importance, risk, and opaqueness. Some assumptions may have strong supporting evidence, while others are based on little to no data at all. Some assumptions may have little impact on the overall success of the product (“We assume that dashboard users want data presented as a bar chart”), while others are far more strongly linked (“We assume that users want to use a dashboard”).
Making assumptions is not a bad thing, but they should be explicit. It’s vital to validate assumptions – especially the riskiest – as early as possible. We need to make it “cheap to be wrong,” and assumption testing using experimentation and feature flags is a great way to achieve this essential goal.
LaunchDarkly, a gravity9 partner, are a leading software feature management platform that provides a streamlined yet feature rich way to release, monitor, and optimize software in production. Below, we’ll explore how this accelerates low-risk experimentation and empowers data-driven decision making to ensure you’re building functionality that your users will love.
Types of Assumptions in Product Development
Several common assumptions are found in product development, which we’ll examine here based on the following product goal: “I want to create an AI chatbot to guide shoppers through our sales journey and recommend products based on their conversations with it.”
- Desirability Assumptions: Do customers want this feature? Will they value it?
Product teams can fall in love with their ideas and assume that users will, too, without necessarily having evidence to support the assumption. - Viability Assumptions: Should we build it?
It’s important to make sure the assumption aligns with business goals. In this example, would the chatbot meaningfully drive customer/sales conversion? - Feasibility Assumptions: Can we build it?
It’s important to consider whether it’s technically possible to integrate (e.g., a camera API) with the product. These assumptions can also include whether this feature complies with regulations and security policies. - Ethical Assumptions: Is there any harm in building this feature?
Ethics, such as data privacy and fairness, are essential to consider. Regarding data privacy, legalities can change from country to country.
Which Assumptions Should Be Validated or Tested?
Assumption mapping, an exercise designed by David J Bland, allows product teams to quickly identify their riskiest assumptions.
This is most efficient when visualized, with assumptions documented on a virtual canvas, whiteboard, or wall, and a cross-functional team (especially product designers, developers, and testers) invited to discuss each item and weigh its placement.
Begin by assessing how much is known about each assumption and examining any evidence that can determine whether it is true or false. Next, assess how important the assumption is to the project’s overall success (all assumptions will be important, but it may be that the least important ones can be worked around if they turn out to be false).
The most important and likely riskiest assumptions will be those essential to test. A generic assumption map example can be seen in the image below:
Looking back at our example above – “introducing an AI chatbot to guide shoppers and recommend products” – the assumption mapping exercise might highlight several risky assumptions:
- That users will want to engage with the AI chatbot (Desirability assumption)
- The AI chatbot is technically possible to develop (Feasibility assumption)
- That the AI chatbot will deliver desired business outcomes (Viability assumption)
Busting Common Myths about Experimentation
“We can only test when fully developed.”
While investing a lot of work in a feature before releasing it is tempting, it’s more efficient (and therefore important) to minimize the work you put into developing an experiment while maximizing the learning you receive from the process. The goal should be to move assumptions from being unknowns to known and manageable quantities. As a team, agree on the smallest assumption test you can design to provide results you feel comfortable acting on. Make it cheap to be wrong.
“Testing requires a large sample size to give us a truly useful result.”
Large-scale testing can provide a high quality of data integrity, but it can be slower to complete and receive usable results. Remember that our goal is to make it cheap to be wrong. In product development, that means rapid iteration by validating assumptions and running multiple smaller experiments quickly at an early stage of development. It’s here that the power of feature flags can really shine, and tools like LaunchDarkly allow you to leverage this and configure your sample size when designing experiments
“Experimentation is risky and will confuse our users.”
Experimentation is, in fact, a risk-averse approach, as it’s far less risky than releasing features or functionality and simply hoping for the best! Again, this is where feature flags step in, allowing product teams to easily control what features are available to which (if any) users and for how long. This targeted approach to feature testing allows assumptions and new features to be tested without impacting all users simultaneously in a live environment. Furthermore, if a feature has bugs or other issues, it can be deactivated quickly and effectively so that the user experience, product, and business reputations are unharmed.
How to Test Feature Flags in Practice
By applying assumption mapping to our AI chatbot example above, we’ve determined that it’s most important to test whether the feature is desirable, viable, and feasible. There are several tests we can apply to collect the data we need:
Experiment 1: The Fake Door Test
We want to keep our first experiment lightweight, rapid and presented to a small segment of our users. To that end, we could create a “ShopBot” button, presented as our AI chatbot and placed throughout our online store, which would lead to a “coming soon” placeholder. Usage of the button could be measured, allowing us to gauge interest in a real chatbot function without investing in developing one.
LaunchDarkly features an analytics dashboard that lets us see how users interacted with the feature, making it easy to see if and how we should proceed.
The LaunchDarkly platform also lets us determine the audience to which we surface the “ShopBot” button, so we can select a small segment rather than conduct testing with our entire user base. The platform’s feature flag functionality ensures testing can be quick and easy to conduct, with a simple toggle to turn the feature on and off. Leveraging feature flags with this approach to testing lets us test quickly and makes it very cheap to be wrong.
Experiment 2: The Simple, Scripted Chatbot
The results of our first experiment tell us that people are interested in the AI chatbot, so we can move the “desirability” assumption towards the “known” portion of our assumption map and move on. It’s time to test our “feasibility” assumption.
We turn to LaunchDarkly again, using its Experimentation suite to push a simple version of the chatbot to users to test if the interface is feasible and drives outcomes in line with our business objectives (e.g., increased sale conversion rates). Although the chatbot will be basic, it should provide enough value to drive checkout views and completions if our assumptions are correct.
LaunchDarkly allows us to split our customer audience into 2+ groups, including those who see a fake “coming soon” button and those who can interact with the basic chatbot. This allows us to measure metrics across all groups from within the platform’s dashboard. Depending on the results we observe, we can easily toggle off functionality entirely, roll it out across our entire user base, or target a specific group for further testing.
Experiment 3: The Phased Rollout
The results of our experiments have told us that the AI chatbot is desirable, feasible, and viable! With further development to complete the feature, we can move on to a phased rollout for our users.
Pushing the feature out universally is possible but could lead to issues we’ve been careful to avoid. Instead, we could deploy it only to new or returning users first or by country or product category.
While keeping a watchful eye over conversion rates and reports of any issues, we can use LaunchDarkly’s percentage rollouts feature to push the feature to larger and larger user segments – or quickly roll back deployment in case of any issues – until the AI chatbot is fully deployed across our user base.
At this stage, we can also experiment with different iterations of the AI chatbot to explore what variations are best received by users and learn how to maximize customer satisfaction and business conversion rates through targeted testing and real-world data.
The Advantages of Feature Flags in Product Development
Feature flags, particularly through deployment platforms like LaunchDarkly, allow software teams to assess and validate assumptions early in development, minimizing risk and cost and maximizing learning. By framing experimentation as a vital part of the discovery process rather than a risky endeavor, product teams can create a culture of learning and agility, making it cheap to be wrong (or right)!
It’s worth noting that feature flag functionality can come with a significant development overhead if the proper platforms and libraries aren’t used, however, LaunchDarkly mitigates that overhead via a simple SDK that requires minimal additional code from your developers, as the rest of its feature flag functionality is embedded in the platform itself.
An excellent product team should run multiple experiments simultaneously and regularly, using real data to validate their assumptions and test innovative approaches and features. As shown above, these experiments need not be overly complex or impact your entire user base; they can involve a minimum of work to maximize learning. Feature flags and analytical dashboards like those found in LaunchDarkly can help product teams to create, administer, and learn from these experiments quickly and effectively, ensuring results are easily traceable and put to best use efficiently.
Adhering to this approach ensures you invest in features your users will love while safeguarding your product’s long-term success. Each validated assumption drives the development process and builds stakeholder confidence from an early stage.
Collecting real, user-generated feedback and data makes it easy to pivot, adjust strategies, designs, and development, and foster rapid innovation aligned with your user’s needs and, in turn, more successful business products able to drive engagement, use, and sales conversion.
If you’re interested in working with feature flags or LaunchDarkly or have questions about how best to validate assumptions, gravity9 would be happy to help. Please visit www.gravity9.com for more information.