Data science tutorials: Product analysis: A/B testing and KPI analysis of products for a start-up

Background

Almost all businesses exist to solve problems and their solutions are regarded as products offered to clients and customers. The success of a business, at least financially, is thus dependent on the product. That does not necessarily mean that all businesses failed because their products were not great. A great product can fail to deliver for a firm because it was not monitored and better positioned to be received in the market or it has not been innovated overtime to response to market and user dynamics. This is where product performance monitoring and analytics come to play. The pending failure of a product can only be predicted if proper records exists and the right indicators are set to measure its performance.

Key Performance Indicators (KPIs) are design to monitor business goals and objectives and in this case product goals. By this, there is the need to first set business goals or objectives that need to be achieved and based on that, various KPIs designed to comprehensively measure them. The type of KPI that a business uses depends on the industry and the product it is offering to end-users. Thus, selection of a good KPI is informed by a clear understanding of the product to be analyzed.

For this case study, the analysis is for an online accommodation booking firm. The product is an online booking platform where users can search for accommodation, select a number of features such as location, number of booking days among others. A number of KPIs can be proposed for the product team to be monitored for analysis. These are as follows;

I. Conversion rate:

This should be defined to be purchase oriented. That is, a conversion is defined to have occurred when a visitor successfully book an apartment. The conversion rate will be the total bookings successfully requested divided by total number of sessions made by oneline visitors and then multiply the result by 100%. The rational behind recommending this KPI is to enable the product team assess the monetary value contribution of their product to the company as this is usually the primary means of achieving financial viability. More importantly, conversion rate is likely to have a positive relationship with sales hence this KPI will enable the monitoring of a goal such as to increase sales by X amount on the platform.

II. Customer acquisition:

Customer acquisition is one of the key indicators for measuring market outreach which is related to sales. For an online booking platform, customers can be conceptualized to be online visitors who register on the platform in hopes of using the platform to book an apartment. Therefore, customer acquisition can be monitored using the number of people who registered on the platform.

III. Total number of unique visitors:

The number of visitors who access the platform will provide clues to its web traffic which has the potential of translating into sales. This KPI is also a measure for monitoring marketing efforts and popularity of the products

IV. Average session per User:

Average sessions per user helps monitor web traffic flow and a high average session per user indicates that several visitors are requesting multiple sessions hence repeatedly using the product.

V. Bounce rate:

Bounce rate indicates single page view without further interaction with the product and this KPI should be monitored to keep it to the minimum possible. Bounce rate is likely to be negatively correlated with revenue and indicative of the poor user experience for platform. A number of reasons could be possible including poor User Interface, product not being cross-platform friendly or responsive, not mobile-friendly among others.

Analyzing KPIs suggested

Defining the KPI is not enough but more importantly analyzing and monitoring it. The KPIs suggested are analyzed as follows;

Data analysis

The analysis of the data is undertaken using R programming and Rstudio and several other packages. The code for loading required packages and reading the data is as follows;

library(readr)
library(tidyverse) # data manipulation and visualization
library(ggplot2)
library(GGally)
library(ggstatsplot)
library(plotly)
library(highcharter)
library(gridExtra) ## plotting multiple graphs
library(DT)
library(stringr)
library(stringi)

# read the dataset

data18 <- read_csv("bq-results-20210718.csv")

The dataset read above contains user behaviour on the platform including how users journeyed and used various features. The first 100 rows for the data is viewed in a table using the code below

datatable(data18[1:100, ]) ## view the data

## features in the dataset
colnames(data18)

 [1] "datetime"              "event_type"           
 [3] "test_groups"           "session_id"           
 [5] "visitor_id"            "user_location_country"
 [7] "user_location_city"    "device_class"         
 [9] "device_family"         "device_browser"       
[11] "page_type"             "page_country"         
[13] "page_city"             "apartment_id"         
[15] "params"                "environemnt"          
[17] "is_internal_ip"

Estimating conversion rate – Method

In order to estimate conversion rate, there is the need to define what is regarded as conversion in an analytical way.

So, it is time to make meaning out of the dataset. Typically, the page_type variable provides a clue as to where a user is on the path to conversion. The values of this variable is checked using the code below to verify if this is possible.

unique(data18$page_type)

[1] "search_page"              "apartment_view"          
[3] "request/initial"          "request/checkout"        
[5] "request/success"          "request/rental-agreement"

“request/success” is one of the values in the page_type variable and can be deduced to mean that a booking request by a user has been successful hence defined to be conversion. In other words, conversion is said to have occurred when ‘page_type == request/success’ and the number of such instances defines the number of conversions.

First, the total number of unique visitors and sessions is analyzed using the code below

#### Total number of unique visitors
unique_visitors_count<- data18 %>%
  dplyr::select(visitor_id)%>%
  na.omit()%>%
  count(visitor_id)

### Total number of unique sessions
unique_session_count <- data18 %>%
  dplyr::select(session_id)%>%
  na.omit() %>%
  count(session_id)

Conversion rate is estimated by first identifying number of conversions, thus filtering page_type to be “request/success”. The result is divided by number of unique sessions and multiplied by 100.

## Conversion rate
## Conversion occurs when page_type == request/success

# find number of conversions made
request_success_data <- data18 %>%
  dplyr::select(page_type)%>%
  na.omit()%>%
  filter(page_type == "request/success")
## conversion rate
conversion_rate <- (count(request_success_data)/count(unique_session_count)) * 100
conversion_rate$n  ## conversion rate is 0.159 %

[1] 0.1589657

Therefore, the overall conversion rate is 0.159%

Estimating Bounce rate

Bounce rate is estimated from session_id, and page_type. The single page indicator can be estimated by counting the number of session_id that visited only one page. First the session_id is grouped and the number of pages that each session_id visited is counted. Where only 1 page was visited, it is counted as a bounce; otherwise not a bounce. The code for executing the process is below

## Bounce_rate
bounce <- data18%>%
  dplyr::select(session_id, page_type)%>%
  na.omit()%>%
  group_by(session_id)%>%
  count(page_type)%>%
  tally(wt = n)%>%
  mutate(num_pages = n)%>%
  mutate(bounce_status = case_when(num_pages == 1 ~ "bounce",
                                   num_pages > 1 ~ "not_bounce"))

bounce_group <- bounce%>%
  group_by(bounce_status)%>%
  count(bounce_status)

## bounce rate
bounce_rate <- (bounce_group[1,2]/sum(bounce_group$n)) * 100
bounce_rate$n  ## 13.54% bounce rate

[1] 13.53858

From the above analysis, it is concluded that bounce rate was 13.54%.

Estimating Customer acquisition as KPI

The customer acquisition KPI recommended can be estimated from the event_type variable. First the values for event_type variable is checked to choose the appropriate one to use. Given that there are as many as 173 values, they are not shown here. But then, there is the value “user_register_success” which shows that a new user has successfully registered on the platform hence a customer acquired. This is counted for the customer acquisition KPI. The code below can be used to estimate it.

## Customer acquisition as KPI 
customer_acquisition <- data18%>%
  dplyr::select(event_type) %>%
  na.omit()%>%
  filter(event_type == "user_register_success")
customer_acquisition_count<- count(customer_acquisition)   ## 76 new users register for our services
customer_acquisition_count$n

[1] 76

It is estimated that 76 new users registered on the platform. This represents customer acquisition.

Double down: Conversion rate for users searching in Berlin

The KPI analysis can be disaggregated further based on location among others. Sometimes, this disaggregation is important in providing localized analysis and identifying regions where there is the need to double efforts. A case is made for Berlin where conversion rate is estimated. In order to estimate conversion rate for users searching in Berlin, the dataset is filtered to focus on Berlin users and the number of sessions and conversion estimated using the code below.

# conversion rate for users searching in Berlin

## select users searching in Berlin
berlin_users <- data18%>%
  filter(user_location_city == "Berlin")%>%
  dplyr::select(session_id, page_type) 
  
  
berlin_conversion <- berlin_users%>%
  filter(page_type == "request/success")

berlin_session <- berlin_users%>%
    count(session_id)

berlin_session_count <- count(berlin_session) 
berlin_session_count$n  # Total number of sessions in Berlin is 804

[1] 804

berlin_conversion_rate <- (count(berlin_conversion) / count(berlin_session))
berlin_conversion_rate$n  # conversion rate in Berlin is 0%

[1] 0

From the analysis above, it is clear that there were no conversions for users searching in Berlin despite 804 sessions were made.

Analyzing A/B test

A/B testing is usually preceded by identifying KPI to be tested for the experimentation. This is probably the most important prerequisite for designing the testing system. This test aims to assess which of the solutions for user groups (control vs test group) will help achieve the business goals better. The KPIs to be tested for the A/B testing are identified as follows;

KPI analyzed for A/B test

Conversion rate

Bounce rate

User journey

Splitting of data into test and control group

From the “test_groups” column, “rcsp=ref” can be identified as the control group and “rcsp=show” as the test group. The “rcsp=show” is a feature implemented for users on search page that results in faster page loading time. This solution is offered to users selected as part of the test group (“rcsp=show”). Conversion rate is analyze for the test and control group. The code below is used to analyze the data.

Conversion rate based on A/B testing

# divide data into test group and control group

# subset test group
rcsp_show <- subset(data18, grepl('"rcsp":"show"', test_groups))


# subset control group
rcsp_ref <- subset(data18, grepl(pattern = '"rcsp":"ref"', test_groups))

# conversion rate for control
rcsp_ref_conversion <- rcsp_ref%>%
  filter(page_type == "request/success") %>%
  na.omit()
rcsp_ref_conversion_count <- count(rcsp_ref_conversion) 
rcsp_ref_conversion_count$n   ## Conversions for control group is 19

[1] 19

# number of sessions made by rcsp_ref
rcsp_ref_session <- rcsp_ref%>%
  dplyr::select(session_id)%>%
  na.omit()%>%
  count(session_id)

rcsp_ref_session_count <- count(rcsp_ref_session) 
rcsp_ref_session_count$n #  9467 sessions for rcsp_ref

[1] 9467

rcsp_ref_conversion_rate <- (count(rcsp_ref_conversion)/count(rcsp_ref_session)) * 100
rcsp_ref_conversion_rate$n  ## 0.2% conversion rate for rcsp_ref

[1] 0.2006972

# conversion rate for test group rcsp:show

rcsp_show_conversion <- rcsp_show%>%
  dplyr::select(page_type, session_id)%>%
  na.omit()%>%
  filter(page_type == "request/success")
rcsp_show_conversion_count <- count(rcsp_show_conversion)  
rcsp_show_conversion_count$n  ## conversion for test group is 11

[1] 11

# number of sessions for rcsp_show

rcsp_show_session <- rcsp_show%>%
  dplyr::select(session_id, page_type)%>%
  na.omit() %>%
  count(session_id)
rcsp_show_session_count <- count(rcsp_show_session) 
rcsp_show_session_count$n  ## 9,463 sessions were made by  rcsp_show test group

[1] 9462

## conversion rate rcsp_show
rcsp_show_conversion_rate <- (count(rcsp_show_conversion)/count(rcsp_show_session)) * 100
rcsp_show_conversion_rate$n ## 0.116 % conversion rate for rcsp_show

[1] 0.1162545

# Thus the test group ( test_groups == rcsp:show) achieved a lower conversion rate compared to the
## control group (test_groups == rcsp:ref )

From the above analysis of A/B testing, it was estimated that the control group (rcsp:ref) had 19 conversions and 9467 user sessions which translates into a conversion rate of 0.2%.

For the test user group (rcsp:show), there were 11 conversions and 9,462 user sessions hence a conversion rate of 0.116%

Thus, it is concluded that based on conversion rate the test user group had a lower conversion rate (0.116%) compared to the control group (0.2%). Therefore, base on conversion, the new feature which is shown to the test group is not recommended.

It should be noted that further inquiry can be made as to whether the difference in conversion between the control and test group is statistically significant.

Bounce rate analysis for A/B test

A key concern for the product team will be to ensure that they do not roll out a feature that has detrimental effect on user engagement and experience hence bounce rate. Thus, the A/B testing could also be undertaken based on bounce rate.

The previous steps used to estimate bounce rate is applied for the control and test group and the results compared. A more robust decision can be made when test of significance for difference in bounce rate is undertaken. The code for analyzing bounce rate for the test and control group is as follows;

# bounce rate for A/B testing

## Bounce_rate for rcsp_ref
bounce_rcsp_ref <- rcsp_ref %>%
  dplyr::select(session_id, page_type)%>%
  na.omit()%>%
  group_by(session_id)%>%
  count(page_type)%>%
  tally(wt = n)%>%
  mutate(num_pages = n)%>%
  mutate(bounce_status = case_when(num_pages == 1 ~ "bounce",
                                   num_pages > 1 ~ "not_bounce"))

bounce_group_rcsp_ref <- bounce_rcsp_ref%>%
  group_by(bounce_status)%>%
  count(bounce_status)

## bounce rate for rcsp_ref
bounce_rate_rcsp_ref <- (bounce_group_rcsp_ref[1,2]/sum(bounce_group_rcsp_ref$n)) * 100
bounce_rate_rcsp_ref$n  ## 13.61572% bounce rate

[1] 13.61572

#  Bounce_rate for A/B testing (test group)
## Bounce rate rcsp_show
bounce_rcsp_show <- rcsp_show%>%
  dplyr::select(session_id, page_type)%>%
  na.omit()%>%
  group_by(session_id)%>%
  count(page_type)%>%
  tally(wt = n)%>%
  mutate(num_pages = n)%>%
  mutate(bounce_status = case_when(num_pages == 1 ~ "bounce",
                                   num_pages > 1 ~ "not_bounce"))

bounce_group_rcsp_show <- bounce_rcsp_show%>%
  group_by(bounce_status)%>%
  count(bounce_status)

## bounce rate for rcsp_show
bounce_rate_rcsp_show <- (bounce_group_rcsp_show[1,2]/sum(bounce_group_rcsp_show$n)) * 100
bounce_rate_rcsp_show$n  ## 13.59 % bounce rate for rcsp_show

[1] 13.59121

From the analysis, the control user group (rcsp:ref) had a bounce rate of 13.62% while the test user group (rcsp:show) had 13.59% bounce rate. Thus, in terms of bounce rate, the test group had a slightly lower bounce rate compared to the control group. The result is however not conclusive of which feature is better as there is the need to conduct further analysis to determine whether the difference is statistically significant. However, the mere fact that the test group is producing preferable results reinforce the earlier assertion that it is important to decide on the main KPI to analyze for the A/B testing experiment. It is not uncommon to see conflicting results or recommendations for different KPIs. As in this case, the control group produces a preferred result with higher conversion rate while the test group produces the preferred result for bounce rate. Seemingly, the results follow expected logic on the grounds that the faster loading time experienced by test group will lead to lower bounce rate while getting past the first page does not necessarily mean that the customer will book an apartment.

Analysis of user journey for A/B test

Given that the difference in bounce rate between the two groups is a very small margin, further inquiry will be to analyze the user journey and display the result using a funnel chart. This will enable visualizing pages where users leave the platform and get a sense of which group makes it closer to final conversion.

The code for analyzing user journey is provided below;

# User journey for A/B testing

# user journey for control group -- user_ref
rcsp_ref_user_jour <- rcsp_ref%>%
  dplyr::select(page_type)%>%
  na.omit()%>%
  group_by(page_type)%>%
  count()

# user journey for test group -- rcsp_show 
rcsp_show_user_jour <- rcsp_show%>%
  dplyr::select(page_type)%>%
  na.omit()%>%
  group_by(page_type)%>%
  count()

rcsp_ref_user_jour.desc<- dplyr::arrange(rcsp_ref_user_jour, desc(n) )
rcsp_show_user_jour.desc <- dplyr::arrange(rcsp_show_user_jour, desc(n))

rcsp_funl <- plot_ly(
  type = "funnel",
  name = 'rcsp:ref (control group)',
  y = as.vector(rcsp_ref_user_jour.desc$page_type),
  x = as.vector(rcsp_ref_user_jour.desc$n),
  textinfo = "value+percent initial")

rcsp_funl <- rcsp_funl %>%
  add_trace(
    type = "funnel",
    name = 'rcsp:show (test group)',
    orientation = "h",
    y = as.vector(rcsp_show_user_jour.desc$page_type),
    x = as.vector(rcsp_show_user_jour.desc$n),
    textposition = "inside",
    textinfo = "value+percent initial") 
rcsp_funl <- rcsp_funl %>%
  layout(yaxis = list(categoryarray = as.vector(rcsp_show_user_jour.desc$page_type)))%>%
  layout(hovermode = 'compare')

rcsp_funl

Summary

This post demonstrated how to undertake a real world analysis for a product team that aims to adopt a data driven approach to growth. The business problem was absence of relevant product monitory and decision support system that provides insights on the usefulness of features being implemented. The end-user and stakeholder of the analysis was identified to be the product team for an online accommodation booking service provider. The solution provided was to define and conceptualize relevant Key Performance Indicators (KPIs) for monitoring the product. Further, data provided was used to show how the KPIs can be estimated. Features are released mainly to drive businesses forward and this goal need to be evaluated. This was undertaken using A/B testing. The data-driven approach provided insights on how the solution implemented impacted bounce rate, conversion rate and user journey funnel while also highlighting further call to action such as statistical test of significance to verify if the difference between the control and test group is significant.

Comment on this article Share: