Updated daily - Curated by Geoffrey Idun

The Idun Times

Read My Blog Visit My Site

Facebook’s Safety Check feature gets its own dedicated button, can be accessed anytime

 Facebook is giving its “Safety Check” feature a permanent home in its app and on the desktop, the company announced today. The feature, which lets you check to see whether friends and family are safe following a crisis, will now have its own dedicated button in the app’s navigation menu and will be available via the Facebook website on the desktop. Read More
Social – TechCrunch

FacebookTwitterGoogle+LinkedInEmailPinterest

Statistical Design in Online A/B Testing

Statistical Design in Online A/B Testing

A/B testing is the field of digital marketing with the highest potential to apply scientific principles, as each A/B experiment is a randomized controlled trial, very similar to ones done in physics, medicine, biology, genetics, etc. However, common advice and part of the practice in A/B testing are lagging by about half a century when compared to modern statistical approaches to experimentation.

There are major issues with the common statistical approaches discussed in most A/B testing literature and applied daily by many practitioners. The three major ones are:

  1. Misuse of statistical significance tests
  2. Lack of consideration for statistical power
  3. Significant inefficiency of statistical methods

In this article I discuss each of the three issues discussed above in some detail, and propose a solution inspired by clinical randomized controlled trials, which I call the AGILE statistical approach to A/B testing.

1. Misuse of Statistical Significance Tests

In most A/B testing content, when statistical tests are mentioned they inevitably discuss statistical significance in some fashion. However, in many of them a major constraint of classical statistical significance tests, e.g. the Student’s T-test, is simply not mentioned. That constraint is the fact that you must fix the number of users you will need to observe in advance.

Before going deeper into the issue, let’s briefly discuss what a statistical significance test actually is. In most A/B tests it amounts to an estimation of the probability of observing a result equal to or more extreme than the one we observed, due to the natural variance in the data that would happen even if there is no true positive lift.

Below is an illustration of the natural variance, where 10,000 random samples are generated from a Bernoulli distribution with a true conversion rate at 0.50%.

Natural Variance

In an A/B test we randomly split users in two or more arms of the experiment, thus eliminating confounding variables, which allows us to establish a causal relationship between observed effect and the changes we introduced in the tested variants. If after observing a number of users we register a conversion rate of 0.62% for the tested variant versus a 0.50% for the control, that means that we either observed a rare (5% probability) event, or there is in fact some positive difference (lift) between the variant and control.

In general, the less likely we are to observe a particular result, the more likely it is that what we are observing is due to a genuine effect, but applying this logic requires knowledge that is external to the statistical design so I won’t go into details about that.

The above statistical model comes with some assumptions, one of which is that you observe the data and act on it at a single point in time. For statistical significance to work as expected we must adhere to a strict application of the method where you declare you will test, say, 20,000 users per arm, or 40,000 in total, and then do a single evaluation of statistical significance. If you do it this way, there are no issues. Approaches like “wait till you have 100 conversions per arm” or “wait till you observe XX% confidence” are not statistically rigorous and will probably get you in trouble.

However, in practice, tests can take several weeks to complete, and multiple people look at the results weekly, if not daily. Naturally, when results look overly positive or overly negative they want to take quick action. If the tested variant is doing poorly, there is pressure to stop the test early to prevent losses and to redirect resources to more prospective variants. If the tested variant is doing great early on, there is pressure to suspend the test, call the winner and implement the change so the perceived lift can be converted to revenue quicker. I believe there is no A/B testing practitioner who will deny these realities.

These pressures lead to what is called data peeking or data-driven optional stopping. The classical significance test offers no error guarantees if it is misused in such a manner, resulting in illusory findings – both in terms of direction of result (false positives) and in the magnitude of the achieved lift. The reason is that peeking results in an additional dimension in the test sample space. Instead of estimating the probability of a single false detection of a winner with a single point in time, the test would actually need to estimate the probability of a single false detection at multiple points in time.

If the conversion rates were constant that would not be an issue. But since they vary without any interventions, the cumulative data varies as well, so adjustments to the classical test are required in order to calculate the error probability when multiple analyses are performed. Without those adjustments, the nominal or reported error rate will be inflated significantly compared to the actual error rate. To illustrate: peeking only 2 times results in more than twice the actual error vs the reported error. Peeking 5 times results in 3.2 times larger actual error vs the nominal one. Peeking 10 times results in 5 times larger actual error probability versus nominal error probability. This is known to statistical practitioners as early as 1969 and has been verified time and again.

If one fails to fix the sample size in advance or if one is performing multiple statistical significance tests as the data accrues, then we have a case of GIGO, or Garbage In, Garbage Out.

2. Lack of Consideration for Statistical Power

In a review of 7 influential books on A/B testing published between 2008 and 2014 we found only 1 book mentioning statistical power in a proper context, but even there the coverage was superficial. The remaining 6 books didn’t even mention the notion. From my observations, the situation is similar when it comes to most articles and blog posts on the topic.

But what is statistical power and why is it important for A/B experiments? Statistical power is defined as the probability to detect a true lift equal to or larger than a given minimum, with a specified statistical significance threshold. Hence the more powerful a test, the larger the probability that it will detect a true lift. I often use “test sensitivity” and “chance to detect effect” as synonyms, as I believe these terms are more accessible for non-statisticians while reflecting the true meaning of statistical power.

Running a test with inadequately low power means you won’t be giving your variant a real chance at proving itself, if it is in fact better. Thus, running an under-powered test means that you spend days, weeks and sometimes months planning and implementing a test, but then failing to have an adequate appraisal of its true potential, in effect wasting all the invested resources.

What’s worse, a false negative can be erroneously interpreted as a true negative, meaning you will think that a certain intervention doesn’t work while in fact it does, effectively barring further tests in a direction that would have yielded gains in conversion rate.

Power and Sample Size

Power and sample size are intimately tied: the larger the sample size, the more powerful (or sensitive) the test is, in general. Let’s say you want to run a proper statistical significance test, acting on the results only once the test is completed. To determine the sample size, you need to specify four things: historical baseline conversion rate (say 1%), statistical significance threshold, say 95%, power, say 90%, and the minimum effect size of interest.

Last time I checked, many of the free statistical calculators out there won’t even allow you to set the power and in fact silently operate at 50% power, or a coin toss, which is abysmally low for most applications. If you use a proper sample size calculator for the first time you will quickly discover that the required sample sizes are more prohibitive than you previously thought and hence you need to compromise either with the level of certainty, or with the minimum effect size of interest, or with the power of the test. Here are two you could start with, but you will find many more on R packages, GPower, etc:

Making decisions about the 3 parameters you control – certainty, power and minimum effect size of interest is not always easy. What makes it even harder is that you remain bound to that one look at the end of the test, so the choice of parameters is crucial to the inferences you will be able to make at the end. What if you chose too high a minimum effect, resulting in a quick test that was, however, unlikely to pick up on small improvements? Or too low an effect size, resulting in a test that dragged for a long time, when the actual effect was much larger and could have been detected much quicker? The correct choice of those parameters becomes crucial to the efficiency of the test.

3. Inefficiency of Classical Statistical Tests in A/B Testing Scenarios

Classical statistics inefficiency

Classical tests are good in some areas of science like physics and agriculture, but are replaced with a newer generation of testing methods in areas like medical science and bio-statistics. The reason is two-fold. On one hand, since the hypotheses in those areas are generally less well defined, the parameters are not so easily set and misconfigurations can easily lead to over or under-powered experiments. On the other hand – ethical and financial incentives push for interim monitoring of data and for early stopping of trials when results are significantly better or significantly worse than expected.

Sounds a lot like what we deal with in A/B testing, right? Imagine planning a test for 95% confidence threshold, 90% power to detect a 10% relative lift from a baseline of 2%. That would require 88,000 users per test variant. If, however, the actual lift is 15%, you could have ran the test with only 40,000 users per variant, or with just 45% of the initially planned users. In this case if you were monitoring the results you’d want to stop early for efficacy. However, the classical statistical test is compromised if you do that.

On the other hand, if the true lift is in fact -10%, that is whatever we did in the tested variant actually lowers conversion rate, a person looking at the results would want to stop the test way before reaching the 88,000 users it was planned for, in order to cut the losses and to maybe start working on the next test iteration.

What if the test looked like it would convert at -20% initially, prompting the end of the test, but that was just a hiccup early on and the tested variant was actually going to deliver a 10% lift long-term?

The AGILE Statistical Method for A/B Testing

AGILE Statistical Method for A/B Testing

Questions and issues like these prompted me to seek better statistical practices and led me to the medical testing field where I identified a subset of approaches that seem very relevant for A/B testing. That combination of statistical practices is what I call the AGILE statistical approach to A/B testing.

I’ve written an extensive white-paper on it called “Efficient A/B Testing in Conversion Rate Optimization: The AGILE Statistical Method”. In it I outline current issues in conversion rate optimization, describe the statistical foundations for the AGILE method and describe the design and execution of a test under AGILE as an easy step-by-step process. Finally, the whole framework is validated through simulations.

The AGILE statistical method addresses misuses of statistical significance testing by providing a way to perform interim analysis of the data while maintaining false positive errors controlled. It happens through the application of so-called error-spending functions which results in a lot of flexibility to examine data and make decisions without having to wait for the pre-determined end of the test.

Statistical power is fundamental to the design of an AGILE A/B test and so there is no way around it and it must be taken into proper consideration.

AGILE also offers very significant efficiency gains, ranging from an average of 20% to 80%, depending on the magnitude of the true lift when compared to the minimum effect of interest for which the test is planned. This speed improvement is an effect of the ability to perform interim analysis. It comes at a cost since some tests might end up requiring more users than the maximum that would be required in a classical fixed-sample test. Simulations results as described in my white paper show that such cases are rare. The added significant flexibility in performing analyses on accruing data and the average efficiency gains are well worth it.

Another significant improvement is the addition of a futility stopping rule, as it allows one to fail fast while having a statistical guarantee for false negatives. A futility stopping rule means you can abandon tests that have little chance of being winners without the need to wait for the end of the study. It also means that claims about the lack of efficacy of a given treatment can be made to a level of certainty, permitted by the test parameters.

Ultimately, I believe that with this approach the statistical methods can finally be aligned with the A/B testing practice and reality. Adopting it should contribute to a significant decrease in illusory results for those who were misusing statistical tests for one reason or another. The rest of you will appreciate the significant efficiency gains and the flexibility you can now enjoy without sacrifices in terms of error control.

image 
Natural Variance
Classical statistics inefficiency
AGILE Statistical Method for A/B Testing


Online Behavior – Marketing Measurement & Optimization

FacebookTwitterGoogle+LinkedInEmailPinterest

Marcofolio.net vNext

Last May, Marcofolio.net turned 10 years old. Although I’m not blogging as frequently anymore as back when I started,
I’m still dedicated to share my passion and inspire my readers. That’s why I’m bringing you Marcofolio.net vNext, a complete overhaul and redesign of my blog. I decided to go for a minimal & clean theme to keep focus on the most important thing: the content.

I decided to focus this blog on development, split in different categories like Xamarin, Cognitive Services and Web Development. You’ll find all other categories on the top of this page.

The logo

The new logo is inspired by lines of code

I’m especially happy with the new logo, proudly showing off at the top of this site. The new logo is inspired by lines of code and has been created with my brother Auke from Rocket Media. My previous logo looked a little bit like a fidget spinner so I’m totally excited to share you my vNext logo.

The old

The last redesign of Marcofolio.net was from 2009 and could really use a 2017-update. My old blog was running on Joomla!, but moving forward I decided to switch to WordPress. I’ve moved over a couple of unique articles related to coding to clean up the content, but you’re still able to visit the old website. Simply head over to old.marcofolio.net to take a glimpse at the past. If you had any bookmarks, everything should still work!

Your thoughts

Anything you want to see different? Stuff that’s not working? Let me know what you think in the comments or on Twitter! I’m now even more motivated to deliver high quality development articles, so expect them soon. Feel free to subscribe to the feed to make sure you won’t miss out!

The post Marcofolio.net vNext appeared first on Marcofolio.net.

Marcofolio.net

FacebookTwitterGoogle+LinkedInEmailPinterest

How to Repurpose Blog Posts Into Instagram Albums

Are you looking for Instagram content ideas? Have you considered repurposing your blog content into Instagram albums? Grouping multiple images from a blog post into an Instagram album can bring engaging content to Instagram. In this article, you’ll discover how to combine blog posts into Instagram albums. Why Use Instagram Albums to Repurpose Blog Content? […]

This post How to Repurpose Blog Posts Into Instagram Albums first appeared on .
– Your Guide to the Social Media Jungle

FacebookTwitterGoogle+LinkedInEmailPinterest

Layering in Additional Insights

In my previous post, we discussed the importance of creating a data strategy and modeling your data. Interestingly enough in recent research completed with Econsultancy 46% of ANZ respondents cited integrating data as their key challenge with marketing automation. Once you have your base data model, you may also decide to layer in additional insights based on analysis done either by an in-house team or an agency. This could include data like customized personas based on buying type or what type of customers are detractors.

Following on from the last post, we will continue with our automotive example: The main data object is the buyer and the secondary data object is accessories or factory options. Beyond the base layer, which is the data that you can collect directly from the buyer on a form or from a third-party provider such as Dun and Bradstreet, we can also start tracking their digital body language or external web analytics data to provide insight into how we can better personalize their experience. Digital body language can be tracked through marketing automation platforms that allow us to monitor and view how a client or prospect is engaging across digital channels.

For example, for buyers that purchased in the last quarter were there any common engagement criteria that a buyer has completed that would trigger an intent to buy?

Did they visit the site more frequently?

Did they complete a Sales enquiry form?

Did they schedule an appointment with a Sales Rep?

Can we look for similar activity within our prospect database as provide a more targeted communications strategy to drive conversions based on these additional insights?

Another layer of insight to utilise is mobile usage. Do you know if your customers use mobile versus desktop? Is there app data you can leverage to decide on the frequency or method of communication? If you notice a particular group of buyers searching using their mobile device, perhaps they can choose to receive push notifications versus email. Can you layer any push notifications via SMS or an app especially for time sensitive communications?

From a web analytics standpoint, you can take a similar approach to mobile and analyze patterns in web browsing behavior and see if there are any trends that predominate in a particular buyer group. For example, you may find buyers within a certain age group go online during a particular time of day and search for certain search terms on the site.

In our next post, we’ll discuss how to represent these objects in your marketing automation platform.

As a B2B Marketer, your days are spent trying to reach your customers with the right message at the right time. For ideas on using Account Based Marketing to make this easier, check out this free download.

Account Based Marketing


Oracle Blogs | Oracle Marketing Cloud

FacebookTwitterGoogle+LinkedInEmailPinterest

Political illustrator Ellie Foreman-Peck on her unfortunately abundant Trump back catalogue

Ellie-foreman-peck-trump-political-illustration-itsnicethat-list

“I didn’t believe I would still be drawing Trump’s face after the elections. Now, I’ve drawn him too many times to count,” says illustrator Ellie Foreman-Peck. Her knack for capturing expression and character has seen her visually satirise most of our political leaders for The Guardian, The Economist and Standpoint, but there’s one face that she, among many, wishes she didn’t have to examine so much.

Read more


It’s Nice That

FacebookTwitterGoogle+LinkedInEmailPinterest

Build your first bot with Bot Framework and LUIS

Chatbots (or bots) are hot. Retrieving data through natural language is getting more and more common. Many people already have apps like Slack or Messenger installed and letting them use your application through these channels can provide you with more opportunities. In this article I’ll show you how to build your first chat bot with Bot Framework and LUIS. Although LUIS is not required for the bot framework itself, we dive directly in the combination of both. Check out my previous article on how to set up LUIS. Building a bot with the Bot Framework is extremely easy, yet very powerful. Let’s see how this can be done!

Build a great conversationalist.

Requirements

Before we continue, make sure you have the following installed:

The Project

Download the Bot Framework Visual Studio Template and install it as a ProjectTemplates. Once installed correctly, create a new project in Visual Studio. Select Bot Application and give it a name. Create your solution and you’re ready to build your first bot.


Let’s inspect the most important things the template created for us:

  • WebApiConfig: Since the Bot Framework runs on ASP.NET WebAPI, this class is needed to handle settings, config and routes.
  • MessagesController: This controller handles all incoming messages that will be consumed by the bot. Especially take a look at the following pieces of code:
    • HandleSystemMessage(Activity message): System messages can be handled as well, for example you can get notified when the user is typing a message.
    • new Dialogs.RootDialog(): When a user message is received, the RootDialog is created.
  • RootDialog: The starting point of our bot. Inspect the MessageReceivedAsync-method where you’ll see that the bot will respond to any message by telling you how many characters you’ve sent.

Without doing anything furter, we’re able to talk to this bot. Simply build and run the application which will start your browser with a general message from the default.htm-file.

The Emulator

To be able to talk to this bot, we’ll first be using the emulator. Simply start this application and enter your endpoint URL. Since the application is running locally, enter the URL from the browser window that just opened and append /api/messages (this is the default route). Hit Enter and the Emulator will start connecting with the bot. Simply type a message and start chatting with your first bot!

Inspect the following parts of the emulator:

  • The URL on top where you’re able to see to which bot you’re connected to.
  • The messages window on the left side to show you the conversation.
  • If you click on a message, you’ll get more (technical) details in the Details-pane.
  • The Log-pane displays the current activity and communication.

Adding LUIS

Now that we got our first bot up and running, it’s time to make it a bit smarter by adding LUIS to the mix. Simply make the RootDialog able to handle the Intents you’ve defined in LUIS. Change the class to look like the following:

 [LuisModel(Constants.LUIS_MODEL_ID, Constants.LUIS_SUBSCRIPTION_KEY)] [Serializable] public class RootDialog : LuisDialog<object> {     #region None      [LuisIntent("")]     [LuisIntent("None")]     public async Task None(IDialogContext context, LuisResult result)     {         await context.PostAsync("Sorry, I'm not able to handle that.");     }      #endregion      #region OrderFood      [LuisIntent("OrderFood")]     public async Task OrderFood(IDialogContext context, LuisResult result)     {         await context.PostAsync("I understand that you want to order some food.");     }      #endregion } 

Let’s walk over this code step by step:

  • A LuisModel attribute has been added, requiring a LUIS_MODEL_ID and LUIS_SUBSCRIPTION_KEY from LUIS.
    • LUIS_MODEL_ID can be found on the Overview for your app, also called an App Id.
    • LUIS_SUBSCRIPTION_KEY can be found under My Keys.
  • The dialog now inherits from LuisDialog.
  • The LuisIntent attributes are used to handle the Intents that LUIS recognized. They should have the same name as configured in LUIS.

That’s all there is to it! Simply build and run, fire up your emulator again and start talking to your bot to see if LUIS is able to recognize the intents.

Entities

But LUIS isn’t only able to recognize Intents, but also Entities which makes everything a whole lot more powerful. In LUIS we’ve defined food as an Intent, so let’s try to extract that in the OrderFood-method.

 [LuisIntent("OrderFood")] public async Task OrderFood(IDialogContext context, LuisResult result) {     // Check if LUIS has identified the entity that we should look for.     string food = null;     EntityRecommendation rec;     if (result.TryFindEntity("food", out rec)) food = rec.Entity;      if(string.IsNullOrEmpty(food))     {         await context.PostAsync("I understand that you want to order some food.");     }     else     {         await context.PostAsync($  "I understand that you want to order {food}.");     } } 

As you can see, we’re using TryFindEntity to search for the LUIS Entity we’re looking for. When we have that value, we display it to the user. It’s that easy! Now build and run again and see your first chat bot with LUIS come to life in your emulator.

Conclusion

With just a few lines of code we’re able to make to make our first smart bot using the Bot Framework and LUIS. As you can imagine, we’ll need more to actually make this a useful bot but this is a pretty good start. Simply start adding more Intents to LUIS and let the Bot Framework handle these. Stay tuned for more articles about smart bots and the Bot Framework. Let me know what you think in the comments or on Twitter.

Want to learn more about this subject?
Join my “Weaving Cognitive and Azure Services“-presentation at TechDaysNL 2017!

The post Build your first bot with Bot Framework and LUIS appeared first on Marcofolio.net.

Marcofolio.net

FacebookTwitterGoogle+LinkedInEmailPinterest

James Leman’s glorious palette – Part 2

This series of blog entries describes the scientific analysis of pigments and dyes used on the Leman Album designs.


In March 2017 the population of the V&A Science Section, normally amounting to a meagre 5 bodies, ballooned to a whopping 11 (Figure 1).

Figure 1: V&A Science Section population count – updated in March 2017.

Figure 1: V&A Science Section population count – updated in March 2017.

This was due to a very welcome invasion of MOLAB foreign scientists from Perugia, Italy, who spent 5 days at the V&A analysing the Leman album with us.

Figure 2: MOLAB 1 A visiting scientists in front of a textile made to Leman’s design (T.156-2016). Photography by Eileen Budd © Victoria and Albert Museum.

Figure 2: MOLAB 1 A visiting scientists in front of a textile made to Leman’s design (T.156-2016). Photography by Eileen Budd © Victoria and Albert Museum.

MOLAB stands for MObile LABoratory, and is made of a number of European laboratories and research centres who provide their portable scientific equipment and expertise to cultural heritage institutions. Museums and galleries in Europe bid for MOLAB time, and if they are successful they receive a visit by MOLAB, fully funded by the EU.

Figure 3: MOLAB van used to deliver portable scientific equipment to cultural heritage institutions. Photography by Costanza Miliani, MOLAB.

Figure 3: MOLAB van used to deliver portable scientific equipment to cultural heritage institutions. Photography by Costanza Miliani, MOLAB.

This is exactly what happened to us: we put in a bid in September 2016, competing with many heritage institutions from other countries. We were successful, and were awarded three successive MOLAB visits, each by a different European research group.

In March 2017 MOLAB 1A researchers from Perugia came to the V&A with five different sets of state-of-the-art scientific equipment to help us analyse the Leman Album using non-invasive methods to which we do not have access in-house (yet!). The analysis work was coordinated by Erasmus intern Rosarosa Manca, who spent a few months at the V&A helping with the Leman analyses.

Figure 4: Scientist Chiara Grazia analysing two of the Leman designs with her MOLAB equipment. Photography by Lucia Burgio © Victoria and Albert Museum.

Figure 4: Scientist Chiara Grazia analysing two of the Leman designs with her MOLAB equipment. Photography by Lucia Burgio © Victoria and Albert Museum.

Some of the MOLAB scientific methods are particularly indicated for the analysis of natural colourants derived from plants and insects.

Other MOLAB methods allowed us to map the pigments used on the designs. This was useful where we suspected degradation processes, for example where white pigments had darkened to red, brown or even black. Figure 5 shows two Leman designs where lead white has darkened, probably due to environmental pollution.

Figure 5: Discoloured white pigment on Leman designs on the left, and corresponding lead map on the right. Image prepared by Rosarosa Manca, Erasmus intern.

Figure 5: Discoloured white pigment on Leman designs on the left, and corresponding lead map on the right. Image prepared by Rosarosa Manca, Erasmus intern.

A technique called FTIR (which stands for Fourier-transform infra-red spectroscopy) helped us to confirm and support the identification of a number of pigments and dyes which we had already recognised with other methods.

Next time I will tell you more about the visit by the other two MOLAB groups, who came to work on the Leman album at the end of March and in June 2017.


Read other Leman blogs:

Blog

FacebookTwitterGoogle+LinkedInEmailPinterest

Get In Touch!

Or save my details for later...