The Wilson Score Interval

← prev up next →

The Wilson Score Interval

Background & Explanation

We are planning on doing a field trial for our new Flashload Over the Air (FLOTA) feature, which, as you may already be aware, is both highly sensitive from a business standpoint and not without technical risk. So we want to be able to state our level of ’confidence’ is in the feature quantitatively and not just qualitatively .

We assume that the vehicles in the field trial will enter a customer’s yard periodically (e.g. nightly) where we will attempt to flashload them. We have defined the conditions under which an attempt ’counts’, i.e. it is in Wi-fi coverage, the unit can be turned on, etc. So the question boils down to:

How many successes do we require in order to have (e.g.) an 80% confidence that the probability of a successful flashload is (e.g.) 90%?

Statistics

Some scrolling through Wikipedia eventually lands me on the Wilson Score Interval. If you have some background in statistics and probability, you may be familiar with the Central Limit Theorem which states (loosely) that if you have enough randomness you can just assume the Normal Distribution. The normal distribution was developed by Gauss and is well understood. Or as the Wikipedia page on the Central Limit Theorem quotes much more eloquently:

Sir Francis Galton described the Central Limit Theorem as:

I know of scarcely anything
so apt to impress the imagination
as the wonderful form of cosmic order
expressed by the Law of Frequency of Error.

The law would have been
personified by the Greeks
and deified,
if they had known of it.

It reigns with serenity
and in complete self-effacement,
amidst the wildest confusion.

The huger the mob,
and the greater the apparent anarchy,
the more perfect is its sway.

It is the supreme law of Unreason.

Whenever a large sample
of chaotic elements
are taken in hand
and marshaled
in the order
of their magnitude,
an unsuspected
and most beautiful
form of regularity
proves to have been
latent all along.

But the key question here is: how much data is enough to be able to assume that you have enough to use the Normal Distribution? The same Wikipedia article gives a rule of thumb which states that in our case, where we to desire to prove a 90% probability, we would need at least 50 trials. Can we do better?

Here is the equation for the Wilson Score Interval from the Wikipedia article: Wilson Score Interval

Here we are accepting an 80% confidence interval, so z1-a/2 = 1.28. This number is obtained by looking at the Normal Table, if you remember that from any statistics classes you may have had.

If we have m failures in n trials, then p = (n-m)/n . We want the lower end of the interval given above to be at least 0.9 (90% success). Results of the Calculation

Plugging the numbers into the formula and calculating, I find that at 80% confidence for 90% success with m failures, the number of trials we need is as follows:

failures trials 0 15 1 32 2 47 3 60 Alternatively, should we wish to prove a 99% success rate in flashload (again with 80% confidence) the number of trials that we would require for varying numbers of failures would be as follows.

failures trials 0 163 1 333 2 479 3 617

Does this seem extreme? It makes sense to me under the following logic. That it is approximately 10 times harder to prove 99% success than it is to prove 90% success. i.e. you are proving that you have 10 times fewer failures.

So to prove our 90% success rate (with 80% confidence) we would need to have 15 successes in a row. We can allow ourselves a single failure if we have 31 other successes. Method of Derivation of the Results Tools

The above numbers were created by solving the equations using some programming in Haskell, which is a programming language well suited to rigorous scientific and technical computations such as this. Haskell has many other virtues, but this is not the correct venue in which to discuss them. Another choice would have been R which is specifically for probability and statistics. Both of these languages can be run from within e-macs which is a well known and powerful text editor. (e-macs is practically a religion amongst some, but again this is not the venue to discuss its myriad capabilities.) This HTML file that you are currently reading was also written from within e-macs.

Here is the source for the Haskell Calculations: wilson.lhs. Haskell supports ’literate Haskell’ where the code is written within a framework that allows comments in LaTeX, which is a way to create .pdf and other documentation that allows for mathematical equations and other precise formatting. (This is only scratching the surface of the abilities of LaTeX.) The idea is that writing about the code is primary, the code itself just kind of falls out as a consequence of its description. The .pdf so created is here: wilson.pdf The Challenge

So far we are all well and good. The calculations have been made and we know what our target is. But then the question comes up: "Can we report, not just whether or not we have reached our goal, but what our progress is along our way to the goal?" And this should be stated in terms not just that we want 15 trials and we currently have 10 so we are 2/3 of the way through, but in the following terms: we currently have 22 trails and 1 failure so our current confidence level (that we have 90% success rate) is only XX%, whereas our goal being (in this case) 80%, this means that we have to conduct Q more trials before we have reached the desired target.

Now suddenly this is not something that can be addressed with a one-time (static) calculation, but is rather something that requires one to be able to dynamically calculate the confidence level. We want someone to go in each day, punch in the current numbers relating to the field trial, and get the current confidence level to report out to the team. Enter Javascript

Enter javascript. And yes, part of the reason I did this was not because it was so vital to get this dynamic calculation out to the stakeholders, but just because I wanted an excuse to play with HTML, js & css :-)

I therefore created the above HTML form which allows one so to enter the numbers and then provides the needed level of confidence for the success rate that is being targeted (90%). I included both the Wilson Score Interval and the Normal Distribution because the calculation based on the Normal Distribution is much easier. I also included the converse calculation for the question: What is the current level of probability of success that we can demonstrate with the fixed 80% level of confidence given the current number of trials and the current number of failures? Again, because this is a much easier calculation. Opportunities for Further Research

Eventually I had to give up on solving the Wilson Score Interval equation (above) explicitly for z and settled on just using a table to get the confidence level to within 1%, which is more than adequate for this particular purpose. I still feel like it is a bit of a failing on my part not to have gotten it and hope to return to the question.

← prev up next →