Skip to main content

Score Formula

This page will be polished some time in the future, but for now it's being used in production as of May 2024.

General Goals & Ideas

There are 2 goals the OyenCov needs to achieve with the score formula:

  1. Preserving the legacy test coverage goal: Ensuring every line of codebase in use, is tested at least once.
  2. Incorporate usage weights into scoring, to help the engineers in prioritizing efforts into frequently used, thus commercially important parts of the codebase.

In the current formula, Methods are the first class citizens we measure usages & weightings of, including controller actions and background jobs’ perform method.

We currently don't look at conditional branches, but it's on the table, we will revisit it in the next iterations.

Usage-weighting

The most used Methods in any given codebase running in production will have significantly higher orders of magnitude hits than most of the other Methods.

Graph showing usage weighting curve before normalizing

We turn usage into a more usable and tame weighting by sorting it and assign them with weights from 1.0 to 10.0. This ensures the most used Method is at most 10x as valuable as the least used ones.

Graph showing usage weighting curve after normalizing

normalized_weight=1+log(1positiontotal_Methods_count)log(113)\text{normalized\_weight} = \left\lceil 1 + \frac{\log\left(1 - \frac{position}{total\_Methods\_count}\right)}{\log(1 - \frac{1}{3})} \right\rceil

Test/prod deviation for each Method

  1. We calculate the mean and standard deviation log(raw_test_hits/raw_prod_hits) of the Methods that are both actively in use & tested.
  2. For Methods below the mean of test/prod coverage, calculate z-score = how many standard deviations below the mean.
  3. For Methods that are used but not tested, we set the z-score to be -10.
deviationi={log(TiPi)μσif Ti>0 and log(TiPi)<μ10if Ti=0 and Pi>0deviation_i = \begin{cases} \frac{\log\left(\frac{T_i}{P_i}\right) - \mu}{\sigma} & \text{if } T_i > 0 \text{ and } \log\left(\frac{T_i}{P_i}\right) < \mu \\ -10 & \text{if } T_i = 0 \text{ and } P_i > 0 \\ \end{cases}

General Score = sum(Usage-weight * Deviation)

Going back to our general goals, we want to ensure your engineering & testing efforts closely reflect how the codebase is being used in real life, while also not neglect any parts that are being used, however minor it is.

general_score=102(10+i=1nwidii=1nwi)\text{general\_score} = 10^2 \left( 10 + \frac{\sum_{i=1}^{n} w_i d_i}{\sum_{i=1}^{n} w_i} \right)

The general_score will be between 0 and 1000. Ideally, you should be going for 950, that should give your team sufficient confidence that most use cases of your application are covered with test suites.