Testing Relationships

My stats guru colleague Dr Andrew Pratley and I are on the move to tackle Quantifornication, the plucking of numbers out of thin air. Here is the sixth in a series we are co-writing.

Relationships are the branch of statistics that describe how one thing influences another. We know these as regression analysis, x-y plots or scatter plots. The classic regression plot involves a line of best fit. We tend to think that the better the line of fit, the better the statistical relationship. That’s true, but it’s not the entire story.

Just like in our lives, (statistical) relationships are complex and often hard to interpret. We tend to fall back on rules of thumb, rather than really understanding what we’re trying to do. The point of relationships in statistics is the belief that we have something relatively cheap and easy that is a good predictor of something we’re interested in that is complex, difficult and time-consuming. We need both of these to be true for statistical relationships to be worth pursuing.

Take early childhood education for example. Until the data was collected and analysed, many parents would have suspected that investing in their child’s early education was for the better but could not be sure because life outcomes are complex. We now know that time, effort and money put into a child before they start school has a profound impact on their education level and achievement (OECD). Ask parents about the money they’ll spend on their child’s education and some may respond that they’re saving for a private school for high school or possibly university. One of the challenges with relationships is appreciating the time over which they occur. It’s arguably one of the primary reasons for compulsory superannuation. We simply can’t look far enough ahead to understand the benefits of what we’re doing today.

In case you’re wondering why early childhood learning is so valuable it’s a combination of the rate of brain development (that slows down over time) and that learning early creates the ability to become a better learner. In an optimal world, we’d spend every dollar and minute we have developing children before they enter kindergarten. This investment compounds in the same way early contributions to superannuation do.

In risk management, there are a number of statistical relationships that might be of interest; such as the relationship between control measures and likelihood. What we’d like to see is that by implementing a control measure we reduce the likelihood of something happening. For example, we should be interested in the relationship between cybersecurity training and the likelihood of a breach. However, we rarely objectively measure the value of training. But if we did, what we’d want to see is that for each session staff attend, two outcomes occur.

First, we’d hope to see staff being more cautious about clicking on links and secondly we’d hope to see this sustained beyond just a short period of time. A strong relationship between the training and the intended outcomes, less likelihood of a successful phishing attack, would mean we could be clear on the optimal investment in training. And the optimal investment in training could be determined by asking the question: “How much training do I need to reduce the likelihood of a cyber breach from phishing attacks to a level of x per annum per thousand employees?”

If we found that the amount of training didn’t have a relationship to reducing the likelihood, then it’s possible that we have an ineffective control measure.

Stay safe and adapt – with better measurement!