OPINION20 January 2021

A kind of magic

Data analytics Opinion UK

The line from data integration to marketing brilliance is drawn with data fusion, says Ryan Howard.

Magic hat spell_crop

Sustainable competitive advantage is found when data sources “talk to each other”, allowing us to “connect the dots” and “fill in the gaps”. Of all the throwaway marketing-speak we must endure, phrases like these rile me up the most, wilfully ignorant of the mechanics, peppered liberally into every data strategy webinar.

In truth, “bringing data together” lies somewhere between very difficult and impossible. Done well, however, it is nothing short of magic.

No more so than right now, when competitors share very similar data sources and platforms, and when digital data, while deep, is limited in scope. The good stuff is siloed away, walled off, shielded by GDPR opt-ins, too often leaving data integration at the mercies of PowerPoint. Here, the analyst must leap from one chart to the next, puzzle-piecing sources together under an overall narrative.

At best, it might mean plotting different sources on the same timeline to spot trends. For the lucky few, it is the hope that enough customers from their database will see their way through a survey, with no option but to quietly ignore the lapsing or cash-rich time-poor who don’t.

Falling so despairingly short of where we need to be, we leaf through to the very last page of our book of spells, where we find the greatest and most beloved trick of them all...

Fusion works by matching an individual from one database to a similar individual in another, using common information, then treats them as if they were the same person. The more relevant the information and the more matching alternatives available, the better and stronger each match will be.

Simply put, one dataset gets stuck onto the next. The resulting merged dataset performs surprisingly well at an overall level. There is also no need to touch personally identifiable information. Success is all but ensured by having proper options for everyone to match with, as is failure, if the relationships are even slightly tenuous.

Fusion, as we know it today, was nursed into commercial research in the late 80s by Roland Soong and Steve Wilcox. For a while at least, it was misconstrued, haphazard, requiring the tight guard of a likeminded cabal of stats wizards before it would find a foothold. Each would weave in their own flair. With only a few journal appearances around which it could rally, it remained a niche field, spreading as would a generational hand-me-down. It found the maestro Martin van Staveren, and later, me.

Martin would spend selfless hours at a time, month in and month out, over years, developing my intuition for its cruelties. It was not a recipe I could have copied and pasted off the internet nor lifted from a textbook. On the one hand, simple distance mathematics, on the other, an unwinnable game of push and pull. In the then dusty world of statistics, this was different, each foray felt like a rite of passage.

Underneath the industry’s prominent fusions, those involving audience currencies, are smaller single source datasets, used to monitor the health of the larger piece, guiding the design, and validating its estimates. Most fusions, however, don’t share this luxury – there is nothing to say: “Thumbs up! Good job.” It’s an absolute stinker when you think about it: It impossible to prove that a fusion worked. If there were, perhaps there would be no need for the fusion in the first place. A conceited catch-22.

Frankly speaking, this is an uncomfortable position to be in, always open to fair cross examination. Considering too, its lack of a standard technique or a mathematically defined optimal solution, fusion is not without controversy, the kind that will still divide a room. In this sense, it is not unlike a segmentation study, which must begin with everyone on the same page, planning, nailing down its use case and managing expectations.

In this careful fashion, fusion evolved under the most intense scrutiny to become established and widely practiced; its miraculous promise did not allow an easy path. Yet because it has only gained in standing, despite never receiving the benefit of doubt, I know of no other technique more deserving of the faith it calls for. Moreover, it stood this test of time without even pretending to be near perfect.

Luckily, we’re not in the business of achieving perfection. Instead, we’re asked to conjure up competitive advantage through the clever use of data. If this isn't it, I don't know what is? Then there is that thick black line from rounded customer understanding to intelligent targeting, cheaper acquisition, and longer retention. Overlaying tagged web analytics and survey-style profiling onto a modern CRM is paydirt in anyone’s book. Hear silos tumble. Behold, the golden age of fusion.

So, the next time you are asked about data integration, and simply refuse to regurgitate the half-hearted platitudes of a webinar, know that you too have proper options. Be not afraid of magic but hold your wand up high. Stand on the shoulders of giants, for you have a mighty spell to cast.

Further reading:

Baker, K, Harris, P & O’Brien, J. ( 1989 ). Data fusion: an appraisal and experimental evaluation, JMRS, 31, ( 2 ).

Santini, G. ( 1986 ). Méthodes de Fusion: Nouvelles réflexions, Nuvelles expériences, Nouveaux enseignements, Les médias, expériences et recherches, Séminaire de l’IREP.

Ryan Howard is a freelance data science consultant