OPINION13 July 2017

Can we really trust open source?

Opinion UK

Working with open source software may not always be easy or comfortable but it is vital for innovation in analytics says Ryan Howard.

Code lines_crop

Not that long ago, big brand software dominated marketing sciences. For the most part, your software licence determined your toolbox into which ideas were shoe-horned. Options were limited and innovation was superficial. That said, everything worked intuitively and on the first time of asking, with a few clicks. Training and recruitment revolved around software and it was easy to know what the competition was doing. Analytics was standardised. There was even a help line.

Then came Git. It allowed academics, mathematicians, statisticians and software developers to share code and ideas seamlessly; piggyback on what worked and discard the nonsense. open source ceased to be ‘the free version that was not good enough to charge a license for’. Rather, it became the showcase for new thinking and it wrapped up its cutting-edge algorithms with a big beautiful bow. Without the overhead of licences, we began delivering exciting analytics at a fraction of the cost.

A question lingers: open source is devoid of assurances or guarantees, for some, this means that it cannot be trusted for commercial application.

I find three problems with this argument:

  • Firstly, I am not convinced that the individuals giving time freely (some sponsored) are inferior to those working on enterprise equivalents. On the contrary, it is only the most talented and recognisable individuals who lead open source projects. This is where they have earned their wings and continue to contribute.
  • Secondly, the big five digital giants frequently release their languages, frameworks and libraries – not because they are not good enough, but so they can be trialled and trusted.
  • Finally, and perhaps cynically, I do not recall the small print of enterprise solutions offering to take liability for errors in their software – and of course, there are always bugs. Alternatively, open source is not a black box so we are free to delve into the code and design tests for it.

In highly regulated sectors such as pharmaceutical, insurance and finance, open source needs more than ad hoc testing. Without help, open source software, our most beloved R programing language, in particular – still has serious trust issues. Here IT needs to know that the specified R version and packages are completely safe to use.

It would remiss not to mention the noted expert on the subject, Richard Pugh, chief data scientist at Mango Solutions. Richard and his team developed ValidR. It’s a single install bundle that produces a validated version of R to comply with regulatory guidelines, along with all the assurances and requisite evidence. In doing so, it works as the necessary bridge between the wild and fluid open source world and highly regulated industry. Dare I suggest that traversing regulatory guidelines is the final hurdle to the universal acceptance?

Right now across the globe, open source communities are refining code collaboratively. Because the code is employed so widely and frequently, across many disciplines for so many different applications, nasty bugs are weeded out sooner. Working in informal but orderly hierarchies of contributors, hundreds – and sometimes thousands – of eyes trawl through, quality checking as the project grows. The code evolves and the fittest survive.

Working with open source code is often unpleasant. There are no GUIs or a three finger thick troubleshooting manual to fall back on. Sometimes it is a real pain. The kind that will rob you of your weekend. In exchange, we receive so much more. We need to embrace it. Get more involved, continue to innovate, throw the full weight of our community behind it and never hark back to simpler times.

Ryan Howard is director advanced analytics at Simpson Carpenter