Experimental approaches to selecting a programmer’s tools

You can read this article on my own blog, if you prefer that to medium.

Like many people, I have a sort of pet programming paradigm, which I enjoy adapting to the various codebases and languages I work with.

I wanted to standardize it a bit, and share it with the world. It was going to be called something along the lines of “explicitness-focused minimal-shared-state programming” (I’m not good at names). I was thinking about how to prove its worthwhileness, both to the world and to myself, when the applied statistician in me kicked in and taught: “Hey, I could try creating a small experiment”.

So I did, I wrote two pieces of code doing the same thing, a rather simple task. One was functional, the other one was written in this pet paradigm of mine. I made a website that would serve participants either one or the other pieces of code and ask them to explain (in a few paragraphs) what the code did.

The code was simple enough that any person that can read code would have been able to figure out what it does after a bit of head scratching. Indeed, after manually checking about 20% of the answer (randomly selected), none of them were wrong.

But, the answer itself was not the data I actually cared about. Instead, what I wanted to monitor, is how long people took to come up with the answer and how many people gave up after reading about the task and wrote nothing.

My thinking went along the lines of: “If a programming style is truly easy to read, that means a programmer should have an easier time translating it into abstract concept inside their brain and subsequently into words”. So, you can partially asses the difficulty of a programming styles by seeing how quickly someone can understand a “simple” piece of code, rather than seeing if he can understand a large codebase (which comes with a host of issues).

To my surprise and discontent, the study failed to prove my hypothesis. On a sample of a few hundred answers, there was no significant difference in how quickly people explained away the code written in my pet paradigm vs the code written in a purely functional style (after removing some questionable data and z score > 5 type of outliers).

So my plans for popularizing and contouring this new paradigm were despoiled by my poor attempt at statistical psychology.

However, this got me thinking about how we “validate” the programming paradigms we use.

In searching for reasons about why and where certain styles are preferential to other, one comes up with a nearly infinite amount of blog posts, reddit arguments, whitepapers, talks, anecdotes, etc.

Yet, there’s no actual studies to back any of this up.

A lot of what we do to back up a certain paradigm is waxing philosophical about it.

Instead, what we should be doing, is looking at its benefits through the critical lens of “science” (read: experiments).

A (thought) experiment

Front-end development is an in-demand field right now. Let’s assume the two “leading” paradigms in terms of tooling, style and syntax are Angular with code written in Typescript and React + Redux with code written in ES6.

I’m not saying they are the two leading styles, since I’m not that up-to date with browser technologies. But, for the sake of argument, please assume they are.

There’s no quantifiable difference between these two stacks of tools. Of course, I hear a choir of people screaming, there’s all the difference in the world:

  • Blah result in better isolation which can improve modularity.
  • Bleh leads to cleaner interfaces which makes the code easy to read.
  • Blax helps us get more uniform performance across browsers leading to a more predictable UX.
  • Blerg means that state is immutable from the perspective of X which makes modifications to Y easier to make.
  • Blif will help new people learn “our way” of doing things faster than they’d learn the “other way” of doing things.
  • … etc, etc, etc,

As I said before, you can wax lyrically to no end about why your paradigm is better and the opposite one is the root of all evil.

Let’s say a large company has to constantly build a lot of complex frontends. Most of the leads in the company agree they’d prefer to use mostly the same tools to accomplish this, but they strongly disagree as to weather that should be Angular written in typescript or React & Redux written in ES7.

Why not try the following ?

  1. Select three experienced Angular & TS devs, with confirmation of said experience from peers and projects.
  2. Select three experienced React & Redux devs, same methodology.
  3. Have the same QA and ops team work with both these frontend teams, so that they have the same “track” to production.
  4. Monitor how fast the two frontends get to a prototype version, a useable beta version and a final release version.
  5. Run the two frontends in parallel, serving them randomly to users, a sort of A/B testing, but with the whole frontend rather than by components.
  6. Monitor the number of bugs reported by users (or logged automatically) on said frontends.
  7. Monitor the time it takes for the team to fix said bugs and deploy the fixes.
  8. Monitor the time it takes to add new features.
  9. Replace each team member with a junior that doesn’t know the tools being used, but is excited about learning them. Add a new junior every month and move one senior dev to another project every month after the junior is added. Within 3 months, the whole team will be changed (and the juniors will take full charge of the project)
  10. Monitor the same metrics as above: speed of feature addition, speed of debugging and number of bugs. Both in the transition period and with the new dev team.

By doing this experiment, over the span of less than a year, we’ll have gathered some data about our stack of tools in regards to:

  • How quickly those frameworks can be used to build a product.
  • How easy it is to debug and extend the final product.
  • How easy it is to teach new people a project that uses said tooling stack.

I’m not saying one of these types of experiment, executed once, would be the end of the discussion. However, it would at least give use some data to debate over. Given that a few companies did this and pooled their data together, we might even start having some semi-conclusive data based on which we can say things such as:

  • {X} is better for quick prototyping (xy% of the time this was studied)
  • {Y} results in less serious bugs in production (according to x,y,z)
  • {X} makes the code easier to read and learn for new team members (on a sample of n programmers distributed amongst y teams)

The hype (advertising) based approach to tooling

The problem with not using an experimental approach to choosing tools, is that it leaves a void to be filled up by more nefferious methods. Usually, this method is what I’d encompass in the word “hype”.

Sometime this hype based approach to tooling is relatively organic, that is to say, it’s based around honest feedback from early adopters that spread the tool to their friends. This “organic” hype is what has bought languages like Rust, Node and modern C++ into popularity among the young generation of programmers.

There’s the darker side of hype, the one that’s paid for by advertising budgets. This kind of hype is responsible for the virulent spread of technologies such as Java, Macbook hardware (and the pre-installed spyware called OSX), Oracle’s database or Google’s AMP.

Then, we have a lot of murky middle ground in between, with technologies such as Scala, Angular, React, Tensorflow or MongoDb. Tools that obviously have a lot of ad money behind them, but do seem to gain traction naturally in some communities.

An experimental approach to choosing programming technologies wouldn’t have to fully replace the hype based approach, but it would be a nice antagonist to it. We could still enthuse to no end about this or that shinny new tool, but we’d at least be able to garner the experiences of other users from a more exact format. Or, in the worst case, we as early adopters could start doing these experiments to prove the usefulness of the tool or realize we were swooped away by hype and we should just go back to our old implements.

What the best front-end framework is might be a silly question to answer via experiments. However, there are much more serious problems where this kind of approach would be useful, from what language and methodology DARPA should use to write software for killer drones, to trying to pick a subset of C++ in which to write complex and heavily-used projects such as Windows NT.

There are many improvements in programming that we’ve taken on faith, and many of them might be objectively worst or irrelevant compared to what they replaced.

A culture that analyses tools based on experimental measurements of their usage would be superior to the current hype-based machinery of promoting new technologies.

The hype-based approach could lead us to a more closed-sourced world, where corporations with large advertising budgets have too much influence over the tools that we use.

The demographics of my articles are quite biased and many of us are lucky enough to work in startups and technologically progressive companies, but we shouldn’t forget Java is the most used programming language in the world by most metrics.

The reason why Java managed to get so popular in the first place, was sponsored hype. Java is and was technically inferior to its rivals and harder to learn, debug and read (much more so in its v1 iteration).

In the case of Java, there is a silver lining, in that Oracle’s scheme of making it popular to then patent troll and license gauge its users into paying didn’t work. But Java is still an example of how progress in programming has been (allegedly) held back by corporate-sponsored hype.

Would an experiment based approach to choosing tool be able to dictate what tools to use in every situation ? No

Would it be open to immune to abuse by corporations or individuals that have the budget or skills to “hack” the system ? No

Would it be able to fully replace the hype-based approach to choosing tools ? No

It’s no a magic bullet, but I think it would be a welcome method of analysis in a world where the number of tools is growing exponentially, but our means of looking at said tools aren’t evolving.

The best part about it, is that people are already doing it to some extent, but they are doing it using rather shabby methodologies and they aren’t sharing any objective data, just blog posts full of anecdotes.

So, maybe, next time you build or try some new tool, see if you can do a simple experiment to observe how good it is in some respects… and please share said data with the community.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store