feed

Jan  23  Why bother creating new programming languages? by Chris Poirier • in Discussions/Generalpermalink

I know that at least a few of you are probably asking that question.  And I guess it’s a fair question — programming languages take a certain level of skill to design, and a certain level of expertise to implement.  Truth be told, there seems to be a common perception that language designers rate up there with brain surgeons and rocket scientists on the nerd scale.  And, to make matters worse, programming languages have a bad reputation as being hard to learn.

So, if only rare developers can create them, and using custom languages eliminates the “commodity” status on programmers, why would you even consider creating your own?

Well, even if we accept those two points as truth, the answer is still simple: with the right programming language, you can get a lot more done.

A contrived example

Consider this program describing a common activity:


   activate neurons 9388908 through 9388940
   wait 0.0002s
   activate neurons 9700938 through 9800838
   activate neurons 3992893 through 3992897
   wait 0.0001847
   deactivate neurons 9388908 through 9388912
   deactivate neurons 9388908 through 9388912


   (40 or 50 *billion* instructions skipped for space reasons)


   wait 0.0827053
   deactivate neurons 9388908 through 9388912

Now, consider this same example in a higher-level language:


   hold the bat <here>
   bring it up to your shoulder like <this>
   watch for the ball
      as it approaches, swing the bat down and forward into line with the ball
         follow through

Okay, okay — there is a lot more to swinging a baseball bat effectively than either of these two examples show, but I think it demonstrates the fundamental differences between low-level programming languages and a high-level ones:

low-level languages high-level languages
are very precise are very terse
make few assumptions rely on assumptions
can be used for almost anything are very specific to a single problem space

To me, the value of new programming languages is not in offering different syntax for the same thing.  That’s not what I’m talking about, here.  If the primary reason you want to write a new programming language is because you want to use { and } to delimit lists, instead of [ and ], or because you think % is a better address-of operator than &, you and I haven’t got much left to talk about.  ;-)

The value in building new programming languages is in offering better, richer, more expressive ways to describe a solution to a problem, ways that shift the focus away from the “trees” and onto the “forest”, instead.  So we can all get more done.

I don’t know about you, but if I had to teach someone to swing a baseball bat, I know which language I’d choose.

A less contrived example

My present paying job involves supporting a large networked information system.  The underlying software system — involving a half-dozen major components — is off-the-shelf, but has been heavily customized through data, configuration, and custom code.  There are two complete environments — one for testing, one for production — and each environment contains about a dozen machines.  All services within each environment have redundant copies on separate machines, to ensure maximum uptime.

When new customizations are to be deployed into the system, it generally involves copying files around the network, and running up to hundreds of commands on up to a dozen machines, all in a particular order determined by exactly what is being deployed.  Some changes require service restarts, and some service restarts must cascade to other services.

Needless to say, this kind of thing involves a lot of error-prone work, if it’s done by hand.  And getting it back out again — should the need arise — often proves even more difficult, as the undeployment process is generally less well tested (being complex, and not on the “critical path”; at least, not until it becomes critical).

So, to address this situation I built a deployment system that leverages several “little” languages to address different aspects of the problem, and brings them together to provide completely automated deployments and undeployments.  And, unlike the manual undeployments of the past, the automated ones are reliable.

I had three goals in mind when designing the control languages for this system:

  1. minimizing the work necessary to build a deployment script (the most common activity)
  2. ensuring the effects of every deployment script could be completely reversed without requiring planning or additional input from the script writer (so it couldn’t be screwed up)
  3. isolating the deployment script writer from the details of the system topology (which varied from environment to environment, and could be changed at any time)

These design goals are important for more than just usability reasons: if the scripting language requires a lot of thinking, or a lot of work to use, errors will occur more often.  And, in this instance, errors directly impact the reliability of production systems, and overall uptime.

So, the finished system provides one little language for writing deployment scripts, a second for describing the system topology and component relationships, and a third for controlling the deployment system itself.

The deployment scripting language provides primitives that do atomic units of deployment (things like: create a policy, delete a policy, update a filesystem directory, modify a particular configuration file in a particular way, etc.).  You write the deployment script as if all of the underlying systems exist on one machine, as if nothing ever goes wrong, and as if nothing ever needs to be undeployed.  You simply put the directives in the right order for their inter-dependencies, and provide them with whatever data they need.

When the time comes, the system reads in the deployment script, combines it with the topology and relationship information from the configuration file (the second language) and does all the work of getting files to the right place at the right time, taking backups of the data that’s about to change, generating the necessary shell commands, restarting services, and monitoring for errors.  And, should the deployment fail, the undeployment command takes the exact same deployment script, and uses the configuration information and the backups to figure out how to reverse the order in a way that respects dependencies, and undoes everything that has been done.

As a result of this language-based approach, a reliable deployment script can be built in a couple of minutes, and freely deployed and undeployed in any configured environment in the minimum time.  And by separating the concerns of system topology and primitive operation from the specific deployment script, a large pool of risk is removed from individual updates.  Once the configuration of a system is correct, any correct deployment script will run correctly on that environment.  And once a new primitive is known to work properly, it will work properly in any deployment script that uses it.

Could these benefits have been achieved without developing new programming languages?  No.  In the end, the solution described here is a language-based solution.  The details could have changed — the deployment script and system configuration could have been written in XML instead of a custom language — but, from a conceptual standpoint, you’d still have had the same two languages.  The only difference would have been in the amount of noise the programmer would have had to deal with when reading and writing those languages.

The truth about programming languages

The truth is that the points I started out with are both myths:  designing a programming language does not require an uber-nerd; and learning a new programming language doesn’t require one, either.

The truth is, completely normal, approachable, friendly programmers make new programming languages every day.  When somebody designs a new XML structure to communicate something from one application to another, that’s a programming language.  When somebody creates a user interface that allows a user to control an application, that’s a programming language.

And the value of these high-level and domain specific programming languages is that they let you focus on a specific aspect of a problem, in terms that are natural to that domain.  Describe to me in English how you would deploy this set of updates into the system.  Don’t worry about how many instances of the service exist in this particular environment, or what machines they are on; don’t worry that any time you update a policy, you need to take a copy of the old one so you can undeploy later; don’t worry that in the test environment, restarts can take services completely down, but in production, service must be maintained.  Give me the high level details, formalize it a tiny bit, and here’s your deployment script.  Assume the system can figure out the rest.  Please!

That’s the power of a language.

The cost of generality

Programming languages like Java and C — and all of the “commodity” general-purpose programming languages, in fact — are low-level languages.  They are generally useful specifically because they demand you provide so much detail: by shifting the burden onto the programmer, they can be used to solve just about any problem.  But by assuming so very little, such languages require programmers to write tons of (often repetitive) code, just to get anything done.

Low-level languages do, of course, generally offer a way to package runs of logic into re-usable chunks (Java’s class libraries, for instance).  These facilities allow you to leverage the work of other programmers, and can certainly be very powerful.  But their “shape” is ultimately constrained by the language they were designed for.  It takes new languages to allow new shapes, and to make them feel natural enough to use.  Ruby’s closures define a new shape, and they are ubiquitous in Ruby code.  The inversion of control completely transforms how you think about and write Ruby code.  Java’s anonymous classes, OTOH, try to offer the same services within the existing shape of the language, and inflict such pain in doing so, that they get only limited use.

New languages allow new ways to think, and that is often exactly what is needed to make leaps in productivity.

Conclusion

I could go on.  But I won’t.  ;-)

Here’s my point of view: computers are good at doing boring, predictable, repetitive stuff, and we aren’t.  And for anything but the simplest of problems, languages are a necessary tool for shifting that burden from people to computers.

It all comes back to Norman’s Law of Conservation of Complexity: give people better tools, and they’ll do more with them.

It’s time we had better tools.

Related Links

in General:
in Discussions:
on site:

Discussion: 1 Comment

Jump to comment form | comments rss | trackback uri
  1. The Disco Blog » Blog Archive » The weekly bag– Feb 1 2008-02-02 10:08

    [ . . . ] Why bother creating new programming languages?- some interesting thoughts on DSLs. [ . . . ]


Leave a comment

Markdown: The kinds of formatting markup you'd use in an email will probably work here. For more details on what you can do, check out the Markdown docs.

What of these number is prime? 10, 11, 18, 22 (required)


Site copyright 2007-2008 Chris Poirier.       Powered by Wordpress.       Entries RSS Comments RSS Validate Log in