Monday, June 29, 2009

Content Editable

I've been playing with the HTML contentEditable mode in Firefox.

One word awesome.

I quickly managed to put together the basis of a seamless editor. This is described in a seven part article.

The source code is available from http://github.com/joearms/contentEditableDemo/tree/master

Sunday, February 15, 2009

JSON protocols (part 1)

For a long time I have been interested in describing protocols. In 2002 I published a contract system called UBF for defining protocols. This scheme was never widely adopted - perhaps it was just to strange...

I have revised UBF and recast it in a form which I call JSON Protocols - since JSON is widely implemented, this method of described protocols might be more acceptable.

What's the problem?

Client and server interaction should be regulated by some kind of contract that is independent of both the client and server. If the client-server interaction fails, then it should be evident by examining the contract which of the parts in the system has failed. Is the problem in the client or the server?


To simplify our problem we will assume that the client and server interact by exchanging JSON messages and we will add a form of contact that will allow us to check that the sequence of messages is correct.

The File Server Contract

We'll start with a simple example and build a formal description of a file server. I'll use the familiar notion of a finite state machine to describe the operations of the server.

The behaviour of the server is completely specified by a set of
4-tuples of the form:

State x RequestMessage -> ResponseMessage x StateOut

We'll start our specification of the file server somewhere in the middle of a session. We'll assume that users must be authenticated, but we'll show how they are authenticated later.

Let's assume the server is in the state ready - meaning it is ready to accept a request. We can start by defining two state transitions:

ready x getFile -> file x ready;
ready x getFile -> noFile x stop;

This means that if our machine is in the state ready and receives a getFile message it will respond by either sending a file message and transitioning to the ready state or it will respond with a noFile message and transition to the stop state.

Here getFile and file are messages and ready and stop are states.

Attached to the message is some data structure that accompanies the messages.
We can defined these data structures as follows:

data[getFile] = {fileName:string};
data[file] = {fileName:string, fileData:string};

Having defined the data we turn to the wire protocol - what data is actually sent between the client and the server? To answer this we will give a JSON example.

Suppose we want to fetch a file called "index.txt", assume also that the content of the file is "abc" then our contract says that the following JSON terms must be exchanged:

Request =
{msg:"getFile", data:{fileName:"index.txt"}}


Response =
{msg:"file",
data:{fileName:"index.txt", fileData:"abc"},
state:"ready"}
Note that exactly this interchange must take place. If either of the messages is incorrectly typed the contract checker can detect the error and determine whether the client or server has violated the contract.
There is a simple relation between the format of the message that is actually exchanged on the wire and the, algebraic specification of the messages.

[note - I have taken liberty with JSON notation here and omitted the quote marks preceding the tags in the object name, strictly I should have written {"msg":"file" etc., but I have written msg:"file"]

What happens if a file doesn't exist? We had a rule for this:

ready x getFile -> eNoFile x stop;

The reply message eNoFile has no associated data, so no data description is necessary.

As an example, suppose we request the file "badfile" which does not exist. This is what we would see "on the wire".

Request = {msg:"getFile", data:{fileName:"badfile"}}
Response = {msg:"eNoFile", state:"stop"}

Observe that the eNoFile message has no associated data.

Why do we send the state back in the response message?

This is to avoid the situation where the server performs a silent state change that cannot be observed by the client. Suppose we have two rules:

a x s1 -> c x s2
a x s1 -> c x s3

When we send an a message we always receive a c message, but we cannot tell if the server changed to state s2 or s3. To make things clearer we always include the new state in the reply.

Now that we've seen what happens in the middle of a session, we can include details of the login and authentication phase.

login x start -> challenge x wait;
response x wait -> ok x ready;
response x wait -> badpassword x stop;

data[login] = {name:string};
data[challenge] = {salt:string};
data[response] = {md5:string};

Once we are ready we might want to list files:

ready x listFiles -> files x ready;
ready x logout -> stop;

data[files] = [{filename:string}];

This completely (and formally specifies the behaviour of a file sever)

Adding time

We can easily add time to our specification:

read x getFile -> file x read within 2 seconds;

This means that we must respond within 2 seconds.

What else?

We need some meta-information, the version number and name of the protocol, and an introspection mechanism.

Notation

I've been a bit sloppy with notation here and used a notation that I hope is 'self-evident'. The state machine syntax is trivial:

StateIn x MessageIn -> MessageOut x StateOut;

The data notation is less obvious:

data[XXX] = {tag1: type1, tag2: type2 ...}

denotes a JSON message of the form:

{"msg":"XXX", data:{"tag1":Data1, "tag2": Data2, ...}}

Where Data1 is of type type1 and Data2 is of type type2. Observe I have only
used the type "string" in my examples, but this is easily extended to JSON primitive types, enumerations and sequences of types.

I also used the notation [X] (in the definiton data[files] =
[{filename:string}]. [X] means an array (or sequence) of type X.

Contract Checking

Now what we have our state machines and message we can easily write a contract checker.

Given the state of the finite machine and then next message we can easily check if the client and server are correctly responding to protocol messages as required by the specification. Each message has a data type specification can easily be checked.

Comments on this are welcomed.

In Part 2 I will post Erlang bindings for the protocol specification and code for a contract checker.

Other implementers might like to implement bindings and contract checkers for their favorite languages. having done this we could start writing multi-language applications based on formal and checkable contracts.




Wednesday, January 28, 2009

Micro Lightweight Unit Testing

I'm often asked the question "what unit testing framework do you use?" The answer is usually I don't, but I do use a form of micro testing that is built into Erlang.

In Erlang, every assignment of the form Lhs = Rhs where the Lhs is a ground-term and Rhs is a non-ground term can be viewed as an assertion, or unit test, since it can possibly fail.

So when we write:

    {ok, S} = file:open("filename", [read])
We're writing assertion to the effect that opening "filename" for read will succeed.

So how do I write unit tests?

To answer this question, I'll walk you through how I'd write the code for an efficient Fibonacci function.

I'm actually following the three rules of TDD so Uncle Bob and the agile crowd should approve of this method ...

I'll show you the order in which I implement the code. Often we show the final version of some code, but not the order in which the code was written. This time I'm going to show the precise order in which I wrote the code, and show how and when I tested and ran the code.

I'll start by defining a module, with a unit test.

Step 1) First write a micro-unit test:

-module(fib).
-compile(export_all)

test() ->
0 = fib(1),
1 = fib(2),
2 = fib(3),
6765 = fib(20),
ok.
Where did I get these values from? - I checked on the wikipedia - I was unsure if the Fibonacci series starts 0,1,1,2,.. or 1,1,2,3.

This code won't compile correctly, since the fib function is missing.

Step 2) Write the fib function:

fib(0) -> 0;
fib(1) -> 1;
fib(N) -> fib(N-1) + fib(N-2).
This version of the Fibonacci function is recursive and very inefficient. But I'll implement it first, because I have high confidence that the code is correct, and because I'll use it later to test the efficient version of the code.


Step 3) I compile and test the module


The module now looks like this:

-module(fib).
-compile(export_all).

test() ->
0 = fib(0),
1 = fib(1),
1 = fib(2),
6765 = fib(20),
ok.

fib(0) -> 0;
fib(1) -> 1;
fib(N) -> fib(N-1) + fib(N-2).

I compile and test it:

1> c(fib).
{ok,fib}
2> fib:test().
ok
So now I have something that works.

step 4) Add unit tests for fastfib

test/0 looks like this:

test() ->
0 = fib(0),
1 = fib(1),
1 = fib(2),
6765 = fib(20),
0 = fastfib(0),
1 = fastfib(1),
1 = fastfib(2),
2 = fastfib(3),
K = fib(25),
K = fastfib(25),
ok.
Here I check that fastfib returns the same value as fib with the lines
  K = fib(25),
K = fastfib(25).
Step 5) Write the fastfib function.

The entire module looks like this:

-module(fib).
-compile(export_all).

test() ->
0 = fib(0),
1 = fib(1),
1 = fib(2),
6765 = fib(20),
0 = fastfib(0),
1 = fastfib(1),
1 = fastfib(2),
K = fib(25),
K = fastfib(25),
ok.

fib(0) -> 0;
fib(1) -> 1;
fib(N) -> fib(N-1) + fib(N-2).

fastfib(0) -> 0;
fastfib(N) -> fastfib(N, 1, 0).

fastfib(1, A, _) -> A;
fastfib(N, A, B) -> fastfib(N-1, A+B, A).

Step 6) Compile and test.

I compile and test the module, as in Step 3)

Step 7) I change the exports of the module and change

-compile(export_all). to -export([test/0, fib/1]).

I rename fastfib to fib and fib to slowfib.

Done

Step 8) Quickcheck

I don't have John Hughes Quickcheck on my machine, but If I did, I could write a test case that said "forall integer N >= 0, fib(N) and fastfib(N) compute the same value" - quickcheck would then generate zillions of tests that test this property.

Finally

I have used the convention of exporting a function test/0 in several modules. I also have a simple program which checks a large number of modules. It checks if the module exports the function test/0, and if so evaluates (catch Mod:test()) if this returns ok, then the module has passed its test. If not I print an appropriate error.

Note how when I wrote the module I wrote a test case, then implemented the function it was testing, then wrote another test case, then more code etc. This way I'm interleaving writing test cases with implementing the test case. This way I write the code in a number of small steps, and if something goes wrong I can just reverse the last step.

When I'm finished with the code all the test cases are ready - so I don't write the code then the test cases, I interleave the two.



This is my micro-lightweight unit test framework

This is what I use for my hobby-hacks. For paid work I use the OTP test server.

Thursday, July 10, 2008

UBF and VM opcocde design

UBF is a data encoding that allows structured terms (rather like XML) to be sent over the network. It also includes a protocol checking scheme to automatically determine if sequences of typed messages follow a particular protocol.

This blog entry was stimulated by this posting on the erlang mailing list.

One of the basic ideas of UBF of was to send programs not data structures. The programs were for a byte-coded stack machine. So instead of sending data structures between machines we send tiny programs which when evaluated create data structures.

Each byte is an opcode for a VM. The net-effect of executing a UBF program is to leave a value on the stack.

The trick in UBF was not to start allocating the opcodes in the VM from zero - but to allocate them with loving care.

A common mistake in making byte coded VMs is to allocate the byte codes from zero. If you think about it the byte code for a PLUS operation can only be 43 (why? - easy - this is the ASCII code for "+").
In fact the byte code for PLUS should be 43 in all byte coded VMs - there should be laws that make it a criminal offense for the opcode to be anything other than 43 - thus it is written - there will of course, be a problem with the opcode for TIMES - if you are familiar with your ASCII codes then you should understand why.
I have no idea where I learned this trick - it seems to be in the folk-law of VM design - choose the op codes so that the binary code is readable (if you can). Unfortunately I didn't know this when I designed the first Erlang VM but now I know better.

So this way the byte code for start-of-tuple is 123, end-of-tuple is 125 and element-separator is 44 - unsurprisingly "{", "}" and ",". Thus "{...,...,... }" is a program and NOT a bit of syntax.

With this choice of encoding programs become human readable strings which require zero parsing - you just execute the byte codes.

Contrast XML where the data structures are human readable but require parsing - this is why constructing a term from UBF is far faster than using XML and why the size is far smaller and is human readable.

Why didn't UBF spread?

If you have something that is almost ok - then lots of people can have great fun arguing over it and polishing it at the edges.

Things which deeply flawed and industry standards things like XML can lead to endless discussions - great fun - lots of hot air. Project management can happily preside over "the illusion of work" - wages get paid - everybody is happy. Projects get delayed - project management becomes very happy.

The optimal point is where projects get as delayed as much as possible, budget overruns are as large as possible and the project manger is almost, but not quite, sacked. This idea is explored in Putt's law and the successful Technocrat - recommended to me Gilad Bracha - and a great read.

Some things like (scheme, pascal, ..) are pretty nearly perfect - thus there is little to do. In fact pascal was perfect (anybody got a UCSD pascal emulator and image? - now that was really nice)

Fixing stuff that's broke

Programmers like to have something to do - so our lot in life is to fix flawed things. Most of my time is spent in fixing things that should work, but are in fact, broken.

ASN.1 (which got me started on this blog entry) is elegant - but how it has been used is not.

I am currently examining LDAP - LDAP schemas have to be seen to be believed (and yes LDAP schemas are written in ASN.1)

In LDAP schema speak a boolean is a 1.3.6.1.4.1.1466.115.121.1.7 (this is an OID, for those in the know) and 1.3.6.1.4.1.1466.115.121.1.40 is a string ...

I'm glad the LDAP schema designers didn't turn their hand at
programming language design. If they had, then

     boolean x,y,z;

Might have been

   type 1.3.6.1.4.1.1466.115.121.1.7 x,y,z;


The only thing that is good about LDAP schemas is that they are not XML schemas.
...

Saturday, June 28, 2008

Itching my programming nerve


Photo: oreillygmt

I've just got back from the first ever commercial Erlang conference. Some 40 talks in two days all related in some way or other to Erlang. It was a chance to meet old friends, make new friends and connect people together in the hope that new synergy effects would arise.

The most exciting thing was the emergence of what I think might be the first killer applications written in Erlang. I might be wrong, but my gut feeling is that what Alexander Reinefeld showed us will be the first killer application in Erlang.

Only a few language nerds are interested in programming languages in their own right. Most people are more interested in what you can do with a programming language than with the underlying language. Thus is was with Ruby. Ruby on rails was the application that drew developers into Ruby. It made them want to learn Ruby so that they could easily build web applications.

Alexander Reinefeld told us about an Erlang implementation of the Wikipedia that not only has a stunningly beautiful architecture, but which outperforms the existing Wikipedia.

I'll talk about the Wikipedia implementation later in this posting, but it was not the only notable talk there were many other great talks.

Claes Vikström (Klacke) gave a great lecture which was a mixture of battle stories, history and what he was doing today.

Klacke is the master of the one-line throw-away remark - "and then I implemented a DBMS ... and a web server" this was the technical stuff, and on the business side
...then we started a company and made a whole lot of money...
At the end of his lecture Klacke said something like:
... so how come we have this great technology and people are just doing boring things and not writing stock exchanges ... there aren't any killer applications ...
Just as an aside Joel Reymont did stir up the Erlang mailing list with his announcement that he wanted to write an open source stock exchange as a publicity stunt, but this ran out in the sand. Perhaps later we can resurrect this idea, it would be a bit of fun.

The Wiki

Now for the fun stuff. Alexander Reinefeld video answered Klackes call for action and for a non-boring application as he described how he had implemented the Wikipedia on a new p2p system now called Scalaris.

Here's my version of his story:
  1. They make a peer to peer system based on the chord algorithm
  2. They added a replication later using the paxos algorithm
  3. They added a transaction layer
  4. The injected the wikipedia
  5. It went faster that the existing wikipedia
Applied to Wikipedia, Scalaris serves 2,500 transactions per second with just 16 CPUs, which is better than the public Wikipedia.
Alexander is a tall bespectacled academic who usually only turns up at academic conferences. He was very worried at the start of the talk when he was introduced as "Professor Reinefeld" I think he thought it would frighten people.

The system they described won an IEEE prize for scalable systems and was also presented at the Google conference on scalability. I asked Alexander why publicity about what they had done was so hard to find.
"I'm academic, we usually publish papers," he said.
He'd also said he'd started a company that "wasn't doing very well" (tip to VCs - check this one out and give the guy some help).

So my take on this is that this is one of the sexiest applications I've seen in many a year. I've been waiting for this to happen for a long while. The work is backed by quadzillion Ph.D's and is really good believe me.

On second thoughts don't believe me but check out the video lecture. You can also download the code.

CouchDb

When Alexander had blown my mind Jan Lehnardt popped up for the next section and blew it even further by presenting CouchDB - am I going mad are we seeing the emergence of two killer applications? This cannot be.

Jan Lehnardt has a presentation technique that is a joy to watch - it reminded me of why I love programming.

Jan communicates on two level simultaneously. His body language oozes enthusiasm - he waves his arms so fast and hops up and down so we think that he was either a helicopter in a previous life or that he wants to get a job as a windmill. Words tumble out of his mouth so fast that his tongue often trips over the end of his sentences and falls flat into the middle of his next sentence conveniently missing out the middle of the last sentence.

Check out the video and you'll see that I mean - You'll see Jan almost taking off as he impersonates a helicopter. For more information see the slides of the talk. You can download CouchDb from the Apache incubator site.

This style of lecturing is amazing. Jan communicates simultaneously on two entirely different level. His enthusiasm is received by the amygdala in the limbic system and his slides go via your eyes to the pre-frontal cortex for analytic processing.

So what was this great stuff the Jan was so enthusiastic about?

CouchDb - is an Erlang application that turns a Key-Value JSON store into a system with a RESTFUL interface that stores arbitrary data structures data in a way that fits nicely in with the Erlang system. When I got home I downloaded CouchDb and took a look, and there is was, nicely packaged with a Mochiweb server, the same server which is used by facebook for their comet web chat system.

Like most good ideas the CouchDB is deceptively simple. Once you've seen it you think - yes that's how it should be, how simple, how beautiful. But designing simple things is not easy it requires many false starts and takes a long time to get right. Hats off to Damien Katz for the initial design and to his collaborators, and thanks Jan for telling us about it.

Synergy...

Now it just so happens that both Jan Lehnardt and Alexander Reinefeld both live in Berlin, both are working with key-value stores (the details, vary) which are programmed in Erlang, both of them are working on what might be the next killer Erlang application and ... they have never met.

I introduced them and then stood back. Wow.

After a moment of shyness they changed to speaking German and Jan started bouncing up and down and speaking even faster than in English - this was getting dangerous - this time Jan did turn into a helicopter and narrowly missed causing an accident as he flew out of the room.

Going home

Klacke and I took the same flight back to Sweden the next day.

"That itched my programming nerve," said Klacke

"Precisely ..."








Tuesday, June 24, 2008

Invasion of Privacy

On 18 June the Swedish Parliament passed a law giving sweeping new powers to the FRA (Swedish Defense Radio Establishment) allowing them to wiretap people in Sweden through phone conversations, email, text messages and more.

All people in Sweden using electronic communication can have their communication monitored despite the fact that they are not suspected of committing any crime.

In my view this is in direct contravention of article 12 of the UN declaration on human rights to which Sweden is a signatory.
Article 12.

No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.
This shameful law applies to me - if you send me email (I get a lot of emails from readers of this blog and of my books) then you should be aware of the fact that your mails are not private but will be read by FRA employees - so your rights will be violated.

The Swedish government is sensitive to foreign opinion so if you wish to protest about the fact that your privacy would be violated if you mail me then I suggest you send mail to one of the leaders of the four political parties (Maud Olofsson, Fredrik Reinfeldt, Jan Björklund, Goran Hagglund) that voted this law through. They can be contacted through The Prime Minister and Ministers.

I urge you to make your opinions known.

My daughter asked me:

Does this mean they will read my MSN chats?
I said "yes".
She said "that sucks"


I get pretty pissed off when they confiscate my water bottle when I have to travel by air, but I guess I can live with this but reading my email, and the email of my wife and kids is totally unacceptable. I would never read my wife's or kids' email - such behavior is totally unacceptable in a civilized society.

All this is done in the name of saving us from terrorism - after 9/11 western politicians promised that they would not allow acts of terrorism to change our way of life. Well, spying on 9 million people who have not broken the law is a funny way of "not changing our way of life."

This really pisses me off.

/Joe Armstrong

[In other areas Sweden is a pretty decent place to live, with decent human values, but this new legislation is totally unacceptable]

[Note also - all mail to the erlang mailing list will be monitored by FRA - if this upsets you then please mail the people responsible (see above). If this worries enough people we can move the list to a country that respects human rights]


Monday, May 26, 2008

The Road we didn't go down

I've been following an interesting discussion on the Erlang mailing list where Steve Vinoski and friends have been telling us what's wrong with RPC.

The discussion started on 22 May, the general topic of conversation was the announcement that facebook had deployed a chat server written in Erlang.

In one of the posts Steve said:
"What all those years of CORBA taught me, BTW, is that RPC, for a
number of reasons, is generally A Really Bad Idea. Call it a hard-won lesson. The Erlang flavor of RPC is great because the entire Erlang system has distribution fundamentally designed and built into it, but for normal languages, RPC creates more problems than it solves."
more...

-- Steve Vinoski
Future posts asked Steve to elaborate on this.

Steve posted a long and brilliant summary of the problems with RPC to the Erlang mailing list:
"But if you don't have the time or energy, the fundamental problem is that RPC tries to make a distributed invocation look like a local one.
This can't work because the failure modes in distributed systems are
quite different from those in local systems, ..."
more ...

-- Steve Vinoski
Precisely - yes yes yes. As I read this my brain shouted YES YES YES - thank you Steve. Steve wrote more about this in RPC under fire ...

This the road we didn't go down

Steve went down this road and saw what was there and saw that it stunk, but he came back alive and could tell us what he had seen.

The fundamental problem with taking a remote operation and wrapping it up so that it looks like a local operation is that the failure modes of local and remote operations are completely different.

If that's not bad enough, the performance aspects are also completely different. A local operation that takes a few microseconds, when performed through an RPC, can suddenly take milliseconds.

If programmers cannot tell the difference between local and remote calls then it will be impossible to write efficient code. Badly placed RPCs in the middle of some mess of software can (and does) destroy performance.
I have personally witnessed the failure of several large projects precisely because the distinction between local and remote procedure calls was unclear.
Note that this factor becomes even worse in large projects with dozens of programmers involved. If the team is small there is a chance that the participants know which calls are local and which calls are remote.

How do we do things in the Erlang world?

All Erlang programs are composed from sets of parallel processes, these processes can create other processes and send and receive messages. Doing so is easy and is a lightweight operation.

Processes can be linked together for the purposes of error handling. If A is linked to B and A fails then B will be sent an error signal if A fails and vice versa. The link mechanism is completely orthogonal to the message send/receive mechanism.

When we are programming distributed systems, various forms of RPC are often extremely useful as programming abstractions, but the exact form of the RPC varies from problem to problem and varies with architecture.

Freezing the exact form of an RPC into a rigid framework and disregarding the error cases is a recipe for disaster.

With send, receive and links the Erlang programmer can easily "roll they own RPC" with custom error handling.

There is no "standard RPC stub generator" in Erlang nor would it be wise for there to be such a generator.

In a lot of applications the simplest possible form of RPC suffices, we can define this as follows:
rcp(Pid, Request) ->
Pid ! {self(), Request},
receive
{Pid, Response} ->
Response
end.
Nothing complicated, this code just sends a message waits for the reply.

There are many variations on this theme. The simplest RPC waits forever, so if a reply never comes the client hangs. We can fix this by adding a timeout:
rcp(Pid, Request, Time) ->
Pid ! {self(), Request},
receive
{Pid, Response} ->
{ok, Response}
after Time ->
{error, timeout}
end.
Suppose we wish an exception to be raised in the client if the remote machine dies in the middle of a RPC, then we define:
rcp(Pid, Request) ->
link(Pid),
Pid ! {self(), Request},
receive
Response ->
Response
end.
The addition of the link will ensure that the client terminates if anything goes wrong in the RPC.

Suppose we want to "parallelize" two rpcs:
rpc(Pid1, Pid2, Request) ->
Pid1 ! Pid2 ! {self(), Request},
receive
{Pid1, Response1} ->
receive
{Pid2, Response2} ->
{Response1, Response2}
end
end.
(don't worry this does work, the order of the replies is irrelevant)

The point I am trying to make through a number of small examples is that the level of granularity in the RPC AND the error characteristics is under the precise control of the programmer.

If it turns out that these RPC abstractions do not do exactly what we want then we can easily code our solution with raw processes and messages.

So, for example, going from a message sequence diagram to some Erlang code is a trivial programming exercise.

"Standard" RPC also make the following crazy assumption - "that the reply should go back to the client".

Interactions of the form tell X to do Y then send the result to Z are impossible to express in a standard RPC framework (like SOAP) but are simple in Erlang:
rpc(tell,X,toDo,Y,replyTo,Z) ->
X ! {Z, Y}.
(This assumes the convention I'd used earlier of always sending two-tuples as messages with the Id of the process that is expecting a reply as the first element of the tuple (using self(), in the earlier examples we forced the reply to come back to the originator)).

Let's suppose we want to add versioning to our protocols, this is easy:

rpc(Pid, Request, Vsn) ->
Pid ! {self(), vsn, Vsn, Request},
receive
...
end.

The point is here is to show that things like versioning, error handling parallelisation etc are easily added if we expose the interface between messaging and function calls and allow the user to custom build their own forms of interactions with remote code.

Of course, certain common patterns of interaction between complements will emerge - theses are what are baked into the OTP libraries.

What is OTP?

OTP is a set of battle tested ways of doing things like RPC in fairly common cases. The OTP methods do not cover all error cases but they do cover the common cases. Often we have to step outside the OTP framework and design our own specialised error and recovery strategies but doing so is easy, since OTP itself is a message driven framework and all we have to do is strip away the stub functions that send and receive the message and replace these with our own custom routines.

OTP should re-branded as "OTP on rails" it's really just a framework for building fault tolerant systems.

Does this method of building software without excessive reliance upon one particular flavour of RPC work?

I'd say the answer is Yes and Yes with a vengeance.

This is the way we have built real-time server software at Ericsson for decades. We have used PLEX, EriPascal, Erlang and C++ with Rose-RT for years. The common factor of all of these is the non-reliance on RPC. We specify protocols then we terminate them with a number of different technologies.

These protocols are way more complex than can be specified using RPCs but by exposing the protocols and the failure modes we can make systems that are highly reliable.

I'd always thought that if we did things with RPCs then we'd run into trouble.

Steve went there and did that and found the problems - we went down a different road.

What's really interesting is that Steve's world and our world are starting to collide - we have a lot to learn from each other.