Good Ideas in Programming

Mar 17, 2018   #FP 

Collecting a bag of what I think are good ideas in the history of programming, which lead to better thinking about system design, implementation and evolution.

Note: This post will continously be edited to include ideas as they come to my mind as worthy of being included here. Also, the ideas are in no particular order and this is just a brain dump.

Edits

Procedures and protocols

The fundamental unit of program organization is the “procedure” which captures “how to” information about doing something. It also serves as an abstraction mechanism where procedures can call on other procedures to accomplish parts of their tasks.

Protocols are collections of messages that can be passed between processes with an agreement about what those messages stand for and what they will result in. These are also known as “interfaces”. In the simplest synchronous languages, they tend to be implemented as structures containing pointers to procedures/functions. Languages may make protocols explicit or by convention.

The reason I’m clubbing “procedures and protocols” in one place is that these two alone are adequate to build fairly large systems.

Evolving interfaces to objects

Microsoft’s COM, with all its craziness, had one good idea worth stealing - which is how to version interfaces and how to get “objects” to support multiple interfaces in an adaptable manner.

For this, each interface was identified by a UUID. All interfaces derived from an IUnknown which provided a key method called QueryInterface.

In C++ parlance, this would read -

class IUnknown {
    public:
        virtual void *QueryInterface(UUID iid) = 0;
}

Every object would eventually implement IUnknown, meaning you could actually ask an object about which interfaces it implemented at runtime. This made some cool things possible -

  1. You could write version adaptive code where you assign different IDs to different protocol versions.
  2. The object is free to return whatever it wants and is not constrained to return a pointer to itself. This means it could return a proxy which adds constraints on top of a protocol.

Though this looks like a dynamic_cast, it is more flexible.

Auto-release pool

Introduced in Objective-C, the auto-release pool is a thread-local entity that traps all allocations into its scope, so that all allocated objects after the creation of a pool up to the pool’s release could be considered to be owned by the pool.

This permits a programming style in low level languages that feels very close to a GC’d language without memory problems … as long as reference cycles were avoided.

In particular, auto-release pools are useful within event loops to release memory at every turn of the loop. This is a great compromise between requiring code to manage memory explicitly everywhere using smart pointers, and admitting the uncertainties of running a GC.

Message passing as central to OOP

Object oriented programming got hijacked soon after its introduction and got shaped into an abominable collection of tools and concepts for designing programs using “patterns”. This lost the key insight of early OOP systems such as Smalltalk that reified messages sent between objects as first class entities in the system. Even Objective-C provided that.

Designing OOP systems which don’t focus on the messages leads to systems that trigger and accelerate hair loss.

Message passing concurrency

This is a very easy to understand concurrency model whose first major implementation was in Erlang. Today, many other languages including Golang and Haskell use this model.

The key idea is to have isolated processes that can only communicate by passing messages to each others by copy.

Though there are some subtle differences about when process switching happens, etc., for the most part this leads to maintainable and performant code for large distributed systems as well as complex interactive user interfaces.

Though Erlang made it available at language level, the value of this approach was recognized for GUI programming very early on, where Windows 3.1 and the early MacOS versions all provided message handling functions associated with windows - ex: WndProc - and facilities to post messages between windows. This carried over fairly well when pre-emptive multi tasking was introduced into the GUIs of these OSes.

The idea of threads/processes having their own memory was also partially adopted in some systems such as Symbian OS, where each thread allocated its own heap.

Tracking memory ownership using a type system

The only instance I know of where the type system is used to keep a check on whether a piece of code can legitimately access (read/modify) some portion of memory, across threads, is Rust.

While it is (I think necessarily) restrictive way to provide safety guarantees, the balance struck by Rust is excellent IMHO.

Programming with the AST

The LisP family - Common Lisp and Scheme in particular - use a syntax that is pretty much programming directly with the AST of the language. In other words, there is almost no “syntax” for them. The strength of this idea is that these programs could encompass procedures that consume and produce these ASTs, leading to a “program writing a program” kind of approach to abstraction, known as macros.

Functional notation in Haskell

The type system and function notation in Haskell (other ML languages too) are significant advances in “tools for thought”. The f x y notation for applying a function f to two arguments x and y is interpreted as f x yielding a function of y, which is then applied to y. Furthermore, it means f by itself is just a value, a constant, and there is no way in this notation to distinguish between “function applied to no arguments” and a “constant value”.

I’ve written more about this in Functional thinking for fun and profit.

“Descriptors” of B5000

The Burroughly B5000, afaik, introduced the idea of separate memories holding program and data, and data pointers always being accompanied by the size - called “descriptors”. This terminology also landed in Symbian OS, which needed high degree of reliability in applicaiton engineering. Clubbing data size value with the data pointer made validity checks easy and out of range memory accesses rarer due to the ease of testing for these.

Current systems permit a region of data to be interpreted and run as code. The B5000 didn’t permit that. It had the reputation of being a machine that never crashed. These concepts probably contributed to that reputation.

Disorderly programming

By that I’m referring to the collection of ideas in Daedalus, Bloom and Eve. The key idea here is that you give up control over orderly happenings and only talk about what relationships must hold over collections (“lattices”?) of records.

The word “disorderly” is taken from Peter Alvaro’s “Disorderly labs”. It is used in multiple senses, but the sense in which I’m referring to it here is w.r.t. distributed systems.