A hypothetical conversation with an 'imperative programmer'

(This was originally written on May 25, 2015. Publishing it ‘cos a former colleague actually recalled it after all this time when even I had forgotten about it.)

In which an imperative programmer bugs a functional programmer to explain the whys and wherefores and hows of using a pure functional programming language in his normal work day.

Episode 1

In which the imperative programmer dips toes into a purely functional language.

IP: How do I assign values to variables?
- FP: Hmmm … no, the concept of “variable” doesn’t exist. You don’t think in terms of “modifiable memory locations”. i.e. no reassignable memory locations. What might look like “assigning to a variable” in functional languages is just labelling a value so you can refer to it in other parts of the program. Indeed, it is not uncommon to see the order of declaration in a bunch of such “bindings” being irrelevant to the meaning of a program (ex: Haskell).
IP: Can I print the value of this variable at this point?
- FP: Hmmm .. no, that’s is an impure action, besides the order of evaluation of things can be lazy, so you what do you mean by “at this point” anyway? .. and I just told you there are no “variables”.
IP: How do I update this counter?
- FP: Hmmm … were you even listening? I told you no “variables” exist. So you can’t “update” them.
IP: Ok I have this array and I want to increment the 5th item. How do I do that?
- FP: Hmmm .. you can make a copy of the array with the new value as the 5th item.
IP: Wat? Are you insane?
- FP: Hmmm … not really. There are ways to do that efficiently by sharing structure between the new array and the previous one.
- FP: When you asked about updating a variable, though you cannot express such an updation because mutation is not possible, you can still express the concept of “updating a variable” as an action to be performed.
IP: Wat? How does that help? How do I get anything done with this thing?
- FP: Hmmm … you’ve got to really change the way you think about programming.
IP: But … why?
- FP: You stand to get a lot more opportunities to automatically parallelize your code and gain performance.
- FP: Your code is consequently more testable.
- FP: Heck you need to test less if you use a functional language with a rich static type system like Hindley Milner. You also get clear separation of actions with side effects from pure computation.
IP: So you’re telling me that if I give up everything I think I know programming a computer is about, my code gets better?
- FP: Pretty much.
IP: Where’s the proof?
- FP: (Show some one liners that run absolutely slowly in Haskell. Especially the famously wrong one liner “sieve” algorithm. Or maybe quicksort that uses way more memory.)
IP: That’s short alright, but are you kidding me? Where is the “performance”? Look what I can do with just plain C!
- FP: With FP, you can have efficient C code be auto generated for performance when you want it.
IP: But isn’t that a much harder task than just coding it up in C at the start?
- FP: No. If you model your problem domain correctly, you gain a lot of flexibility and abstraction without giving up performance.
IP: Again, where is the proof of that?
- FP: Check out Pan, Fran, graphics drivers for GPUs.
- FP: *(Provide some summary of these.)
IP: Now you’re asking me to write a compiler? Anyway why should I use FP to generate such C code rather than (insert-ruby-python-whatever)?
- FP: Because FP is mathematics, you can reason a lot about the code in a pure FP language than in a language where you can have side effects. Btw in a pure FP language, you can reason about side effects too.
IP: What do you mean by “side effect”? When I say printf("blue moon"); the very effect that I want to have is to have the string “blue moon” printed.
- FP: Ok well, let’s just say “effects”. You can reason about effects too.
IP: Why should I bother with “reasoning about effects”?
- FP: If you throw effects willy nilly in a multi-threaded or multi-process or distributed system, you will have a system that is hard to control, full of race conditions and is failure prone. With FP, you use the same techniques to reason about distributed systems as with “normal” programming. As I said, it is all mathematics .. this property is known as “referential transparency”.
IP: Referential what?
- FP: The term “referential transparency” means that wherever you see a function being “called” in a particular way, you can replace that call with its evaluation result everywhere that same call occurs, without changing the meaning of the program. This is what you used to do with algebra in school.
IP: What about functions that can be called without any arguments?
- FP: In a pure functional language, these are called “constants”. Even the syntax of Haskell enforces this notion, preventing you from even expressing the concept of a “function call with no arguments” as something distinguishable from a “constant”.
- FP: In most languages, function calls are notated as “func(arg1,arg2,..)” or in the lisp dialects, as “(func arg1 arg2 …)”. In Haskell, they are notated as “func arg1 arg2 ..” and usually they aren’t called “function calls”. This notation enforces purity automatically. In the conventional notation, 3, 2, 1 and 0 argument functions are expressed as “func(a1,a2,a3)”, “func(a1,a2)”, “func(a1)” and “func()” respectively while a function as value is notated simply as “func”. In Haskell, the same sequence would go as “func a1 a2 a3”, “func a1 a2”, “func a1” and “func”. Notice that the last one. There is no distinction between “func as value” and “func called with no arguments”. That is, it is syntactically impossible to express the concept of “function called with no arguments” .. because if you do, it will simply be a taken as a value.
- FP: This concept extends to the notation of types in Haskell as well. The type of a function that takes 2 arguments and “returns” a value is expressed as “a -> b -> c”, where “a” and “b” are the argument types and “c” is the return type. Therefore a function that takes one argument would have a type like “a -> b” and so one with no arguments would be “a” - i.e. the type of a “function with no arguments” is indistinguishable from the type of a normal “value”.
- FP: This property of the Haskell syntax shows why it is useful to have a notation that enforces a desired way of thinking. Programming, after all, is notation for communicating “how to” procedures (Abelson).
- FP: There is another language that has this property - Factor (http://factorcode.org/) - which falls in the category of concatenative languages, influenced by Forth. You may not have heard of Forth, but you’d have heard of Postscript, which is very Forth-like. (Yes, Postscript is actually a programming language!)
IP: Ok, to summarize, no variables, no assignments, only constants and functions that must always return the same value for the same arguments, also something about a static type system. Does that cover what FP is about?
- FP: If you just remember that a) your functions must return values, and b) they must return the same value given the same arguments at any time, the rest are simply consequences of this. Well, except the type system, which usually concentrates on the part about “how to make data and constrain operations on data”.
IP: Doesn’t this mean that if I limit myself to that constraint, I can use pretty much any programming language to do FP?
- FP: Yes.
IP: Then why bother making a new language like Haskell for it?
- FP: Because it is useful to have a system enforce these constraints on you so you don’t shoot yourself in the foot.
- FP: Also because it is useful for you to be able to express the constraints that you want your system design to meet and have the language enforce them for you so you don’t shoot yourself in the foot.
IP: I don’t know what I’d do without being able to mutate memory locations.
- FP: Actually you do. You already use git don’t you? Before version control systems, we used to modify files, continuously losing history. We only had access to the latest version of a file. With git, you have access to all versions of your code. When you “commit” a new version, in principle you’re making a copy of your entire project with that one change you made. In practice, git only stores the change you made so that it can reproduce the latest version from history and changes.
- FP: Data structures in pure functional languages share this trait with git and are called “persistent data structures”. The way such history doesn’t end up eating up all RAM over time is when live code stops referring to the history, it gets GCd away.
IP: Where do pure functions feature in this?
- FP: “git commit” is like a pure function. Whenever you apply git commit on a repo in a given state, with a known set of changes in the working copy, you get a new repo that is always in the same state (represented by the hash). This is why you can do “git reset” followed by “git commit” to get back to the same repo as many times as you please.
- FP: Immutable data structures and pure functions go together. So if you hang on to “pure functions all the way through”, you can’t escape FP.
IP: Give me something more than “just pure functions” to hang on to, will you?
- FP: The key is to approach modeling your problem domain in terms of function composition - i.e. map things that compose in your problem domain on to functions that compose with a similar algebra.
IP: I don’t think I understand what you mean.
- FP: If you have a problem domain that is amenable to computation, then it can be computed by pure functions. That much is guaranteed by theory.
- FP: You have entities in your problem domain that combine using certain operations to form other entities. To reap the benefits of FP, you map these entities on to functions and the combination operations to function composition.
IP: That’s too abstract. A concrete example please?
- FP: For example, your traditional database table has one column for ids and one or more columns for associated values. A database lookup operation can be thought of as a function which when given the row id, will produce the values for other columns on that row. A simple database “join” operation is simply the composition of two such lookup functions - i.e. you can compose two tables and pretend you have a third table that is in some sense a merger of the two tables.
IP: You seem to suggest that databases are functional, but I treat them as permanent mutable storage. Isn’t that a contradiction?
- FP: Yes it is. I described how reading databases can be expressed functionally, but writing is a bit more complicated, though not very much unlike the way git works.
- FP: So traditional databases, if you include the ability to write data into them, are indeed not purely functional. In order for them to be functional, you have to treat the database as one gigantic constant value, which you can make copies of with incremental changes.
IP: Really? Is there an example of a database that works like that?
- FP: I already told you about git. Check out datomic. Log structured storage also qualifies as functional. For a simple example, redis’s “aof” file format is one such database whose history you can revisit whenever you want to. Kafka is a more complex example. (With more subtleties, so maybe I shouldn’t mention it here?)
IP: I think I’m starting to get it. What about a web server? How do you “model” a web server in terms of function composition?
- FP: That’s a complex topic. Shall we reserve it for another day?
IP: I don’t think it ought to be a complex topic.
- FP: Well, it is, if you consider the slew of problems that naïve implementations of web servers throw up, especially when it comes to state management, concurrent requests, database locking and so on.
IP: Ok, perhaps later then.

Episode 2

In which the discussion dives deeper with more examples from the real world.

IP: I was trying out modeling database tables using functions and table join as function composition as you suggested. I think there is a fundamental inefficiency there that I don’t see how you can overcome. It looks like such a “joined” table based on function composition will have to make two round trips to the DB server to complete each join query. How do you explain that?
- FP: To restate, did you model table join as something like join(table1, table2)?
IP: No. It was more like function joined(x) { return table2(table1(x)); }.
- FP: Ok. Since that is essentially the composition of plain functions, I can rewrite that as a higher order composition function like I stated just now.
IP: Ok. So what? The performance issue I mentioned still remains, doesn’t it?
- FP: Yes it does. However, the point is that the abstraction that you achieved now will faithfully model the domain of table joins no matter what underlying implementation you choose to have.
IP: It seems like no matter what implementation I provide for the functions, I can’t get around making two trips.
- FP: You have to go one step further than just thinking of table1 and table2 as functions that actually perform the query and produce the results. For example, you can think of table1 and table2 as functions that produce instructions to a DB engine on what query to perform on what table and the “join” function as generating an instruction on how to join the two tables together to produce the final result.
- FP: For example, you can have the join, table1 and table2 functions produce SQL instructions instead of actually performing the queries.
IP: Isn’t that cheating?
- FP: No it is not. Modeling a domain using function composition from the ground up can help you understand your domain better, even if you start from an operational level.
  - See the “beta abstraction” process example worked out during the SICP sessions “http://gitlab.pramati.com/srikumar.subramanian/sicp/blob/master/beta-abstraction.scm"
- FP: You retain the benefits of the abstraction and all reasoning you can achieve with the abstraction. This is a very common high performance architecture that you’ll find everywhere - where one layer of the architecture is computing instructions for the next layer beneath it, and so on until you hit the hardware. The whole point about this being to avoid deep round trips.
IP: Really? Where else does such a thing occur?
- FP: You send instructions (“shaders”) and data (“textures”) to the GPU and the GPU performs the specified graphics computations and sends you back the results (“frame buffers”).
- FP: You send machine code to the processor and the processors today convert these to microcodes before handing them over to their innards for execution.
- FP: The SQL you send to your DB server is also part of this game.
- FP: You send normal opcodes to a JIT compiler and it produces optimized machine code for your processor, which converts it into microcode and so on. Not to mention that is this same processor is doing the JIT as well.
- FP: You think you’re reading data from your disk? You’re actually just sending pipelined instructions to the disk controller … and some are high level enough to have a garbage collector in them.
- FP: Conal Elliot’s “Pan” can (could) be used to generate Photoshop plugins because the Haskell DSL implementation generates vector optimized plain C code from a description that is as abstract as “an image is a function from (x,y) -> (r,g,b,a)”.
- FP: An “Elm” program computes the HTML to render as a virtual DOM, which is then processed by the virtual-dom module to minimize interactions with the DOM, resulting in high speed UIs (even when compared with other imperative implementations) while maintaining high code abstraction.
- FP: The RDDs of Spark achieve their performance gains by storing instructions on how to construct the RDDs, which can then be used to provide resilience as well as optimization of the combined execution DAG.
- FP: I could go on, but does that help?
IP: Crap! My language supports “interfaces”. Can’t I just use that to write alternative implementations too?
- FP: These “interfaces” in languages like Java/C++ do not lead to the kind of layered abstraction I mentioned if the thinking behind using them is not along the lines I’ve been talking about. You’ll mostly end up doing deep round trips with systems built exclusively with that architecture … if you don’t think in terms of function composition, that is.
IP: You said I can do FP in any language, right? How do I do FP in Java 7, which doesn’t support closures or even functions?
- FP: A function is simply something you can call with some arguments and get a result computed for you. You can think of a function or closure, therefore, as an object that implements an interface like this -
  - public interface Function<Arg,Result> { Result call(Arg arg); }
- FP: Your implementing object can have a bunch of “final” member objects that you access in your implementation of “call”. This gives you closures. You can’t get the short code you see with functional language examples, but conceptually they are equivalent to this “interface”. To avoid side effects, deny yourself the semicolon within function bodies, except as the terminator to “return” statements and constant declarations. So if you think using the principles of FP, you can implement in languages like Java too. Just remember that you can only work with constant values and methods that don’t modify objects. All the way through!
- FP: Languages designed for FP, however, give you a number of other goodies such as algebraic data structures, pattern matching and list comprehensions that make life easier when thinking in FP.
- FP: Actually, in an OOP language, if you resolve to work only with constant “objects”, then in effect you’ll be thinking functionally. Clojure’s functional data structures are built on top of the JVM, so this is entirely normal.
- FP: All that said, if your language even affords a little bit of a sneak into destructive state modification, it is possible to throw away most of the benefits we discussed with a single line of code. So sticking to the pure functional approach in an otherwise imperative language requires some disciplined thinking on the part of the programmer.
- FP: If you go down the path of using an imperative language functionally, as a programmer - i.e. a professional who strives to not repeat herself - you would soon find yourself writing programs to help you stay in the pure functional realm … in other words you would invent close-to-pure functional languages or sub-languages, implement them and live within them eventually just so you can reason with them with fewer errors. Why not instead start with one that’s already had lots of thinking behind it done for you already?
IP: You mean I can’t use printf(); even to debug?
- FP: That’s right … but only in principle.
- FP: A totally pure functional language is somewhat useless since much of what happens with a computer is to modify the state of something or the other. However, you can use a functional language to specify these changes, store these specifications in collections, reason about their influence (ex: concurrency) etc. Then you can have your system’s top level take these specifications and actually perform them, so your program becomes useful.
- FP: This applies to debugging too. Some functional languages expose impure functions like “log :: String -> x -> x” where “log string” behaves like the identity function, except for printing out the string and its argument on the console. This exposes a functional language’s evaluation model and works better for “eagerly evaluated” languages. It doesn’t work for “lazily evaluated” languages like Haskell .. in the sense that evaluations can appear multiplexed and not quite in the order you’d expect from looking at the source code of your program. “In Haskell, a “log” like above is “Debug.Trace.trace”.”
- FP: There have been some experimental systems that provide a better way to debug programs. A “declaration debugger” called “Buddha” was developed by a research team. For Elm, you have a “time traveling debugger” using which you can step backwards in time to a point of misbehaviour, change some code, hot-load your fix and continue from that point onwards. The entire history of your program becomes available to you for examination, unlike most systems where you have to rely on logs to reconstruct the error states.
IP: Do I have to think SO MUCH to program? Imperative programming feels a lot easier.
- FP: Functional languages pose a much greater think/code ratio than imperative languages. The consequence is that you get much higher quality code, abstractions and domain models. A page of code in Haskell can do the equivalent of ten pages of C.
- FP: The tragedy of computing education is that we spend literally a decade teaching kids about mathematical reasoning, and then throw it all away when they begin tinkering with computers. Treating code like mathematical statements about the behaviour of the system you want to build is a much more effective way to communicate your ideas to other people, let alone computers.
- FP: We forget that working from operational knowledge to declarative knowledge about a system is a very powerful thing to do and something that can transform society fundamentally.
IP: You haven’t mentioned web servers yet.
- FP: You haven’t asked me about them yet :)

Codaholic

Does the dog have Turing nature?

A hypothetical conversation with an 'imperative programmer'

Episode 1

Episode 2