If you know about my slowly ongoing CoinZdense project, you might know I've recently switched from Rust for the native implementation of the core library back to C++ because of my struggles with language bindings in Rust. In my long term roadmap, there was Monte and Elixir, two really interesting languages that I've worked way too little with, so adding them to the list of programming languages to support is going to be a challenge.

I'm a big fan of Object-Capability languages, of Functional programming languages, and of the Actor Model Of Computation, and because CoinZdense is going to include a least-authority sub-key management API, what better languages to (eventually) write Web3.0 bots and Web 3.0 Layer-2 nodes in than an ocap language or a functional actor model language.

For years I've been fantasising about what my ideal language would look like, with bits of E, Haskel, Erlang and C++, and one of the main features I pondered about was a curious little bit of C++ deprecated smart pointer, the auto_ptr, a subject that I wrote a blog post about way back in 2011. Other subjects that I have been pondering about were that while I love the concept of closures, their pervasive use leads to huge monolithic source files.

Mix in principles for safe defaults, and I had quite a stack of features that I felt my ideal language would have to have. While I don't have the time to create a complete general purpose language like Monte or Elixir, the thought came to me that that shouldn't be my goal. My goal is to have CoinZdense and access to any chain that in the future may want to integrate CoinZdense in a language suitable for writing Web 3.0 bots. A Domain Specific Language or DSL. A minimal language with strict design principles and usage patterns that we aren't going to deviate from because deviation would blow up the specs and make the idea that having a CoinZdense centered DSL be more work and effort than trying to make CoinZdense available for Monte and Elixir.

For now it's a path I'm exploring. Maybe this mini language, this DSL wont ever exist and I'll go back to the old plan, but for now it's an idea that I want to explore deeply enough to figure out if it's the right choice.

The below description is incomplete, it outlines some of the core principles that I hope are going to be the backbone of the DSL.

Merg-E

Merg-E (pronounced as "merge", aims to be a simple embeddable language that tries to combine a subset of functional programming with a subset of capability based languages.

Merg-E has no classes or objects, and the name is a play on the word "merge" as merging multiple source files through scoped imports into a closure centric application is an important aspect of the simple language, and the E language is an inspiration for the OCap/POLA design of the language, though Merg-E tries neither to be a full on functional language nor a complete ocap language.

implicit constness and transitive mutability

Everything in Merg-E is implicitly const unless it is explicitly marked as mutable. The only exception to that rule is the ambient keyword that we will look at soon. A function is transitively mutable whenever it is given explicit access to mutable state in the outer scope of the closure where it lives, otherwise it is const. For safety purposes, the user needs to make the mutability status of a function match the mutability that it gets from this transitive rule. failing to do so will result in an exception (interpreter) or a compilation error (compiler). Note that mutable function arguments don't make the function itself mutable, only the outer scope mask of the function fingerprint does. A mutable and a const can't be assigned to each other.

main and ambient

A Merg-E program's root is a single nameless function (main is a language keyword, not a function name). Please note that Merg-E only has 6 top level keywords:

app
ambient
lang
merge
ns
state

Every program starts with the first three of these four keywords.

All other keywords are scoped within the immutable lang.

app ambient lang merge state {
   lang.mutable lang.main ()::{args: ambient.args, entropy: ambient.entropy[1024], workdir: ambient.os.filesystem.home[".MyApp/var"], confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"])} {
      ...
   };
}

The above example may look a bit complicated, but once you understand the basics it becomes easy to read. The first line tells us Lets walk through what it means. The app keyword tells us this is the start of the application scope. The ambient modifier is a cary MF, it basically represents all ambient authority that your program had at startup as granted to it by the operating system. For a least-authority App this is way too much authority, we get to that later. The lang modifier adds the actual Merg-E language to the scope. Without the ambient, we can't touch anything outside of our process and our actions will be limited to pure computation and gobbling up memory, which isn't very useful. Without lang we won't be able to express anything in the Merg-E language.

The second line looks complicated. It is prefixed with lang.mutable as to indicate that main is to be considered mutable because it gets access to mutable state from ambient. Then follows the main definition. It starts off with lang.main, as we said before, all keywords except for four special ones are defined in lang, the Merg-E language. main is one of these keywords. It tells us the next nameless function coming is the main of the program. The () is the nameless function that is not allowed to have any arguments. After the function definition the ::{} adds a mapping for main's accessible scoped authority. Because this is main, there are no outer-scope variables that can be exposed except for ambient and lang.

The merge and state keywords are there so the keyworda are made available in the app. If you don't use actors, you can ommit the state keyword. If you arent using modules from your code, you can ommit the merge keyword.

By default lang is carried into the inner scope, not because lang is a special keyword, but because lang is a so-called deep-frozen. By default ambient is NOT carried into the inner scope because ambient is unfrozen mutable state and/or authority. So in order to carry in some needed authority from ambient, but not too much, the mapping defines how a number of chunks from the ambient tree carry into the inner scope.

It is important to distinguish between read-only, immutable/frozen and deep-frozen. Something that is read-only is read-only to our scope, other scopes might have the ability to write it. Something that is immutable might still carry some authority, and is treated as such. Something that is deep frozen is assumed to carry no authority down into deeper scopes.

Let's look at the four mappings.

args: ambient.args : This makes the commandline arguments available as args, ambient.args is considered to be frozen or read-only
entropy: entropy: ambient.entropy[1024] : This gives the App access to 1024 high entropy chunks of (256 bits of) data.
workdir: ambient.os.filesystem.home[".MyApp/var"] : This gives the inner scope access to the user's work directory for this application. Note that it is full read/write access without any attenuation, but nothing outside of the directory is available. No .. is available.
confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"]) : Like the last one, this gives the inner scope access to a directory, but in this case access to the directory is attenuated to read only access.

No returns, no guarantee of order.

Before we look further into functions and closures, it is important to realize that Merg-E functions don't return anything. Often because they can't because they run in another execution context at the discretion of the execution environment. The easiest way to think about things is to consider each function as a (usually) short lived actor. It is possible to invoke a function as a long-lived actor, more about that later. Instead of returning a return value, Merg-E does allows to give the inner scope access to other named functions or actors defined in the outer scope, we will look into that later too.

Move as default

A big part of Least Authority programming is about minimizing shared mutable state. If you are a long term C++ user, you may remember the auto pointer that is now deprecated because of a property that C++ developers found non-intuitive. An auto pointer used to move ownership when it was assigned on use. In C++ this was called reference stealing. But in Merg-E this isn't stealing. This is the default on invocation unless the mutable state is first explicitly marked as shared.

This isn't the whole story. Function calls can reference shared-mutable-state in two distinct ways. The state can be referenced as a function call argument, or it can be referenced from the closure fingerprint, and there is more than sharing and moving; there is also copying. Note that in Merg-E, a copy is always deep and can thus be expensive.

It is important to note that not all combinations are possible. If a variable is marked to be copied or shared, it can no longer move. For this reason if either the closure fingerprint or the function argument is marked separately as shared, the other one is implicitly implied to need copying.

If we define a variable like this:

   lang.mutable lang.type.uint64 foo = 17;

this variable will get moved on whatever usage.

If instead we make it const:

   lang.type.uint64 foo = 17;

this implicitly const integer will be shared regardless how it is used.

If we want to not share a constant with inner scopes, we should either use a merge for each of these scopes, or we should mark the constant as anchored to this scope and unavailable in any other scope.

   lang.anchored lang.type.uint64 foo = 17;

Mutable state can be marked for copying or sharing or a combination of the two.

   lang.mutable lang.shared lang.type.uint64 foo = 17;

this will make any usage shared.

This, in contrast

   lang.mutable lang.copied lang.type.uint64 foo = 17;

will make any usage copied.

The following will make sharing through function arguments shared while implicitly making any closure fingerprint based sharing use a (deep) copy.

   lang.mutable lang.arg_shared lang.type.uint64 foo = 17;

And this one achieves the opposite

   lang.mutable lang.closure_shared lang.type.uint64 foo = 17;

A more complex but convenient alternative for move, is a move that snaps back to the previous holder at the end of the execution context lifetime.

   lang.mutable lang.borrow lang.type.uint64 foo = 17;

And in the same way as we define shared, we can distinguish between arg and closure based move scenarios:

   lang.mutable lang.arg_borrow lang.type.uint64 foo = 17;

   lang.mutable lang.closure_borrow lang.type.uint64 foo = 17;

So how about at deeper levels? We will dive deeper into this later in this document, but to get an idea, we need to declare it in our function line, or in this example our main:

app ambient lang merge state {
   lang.mutable lang.main ()::{args: ambient.args,
                  entropy: ambient.entropy[1024],
                  workdir: lang.shared[ambient.os.filesystem.home[".MyApp/var"]],
                  confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"])
                 } {
      ...
   };

In this case workdir becomes shared and owned by main.

Named functions

If we want to create a named function/actor, it is similar to our main definition.

   lang.mutable lang.copied lang.types.int64 z = 42;
   lang.mutable lang.shared lang.function foo = (x lang.types.int64, y lang.types.int64)::{z: z} {
     ...
   }
   lang.mutable lang.function bar = (x lang.types.int64)::{res: foo} {
     res.invoke(x, x+7);
   }

Or if we want to make the function into a long lived actor:

   lang.mutable lang.type.map actor_state = {z: 42};
   lang.mutable lang.function _bar = (x lang.types.int64)::{res: foo} {
      res.invoke(x, x+7);
      state.z +=1;
   };
   lang.mutable lang.actor bar = _bar.spawn(actor_state, 4);

merge

When a Merg-E program gets bigger, it is desirable to be able not to have to use one huge source file with a huge number of nestings. Next to this, some form of DRY is desired. To accomplish this, the merge keyword allows us to do scoped and fingerprinted imports.

   lang.mutable lang.type.map actor_state = {z: 42};
   lang.mutable merge utils.bar as _bar(x lang.types.int64)::{res: foo}{};
   lang.mutable lang.actor bar = _bar.spawn(actor_state, 4);

The merge line replaces the function definition with a scoped fingerprint import. The utils package is looked up in the file system according to rules we will discuss later, and bar is imported from it if it exists with the right fingerprint matching that of the merge line. Please note the extra {} at the end that we haven't seen before. While in a single file const and deep frozen outer scope variables are implicitly available, in the context of a merge, they are part of the fingerprint and need to be specified.

In the utils package, the function definition looks something like this:

ns utils lang state {
   lang.mutable lang.function_def bar = (x lang.types.int64)::{res: lang.types.int64}{} {
      res.invoke(x, x+7);
      state.z +=1;
   };

}

Note the difference in the fingerprint dict for outer scope vars, instead of a reference to the outer scope variable, a type is used that should match the type referenced in teh merge line.

Assume nothing about parallelism

The Merg-E language is meant to eventually have multiple implementations, from scripted running on top of single threaded single event loop Python (the first Proof Of Concept) app, to a BEAM-bytecode compiled version that leverages the parallelisms in the BEAM VM, to a LLVM IR version, and possibly in later versions the LLVM IR might end up translating the simpler Merg-E functions and actors into NVPTX LLVM IR or to SYCL C++ code that is then linked with the LLVM IR from the main Merg-E code. There is a lot there to figure out, but the important take away is that when writing Merg-E code, don't assume anything about parallelism. Your code may run blocking in a single task on a single thread, or it may run in a massively parallel environment. There are some suggestions about parallelism you may give, but the interpreter or compiler might decide otherwise, they are just hints and should be treated as such.

When you invoke a function, there is no return value, but there is something returned nonetheless. The thing that is returned has a language private type, so you can't directly assign it to anything, you can accumulate them though, which for many purposes is the same thing.

    lang.mutable lang.function.controller c;
    c += bar(17, 88);
    c += bar(42, 18);
    lang.await.any.abandon c;

This code creates a function controller, then it adds the potentially asynchronous running or potentially blocking and then completed function to the controller for later actions. At this point, bar might or might not have completed. It calls bar again with other arguments, again adding the function that might or might not be running to the controller. Then lang.await.any.abandon gets called on c. This is a weird little thing. language features under lang.await are the only features that are guaranteed to be blocking. This specific function blocks until at least one of the functions has completed and then tries to abandon the other functions that haven't completed yet. We say try because as we will see later, a function may be declared atomic and atomic functions, when started, can not be abandoned.

Please note that this single await line will block guaranteed only until one function has completed. It won't wait until all atomic functions are completed. If we want that, we can add one more line:

    lang.await.all c;

This line will wait till all functions are either completed or abandoned.

Actor spawning and parallelism

Remember this line?

    lang.mutable lang.actor bar = _bar.spawn(actor_state, 4);

We are spawning an actor, but what is the number 4 doing there? This number 4 is a parallelism hint. It tells the compiler or interpreter that it may make sense to make the actor consist of four workers sharing the same actor state between them if parallelism is supported. Alternatively we might have written:

    lang.mutable lang.actor_pool bar = _bar.spawn_pool(actor_state, 4);

In what case not a single actor consisting of four workers with shared actor state, but four actors each with it's own version of the actor state will be spawned.

Exceptions

Each function, including main. has the option to define an error body for error handling.

   lang.mutable lang.function _bar = (x lang.types.int64)::{res: foo} {
      res.invoke(x, x+7);
      state.z +=1;
   }{
       lang.switch state.exception[-1] {
           lang.type.exception.range_error : {
              ...
           };
           lang.type.exception: {
              ...
           }
       }
   }

Note that the error body follows the happy flow body. There is no try catch at arbitrary nesting. This is a design decision that promotes small low authority units of code being their own function. It is important to note that in the current design, if a function has no error body and is never awaited, the exception will be fatal unless the main has a third catch-all error body for uncouth exceptions. If an exception is thrown, it is added to state.exception, that is defined as an array, hence the -1 index used above to get the last exception thrown if an exception is thrown from an error body. If the exception type thrown in an error body is not part of the state.exception array yet, the new exception will also get processed by the functions error body, but if the exception type is already part of the array, to prevent eternal loops, the new exception is handled as if there was no error body, what means it will either be handled by the calling function if an await is used, or will end up in the catch-all body of main.

app ambient lang merge state {
   lang.mutable lang.main ()::{args: ambient.args,
                  entropy: ambient.entropy[1024],
                  workdir: lang.shared[ambient.os.filesystem.home[".MyApp/var"]],
                  confdir: lang.attenuate.read_only(ambient.os.filesystem.home[".MyApp/etc"])
                 } {
      ...
   }{
      ...
   }{
       lang.switch state.exception[-1] {
           lang.type.exception: {
              ...
           }
       }
   }
}

expanding the root namespace

While Merg-E only has 6 keywords to keep the App namespace clean and to allow clean upgrade paths, it is possible to expand the App namespace and slightly reduce verbosity by pulling lang and ambient entries down into the App namespace. This can be done in two ways:

In the app line of the program or ns line of the module.
Using lang.use

Often it is u

Ideas for Merg-E: A least-authority language for Web 3.0