The things you learn along the way to trying to design your own computer language.

So here are two things I sort of uncovered on the path towards designing my own computer language.

(1) Do you have headers or not?

Meaning is your computer language one that includes headers and some sort of mechanism for loading those headers? If so, then a few things become true:

You implicitly need two ways to define something: as an include or forward declaration, and as a concrete definition.

That implies you need some way to say in your header that “Foo” is a function and “Bar” is a global variable, but don’t define “Foo” or “Bar” here. If your language is object-oriented, you then need a way to say (as Objective C does) “@interface” and “@implementation”, and (if your object-oriented language has single inheritance with interfaces or protocols) “@protocol”.

Note that in Java, you only have ‘class’ and ‘interface’; ‘class’ also doubles as a way to say “somewhere this class is defined, but I’m not defining it here.” But then, Java doesn’t have include files.

How do your include files work? That is, are they loaded by a preprocessor into a single logical file, as in C or C++? Or are they separate “definition” files which are compiled then referred to during the compilation of your source code?

This can also implicitly work with precompiled headers, but I’m thinking of a header language which is orthogonal to the source language. Meaning you could theoretically design a language where the headers can only legally include stuff like “extern int foo();”, but cannot legally contain:

int foo()
    return 5;

Now if you don’t have headers, you need to figure out how to handle precompiling enough information from all of your source files so you can refer to the signature of your objects from other files. That could be done, for example, through a two-pass compilation process: the first compiler pass only constructs the implicit header information (the names of globals, functions, classes, methods), then the second pass actually constructs the code, including other class declarations as required.

(2) Is your language strongly types or weakly typed?

One consequence of a strongly-typed language is that every expression in your language, you know the type of the inputs and the implicit type of the output. You know, for example, your variables ‘a’ and ‘b’ are integers, and thus the result of addition will be an integer. You know if you have different variables types for ‘a’ and ‘b’ (say, ‘a’ is an integer and ‘b’ is a real number), then your compiler has to implicitly handle the conversion of a to a real number prior to addition, and the result os a real number.

This puts a lot of things on your compiler. You need a type representation which can represent all the types (including compound types) of your language, so you can differentiate between int **a and int *a, and so you know the result of dereferencing a variable.

On the other hand, if you have a weakly typed language–any variable can be pretty much any value–then this makes the design of the compiler far simpler. But it makes your run-time engine do a lot more work: because a can be anything and b can be anything, the variables have to have an implicit type associated with them, and you your addition code has to know how to add any two objects together to get a reasonable result.

Of course weakly typed languages result in some weirdness; just see the various examples about addition of integers and strings in Javascript.

I’m sure I’ll learn more things along the way. But these are the two that have bitten me in the past few weeks.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s