Izumi: Ralf - Hint Search
Index: Home     | What Is Izumi | Misc Links   | Random Thoughts | Too Much To Read | The Rant Vault | Quotes
Dev:   Projects | Ideas For Dev | Nerdkill | Rig | Hint

The Search for the Hint Programming Language: Specs | Research
Site License And Disclaimer as well as contact information are available here.
Preface:

Hint is a conceptual project vaguely centered around the hypothetical specification and implementation of some programming language.

This text documents this research project on a daily basis as I add new ideas and reconsider previous ones. Do not search here for any breakthrough or anything revolutionary.

The Hint project will not by itself produce any tangible result.

In this text, I am merely toying with ideas on a day to day basis, sometimes following some direction and then sometimes changing my mind or simply dropping everything and starting all other again. This means this document is time sensitive. Early entries (at the top) are generally obsolete. More recent entries (at the bottom) may or may not implicitly refer to earlier entries.

If you're trying to read this, you may want to jump directly at the end and read the last section, then backtrack and skim from the beginning to get an overall idea.

I do not expect anybody but me to find any interest in reading this. On the other hand, any comment is welcome.

For an overview of what this is all about, read the goal description.

20040911 Disclaimer: nothing here will ever lead to any kind of reasonably useful implementation.

20080108: Or maybe it will.


1. The Goal, Reloaded

[Added 20040508, I want to be more explicit on the goal here.]

It is obvious to me from the description list (now moved right below in 1.1.) what I'm trying to do here, but it may not be so for everyone.

So first the initial idea came when I was using my Palm Pilot Pro recently. I needed to code a small thing yet I had nothing at hand. What I mean is that I wanted an IDE on the PDA to code my script and execute it right away. On top of that I wanted to be able to define a small user interface: a couple of buttons, an edit field, etc. So basically I wanted a RAD (rapid application development) IDE and runtime on the PDA, specifically targeted to the PDA. Now I went googling there and there but couldn't find what I wanted (if it exists, please point it to me). There are plenty of IDEs for the desktop. There are cool projects like Waba, SuperWaba (welcome to the Open Source redundancy, great!), [Pocket Smalltalk IDE|http://www.pocketsmalltalk.com/index.html] and of course I have the new promising [PalmOS Developer Suite|http://www.palmos.com/dev/tools/dev_suite.html]. Although these are great tools, they work on the desktop to generate Palm binaries.

As I said, what I use is a Palm Pilot Pro, with gloriously 2 megs of RAM, a 16 MHz Dragonball processor and a 160x160 monochrome screen. By nowadays standard this is nothing -- a cheap phone may have better now. But from my standard this is twice more powerful than my Atari ST on which I used GFA Basic to code and develop "fast" applications. I'm not going to rant about nostalgia and crap like that. There's no reason a limited IDE for a limited language should not be doable.

Oh and bear in mind that I have a view of the kind of IDE and language I would want. The language should have a concise yet not cryptic syntax. For example C-like languages that heavily rely on commas, quotes, backslashes, and braces would be a pain to write on a PDA with the default Graffiti (not to mention that the new Graffiti 2 seems to suck even more when it comes to non-alphanum characters). The IDE itself should make it easy to find shortcuts to avoid typing. Auto-completion comes to mind. A side bar with the first letter of a command would insert the most common commands, etc. There are many things that can be done to make it easier -- it involves a bit of research or trial-and-error of course.

The last important detail is that recently I discovered the Io language. I'm a fan of Self too. Let's mention Squeak here too. I have used neither for anything real beside learning nor do I really intend to, but I like the simplicity of these languages and the elegance of their environment. I was most notably impressed by the compacity of Io and was trying to figure out the complexities that can emerge during the specification and implementation of such a seemingly simple language (simple as in elegant). Then it struck me that simple and elegant meant small yet powerful enough to run on a PDA. Performances would not need to be either dramatic, it would just need to run. Then if it can run on a PDA, it can run on a desktop, as an Apache module, you name it.

20040626 update: Recently I started exploring a different variation from the simple prototype-based language idea I had at first. Namely I'm trying to consider an approach where the source code is a "stream of tokens" and everything uses lazy evaluation. Some Lisp/Scheme ideas got mixed in too. More than ever this makes this project and this document a research project rather than the actual implementation of a known predetermined goal.

So that is the idea behind Hint: primarily a research project, a way to learn from the specification and implementation of such a language, with a specific purpose. There will be obvious design mistakes, misdirections, confusion, lost of time, ups and downs in the motivation, frustration at the debugger and maybe satisfaction to see it running. This is life. Always trying something different, yet always the same.


1.1. Description

20040330

Goals:

Development:

Ideas from other languages:

Note that semantically "function" generally means top-level function whereas "method" means a function within an object and thus the semantic is on the current visible scope (global vs object instance). Since everything is an object, it can be argued that the global scope is an object (actually it is!) so global functions are really methods added to the global object instance. "function" is an easier to understand concept than "method" (everyday life and math usage).

Is the notion of class vs. object instance an advanced concept or something easy to grab? (I tend to think the later...) For example I like PHP's notion of typeless variables. Yet internally all variables have a type (with default automatic conversions and casting as needed). It also has classes. Another strong notion (imho) is pure abstract interfaces (aka protocols in Obj-C), i.e. a set of methods/slots that an object is expected to have. This is generally seen as way for the compiler to enforce static typing, yet imho the greatest benefit is to express design in the implementation.

Const objects could be a good alternative: a constructed object that is read-only. It could serve as a "class" to be instanciated (cloned) or derived (single heritage => parent slot, multiple heritage => list?).

Io has: local scope + self in a message call, super to call obj's parent, scope from local to parent to global ("lobby"), "block" vs "method" (block creates its own local scope, method has the object scope), message forwarding, and most of workflow control implemented as messages, which is rather elegant (and probably means it can be be overriden). All libraries are implemented in C as mutable Object clones with internal hidden fields, and as such methods are overridable.

Quite frankly, Io has most of what I want; except of course my goal here is to understand the problem in creating the language and its implementation so the better way to do it is do it myself. Lua and Io have both good reference manuals and tutorials which make learning the language quite easy, and that should be part of the implementation.

Neither Lua or Io use a main(). Interpreted as being read. Eventually there are a bunch of function definitions and at the end one line that implements it. Similar to the way PHP does it, better for scripting. Also from a prototype-based point of view, an object needs to be cloned before slots can be added, so that's always interpreted.


1.2. Tentative syntax

disclaimer: this chapter is not a reference guide on the syntax. It's more an essay on what the syntax can be and what are the problem it generates. Sometimes I'll write something and then discover later the syntax can't work.

Let's start with something easy, comments should be flexible:

	// single ligne comment
	# single ligne comment
	/* multi line comment *
:preprocessor command

Now the trivial "hello world" program:

	print "hello world"

From a message passing point of view, that is "object msg params" syntax. Object would be the implied "self", here the global environment (VM, Lobby, whatever the name). print is the method and the string is the argument.

I see it as a flow of instructions + data. The print method needs one argument and so it will take whatever is next in the interpreter that can provide the data. So I should be able to write this:

	a = print
	a "hello world"
	print function - return "hello world" end

Also I want to be able to store incomplete functions, a la Haskell:

	function add - n m
		return n + m
	end
	a = add 1
	print a 2
	=> 3

The program really is a flow of instructions. That means a function call will trigger reading as many expressions as necessary. Here print reads one expression.

To keep the syntax light, let's get rid of ; when it can be implied. Using JS rules for that would be nice, except there's no } to terminate blocks (but end will do just fine?). JS says ; is automatically inserted when the next token is invalid in the given context and a new line is found instead, except in some weird cases like for(;;) or ++ and -- isolated on a line (could be either postfix or prefix). They have a documented list of caveats (such as return \n a+b \n, no kidding!!).

Ideally I can expect the IDE to provide wordwrap (in a PDA context for example), so asking for \n to be the "official" line terminator seems OK to me, with ; being an alternate one (especially for multiple inline instructions). Also it should be offered to have _ (VB) be the continuation character (instead of \ like in C/C++).

Eventually I expect preprocessor commands to allow to override all that (and do stupid illogical things).

To keep names small (pda context), I suggest "func" instead of "function". Ideally I want both:

	function foo(n)   return n+2 end
	func     bar(f n) return f n end
	print bar foo 5
	=> 7

Note that bar is declared as bar(f n) and not bar(f, n). The comma can be skipped since variables are typeless. An alternative syntax would be:

	func bar - f n - return f n end

That would be even easier to type. I was initially thinking of using = instead:

	func bar f n = return f n end
but then this would look odd:
	func bar f n = a = 2*n ; return f a end

Also I need something after the function name to distinguish between anonymous function with parameters and non anonymous. Example of possible confusion that we need to avoid:

	func bar n    - return 2*n end
	return func n - return 2*n end

So I will go for this notation in the end:

	function|func [name] [- args] -|\n body end

So the examples would be (all equivalents):

	func bar - f n      # - can be ommited if \n is used
		a = 2*n         # declare a with local scope
		return f a
	end

	func bar
		- f n
		return f a
	end

	func bar - f n - a = 2*n return f a end

	func bar - f n - a = 2*n ; return f a ; end

For functions that take no argument, - - can be shorted as -. This should be equivalents:

	func - - return 5 end
	func -   return 5 end
	func
		return 5
	end

With:

	:rename func - - => func ( )
	func bar(f n) a = 2*n; return f a; end

Note that the initial definition of func is really:

	:rename func - - => func - (- \n)

that is the second - is a list of either - or \n (the list syntax is not developed yet, ( ) may change)

Oh and of course planning ahead, I will want that too:

	:rename function => method

Neat-o ;-)


1.3. Objects

Let's continue with objects. I'm going towards a prototype-based environment where objects can be read-only.

Instances really are "slots" in another object. There's a "global" object, always available from everywhere:

	global.a = 5

When in the global scope, this is of course the same as:

	a = 5

To make it a method, simply affect a function:

	global.foo = function - n - add n+2 ; end

Now, wait a minute :-) In a pure prototype-based environment, I only need anonymous functions. That is everything is a method of some object, there are no standalone functions. In fact, writing this:

	function foo - n - add n+2; end

should be syntactic sugar for writing this:

self.foo = function - n - add n+2; end

In this context, I'm not sure I really want the two - separators, yet I'm not sure I want to skip them. Safeguards are nice to have sometimes, and if I remove them I'm not sure how to rename func - - into func ( ) ! That is not really a problem since I could easily rewrite this:

	:rename function - - => method \b -
	self.foo = method n - add n+2 end

where \b means word boundary (basically any whitespace that separate tokens).

Next I need to create an object:

	obj     = new Object
	obj.x   = 5
	obj.y   = 6
	obj.add = function - n - x += n; y += n; end

which should be equivalent to

	obj = new Object[x=5, y=6, add = func - n - x += n; y += n; end]

where [a,b,c] is a list of initializers, i.e. default slots that will be created or added to the instance. Per such, it is logical to have each initializer be in the form var = value. To be generic, any list can be placed here, so the following will be equivalent:

	obj = new Object[ 42, 10, x=2, y=3, 40, add = func - n - x += n; y += n; end, 60]
and
	obj     = new Object
	obj[0]  = 42
	obj[1]  = 10
	obj.x   = 2
	obj.y   = 3
	obj[4]  = 40
	obj.add = function - n - x += n; y += n; end
	obj[6]  = 60

and the following are actually the same array element selectors:

	obj.x    = 2
	obj["x"] = 2

Also note that function add above uses x/y from the object's instance scope.

To use this:

	obj.add 5
	print obj.x
	=> 10

From whithin an object method, the implied slot self should point to the instance. There should also be a slot proto which here will point on Object (the object was cloned from Object). Objects slots should be looked up in the order local > parent* > proto > global.

I see at least these slots:

For reflection, I can have a slot "slots", or use methods from the global scope that know how to examine an object. Global methods may allow for more flexibility.


1.4. Misc., Lists, Sugar and other oddities

I need tail-end recursion:

	function fact - n
		if n <= 1 then return 1 else return n * fact n-1
	end

How does an interpreter detects that? JS can make an anonymous recursive function by using the self.callee slot. Pretty neat :)

The function syntax seen above can be seen as a function call of the global slot function. It returns an object which knows how to apply its arguments to its inner statements.

Same for new, which has for argument an existing object to clone and eventually a list of initializers. That means the function new has either one or two arguments. See below for problems this will cause.

To simplify, any statement is actually a function call at some point, and every statement returns something. That is the following code:

	1+5
	=> 6
is actually syntactic sugar for:
	global.add 1 5 ;
	=> 6

Now that means that:

	print 1 + 5
could be expressed by:
	global.print global.add 1 5

Note that print actually returns the printed string. Think Java's toString().

Oh and I said I want lists and I want arrays too. Except a list and an array are the same thing, a collection of ordered elements of any type. Above I wanted to define a list using ( ) but that will conflict with a mathematic expression:

	1 + ( 4 * 6 )

So instead we'll use [ ] for lists and arrays and items need to be separated by comma. It sucks pda-wise I guess. And since lists and arrays are semantically equivalent, I'll go for array as internally it will probably be built this way and thus performances wise it will be closer to an array (middle insertions are expensive, iteration is cheap).

An array should be an ordered collection of key=>value pairs, by default the key being an index. Note that an object is a list: the slots are the list values and the slot names the keys.

There currently is an ambiguity to resolve:

	a = 5
	print a [ 6 ]
	=> ?

That could either mean to print 5 and the array composed of 6, or to print the 7th index of the array a. (Here I completly ignore the fact that print is supposed to take only one argument). Even worse, I could have this:

	a = [ new Object, new Object, new Object ]
	print a [ 2 ]

PHP's way is to use the keyword array() to declare an array, so we would have this:

	a = 6
	print a array [ 6 ]
versus
	print a [ 6 ]

Now it sucks for object declarations:

	a = new Object array [ x = 5, y = 6 ]

The keyword array could be optional here, but fear this:

	a = [ new Object, new Object, new Object ]
	b = new a [ 2 ]

Does that mean cloning the 3rd item of a or cloning and setting the slot 2 to nothing?

The best way to solve this issue is to remove it. It is actually linked to the problem of functions with variable arguments. In the context of a language that makes lists easy, a function that needs multiple elements will simply need to receive one list as argument. It also solves the problem of storing the definition of a partial function call.

At that point I can either have new always take 2 arguments, or have two keywords:

	b = new a [ 2 ]            => new takes one argument, the object a[2]
	c = new-init a array [ 2 ] => new-init takes 2 arguments, a and array [2]

Alternative names for new-init are new-with or simply init. Or clone versus clone-with...

Of course I could say that "new [ ... ]" is actually the constructor for an array/list but then this would look odd.

	c = new-init a new [ 2 ]

The only thing it does is simplify by removing one keyword. Yet, array, like function, is a keyword that is mostly there to add human semantic to the statement workflow rather than a strong requirement of the language.

So to resume for list and arrays, this is how one is constructed:

	a = array [ 42, "foo", 'bar', 30, array [ 1, 2, array [3, 4]], x = 5, y = 6, 6 = 'b', 7 = 8]

That's kind of a brain dead syntax, but it's actually easy:

To retrieve a value from the array, simply give its key. An integer number can be either a key or an index, in which case a key will be retrieved if there's one available. Using the array above:

	a[0]       => 42
	a[1]       => "foo"
	a[4]       => [1, 2, [3, 4]]
	a[4][1]    => 2
	a[4][2][0] => 3
	a.x        => 5
	a['y']     => 6
	a[6]       => b     # uses 6 as a key, not an index
	a[7]       => 8     # uses 7 as a key, not an index
	a[8]       => 8     # uses 8 as an index since there's no key '8'

Note that a['7'] is not the same as a[7] above.

Later I should define how + behaves for lists and other objects (list concatenation is the obvious default).


1.4. Summary of syntax

I've already got a bunch of stuff. Some is missing, specs are a bit fuzzy yet getting there. Here's the minimalistic initial syntax, as of today, expressed using a pseudo BNF:

	number   := [0-9]+.[0-9]* | [0-9]*.[0-9]+
	string   := "(\"|[^\"])*"  | '(\'|[^\'])*'
	list     := [ (?: <expr> (?: , <expr> )+ ) ]
	num-expr := ... the usual stuff with + - / * and ( )
	new      := new <id> list?
	var-id   := <id> ( [ <expr> ] )
	var      := <var-id> ( . <var-id> )*
	func-kw1 := -
	func-kw2 := -|\n
	func     := (func|function) <id>? <func-kw1> <func-args>? <func-kw2> <func-body> end
	func-args:= <id>+
	func-body:= <expr>

(to be completed... ?)


2. Ideas

«»  2004/04/01  «»

Let's continue the specifications by throwing away most of what has been previously said.

Ideally I want two things here:

By visual programming, I don't mean workflows and stuff. Scripts are the best representation of an algortihm, I think. On the other hand, the existing modules and/or classes can be represented graphically, yet interaction between different components will always be something abstract that can only be partially represented (think Doxygen's graphs of class hierarchies and class dependencies). So anyway, that's a background idea that won't be addressed right away :-)

The language syntax I started to examine previously is interesting, yet it already disgressed from the original goal in two ways:

There are some good things though. I realize that both goals are antagonist in some way. Eventually the only simple yet powerful syntax is Lisp's notation: (op arg1... argN). Unfortunately it is also one of the most boring ones to use in my opinion.

Everything introduces before was seen (at least by me, in case I didn't really express it) as "syntactic sugar", i.e. human-friendly notations for concept that could be expressed in a simplier, more rigid way. For example, these should be equivalents:

	var = value                   => (global.let var value)
	func foo - n - return var end => (global.func foo (n) (return var))

The second form is Lisp-like and inherently annoying to write (and read back). Internally I need to handle things like that since it's the direct expression of a tree, yet have a flexible parser than can input the friendly ones.

The idea is thus to have a parser that I can program to let it know what the syntax will be. That actually reduces the parser to some kind of pseudo flex/bison (lex/yacc in non OSS words) thingy, except I don't want to care for their barbaric syntax (nor do I actually expect anyone to read the Dragon Book before being able to modify the dynamic parser syntax).

(Etc., etc.).


«»  2004/04/03  «»

I actually wrote a bunch of stuff on paper, here they go. They may contradict or expand what has been said before.


The global namespace should be called hint, f.ex: hint.add, hint.set.

Generic syntax:

	stmt = func param1 .. paramN [;\n] (<- non optional: ; or \n)
	       :preproc-cmd arg1 .. argN [;\n]

Call scope: local > obj > hint-global

All other constructs are remaps (mappings? macros?)

	:fdef: func <id> - <fargs> - <fobdy> end => hint.func[id [fargs] [fbody]]
	                                                      id can be nil
Parse :fdef: as list of (token, <id>) or <map>

Syntax:

Maps:

	:fdef:   func <id>? - <fargs> - <fbody> end
	:fargs:  <id>*
	:fbody:  <stmt>*
	:array:  \[ <expr> <array2>* \]
	:array2: , <expr>
or
	:array: \[ ?( <expr> ?( , <expr> )) \]
with ?(..) -> optional group.

More maps:

	:set:    <id> = <expr>
	:newobj: new  <expr>
	         newi <expr> <list>

Note: still the terminology confusion on list vs. array.

Do I want classes and modules?

	:class:  class <id> <list> => new object
Or simply should have the same syntax than anonymous functions:
	class
	module id - stmts end
	func

Internal structure? list/tuples (token, options) with optional, group, set, optional+group.


To do:

Subset? Simple > advanced > complete


Lexer -> parser -> AST (abstract syntax tree) -> semantic...

Workflow -> hint needs to do lazy evaluation => how? Ex:

a = f(..) => hint.set(a, hint.call(f, ..))
Only affects reference to func f to a, does not use immediately. => when is evaluated?

eval f evaluates expression (a.k.a. computes) and return value.

So a = f(..) != a = eval f(..)?


Also: eval/call => consumes argument only as needed. => Nice but how to model internal tree?
[That means an AST] like:

                                         set         
                                          |          
                                       +--+--+       
         a = f(..)                     |     |       
            ||                         a    func     
            \/                                |       
 hint.set(a, hint.call(f, ..)) =>          +--+--+    
                                           |     |    
                                          p1 ... pN   

is not possible.

(Note to self: takes too much time to do ASCI art.)


Other things needed for hint implementation:

Lazy evaluation?

Ex:
	a = f 1 2 + g 3 4
Will do which one?
	f(1 2) + g(3 4)
	f(1 (2 + g(3 4))
	f(1 (2+g)(3 4))
	etc.?
If priority is to arguments, will do the first one.

Same for new a[2] vs newi a [2] or new a[2]? Uses keyword (to solve ambiguity). (Note: actualy doesn't solve anything).

Yes but what about: f 1 2 g 3? -> f 1 2 (g 3)?

Solution: using parenthesis ( ) induces grouping, not necessarily only for arith. eval. So ( ) can be used anywhere.

For example:

	new (a[2])

Map:

	:call:   id params | id (params)
	:params: *expr*
	:call:   expr *( , expr ) | nil

Grammar summary:

	:arith:   expr + expr | expr - expr | expr * expr | expr / expr | (expr) | -expr (Source: Dragon book)
	:expr:    arith | fcall | onew | fnew | var | num
	:var:     id [ list ] | id
	:array:   [ list ] | [ ]
	:list:    expr *(, expr)
	:fcall:   fname fparams
	:fname:   id
	:fparams: *(expr)
	:affect:  var = expr
	:fnew:    (func | function) ?var - farg - fbody end
	:farg:    *(id)
	:fbody:   *(stmt)
	:onew:    new var
	:onewi:   newi var [ list ]
	:stmt:    ( affect | expr ) [ - \n ]

(Note: This is of course not usable as-is).


20040404

(04/04/04... gotta love that number :-))

Anyway, there's one ambiguity I haven't solved yet: what is an affectation?

Let's consider a function and an object:

	newi Object [x=2, y=2]
	func foo - a b - return 2*a*b end

Now what happens when I write this?

	a = newi Object [x=2, y=2]
	b = foo
	c = foo 1 	
	d = foo 1 2

a should be set to "point" or "refer" to the newly constructed object. It's type is "Object". Obvious and easy. In classic jargon (C++/Java at least), I guess it is called a reference. It is of course garbage collected, that is if latter on someone writes a = nil then the cloned object will be released at some point (unless referenced by another slot somewhere else). OK simple.

Now what about b? An easy case too: b becomes a refenrece on the object function foo. It's type is "function". Note that the syntax:

	func foo - n - return n*2 end

is really a short hand for:

	foo = func - n - return n*2 end

That means both b and foo are references on the same object. Now that is good as it means that as long as foo references on the same function object, this object will not be deleted by the garbage collector.

What of c now? Well it points to what I call a "partial" function, that is an execution context that would invoke foo with at least some know argument. So it's type really is a function too. Easy.

The real problem is with d. I have two choices here:

The logical choice is to select the former choice. Now sometimes I realize I may want to get the later choice, yet that could be simulated by writing this:

	d = eval "foo 1 2"

except it would be damn slow! So instead, there could be an eval function specific to evaluating single function calls:

	d = evalf foo [1, 2]

So the rule would be:

This seems simple enough, yet I can see the catastrophic possible errors when referencing an external lib or someelse's code, or simple in a big program where one doesn't necessarily know all the possible arguments. Of course I would argue easily that for tigh control, a prototype-based language is not the adequate choice. There are plenty of languages availables with statically typing or at least capable of infering the type at compile time. Here there is no such capability. It is a typeless language after all.


4. Prototypal

«»  2004/04/10  «»

It bring an interesting light on the the notion of prototype-based languages in software. It is often said the language should be irrelevant to the task, yet it is obviously not. There's also the religious war about what is a script language versus a programming language, the proponents of the later using the former term as a pejorative term.

My personal idea is that even with a strongly typed language, any big project is a mess to develop and maintain and a clear, painful (boring?) planning is required. Where does a prototype-based language fits in there?

«»  2004/04/11  «»

So the question is "what do I want"? I don't foresee a long term use of a prototype-based language. It's fun to build but it's of no use if I won't use it as an embedded language later for anything specific.

I figure what I like is a clear concept of modularization: modules that contain classes that contain methods. I also noticed that when using something such as Visual Basic, I prefer the strict mode in order to require declaring variables and generating compiler errors as early as possible.

So although I like the concept of a prototype-based language where objects can be modified at any time, I prefer having the notion of classes and type verification builtin.

A class is a template that is used to instantiate objects. Ideally that's nothing different to a read-only object that is cloned, except whenever a class change, all the objects derived from it should change: method added, type of a member changed, etc. Changing the method list is easy since an instance will generally invoke it's parent class method (unless it has overriden the method) except that means when the read-only object was cloned its function slots were not cloned, yet one expect all slots to be cloned by default. Changing the type of a member is more difficult since by default these "value" slots should be cloned in all instances.

So let's see.

In a prototype-based language, all we need is the notion of an object with is a table of slots. A slot contains "something", more exactly a reference on an object. If the slot contains a reference on a function object, then this slot's name can be used in a function call. Generally an object will have some slots with reserved names, such as "type" for the object's type, "parent" for the class/prototype it derives from, etc.

On the other hand, another view is to have modules (or files) composed of classes with an inheritance tree. Classes contains members and methods. There's meta information associated to all of this: private versus public, const versus mutable, etc. On top of that, classes may or may not be real objects that can be manipulated at runtime (except maybe enumerated via reflection).

Here I really want the former approach, which is the "raw internal" form of the later one anyway. But I want the ability to compile stuff and detect errors at compilation time. I also want type checking, even it is infered like in Haskell (actually that's even better). And I want to declare types before I use them, yet allow to redefine them at runtime if necessary.

Now I have to choose what kind of syntax I want. The original idea was to get something as simple as possible, low on esoteric symbols, yet flexible enough and not too ambiguous (the more explicit the better to start, from an implementation point of view). Ideally I'd like to avoid any symbols at all, yet to keep it concise. Ideally get rid of { } and even [ ]. Keep only - and ( ).

Oh and I don't really want a compiler. Formal parameter checking would be enough, or better a fast pre-parsing that can detect as many errors as possible. There should be a way to evaluate a string at runtime, and run it has part of the actual program.

So now we start it all over again. Forget everything that was written before and start from scratch.


I should have some basic types:

	string
	int
	double
	object
	function
	class
	module

And a way to declare some new data type:

	type T string

	class S
		string a
		T b
	end

There's no difference between a class and a struct, conceptually. So object, class and module are pure synomyns.

Then some code:

	module m1

		class c
			func op1 f a b
				return f a b
			end

			func op2 a b
				return a + b
end
		end // a

		func _main
			a = new c
			print a.op1 a.op2 1 2
		end
	end // m1
	m1._main
	exit 0

The types involved here were the automatic types for 1 and 2, integer. The function inherits the type it's called with. There should be the idea said previously that parenthesis can be used to resolve ambiguities during function calls, but that can only be used where it makes sense:

	print  a.op1  a.op2  1 2
	print  a.op1  a.op2 (1 2)
	print  a.op1 (a.op2  1 2 )
	print (a.op1  a.op2  1 2  )

I need a way to provide a type at runtime, like in templates:


«»  2004/04/12  «»

This is the part where I switch sides again and ask why the heck declaring variables is that useful?

Oh, I know that one: as a safeguard from typos.

	func op1 f1 in out
		aut = fi in
		ret aut
	end

Which of course is not:

	func op1 f1 in out
		out = f1 in
		ret out
	end

Which is the same kind of logic behind C# asking one to confirm using the override or new keyword when masking an overloaded method.

Now the issue is what level of trust does one want? I mean Pascal and ADA have been done and redone. For example in one hand C# allows one to do crazy stuff, but on the other hand it requires one to say pretty please before. So does Intercal ;-)

So as some TV show would say the dude has a point.

On the other hand, the dude has no point; how do I interpret this?

	eval func - return self end

Infinite loop.
Interesting ambiguity of self: what is the scope of self?

This self has no obvious scope for a definition of useful.


«»  2004/04/16  «»

It's not the language that matters. It's the framework.
What makes C# cool? The extensive .Net Framework.
What makes Java cool? The extensive Java Foundation Classes.
It's no surprise that the .Net framework and the JFC are very similar for the basic system classes (java.lang. vs System., Collections vs vectors, etc.). These are what makes programming easy -- on top of the language being good of course and much more alike than their respective owners would want to admit.

Which means whatever language syntax/paradigm I come up with, it wouldn't be useful without a good framework to back it. Io and Lua are good examples.

Ideally I'm thinking if I implement a version on top of platform X, then the X framework should be available via some kind of binding. This is true for the Standard C Library, the .Net Framework or the JFC. The problem with this approach is that the language becomes a slave of its first implementation. A better approach is to have a specific framework that uses the implementation's framework for speed but does not offer a model too tighly linked to platform X or Y.


«»  2004/04/17  «»

I was wondering earlier what should happen when there's a function call. The problem is not only with function calls. Retrieving the value of an object, or a math expression for example, is the same as calling a function: evaluate or get a reference?

I'm thinking this:

	a = f 1 2 3
	a = 1 + (2 * 3)
	a = func - return 1 + (2 * 3) end
	func a - return 1 + (2 * 3) end

In this last example, a is affected an anonymous functon that returns an expression (note that affecting an anymous function to a slot name or creating a named function is the same). The point here is that a is not expected to be affected the result of the evaluation from the anonymous function, but rather it is supposed to become a reference to later call this function. So the concept is the same when a is affected an expressiong or a function call: it becomes a reference to this code, an alias for it, rather than reduce and evaluate this code.

Consequently, it is logical that a function call be always treated as the implicit creation of an object "function call with whatever given arguments". The argument list may not be complete and when the slot will be reduced, whatever missing arguments will be looked up in the program stream. Of course if none can be found, a runtime error occurs.

This, I believe, should allow for lazy evaluation: function definitions are composed of functions calls which themselves are composed of function calls, etc. Nothing is evaluated till it is necessary.

Now the question is when is something evaluated? Obviously when an actual value is needed. Consider this simple code:

	func square - a
		return a * a
		end
	func sum - op list
		var s = 0
		foreach i in list
			s = s + op i
			end
		end
	a = array[1, 2, 3]
	b = sum square a
	print b

Here we have:

Now the other important point is this. Remember that all this is syntactic sugar for what should really happen underneath. What really happens is on the right column.

	b = square 2		=> hint.set(b, hint.square(2))
	c = print b			=> hint.set(c, hint.print(b))
	square 2			=> hint.square(2)
	print b				=> hint.print(b)
	c					=> hint.c()

Which should be read as:

The rule here is that every thing in Hint is a function call statement that returns something. So affecting a function is an operation that does return the affected reference, but does not evaluate the affected value per se. Here the function call is to hint.set, not the affected value. On the other calling a function at the root level obviously needs to evaluate/reduce this function or value.

That seems more clear.


«»  2004/04/18  «»

Now let's discuss environment.

There are three kind of environments that interest me:

There isn't a big difference between the embedded VM and the standalone application. A standalone application can be though as a minimal bootstrap for the OS in order to run a VM. On the other hand, an embeded VM may or nay not be allowed to interact directly with the OS.

The idea is that per se a VM does nothing. It just executes instructions. On the other hand modules are loaded in the VM that do stuff. Depending on the implementation, the VM will let those modules access the underlying OS or implementation framework thru some kind of binding. So if the OS-interaction module is not loaded into the VM, the embedded VM is running in a sandbox and only relies on whatever module the host provided to interact with.

Obviously any VM will inherit OS priveledges and limitations from its host implementation. For example a .Net implementation will have as much trust as the .Net environment has been configured with. That means if the VM is executed in the context of an ASP.Net page, for example, the VM will obviously have the trust level of the ASP.Net application. Some goes for a JSP context.

Because of that, it means I do not feel like having to explicitly build security in the VM right away. Security is implementation specific.

Another similarity of the embedded VM and the standalone application is that both would require external edition of their Hint files, maybe using a customized IDE, whereas a Smaltalk-like self-hosted environment would typically have its embedded object browser and editor that can interact with the system in real time.

In the case of the VM embedded in an application with an UI, it could be expected that the application would provide a text-oriented console that allows runtime inspection and modification of the VM, and eventually even a debugger since debugging is one of the obvious weaknesses of embedded VM usage. This means there's a need for modules that can be loaded or unloaded at runtime (and not only at startup), as well as loading and unloading scripts or simply resetting the VM state.

Eventually there could be an advanced feature that allow a VM to auto load modules when a function is called, absent yet provided by a known module. If there is a well-specified "namespace" to access modules (such as mymodule.mymethod), it would then be easy to see if a module repository contains "mymodule" and load it automatically.

Modules should come in two kind:

From the VM point of view, they should both provide the same result: creation of a collection of objects with slots, functions and so forth.

Another commodity would be the support of "precompiled" (or at least pre-tokenised) files.


Now let's consider agents & mobility.

Each VM can be considered like a little world of its own. An host application may run more than one VM (say, one per thread in a multi threaded application, by the way threading is another issue that should be elaborated later). These VM will naturally interact with the host at some point, but on top of that they could interact between each other. Even more, they could interact with some other VM in another application on the same machine or another machine thru a network connection, if available. Yes, I mean RPC.

This would allow Hint to perform distributed computation tasks easily. There are many ways to implement that: with language support, via specific modules, or by help of the application. Modules are probably my first choice, which means the functionnality is part of the framework, not the VM per se.

RPC, aka Remote Procedure Calls, generally involve discovery of neighbourgh nodes and then requesting a function with its parameter and waiting for the result. A host of problems are generally associated with RPC, ranging from arguments-vs-references passing to byte ordering, size of integral types and incompatibilites of host implementations. Most of these are details that are both largely referenced in the literature yet nearly impossible to solve universally, and I do not expect to do any better here. Yet it is promising to examine them.

An agent is a task, or more exactly a collection of objects, methods, structured and workflows that describe a given task. What makes an agent different from a normal task is mobility, i.e. the ability of the agent to be transfered from one VM to another. In this way, it suffers from complexities which are similar to RPC, if not more since an agent may be a collection of various objects that must keep working together.

These issues should be addressed later, and they can be implemented as supplementary modules part of a framework rather than part of the VM. Ideally, the bottom line is that I'd love to have some kind of support that would allow the state of a collection of objects to be frozen and captured. This state could then be saved locally, allowing the VM to be persistent, or transfered and thawed remotely. Some VM support may be needed for that.


Summary:


Implementation, not the real one, just one big overview with misc details, as it trigger interesting questions for the language specification...

Parser:


«»  2004/04/21  «»

Virtual Machine:


6. Minimal

«»  2004/04/29  «»

and 20040430

The idea here is to provide the minimal specification for a prototype.
It can be seen as an XP iteration 0.
But also, it can be seen as finding the minimal core of the language. The language can then be grown from there, extensions can be added. Most of these extensions would be added to the implementation directly to help with performances, but that should not be necessarry. Ideally there should be an atomic core which could be used to implement those extensions as well.

For example, imagine the keyword new in C++. For most, this can be seen as a basic atomic brick provided by the language, something that is a bare requirement for a C++ implementation. Yet technically the new operator can be overriden and reimplemented by some C++ code. And in most advanced projects I know, this happens once, for example when one wants to added debugging and memory leak tracking to the allocation system, or to create an array class which manages its own pool of memory. That and the fact that most of the time there's something called the C runtime library which itself implements new by managing the pool of memory with the use of malloc.

So here it's the same. I want a minimalistic set of intructions that allows for more complex idioms to be built on top of them. That will help define the step in the implementation too.

To be more precise, the very first step of the implementation should result in the construction of a well defined and well behaved language that is mostly useless (typically not more useful than bc, and probably even that is already too much.)

So here are the necessary components I see:

The first component obviously parses source code, either from file or from a memory string. At first it should be splitted in these 3 small parts, that is a lexer with separated syntax and semantic phases. With time those will probably need to merge to interoperate better. (Note: it is obvious here that this a direct reference to [Simondon's thinking about concrete versus abstract objects|http://ralf.alfray.com/perso/ht01rapp.html] (French, [Barbaric English Translation here|http://translate.google.com/translate?u=http%3A%2F%2Fralf.alfray.com%2Fperso%2Fht01rapp.html&langpair=fr%7Cen&hl=en&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools])).

The output of the analysis should be an AST that is directly suitable for use by the VM. Ideally the VM work should be minimal.

The GC should also be minimal in this first iteration. Something like a non-efficient mark-sweep garbage collector will do just fine for a first iteration.

Let's start with a minimal language specification:

Then from generation+1 to +2, in increasing order:


«»  2004/04/30  «»

Note: I started writing that in the plan page, then I realized this is still prospective research rather than implementation detail. So it's back here on the dev page.

Affectations

In 20040403, I wrote:

	:affect:  var = expr

Setting a value should take an expression on both sides:

	:affect:  expr = expr

That allows for writing arbitrary left-hand expression in the grammar, for example:

a = 5
	a[i] = some-function
	hint.module_a.b = a[i]

Semantically, not all expressions can be valid left-hand values. They need to have a writable property. Here are some more advanced examples:

	mod.obj.func1 = a
	mod.obj.func1 2 3 = b

This last one may sound crazy. It is not. It all means in the end resolving the type of the left-hand expression. Consider this last example again:

	list = array [1; 2; 3]
	func f1 - lst a b - return lst[a+b] end
	func f2 - a b -     return     a+b  end
	f1 list 1 1 = 5
	f2      1 1 = 5
There the affectation using f1 is semantically correct. f1 list 1 1 is a reference to the second element of the array list. Of course the affectation using f2 is wrong, since it, is it? Well actually it is not. It is merely ambiguous. In an hypothetic world, f2 could be specialized, for example I could want to write:

	f2 a b => a+b
	f2 1 1 => 5

Think Haskell or ML here! That's a typical function definition for these languages. Yet I'm not trying to go there here.

The other semantic is that f2 1 1 actually returns an object meaning 1+1, in a lazy world, that has not actually been evaluated (since it's value wasn't needed yet). In this context, expressing 1+1 <= 5 is obviously semantically illogical and thus is an error.

Slots

All this brings the necessity of describing objects and slots again. These are an integral part of the language and its implementation will depend on it.

An object, or properly speaking the instance of an object, is composed of slots. A slot has a name and a value. The slot per itself does not have a type; Only the value affected to the slot as one. Since Hint is a purely object language, values affected to slots are always objects. There are no integral types in the system.

The basic types are:

For example when affecting an integer, what is actually affected is a full object instance which represent the integer. That means that even an integer object will have a set of slots.

Some slots are present in every instance. Some slots should be read only and defined by the implementation. Most other slots will be read-write.

Actually it could be argued that all slots should be read-write. Even those that could be reserved by the implementation should be modifiables. This would allow more flexibility to the system and allow hacking (read ''ability to create unpredictible mess'').

Here are the minimal slots that every object should have:

In the context of a class, there could be:

It has not been clearly defined how to access to these slots. Obviously, I'll use the dot as a selector:

	var a = 5
print a.type print a.read_only print a.value print a

The value slot should be implicit.

One shortcomming is that 5 in itself is an instance of an object but I will not allow right away to write 5.type, will I? Same goes for "string".type. Well see later. I see no reason for that except to please those who like obscure languages.

Now what is a type?

I could make it an intrisinc type, such as integer. I could use a string. It could also be something that cannot be accessed by itself.

Rather I see a collection of types:

	hint.types.integer
	hint.types.string
	hint.types.function

Each object is an instance and is read-only. The implementation will have references to them, or there could be a magic slot inside (hidden from hint, visible only in the implementation) that indicates their intrisinc type (i.e. an enum).

Now consider this for checking the type:

	var a = 5
	if (a == hint.type.integer) ...

And then how to obfuscate your code in 3 lines:

	var t = hint.types.integer
	hint.types.integer = hint.types.string
	hint.types.string = t

LOL :)

Variables

Finally, let's examine how to create variables.

On an auxiliary note, the :affect: action lets change a slot. By default it will automatically create a slot. Since slots are basically like variables, it should not be automatic to create them (for the same reason that var should be used to create local variables, that is to prevent obvious minor semantic bugs when writing programs because typos are common human errors so it would be foolish to expect someone to write without typos).

The IO language uses the operators = for affectation and := for creation (or the reverse). Here that is not really necessary, although it would be just fine. An alternate keyword such as var could be used.

By the way var is supposed to create a slot in the current scope. Yet there's no reason it be different to create a slot in the current scope than in another object's scope (it can be argued -- and I will later -- that a scope is an object).

So one way to solve this would be to write:

	var a 		=> local variable in the current scope
	var this.a	=> variable in the scope of this
	var obj.a	=> variable in the scope of another object

It's not entirely satisfactory. More later.

Scopes

I mentionned scopes before.

A scope is a portion of code that is embedded in another one, generally with boundaries established by a structuring verb, such as class, module, if-then-else, do-while, etc. It provides a local context for holding temporary variables and implicitely inherits variables from all parent scopes.

In this sense, a scope is not too different from an object with slots. An object also inherits from its parent.

Now if a scope is an object, can it be affected to something?
That means, could I write something like this:

	var a = if some-test then some-code else some-other-code end

Technically I guess I could. One of the ideas expressed (if I haven't forgotten to do so) earlier is that any construct such as this if-then-else is actually syntactic sugar for writing an expression, so technically this should be the same code:

	var a = hint.if-then-else(some-test, some-code, some-other-code);

But what does this do?

Well, it's simple. Like a function is actually an operator that creates a new reference on an object that will let one execute the function later, this is a reference on the if-then-else code for later use.

In a way, it's like an anonymous function without arguments. It's a macro :-)


7. Beyond Minimal

«»  2004/05/06  «»

Let's go even beyond the minimal specifications presented above:

So from a language point of view, what do I need to make this work?

There are some missing pieces here, of course mostly for simplication:


8. What is Hint?

«»  2004/05/29  «»

What is the purpose of Hint?

«»  2004/05/31  «»

Here's another approach:

«»  2004/06/07  «»

What is a lexer? Something that examine inputs and finds two kind of expressions:

Perl-like regular expressions would be just fine. To start I don't need all that complexity but soon enough it would grow similar. Note that in .Net System.Text.Regexp does not exists in the .Net Compact Framework. Anyway relying on one particular host implementation of regexp handling is dangerous, as I want Hint to be host-agnostic as much as possible.

Strategies for a regexp engine?

IL here means "intermediary language" and could be anything. It was just a fancy way of saying that the first option is to interpret a tree on the fly with a bunch of case statement whilst the alternative is to generate code on the fly and have it executed.

The obvious realization here is that Hint is a language that is compiled into some kind of pseudo code and interpreted by a VM. Thus the regexp engine could simply generate whatever IL code this VM is using and have the Hint VM (HVM) execute it.

Vocabulary: HIL is the Hint Intermediary Language and HVM is the Hint VM.

So we would have a workflow like that:

It doesn't mean the lexer or the compiler needs to be written in Hint (although it could, but that would be a completly different excercise).

It means the host implementation could be something like that:

One of the idea of Hint is that lexer syntax can be redefined when parsing the source. That means the bootstrap sequence above shouldn't be too much different from the compiler code that extracts the lexer defintions from the token stream.

Seems like a lot of work, don't ask me for implementation details yet.


9. Still Searching

«»  2004/06/21  «»

I'm still examining "alternative" languages that would influence the design of Hint.

Currently, I'm having a look at Lisp and Scheme.

Some may argue that Lisp is the ultimate high level language.

I wouldn't go that far. Although I agree that Lisp a very elegant design, IMHO it is somehow the assembler of high-level languages.

In the same way assembly language are bare-to-the-metal, as close to the CPU as one can be, Lisp is as close to the computer's view of a programming language as one can be. Imho programming by forming nested lists of instructions means actually coding in what is easy for a computer to interpret, not coding in something that is easy for a human to interpret.

Sure Lisp is powerful. But it is also a very arcane and mostly write-only language. Sure it has a "unified" list syntax, yet functions and predicates look like random keywords lost in a soup of confusing parenthesis.

So no but no thanks. Hint is not Lisp. There's actually no need to reinvent Lisp. Just leave it where it is used best, in dusty AI labs.

This also applied to Scheme.

I personally fail to see the difference between Lisp and Scheme. The syntax is the same as far as I can tell. Scheme's lambda function seem to me just like a fancy name for anonymous functions. Maybe they match some reality in the lambda calculus theory, but from my procedural programmer's point of view these are just... functions. I understand the need and the point of accepting functions as arguments and I understand the need to return functions yet these do not just blow my mind away. (On the other hand, Haskell lazy evaluation did blow me away completly.)

BTW I still don't understand the big deal with closures. Seems to me just like the ability to return a function to let the caller execute some code with a minimal effort.

I'm not saying that closures or lambda functions are lame; at the very contrary, I find them very useful. But maybe I just got used to it. I find anonymous functions and the ability to manipulate functions as objects simply a natural and necessary evolution of high-level languages, something I'm glad is finally becoming mainstream in a world so dominated by the ugly C++ language. Maybe it's just because I'm aware there's better than C++ out there (when most of the people around me still only talk Java or C++ and are just content with it.)

So what's the point here? Lisp is cool. Yet the syntax sucks big time.

Then I realized what I want to do with Hint is not so different from S-Expressions (the fancy name for saying lists contain processing instructions and data.)

Consider the idea that everything is a function call:

	hint.some-function(arg1, arg2)

This is the same as writing:

	(hint.some-function arg1 arg2)
or
	(hint.some-function (arg1 arg2))

Consider this:

	a = 1 + 2
	hint.set(a, hint.add(1, 2))
	(hint.set a (hint.add 1 2))

One cool thing is that I could almost write Hint as a meta-language that generates Lisp code. No kidding.

The lexer/remap in Hint should be eventually powerful enough that it would allow any of the syntax above to be specified using the remap feature. That means it's more than just defining the lexical stream, it's also about providing semantic and processing for it. Sort like writing a Lisp macro?

Here's a pot-pourri of ideas and issues:


«»  2004/06/26  «»

Some misc ideas related to lazy evaluation and program workflow.

Workflow and groups:

	func a - n - return 2 end 		=> 1 group
	func a - n - (return 2) end		=> 1 group containing one group "return 2"

Lazy evaluation:

Other points extracted from looking at List and Scheme recently:

	var n = 2
	func - n - return n+2 end

	func f - a - a = a*2 end
	var l = new list(1 2 3)
	l.foreach(f)

I said foreach takes a program as argument, not just a function, so I want to be able to write this:

	var l = new list(1 2 3)
	l.foreach( func - a - a = a*2 end )

This was obviously a lambda function. Obvious. Now consider this:

	var l = new list(1 2 3)
	l.foreach(a 2)

Obviously foreach gives one argument to the function. So the question: is this function a going to execute as a 2 x or a x 2? Logically the former, hmm?

Note that these little examples solve (or at least expand on) several issues I had in the past:

	var l = new list(1 2); a + 2;
	var l = new list 1 2 ; a + 2;
	var l = new list 1 2   a + 2;

Several questions here:

Of course I have no idea at all how to detect errors statically when both the syntax can be redefined and the notion of stream make the program free form.

Note: I realized that wasn't stated above although it was kind of implied in my mind that a function needs to access its arguments as a list (like JavaScript has a ".args" slot or something.) The difference is that it's not just the argument as much as it is an access to the caller's stream. So yes explicit grouping here makes a difference. That is if a function needs arguments and there's no group, it gets the current stream. On the other hand if it requires or read arguments and there's a group, it is limited to the group. That allows the programmer to have a bit of control on what happens. The arguments reader may need to be a specific object that remember how far the function has been reading.

From a security perspective, the fact that a function could access its parent scope will scare a lot of people.


«»  2004/06/28  «»

OK so although there's a lot of details left, I have the feeling the language semantics and behavior is going somewhere.

Now let's consider the other side.

As I said earlier the language is just a medium. What's important is the framework.

Here let's assume I want to implement this:

«»  2004/07/07 «» continued  «»

Now imagine I have a Win32+MFC implementation vs a .Net+Windows.Forms implementation. That is because of course I will want to be able to write UIs so I'll provide a .Net Windows.Forms binding. At some point I have something that creates a window and I need to set a callback...

How does the Hint code gets called from the host's implementation?

Assume Hint is multithreaded. Meaning the VM executes several Hint virtual or real threads in parallel. In this case if there's an asynchronous call in the host, it should call a C# callback that will defer treatment to the appropriate Hint routine.

If Hint's VM is not multithreaded, it is assumed the host would need to regain control before the callback calls Hint again. It's "just" a nested call.

Sort of. The details are fuzzy.


«»  2004/07/07  «»

I don't need to implement Hint on top of PHP if I want to use it in a web server. All web servers can call external CGI apps (or ISAPI Dlls for IIS) and get the request info from the environment. The only limitation of doing so is that PHP does a lot of things that would need to be reimplemented (session management, query & html-entities decoding, sending HTTP headers, etc.)

A first implementation should be in C/C++ with a rustic garbage collector. Eventually the Boehm garbage collector should be plugged in. So the ideal situation would be to have a layer that abstracts the garbage collector and let the implementation use either Boehm or nothing. Nothing meaning that stuff gets allocated but never freed -- this may seem stupid but in the context of a web server CGI call with a short lifetime it may make a lot of sense.

A first implementation should be both in C# and C++ actually.

One way to do this is to write some simplified C# or C++ code (take the intersection of both) and have a quick rewrite utility that will dump valid C# or C++. This can be done at a class level, leaving the environment/setup/framework to be handled by conditional code and/or an adaptation layer (for example mimic System.Collections.Array by writing a simple class using the STL.) Limitations are numerous when doing so (C# delegate, fancy pointers to C++ member functions, references vs pointers, etc.) but it means most of simple code of a class can be dealt with. The truth is that in most C++ projects of descent size I've worked with the underlying architecture was hidden by an adaptation layer (or system kit, as you may want to call it.) So overall it won't make a big difference.

Now maybe that common code could simply have a Hint syntax... It would just need to be non-prototype-based and non-functional.


«»  2004/07/19  «»

Looking at boo, there is a couple of amusing things there.

Most notably the fact that some of the things they do as built in such as regexp could actually be done by add in code.

Also they invert the symbols for list and arrays according to my taste. This means the idea of hint being able to define the syntax on the fly is really a good one.

There are two cases though.

First we should be able to simply change an existing lexem of an existing grammar rule. For example I like having brackets for arrays and someone else prefers parentheses.

Second a module could create language extensions by adding its own grammar rules. This case is more tricky to design. It would involve defining at grammar rule (for example a regexp) and an associated function that will actually generate the code, which is possible if the program can access its own code as a list.

Good. What is the difference with Lisp?


«»  2004/07/22  «»

Concepts:

The idea of the bootstrap is pretty simple: a minimal set of lexical, syntax and VM rules that define the core of the language; using these simple rules, a more complex language can be defined at runtime instead of in the host or the VM. To be useful the bootstrap should allow more rules to be defined and existing ones to be redefined or deleted.

The IRE is a mix between an IDE and a runtime environment; The editor should tokenize on the fly and internally store the program directly into its final form (for the VM to execute).

The editor should allow incomplete or incorrect programs to be stored; these should be exported / saved in a textual form that could be edited manually.

The IRE must be a runtime environment closely coupled with the VM and as such display realtime debug information and allow the equivalent of edit and continue. Another useful feature is preserving the state of the VM (for backup and for start/stop, => aka persistence at the VM level.)


«»  2004/07/23  «»

Trying to define the bootstrap... what is really the minimal of the minimal, without getting to the level of BF (brainf..k) nor the one-instruction virtual machine (substract and jump if negative)?

For example: are there types? To be minimal, they shouldn't be explicit. The VM can convert on the fly on what is needed and specific functions can be done to force convertions.

The single basic structure needed is a list (car + cdr) as mentionned earlier.

So where amd how are the modules, classes, functions and variables defined? Should the VM know about them? Or consider the program like a real CPU does, that is a soup of instructions located at a numeral address and with offset jumps?

Obviously the VM can have more structure than that. But on the other hand, it doesn't need to much about for example modules vs. classes nor functions. It just needs a notion of scope. A scope is block, it contains other inner blocks. The block may need specifiers, as are inner blocks by default visible to the outside, etc.

Blocks need a type. For example the whole VM is a block that will contain module blocks and these modules blocks will contain class blocks which will contain one variable block and several function blocks. From the VM point of view, it may not matter that the block is module or a class. It just matters that some specifiers tell it how to manage the accessor scope.

From that point of view, a block is a data type.

So the VM has types. It should define a "block" structure, which is actually just a list and a semantic for its interpretation.

Example:

So every slot would have these attributes. Are they visible by default? Is an attribute a slot that also has attributes, thus creating an infinite loop when automated reflexion is used? Ideally yes but in reality probably no as it wouldn't be very practical.

Another view:

module Something
	class Foo1
		var Var1
		function Fun1
		function Fun2
	class Foo2
		etc
module AnotherThing
	etc

But it you look more closely you actually get:

slot(name=Something, attr(type=module, readonly, public))
	slot(name=Foo1, attr(type=class, readonly, public))
		slot(name=Var1, attr(type=var, private))
		slot(name=Fun1, attr(type=function, public))
		slot(name=Fun2, attr(type=function, public))
	slot(name=Foo2, attr(type=class, readonly, public))
slot(name=AnotherThing, attr(type=module, readonly, public))

And at runtime in Fun1 you get:

slot(name=this, attr(type=list, readonly, public))
	slot(name=ParentFunction, attr(type=function, public))
	slot(name=ParentClass, attr(type=class, public))
	slot(name=ParentModule, attr(type=module, public))

Here there's an XML correspondance: a slot is an XML node and an attribute applies to this node. That means attributes are not sub-slots per se. So there's no longuer only one data type, there's at least two.


«»  2004/07/24  «»

What I called an "IRE" is just a fancy name for an IDE closely coupled with a runtime environment -- which I generally call a "VM" for short, although it's probably more than just a VM since by runtime I also mean the integration with the host (think access to the lib C or .Net's System.)

So what is exactly the VM? Several things:

Note that this notion is close to the self hosted environment which are best represented by Squeak and Self, with their own window manager, their own "multimedia" space (especially for Squeak) and their own object browser & editor as I mentionned before.

Now the IDE just sits on top of that. It's just a way of visualizing the environment and controlling it.

Ideally the IDE should really be on top of that, i.e. with a clear separation between the runtime and the IDE.

This separation would come handy to allow the IDE to control several VMs or to control remote VMs. Think socket connection on another machine.

Ideally the whole VM state should be able to be saved in two or three ways:

This last one would allow a current state to be transfered between two VMs. Imagine you have two VMs with the same source (some kind of hash can and should guarantee that), then you could "duplicate" the state from one VM and sent it to the other (which may be renote.) Brings back to the notion of agents ;-)

Ideally this notion have a smaller granularity, such as per module, or per instance of a class. It would assume the memory knows all the data that a class needs to run (either implicitely thru the garbage collector's data or explicitely thru an API used by the class instance.) Then the instance could archive and send itself to a remote VM. This is yet another way to express persistence too. Many issues would probably arise from trying to implement it naively just like that, but the idea is more than cool, it has the potential to be actually useful.


«»  2004/07/26  «»

Here I've been looking for something for a while yet I could not really pinpoint it out. I finally remember/find out what it is: a programming language is only a given represention of a given algorithm. So ideally the language should not matter, right?

One of my original goals was to create a language that could produce PHP, C# or Java output. I quickly found this is possible yet not optimal. For example PHP lacks all the framework available in C# and Java whilst those lack the simple stdlib-C like calls of PHP. The type system is different too. Using only the common denominator would be utterly annoying.

Yet one can dream that the algorithm described by the data processed by a VM is high level enough that it would be possible to reformat it to source code without loosing any expressive power from the original source.

Basically, if I write some source code and I have it "encoded" (compiled?) into a VM suitable representation, can I expect to get back the same source code expressed for any given language?

Obviously no, for two reasons:

Taken to the extreme, i.e. converting the assembly language, compilation means all human-like identifiers names are removed. Even in the context of a very high-level VM data source that could keep the identifiers names (and actually needs them, for example CLR), the original source code is more than the mere algorithm. It also contains comments. It also contains a layout, which has a human meaning (althouh something as indent could recreate the necessary layout.)

Encoding a source file into a binary representation is not a bijective function.

Unless the source file is stricly similar to the binary representation, but then it can not be language agnostic.


«»  2004/08/23  «»

Initially the main goal of Hint was just to think of a programming language which textual syntax would be friendly to a PDA -- that is could be easily typed with a stylus using as less esoteric symbols as possible.

This idea quickly got combined with my desire to understand how to implement a language similar to Io, that is a prototype-based language with its own VM.

I added to that goal my desire to find something that would let me write code that could work as both C# or Java.

There are several problems here:

Finally I tend to realize the problem is not in the language per se but in the environment. The language is just one component of a grander scheme.

So first let's examine my real goal: what I really wanted was a way to code on a Palm Pilot Pro with 1 MB of RAM. That goal is no more as in between that device got upgraded to a Pocket PC with 64 MB of RAM and a much more usable processor. Part of the problem here is "what for" and there's no actual answer to that besides "why not". That mean what I want is a truly generic language, with a powerful IDE that would let me do RAD -- Rapid Application Development -- with some kind of UI editor too. No less. The original goal came from the realization that most text languages are nearly impossible or at least extremely frustrating to write with a stylus due to the high number of symbols present, starting with braces and brackets.

Let's also examine one of the goal, or more precisely anti-goals: I do not want to reinvent a procedural language just for the sake of it. I already know too many of them and I do not want to have to learn one more syntax for my own projects. Nor do I want to create something that I would not use. And if I use it, it would need to last, as I wouldn't like to write projects in a soon-to-be obsolete language/VM system.

In between I also got very frustrated with PHP class syntax and IDEs, and when trying to understand why I also found out why most prototype-based languages are an issue for me: the lack of strong typing. This has two influences. First when reading some code, these language not having explicit typing mean that I do not know the type of a variable thus I can only infer its meaning from its sole name. There is some ambiguity here which is part of the language's power -- the type of a variable is determined at runtime and thus can be adapted as one see fits. Although that is nice, it does mean that this ambiguity is present in the source code when written. It also means it is not possible to create an IDE that will determine the type of a variable and offer adequate completion directly in the editor. Unfortunately I realized recently that this is a great feature that both speeds up development by help me, the programmer, and reduces errors but detecting them as early as one types some code.

So on one hand I like the generality and the suppleness of prototype-based languages and in the other hand I hate it for the same reason -- it offers to much power and it is difficult to harness this power. That is of course because I acknowledge I make errors and any mechanism that helps me eliminate these errors as soon as possible is good for me. Yet having to specify explicitly the type of each variable in a function's declaration is annoying. Too much typing.

An interesting intermediate level is offered by ML/Haskell and the concept of inferred type system: a variable's type is not explicitly listed in a declaration but inferred from the callers to that function. For a self-hosted program (i.e. one that does not use external libraries), that means an intelligent IDE could try to detect the various types a variable could get and offer completion and verification for those types. It would also promote a top-down approach of coding. Verification could be problematic as sometimes a variable's type may not be possible to infer. Especially if the system allows for generic libraries to be developed it would be hard to guess the type of the future calls made to the library. In this regard it may be useful to give hints about what types a variable should be able to get, either limited and fixed or just the ones most likely to happen. This in turn may result in a clumsy and verbose syntax.

Now my real goal here is to define a RAD environment. Something that allows me to toy with an idea by writing as less code as possible, automating most of it has necessary. The underlying language needs not be a custom one. The system needs not be based on its own VM. Having a specific VM or a custom language introduces a major workload that can only be justified if these component add a real edge to the whole system. Otherwise a RAD environment could be as well custom crafter to generate C# or Java code, relying on their associated frameworks and, why not, simply be written as extensions to their associated IDEs such as Visual or Eclipse, both highly extensible.

The thing is currently I feel like I spend more time writing code for the sake of the language's syntax than for my own algorithms. For each new project I need to rewrite the same base framework. For each class I need to rewrite the same empty class file with constructor and so forth. For each method I need to write the same few lines. Each time the code is similar yet a tiny bit different. This part would be easily solved using templates and wizards that create a new application, add a new class, add a new method, etc. Now on top of that the ideal IDE could list all these classes and methods and show their interdependencies in a textual or graphical way. This too can be easily automated by IDE extensions and static analysis of the given source code; the obviously difficult parts here are parsing the source code in a meaningful way, finding the relationships and especially finding an attractive and intuitive them to display them graphically. These are examples where having a custom language and a custom VM adds nothing.


«»  2004091  «»

This is temporarily (a.k.a. ever) on hold.


«»  2004/10/27  «»

Two different projects:

MiniHint is a first prototype based on a simplified syntax and feature set.

Typical source for MiniHint:

module alfray.hint.test
// IDE most have knowledge af class hierarchy and set module accordingly. 
// One instruction per line. Auto wrap in IDE. Auto tab.

class sample 
  var a = 5
  var b = 3.14 
  var c = (1,2,3) 
  var d = ( x=5, y=6, z=(1,2), f=fun x - return x+1  end, d=a+-b+c) 
  var o= new sample 
  var p= new sample ( x=-5, y=6)  
  var q= new sample (5,6)

// fun is anonymous, function is named. 
// all blocks start with a keyword and end with "end" 
    function sample - x y -
      a =x 
      b=y
    end 

    function f1 - x y z - 
      x= a+b+ y-12 
      // c.foreach fun t - x+=t end 
      c.foreach t - x+=t end 
      if x<y+z 
        x /= 2 
      end 
      // if x > 2 - x+=2  >>No! 
    end 
end // class 


«»  2004/11/19  «»

Ideas (for Hint in general, probably not for MiniHint):


«»  2005/05/06 «» Big No-Way  «»

Here's a summary of what I got so far in Hint -- I'm not quite sure it was explicited above, I'm too lazy to reread all this, but that's what I wanted to express at least:

Let's assume we do all this. What do we get? Something like this:

I'm not even trying to be exhaustive here. At least I know where I do not want to go today.


«»  2005/05/17 «» From Scratch Again  «»

This is a literal transcript of some scratch notes -- literally random notes on a piece of paper -- to see how I can rethink the whole idea of Hint from scratch.

Don't worry if it makes no sense to you. Scratch notes on a piece of paper translate poorly into well written notes. These notes are more conceptual that anything else, they synthesize many unspoken thoughts. And although this page is linear, thinking is not a linear process -- idea gets created, written down then rejected then finally transformed into the next idea. The intermediary processes are only partly relevant.

So here it starts:


[ Apache ] --> { PHP instances } -->(socket)--> { VM } --> [ DB ]

A VM containing several processes in parallel accesses one instance of a DB (will require locking). The VM doesn't plug directly in Apache. Instead it is accessed by a PHP script (typically over a socket). There's only one VM but many PHP instances (as many Apache requests) talk to several processes inside the VM.


[ IDE ] --> [ Hint VM      ] --> [ agent/entity group = 1 thread ]
            [ .Net/Java VM ]     [ { objects }                   ]

The Hint VM is embedded in the host VM (.Net or Java). An IDE is needed to access Hint (not just a text editor, a complete interactive IDE that allows one to create/edit stuff.) The VM manipulates "agents" or "entities", each with its own thread. These are composed of a collection of objects.

Terminology: The word "agents" or "entities" means nothing specific here. They should not be attached any prior semantic. The idea was to avoid using terms such as "class" or "module" which semantic is too strong.


[ agents ] <--+
    ^         | message dispatch
    |         |
    +---------+

Agents are entities that communicate solely by dispatching messages to other agents (or themselves). Messages can be dispatched to agents on the same host or on transparently distributed hosts (some kind of proxy may be needed.)

Note:

The important concept here was that groups of classes cooperating together are... grouped. This group is called either a module, or an agent or an entity. The name is irrelevant. What is relevant is that this agent is defined by its classes. When it is instantiated, the instance runs into its own thread (implied: with its own event loop) and can only communicate with other agents using asynchronous message dispatching.

Agents' instances reference each other.

The granularity of the agents is up to the designer but it should be pretty low. In this idea for example an implementation of the MVC (Model-View-Controller) pattern would use three different agents for each component. Each agent would communicate asynchronously with the other ones without knowing if they are located in the same host VM (or even on the same machine.)

The other part of the concept is that an agent's instance could migrate between hosts without breaking references to or from this agent. And it's really the instance that migrates. Persistence is implied.

Obviously this has a lot of implications related to speed and bandwidth, etc.


Hint/.Net ==> Produce CLR. Possible on CF/PPC?

Hint: PDA != desktop, i.e. limited vs. full ? Requires different libs.


A program (a.k.a. what is traditionally called "source code" and includes resources other than just pure source text files) should actually be considered and managed as a database: sources, classes definitions, instances of these classes and persistence. A live editor (IDE) would allow introspection and edition of these resources.


Purpose of Hint:


Test implementation:

Layout/scope:

Module = 1 thread

Program:

DB: Editor --> live queries

DB <--> Source: source = db dump. 2 Formats:

Example:

<module ...>
  <method>
    <name>foo</name>
    <args>
      <arg>
        <type>stream</type>
        <name>foo1</name>
      </arg>
      <body>
        <var>...
...
Or in the other syntax:
method foo(stream<int> foo1, int foo2): stream<blah>
  var a = foo1.get
  var b = blah.new(a)
  out.put b

Once this is imported/edited, types can be infered as follow:

method foo(stream<int> foo1, int foo2): stream<blah>
  int  a = foo1.get
  blah b = blah.new(a)
  out.put b

Notes on the syntax:

Notes on the language:

Notes on the streams:

One of the major ideas is to promote a stream workflow as a language concept.
Methods accept blocking streams as their input and get or put values to/from these streams when needed.
Also a backward-event model could be defined where a method which outputs a stream is notified when its output is consumed.
This is linked with the idea of asynchronous distributed agents and lazy/late evaluation.
A typical simplified example would be an function that emits a stream of random numbers. The method could say it wants to block till its output is used. The caller of the method tries to get a value from the stream, which unblocks the method, resulting in the computation of a value.

A pseudo code for this example could be:

method rand(void): stream<int>
  var f = fopen "/dev/random"
  while out.open
    try
      out.block
      int a = fread f
      out.put a
    catch ex
      pass
  fclose f

then this could be used as follows:

method do_something(void)
  stream r = rand()
  for i in [0..9]
    print r.get

In this pseudo-code example, one would assume the affectation of rand() associates the output stream to the variable r. When this variable gets out of scope the stream is closed, which makes the "block" method throw an exception.


Flow control could use semantics from the shell:

So we can have something like this:

job j = obj.method &
job j = obj.method.start(x, y)
j.abort
j.wait
j.wait(timeout)
j.sync(j2)

If method1 outputs a stream and method2 has a stream for its first input:

method1|method2 would link the first input of method2 to the output of
method1.


«»  2005/05/21 «» Bootstrap  «»

The language definition can be defined in terms of a bootstrap:

The bootstrap would include basic types and a minimal set of language constructs or objects/methods. An example would be the "if" construct.

The bootstrap methods should be defined in terms of the language itself basically to make sure they can be replaced if necessary.

Example of declarative syntax:

hint.if expr e1 [else expr e2]
hint.while expr e1

This is enough to be able to actually implement the next language construct:

hint.for expr init; expr test; expr end expr body
  init.eval
  while test
    body.eval
    end.eval

Post-note: Amusingly, that if definition is flawed. This one would work better:

hint.if expr e1 expr body1 [else expr body2]

Base types:


«»  2005/05/25  «»

Scope:

Attributes are internal value associated with a definition or an instance. The user cannot create new ones. Some may be read-only or read-write.

Defining a module or a class will append to an existing definition (thus allowing partial definitions and/or overriding) unless the module or class has an attribute ".sealed = true" (and only an instance of a module/class can change its own attribute.)

Two main ideas:

This means that affectation creates a reference unless explicitly noted otherwise.

The syntax should reflect that everything is a function and be uniform:

method [ parameters ] [ new line ]

The syntax should be minimal yet complete:

The other main idea: "expr" is a base type. It represents a piece of code. Any method definition can take an "expr" as argument. An expression is what is generally referred to as an expression in grammars, that is everything from a numeral to a function call, a complex expression or a block of statements. In other words it's one or more statements, i.e. a bunch of statements having the same scope and returning the result of the last evaluated item.

A special attribute ".syntax" will define the syntax of parsable items.

Users can create new constructs simply by declaring them and their syntax on the fly. Nodes with a ".syntax" attribute are added to a special list of parsable items with the following conditions:

Keywords or methods defined in the bootstrap can still be overridden. A user simply needs to save a reference to the original one and create a new definition. Imagine for example you want to change the keyword "if"; the following code may be an example of how to do so:

hint.if_old = hint.if
def hint.if expr expr_test, expr expr_body, expr expr_else = nil
  .syntax = "if /expr_test /expr_body [else /expr_else]"
  if_old expr_test
    expr_body
  else
    expr_else

The syntax should be able to express more complicated structures such as arrays:

.syntax = "\[ expr [ \: expr ] { \, expr [ \: expr ] } \]"
Where you can see that the desired syntax is PHP-like as an array can contain values or dictionary-like entries (this syntax needs some more work though.)

Finally expressions are not evaluated until absolute necessary. In the example of the redefinition of "if" above, the various expressions are not evaluated when received in the new method. Instead they are passed by reference to the old method, which may decide to evaluate them or not. This is why a definition of a "for" construct may look like this:

def hint.for expr init, expr test, expr end, expr body
  .syntax = "for /init \; /test \; /end /body"
  init.eval
  while test
    body.eval
    end.eval

If would be more elegant to make it a shortcut to use "()" instead of a direct call to the inner method "eval". It would only be a shortcut though:

def hint.for expr init, expr test, expr end, expr body
  .syntax = "for /init \; /test \; /end /body"
  init()
  while test
    body()
    end()

Note that "test" is not evaluated directly. It is transferred to the "while" construct which will be solely responsible for evaluating it as needed.

Finally it was said earlier that the syntax should have one instruction per line and no more. The definition of the attribute ".syntax" should probably contain explicit end-of-line markers. For example in the syntax for "if", there must be a line break before "else". Yet in the "for" syntax definition there must or may not be a line break before the semi-colons.

Actually it may be useful to differentiate single-statements from block-statements. This is enough to know where line breaks are needed. For example let's assume "/foo" means a single-statement is expected where "@foo" means a block-statement is expected. Thus the following definitions become less ambiguous:

def hint.if expr test, expr body, expr else = nil
  .syntax = "if /test @body [else @else] end"
  ...
def hint.for expr init, expr test, expr end, expr body
  .syntax = "for /init \; /test \; /end do @body end"
  ...

These would force the accepted syntax to be:

if something
  blah
else
  blah
end
and
for i=0; i<5; i++ do
  blah
end
The idea is that the original syntax is one thing and then the user can barbarize it has much as she wants for example by adding a required yet useless "end" keyword to end up a construct.


«»  2005/06/01 «» /proc  «»

Looking at /proc last time made me wished I could access a VM's content in pretty much the same manner -- either via a regular shell integration or via a custom shell. The idea needs to be developed but the basic points are pretty obvious:

	/hintvm/vm1/...
	/hintvm/vm2/...
	/hintvm/vm1/types/foo/bar
	/hintvm/vm1/mem/foo/bar/instance01
	/hintvm/vm1/mem/ptr/0x045451234

All the VM's runtime and static information can be presented this way. Obviously the exact hierarchy would depend on the semantic of the language. For example if all types are first-class citizens and have self-introspection (aka "mytype.get_class" etc.) there's no need to differentiate between classes, structs, enum or whatever, unless the language does make a point of separating them (i.e. C++ vs dynamic languages like hint.) / In the case of Hint, the global namespace (which is dynamic) is a unified representation of all types.

The runtime part, i.e. instances of classes and such, can also be presented, maybe organized by type and instance's id or by pointers if the language supports them.

For debugging purposes, one should be able to:

Permissions would be required for that.

There are important consequences:


«»  2005/07/20 «» IDE  «»

Ideally Hint should always be used from within an IDE that will provide direct tokenization, that is what is being edited is not a source file that is then compiled but directly the token stream.

Obviously, the IDE should provide a way to "export" to and "import" from a text file. The parser to import should be similar to the one used to edit text in the IDE. Parsing external source files into the IDE should be a common operation. For example it could be argued that the best way to create projects in such an IDE is to store the source in text files.

The IDE should also provide a direct access to a runtime instance (aka a VM state), and possibly allow one to have several such VM instances lying around. If a VM is a separate process (and I believe it should), the IDE should merely be able to "connect" to a given VM.

The IDE should have a notion of offline versus online projects. A better terminology would be disconnected versus connected.

In a connected project, the IDE is connected to a given VM. The "source" code is actually stored in the VM and is part of what is executed. Modifying the code of a given class should have a direct influence on the VM. In such a connected mode, modifications would not take place instantly. Instead after some kind of validation of the code all modifications could be committed into the VM -- think of it as a CVS commit operation (or more like a Perforce one: the operation should be atomic.) If we go this way, we might have well as change numbers and the possibility to go back to previous revisions. It should also be possible to leave current edits in a "pending" mode where they are saved in the VM but not used yet.

In a disconnected project, the IDE is a traditional IDE that edits source code simply stored in text files. Note that in this case the notion of commit exists to -- it's merely the act of saving a file.

Note that the connected mode will obviously have its limits. For example, changing the structure of data types may result in a loss of data. An obvious question is what will happen to existing instances of data types which have been modified -- for example a class may be deleted, or a member of a class may be added or deleted or its type changed. My point of view is that the connected mode is not suitable for such drastic operations. Doing so may render existing instances unusable. When code edits are synchronized back into the VM, an atomic merge should take place that will remove all objects of classes that no longer exist. The process is a "reduction" in the sense that existing instances are reduced to match the new definitions. If some class members are changed, deleted members will go away and new members will be either uninitialized or ideally initialized to default values and it is up to the existing or new code to be able to deal with it. New instances may also be created if some new static objects were declared.

So really this connected mode is merely intended to be used for two scenarios:


«»  2005/08/29 «» C/C++ vs .Net/Java  «»

Once again I swing back and forth. Since this is all just talk right now and there's no implementation, it doesn't really matter.

My latest point of view is that if Hint were to exist, the core engine would be better written in C/C++ rather than something higher level such as .Net and Java.

In fact that was never even a question: performance wise, it would be really really stupid to write a VM using an interpreted language that in itself runs in a VM. If performance is what I want, C/C++ is the way to go, period.

The reason for going towards Java or .Net/C# was merely for the fun of coding. Coding is C/C++ is not fun. That's what I do at work and at home I like to do something different.

Oh well, we'll see if and when it all actually happens anyway.


«»  2006/02/09 «» Syntax  «»

I just found out recently that Scheme as a powerful way of expanding its syntax.

The bootstrap needs at least var, set, if, def and while. The rest can be implemented using these basics.

Example:

 def hint.for expr init, expr test, expr end, expr body
   .keywords = (for in)
   .syntax = "for /init \; /test \; /end /body"
   init()
   while test
     body()
     end()


«»  2006/02/24 «» Top-down overview  «»

Let's start bottom-up:

Coco/R might just do the job.

We need a VM. It will be hand crafted and be mostly register based rather than stack based (actually both.) It should from the start incorporate persistance, introspection and a shell. Obviously the shell will have a Hint syntax. The shell is a command-line interpreter.

Later we'll want a JIT in the VM.

Then we need the environment. The shell can give enough introspection on the VM at first, yet eventually we'll want a GUI on top of that. Since no GUI is to be built into a VM, it makes sense for a running VM to allow outbound connections so we need a protocol for that. The GUI is actually a separate application that connects to the VM. Since the GUI is going to be written in Hint, it should be able to connect to itself. The protocol should be human readable (text, not XML.)

Now we can define the language, or more exactly the bootstrap language which will be embedded in the interpreter.

On top of that we can define the rest of language using itself.

We need some basic elements in the bootstrap language. Before that we need to define the complete set of the language and then reduce this to the minimum.

Let's start the language philosophy: what makes the language special?


«»  2006/04/03 «» Duck  «»

I was totally wrong in the previous entry: "duck" typing as nothing to do with static/strong/explicit typing or not. The "duck" philosophy is totally the reverse: types do not matter, what matters is what can an object do. Say you need to do some IO, you don't care about the type of the receiver as long as it as a "write" method with the proper arguments you want to use.

This means that "duck typing", or therefore lack of typing, can be emulated by the use of interfaces in strongly-typed languages such as C++ or Java. Just create a bunch of interfaces or pure abstract classes with the desired method and have your classes implement them.

Duck typing is more a philosophy than a language detail. Yet it has some obvious advantages.


«»  2006/04/03 «» VM or not VM?  «»

There are many implementation details that affect the whole scope of the project. One of them is the usage of a VM.

So far I assumed I had to use a VM. Do I really?

There are a couple alternatives:

 This has the obvious disaventage that you need many back-end generators for every
 processor or host to support.
 The intermediary compilation phase adds complexity (especially if the choice of
 compiler is free.) The added benefit is that we can rely on the existing
 optimization of existing compilers (a huge bonus).


«»  2006/04/15 «» Streams  «»

One of the concept explored here is the one of a stream.

Let's assume we have an input file. It is first parsed by a lexer, which will provide a stream of lexems (token, line, column, type.) In a naive compiler, the lexical phase is generally separated from the syntax/semantic phases. However in most compilers, the lexical phase is mostly a state machine that is called as a "co-routine" by the real parser, i.e. the syntax/grammar/semantic combined phase. The parser simply asks for the next lexem whenever it needs one.

So here let's just assume we have a stream of lexems. However the parser has one trick: it doesn't care about lexems, it cares about expressions. Lexems are reorganized in a stream of expressions.

Now we just have to say that the language has only 2 basic rules:

So what we have is a stream of expressions. Each expression is composed of a function name and its parameters. The parameters are expressions. The function name can be an expression too.

In fact we have a recursive nested set of expressions.

To make it more interesting, the parser is configured by a set of interpretation rules that actually define its grammar, and this can be modified dynamically by the language itself.

The grammar & syntax rules are defined by an object, accessible by the program itself. Its state can be saved and restored. This allows for a routine or module or any inner scope to reset the grammar and redefine it.

This means the program is truly a stream of lexems. Whenever something is going to be executed it is interpreted using the current grammar/syntax definition, which may vary over time.

If a given function redefines the syntax and calls another function, it is executed according to the redefined syntax. This can create powerful tricks and at the same time can be totally confusing -- meaning if you have code in a library and any caller can have redefined the syntax, the code may not behave as expected. For this reasons it should be possible for a scope to predefine a grammar/syntax set and have it automatically used whenever any of its inner code is executed. Yet this should be an option, not a requirement.

One of the interesting aspect of manipulating the program as a stream of lexems or expressions, respectively as a list and a tree, is that we can allow the program to also modify this list or tree at runtime.

Another interesting concept is that one can create some kind of closure by associating a context with not only an expression tree but its associated lexem list and thus re-evaluate the given lexems using a different syntax.


«»  2006/04/15 «» Duck  «»

As I said previously my idea of duck typing was all wrong. OK so what did I wanted in the first place?

I want some kind of strong yet flexible typing. Basically I want to be able to specify strong typing in some cases and let my options open some other time. Moreover, provided I use an IDE, I'd like the IDE to be intelligent and be able to tell me on the fly what type of objects will typically be used in the non-typed variables.

For example, image I have something like this:

	def foo - a, string b
		puts b
		dosomething a
	end
	def dosomething - Bar a
		...
	end
	foo Blah.new, "test"
	foo Bar.new, "right!"

In this case all I'm saying is that I will want b to act as a string. I don't really care of it's truly a string, I just expect it to have a compatible interface. Interfaces in strongly-typed languages do a good job at this.

Also I'm saying I don't really care what a is. Maybe because the function will call a generic top-level object method on it, or simply pass it on to another function. Yet here I would like the IDE to be kind enough to show me some meta-info on a, namely that it can be either a Blah or a Bar. However if the IDE should be able to detect that dosomething only wants a Bar type and thus the call foo Blah.new... may be innapriopriate if Blah can't behave as a Bar.

To come back to the notion of interface, it would be nice to also have some kind of automatic interface typing. Say for example above that class Blah has the same methods that Bar yet it is not strictly a Bar (i.e. it doesn't derive from it). Well in this case I can still substitude a Blah for a Bar. This comes back to the notion of duck typing: when specifying an input type, are you requesting a specific behavior or specific interface? Duck typing is all about specifying the inteface and letting the behavior free. Some people will argue that you really want is to specify the behavior, yet in most languages (read: C++, Java) inheritance on non-sealed classes means you can't lock the behavior of a public interface.

Of course most of this is nice on the paper, however good luck trying to indentify potentially called types in a totally dynamic runtime-based language.


«»  2006/04/18 «» Streams  «»

Let's revisit the notion of streams one more time. It's good to know what it would be possible to do, however at some point the reality is that we don't want to end up with something too absurd.

I'm not afraid of having a feature that is far more complex than everyone would ever need and way too dangerous. That's the difference between C and Java. You can shoot yourself in the foot, the language is not the one whICH should prevent you from doing it if that's really what you wanted to do. Nobody really complains about the power of Lisp macros to be able to express mini languages or the ability to hack a vtable at runtime in C++. As long as you do it because you want to and like that kind of stuff, not because you're left with no choice.

So anyway, let's summarise what we could do and see if it's really possible:

OK that last one is more of a problem for speed. It's kind of like calling eval() on every lexem.

It means we can't really have an AST or the like. However it would be stupid to assume that the program is going to redefine itself all the time. Instead we can define boundaries in between which we know the source has not changed and we can have an AST of that.

OTOH it should be clearly defined to what exactly happens when calling functions. When is the definition of a function compiled or at least frozen? Will we ever be able to generate C/asm code?


«»  2006/04/20 «» Streams  «»

Obviously there's a conflict between talking about streams of lexems and execution. Execution does not happen on the output of the lexical phase. There's a second phase right after that converts the lexem stream in a tree of expressions and these expressions are the one being executed.

So what's the purpose of all this?

First, yes a program can redefine parsing rules at any moment, i.e. add new lexical rules, new semantic rules, new syntax rules, new grammar. However not all of these apply at a given moment. Lexical parsing is performed only when a file is loaded. Grammar rules and syntax rules are generally more or less the same and involve parsing the lexem stream into an AST, something that will generally be done only when a file is loaded. Eventually we get an AST that represents the code and the syntax is irrelevant.

There are two intended purposes: dynamic loading of files and evaluation of random code can use a redefined syntax and can be used to sensibly extend the language. Examples are typically adding a foreign syntax such as inlined XML or SQL or defining an HTML templating system. In this case, things are pretty simple, we first need a normal piece of hint code that is going to expand the language with the new syntax and rules and then this bootstrapping code can dynamically load new that will be able to use the new syntax.

Note that the notion of dynamic loading may be subject to interpretation and has a direct impact on the intended purpose. In a totally interpreted language such as Ruby or Python, load/''require or import'' statements are truly dynamic and happen during the execution flow. This makes things simplier. However I'd still have the ability to generate code with hint rather than just have it be VM-based.

Code generation is a larger subject, but suffice to say that rather than generate machine code directly we want to generate C/C++ code and compile it automatically (this approach has its own share of issues, cf later below.) In this case, dynamic loading becomes an issue if the properties of the loader are modified at runtime, simply because there won't have been any running and thus scanning the original source for import statements is worthless. If the runtime was to generate code on the fly, we could still do something ugly such as cache the code and compile when actually encountering an import statement. However that screams for inneficiency and real code would probably end up trying to preload everything at startup to remove any unexpected silent compilation later.

More on this later.

Now let's assume we finally parsed our source code into an AST and we know that's final. Our AST represents expressions, recursively nested. Remember that everything is a function and everything returns a value. That's what we have in our tree here.


«»  2006/04/20 «» Code Generation  «»

Code generation is a larger subject, but suffice to say that rather than generate machine code directly we want to generate C/C++ code and compile it automatically (this approach has its own share of issues, cf later below.)


«»  2006/07/22 «» Haskell Notes  «»

Notes from reading the haskell tutorial:

There are some really good points in Haskell and many I don't care for. Most notably I don't quite like the syntax but that's not too big of an issue (it's not like C++ or Perl syntaxes are pretty either..., there's only so much you can do with ascii. However if you could use an HTML or Unicode display and replace the arrow -> by → and the backslash by a real &lamb; lambda it would be very pretty.)

However from a beginner's point of view, I find it weird that you have this whole pure functionnal thing and then suddently you have this parallel world of IO actions and the do keyword to do sequential actions. It's also weird that you're not supposed to have affectations, this being purely functionnal, yet the semantic of a <- readLine to extract the value of an action reads to me exactly as an affectation...


«»  2006/07/22 «» Re-Specification  «»

def fact - n
  if n < 1
    0
  else
    n * fact n - 1
  end
end
 value, in fact there's only returned expression, which is returned
 as-is, i.e. as a closure and is only evaluated when actually needed.

def Module.f2 - a b c
  2 + a + b + c
end
def Hint.List.sum - a
  foldr Int.plus a
end
def ...parent.foo - a
  [a, a]
end

def fact - Int n - Int ...
def module.f2 - Int a, List b, Foo c - Int ...

 [1, 2, 4, 5, "foo", [1, 2], 6]
 List.new Int 1 2 4
{"a": 42, "b": 3, 4: 4, 5: ("c", foo), 42: [1, 2, 4, 6]}
Hash.new Int String {1: "a", 2: "b"}
[1..3]
list[1:-1]
list[1:]
list[:-1]

List.new Object 1 2 4
[1, 2, 4]

 anywhere and are essentially ignored if they don't resolve any ambiguity.


«»  2006/07/24  «»

The previous specs were interesting, however there are still some questions left unanswered.

Let's make a list of all ideas collected above and go thru them:

Still believe in Santa Claus?


«»  2006/07/28  «»

Generic syntax:

def [Type.]*defname < Super
end
def [Type.]*defname - [vname [,] | Type vname [,]]* [> Type]
end
[Type.]*defname [expr1..N]


«»  2006/09/22  «»

module sne2
class server
  def init ; end
  def bind ; end
  def connect ; end
  def onConnected - connection cnx ; end
end # class
class connection
end
end # moduleA


«»  2006/10/03  «»

syntax default
module player
class stats
  attr_r  string race     # class is of type string, race here is no a keyword
  # attr_r  string class  # we don't want to use class as it would hide an intrinsic attribute
  attr_r  power           # type is not defined, it can be any object
  attr_rw health, energy
end stats
def hint.attr_r - name    # extend the base class
  self.caller.class.dict.add ("var" + name) object.new
  self.caller.class.dict.add (name) def - self.dict.get ("var" + name) end
end
def hint.attr_rw - name
  if self.args.len > 1    # although we declared the function with 1 arg, we expect more
    args.for arg
      hint.attr_rw arg
    end
  else
    hint.attr_r name
    self.caller.class.dict.add (name + "=") def - value - self.dict.set ("var" + name) value end
  end
end
Notes:


«»  2006/10/09  «»

OK this is majorly unsatisfactory. Plus it feels too much like Ruby. Let's see:


«»  2007/02/01  «»

Memory management: Either using a garbage collector (Boehm's) or owned/shared memory pointers. I.e. allocation B is owned by allocation A and if A is destroyed then B does to. Generalized weak pointers?

i.e.:

class A ... end
class B
  attr ptr A a
end
a = A.new
b = B.new
b.a = a  # default to own?
b.a = a.own
b.a = a.weak or a.share
a.free
use(b.a)  # throw weak-ref exception

However to make it useful, whoever owns weak references needs to have a way to re-establish them:

class B
  attr ptr A a
  def init
    _init_internal
    a.onWeakRef { _init_internal }
  end
  def _init_internal
    a = A.new.weak
  end
end
Where new, weak, free and onWeakRef are method of the base Object class. Or maybe the ptr data type (which may be implied and mixed in Object.)

Free owned pointers implies being able to free a cyclic graph.

Streams require a different approach to be more usable.


«»  2007/04/09 «» Reloaded x3?  «»

Some notes from re-re-reading Arc ILC03 and Arc LL1:

Of course most of the Arc stuff is just vaporware for yet-another-Lisp (feel free to prove me wrong.) Ok maybe it really exists somewhere behind closed doors garded by level 60 two-headed monsters and implemented as a dialect of Lisp over another dialect of Lisp so for the common guy out there it's just a lot of hype on top of nothing tangible.

One of the main themes that pops up in about every other paragraphs in the article about Arc is that the language should provide the minimum of guards and that ubber coders don't need guards because they are so good. This ignores experience -- mine is that automatic variable creation is the most annoying feature a language can have (heck, who doesn't make typos?). (let ...) is annoying because of the extra scope, but at least it makese things explicit.

There are a couple of good things in there, such as generalizing strings as lists (but please tell me it's all Unicode!) and the macro stuff. For the rest the LL1 "spec" is mostly a big "let me tell you what's wrong with Common Lisp and how to fix it". Most of the other stuff on the site is comparing Lisp with Java, which makes as much sense as comparing a Space Shuttle with a bicycle. Yes, bicycles are for the masses and they can't reach the stratosphere. We get it, now get over it!

Anyway, back to work.

Examples from the book:

a = 3
p = pi 32
p = pi decimals:32
hint.class = hint.hash
point = hint.class
	x = 0
	y = 0
	print = my-pp
	move = def - dx dy - x += dx y += dy end
	+ = def - p - point.new - x = self.x + p.x y = self.y + p.y end end
	+ = def - p
		point.new
			x = self.x + p.x
			y = self.y + p.y
		end
	end
end
p = point.new
p = point.new - x = 1 y = 2 end
p = point.new
	x = 1
	y = 2
end
p.move 5 10
(p + p).print

What about streams?

def produce! - i p!
	p!.append i
	p!.wait
	produce i + 1 p!
	# or if you really want to:
	produce (i + 1) p!
end
def consume - p
p.first.print
	consume p.rest
end
p! = stream.new
produce! 0 p!
consume p!

Note the lack of separator, sort of. Newline can be an implied separator replacement. Also method calls that involve a finite number of arguments do not need explicit separators between arguments. They should be optional though. Also one idea mentioned earlier is that everything can be protected by parentheses.

produce! is just a convention to indicate the method alters parameters. p! is the same convention applied to a variable we know will get modified. Note that consume doesn't modify it's parameter.

In an ideal world the interpreter would use tail call optimization on these functions.

However it may be more tricky for an interpreter to realize that once p.first is consumed, it can be garbage-collected. If each cell in the stream list is implemented using the "naive" approach of a singly-linked cell list, then it would become obvious there's no more reference to the beginning of the stream. However the example above is broken because the top scope p! keeps that reference, so in this case we should either make sure not to store such a reference or explicitely remove the first element from the list.

[Note 20080112: actually that's wrong, reading from a stream automatically removes from it, so elements removed can be garbage-collected.]

Coming back to earlier comments:

So let's try again. The syntax is mostly experimental btw.

def produce! <- int i, stream p -> stream
	generate p
	p.append i
	p.wait
	produce! i + 1 p
	# or if you really want to:
	produce! (i + 1) p
end
def consume <- stream p
p.first.print
	consume p.rest
end
consume produce! 0 stream.new

Note that produce! could be seen as a generator. It never returns, it just generates data thru one of the arguments. So in this way we should tag this method as such. Another possibility is for generators to be specialized functions like in Python except that it seems more sane to define generators as such instead of relying on the magical presence of a "yield" method call later inside:

gen produce! <- int i -> stream p
	p.append i
	p.wait
	produce! i + 1 p
	# or if you really want to:
	produce! (i + 1) p
end
OK maybe I don't really like <- and -> digraphs as operators. It doesn't really matter since the programmer will be able to change the grammar at runtime.

[Note 20080112: I suggest a shell-like syntax: gen produce! < input-params > output-param ]

Note that in the declaration of gen we can directly name the variable that will hold the output. It's implied that the generator will create it and it's available in the block. No other language that I know of really allows that, except Pascal that forces you to affect to the method's name (which always really confused me.)

Example:

def foo - int index, int offset - int
 index + offset
end
vs:
def foo - int index, int offset - int result
 result = index + offset
 do_something_else
end
In the first case, the result of the function call is implied to be the result of the last executed statement. In the second case the result is stored and is not the last statement's result.

There is some ambiguity on how named arguments are to be used. I shall explore this more later. For example consider the def foo above and say we want to use it with named arguments. A naive way to do this:

var a = foo index=5, offset=6
print a
> 11

Unfortunately this means we have a special case where = is not an affection but a paremeter passing. In C it's usual to write if (a = 5) ... even though it's really dangerous and most modern complier will treat that as a warning, forcing you to write if ((a = 5)) ... instead.

I'd rather have the equal operator mean only one thing so may I could use colon instead:

var a = foo index: 5, offset: 6
var a = foo index: 5 offset: 6
var a = foo (index: 5) offset: 6

Oh wait... smalltalk!


«»  2008/01/08 «» Reloaded x4  «»

Two, hmm no three, things to keep in mind:

With regard to the VM, it may force the implementation. Seems like LLVM primary API is in C or even C++, so that may be it. I need to investigate LLVM a bit more, especially wrt to introspection.

JVM opcodes can be generated from anything. Is the JVM even conceptually relevant? Yes, because of Android.

I am not totally convinced by Literate Programming. There's a part which is interesting, namely after the fact generating a readable & structured description of a program. On one hand, a concise algorithm is easier to read than a long paper describing it... but only rarely. On the other hand, encouraging documentation is always good. Unless the doc is obsolete. Keeping the documentation close is a way to solve this. It's always about balance.


«»  2008/01/12 «» Rob Pike's Newsqueak  «»

I saw this video sometime in November: Advanced Topics in Programming Languages: Concurrency/message passing Newsqueak by no less than http://en.wikipedia.org/wiki/Rob_Pike|Rob Pike.

So anyway when I saw that and heard his description of channels, it's exactly what I was trying to formalize here as streams. I do not claim that my approach was novel -- at the contrary it seems obvious to me and I wonder why no modern mainstream programming language has it.

It's time we get past C and Java. Well actually Java is almost a step back towards Visual Basic with a better syntax.

To come back to the video, the presentation is good and I like the examples. That's one thing I don't know how to do -- I don't know how to explain my ideas. I just clumsily expose the mechanism and expect its value will be obvious, but it doesn't qute work that way.


«»  2008/01/14 «» Coroutines  «»

One of the aspects I'm still fighting for is the difference between coroutines and threads. There are many cases where one could be used instead of the other.

A typical producer/consumer stream can be implemented either by using two coroutines as end-points or by using some threads. However in the case of the threads, we need extra synchronization (i.e. block till consumed). On the other hand, a thread can be computing the next value whilst the consumer is using the current one so even on modern non-SMP machines that can be a better usage of the processor.

The language won't decide whether to use a coroutine or a thread -- it's up to the programmer to do that.

Some related links:

Some coroutines examples:


(Continuing the previous rant on Literate Programming)

An example. Assume the whole thing is a source. There are no specific tags to delimit comments, instead comments are the defalut and program code is determine by paragraphs starting with a known keyword. Since we have a syntax where each block ends with end, it's easy to figure where the program ends. It does require a validating parser, although we could simply use come heuristics. Comments use an izumi wiki-like syntax and tags such as sections and paragraph numbers could be used to provide structure.

We'll handle streams which by default have stream.read and stream.write operations. That's a bit tedious to write, so let's define some aliases:

infix-def <- < stream a, o # o is not typed

 a.write o
end infix-def -> < stream a, o
 o = a.read
end Note that using this semantic makes it impossible to write "a.write(b.read())".

The purpose is to implement a trivial "prime sieve" algorithm. First we need a stream of integers, which is a generator with no upper bound:

gen numbers < int start > stream n

 var i = start
 while true
   n.write
   ++i
 end
end Then we want a way to filter multiples of a given integer from a stream of numbers. We can use the first number in the stream as the number to filter out. We still emit it once though.

thread filter < stream n, stream p

 var k = n.read
 p.write k
 var t = stream.new
 filter.run t p
 while !n.end?   # n never ends, however this would be good practice
   var i = n.read
   if i mod k != 0
     t.write i
   end
 end
end This is of course really sub-optimal, especially since we'd have a "mod" computed for each integer. However there is no guarantee the input stream is composed of monotonotically increasing integers, and by design it won't.

Now let's generate the primes.

gen primes > stream p

 var n = numbers 2
 filter.run n p
 p.wait
end

Note that here "filters" is declared using "thread", not "gen", because it did not return anything. A generator should always generate a stream and once started, the generator is forgotten, the control structure is the output stream.

One runtime strategy would be that calling a generator returns immediately -- or more exactly it instantiates the generator but doesn't run it. The generator starts to run when the stream is first read. If the stream is just a message passing semantic, the generator's stream doesn't even have to be more than one object container (i.e. a container with one slot). If the generator returns, it signals the end of the stream.

This means we could easily "pipe" a stream into another, so we could rewrite the thing like this:

gen filter < stream n > stream p

 var k = n.read
 p.write k
 var t = stream.new
 p.pipe filter t
 while !n.end?
   var i = n.read
   if i mod k != 0
     t.write i
   end
 end
end gen primes > stream p
 var n = numbers 2
 p.pipe filter n
 p.wait
end

BTW to use it, you would do something like:

def main

 primes.foreach print
end In this context, the generator is just a simplified form of a coroutine.


«»  2008/01/16 «» ...  «»

def network_server - block b

 socket.bind
 socket.listen - socket.cnx cnx
   a = cnx.read
   response = b.do_something a
   cnx.write response
 end
end gen network_server > stream b
 socket.bind
 socket.listen - socket.cnx cnx
   b.write cnx
 end
end

«»  2008/01/19 «» Scope  «»

We define a "scope" as a context (or environement) where things are defined. Scopes are inherently hierarchical. Scopes are actual objects.

There's an implicit top-level scope, although it is not named.

"hint.lang" is an explicit scope name. Structurally, it's equivalent to <top>.hint.lang. All implicit hint methods such as "def" are in hint.lang, i.e. hint.lang.def or hint.lang.object.

object foo

 def blah
   var a = ...
   a.for x
     ...
   end
 end
end Here we have the scope structure: even though the for block is an unnamed scope also.

Scopes should differ at definition time and runtime. In the example above, one should view the "foo.blah" as a static scope definition. However at runtime, executing the "blah.a.for" will generate a runtime scope.

A scope should have a list of children and a reference to its parent. So eventually in the "for" loop, the current scope exists and can be accessed:

object foo

 def blah
   var a = ...
   a.for x
     scope.current.parent.a ...
   end
 end
end Scope search works local first towards the parents. Once the top has has been reached, hint.lang should be examined. More specifically there should be a list of default scopes to be examined in the top one, of which hint.lang is the single default.


«»  2008/01/19 «» Syntax  «»

Let's assume "def" accept any white-space separated post argument:

object foo

 def blah ...
 def * < ... > ... end
 def + < ... > ... end
end i.e.: a = foo.new a.blah ... a.* ... => a * ? a.+ ... => a + ?

You have the right to shoot yourself in the foot:

object foo

 def . ...
end Now imagine you have a list and want to create a "for each" method from scratch. You need to declare its parsing rules.

Eventually you want to write:

 var a = list.new
 a.for x ...

object list

 def for - 


«»  2008/04/17 «» Types  «»

I've been given a demo of a TTS system yesterday. I like Python too, especially for its light syntax, but the brackets really don't play well with a generic TTS engine. Thus my idea to reduce the number of characters as much is not that bad, even if it imposes severe constraints.

Anyhow, I wanted to mention typing systems.

Static typing systems are pretty obnoxious: you have to declare every single data type returned in a functional "compiler-friendly" way. One of the things I like in C/C++ is that you can typedef a type and then later just use that name. At least you don't have to repeat the same obnoxious generics/template definition in every method declaration and variable. Those can get really long and annoying.

Totally dynamic typing systems can be a bit obnoxious too. You never know what a method can return unless you look at the code or sometimes simply run it. Sometimes this is a good thing, but most of the time I think it reduces visibility. Typing is part of a method signature, it's a sort of self-documentation.

Where I am getting at is that the best way to do this is to have some kind of type inference in a smart editor. Imagine a dynamic typing system where you can define you own not-so-static types, at least named types. So you create a "createFoo" method that returns a "FooType". Now just let the editor use some static analysis of your code to see if it can guess what a FooType is, or let the compiler do that and store in some kind of meta-data (like Eclipse w
Java). Later the editor can tell you if you're trying to do something stupid with that FooType. Maybe it's just too dynamic and it can't, or maybe it has a limited domain and it can. From your point of view, the meta-data is just there to help you, it is not necessary per se to use the language.


«»  2008/05/12  «»

Is there any difference between a class and a method?

(Nope, they are all scopes, see above.)


«»  2008/05/13 «» Alternative Syntax  «»

Alternative Syntax Summary:

	x = 2 = 3
	add = def - x y - x + y end

	def add - x y
		x + y
	end
or
	def hint.int.+ - y
		hint.math.add self y
	end

	"he said 'oh no' too fast"
	"""he said "oh no" too fast"""

# single line ### multiline ###


[20080718] Rebooted

One program = one editor = one "VM".

Kind of like SmallTalk system, a program is a set of initial data + instructions. It's an image. Running the program changes the data with a copy-of-write image so that it can be reverted.

The editor is an integral part of the language definition, it is linked to a specific image and displays information that changes at runtime. For a traditional editor, it would mean simply editing the static initial data.

There should be two kind of editors:

The latter should be a default, always available, and providing some simple levels of introspection.

In a first phase, the "VM" is really an interpreter. In a later version it should be something that generates bytecode for a more generic VM such as JVM or Dalvik.

Contrary to what was said before, there should be no direct access to the host environment (Java, .Net, whatever). Instead some kind of generic libraries should provide the expected basic functionalities. The language should be as host-agnostic as it should be. On the other hand it should be trivial for the host to provide methods/classes to be executed by the image interpreter, and this dynamically (such as for scripting.)

At first tokens are just parsed as bunch of strings. Later, at actual runtime they are replaced by references and added lexical information for syntax highlighting.

In the bootstrap we have the following:


«»  2008/07/23 «» Literate Programming  «»

Coming back to the subject of Literate Programming, I'm not convinced the approach can be generalized. In the context of the book, it kind of works: this is after all a collection of essays done by a scholar. Each example is a paper. In a paper one wants to explain a subject and focus on some interesting parts of the implementation. It is good if there's a parser that can collect these tidbits and re-create source files from them.

However most programs have two parts in their implementation: code to comply to a given framework for running the program and code that does the "business logic" part of the program. But even that logic can generally be cut again in two parts: code that is interesting from a programmer's perspective and needs to commenting and code that is mostly boilerplate and boring.

So really there's only a sub-sub-part of the code that can be interesting to mix in comments. So that's one issue with the basic idea of "describing" the code with literate comments -- most of the code doesn't need such commenting.

The other issue is the one of order. The order in which "interesting" code is going to be described in a paper is very different from the logical source file organization one would use to write a program. Both are artificial and semantic only to the program writer or reader. The compiler doesn't really care about the order as long as it re-interpret it correctly.

However splitting a program in several source files is mostly an organization tool that also adds semantic to program writers and readers. This is lost if the one source is the one present in the paper. I would argue that the semantic is different: in one case one gets an overall overview by looking at the directory or source files names, in the other case one gets the overall meaning by looking at the chapters.

The last issue is the one of refactoring. The hierarchical concept of files/classes/methods/statement is easy to use for refactoring tools. Moving such code around is pretty easy. If any algorithm is split amongst several pages of comments, this code cannot be moved around logically without disturbing the flow of the paper. I would argue that in this case the hierarchy is probably in the paragraph hierarchy (i.e. the table of content) and that's what a refactoring tool would have to work with.

Above, I was wondering whether we could have a syntax that does not require comment delimiters. Delimiters could simply be paragraphs (i.e. blank lines) and if a line starts with a valid identifier it would be a code line, otherwise it would a comment.

That approach doesn't seem sound. Comments delimiters are not a big issue when writing code. At the human level I think they even ease reading back code.


«»  2008/10/02 «» The driving points  «»

If you read that far on this page, you probably realized this is going nowhere. There are a couple of obvious reasons for that: one is that Hint is not trying to solve any specific problem urgent so I have no urgencey to actually implement anything. Another one is that it would actually be a burden to support. Writing a language is not enough. You also need a compiler/interpreter, tons of support libraries, tools and the way to use it in a meaningful way (e.g. a web server or an embedded runtime, depending on what you want to deploy to.)

All these can be overcome if the urgency is there.

One thing that cannot is the lack of focus. Hint lacks focus. I vaguely defined what it should solve, but that's not what's driving this "search".

The driving points are:

 Obviously that won't help with JIT, compilation and such. Doesn't imply "speed".

At some point I have to settle down, define something -- even lame -- that encompasses these 3 points and just implement it. Then I can rant about it some more.


«»  2008/10/04 «» Broken Example  «»

struct complex                               
  float x                                    
  float y                                    
end                                          

def complex.zero                             
  return complex.new 0, 0                    
end                                          

def complex.+ < a                            
  x += a.x                                   
  y += a.y                                   
  return self                                
end                                          

def complex.abssq                            
  return x*x + y*y                           
end                                          

def complex.sq < n                           
  var c = complex.new x*x - y*y, 2*x*y       
  return c                                   
end                                          

def mandel < complex c > int n               
  n = 0                                      
  var z = complex.zero                       
  while z.abssq < 4                          
    z = z.sq + c                             
    n++                                      
  end                                        
  return n                                   
end                                          

def line < y > stream z                      
  var c = complex.new 0, y                   
  for c.x = -2, c.x < 2, c.x += 1/128        
    n = mandel c                             
    z <- n                                   
  end                                        
end                                          

def screen                                   
  var pixels = array.new 256, 256            
  for y = -2, y += 1/128, n = 0, n++, n < 256
    start                                    
      z = line y                             
      array[n].fill <- z                     
    end                                      
  end                                        
end                                          

There are of course so many issues on many levels, I won't even start.


«»  2008/10/12 «» Partials  «»

Sample, pre-fix:

set some_math fun - x, y, z - t
  set t plus cos x tan y z
end
set p1 some_math 1 2
set p2 fun - x - t
  set t some_math x
end
set print fun - x
	hint.sys.out.println eval x
end
set main fun
  print some_math 1 2 3
  print p1    3
  print p2  2 3
end
set = fun - name value
	set .fix infix
end

«»  2008/10/18 «» Hint Partial  «»

expr := literal | (expr) | id | func
literal := string | number

Block is an "atomic" type, i.e. an exec scope:

Id found:

Function:

=> a function is a named block with a specific parsing rule

The interpreter:


«»  2008/10/29 «» Levels  «»

Given a certain syntax, a program could be parsed in different ways.

Let's take a Mandelbrot computation program. I can think of 3 levels.

At the core is the code computing one pixel in a Mandelbrot set. This could use fixed-point, float or arbitrary precision math (aka bignum). One would typically want to optimize that in assembly, generally with a higher-level version for more portability.

At a slightly higher level there would be the code to select which pixels to compute, for example all pixels linearly, boundary detection or tessellation. This needs to be optimized but it would be enough to write in C or something similar and it would need to use a number representation compatible with the lower level.

At a higher level we have the UI where the user selects parameters and watches the result. This is ideally done in any kind of higher level language, even a "slow" scripting language would be adequate (or sometimes preferable.)

The point is that these 3 levels could all use the same Hint syntax. A given "scope" or module could be tagged with which method to use. This would influence the libraries and semantics available.

Depending on the architecture, the lower level could very well get directly pre-compiled at load time in machine code; the middle level could be JITted whereas the higher level could remain purely interpreted for maximum flexibility.

Similarly the lower level could very well have limited access, if at all, to libraries, i.e. not being able to make any calls to methods in other levels and instead just be limited to work on pre-allocated buffers. In this sense it is purely limited to number crunching.

The syntax of the language could also be similar yet semantically different at each levels. For example at the lower level we could have very basic syntax almost assembler-like. At the higher level we could have partial method calls and lazy evaluation not available in the lower levels.

Some semantics like stream would be represented as generators or iterators at higher levels whereas they would be simply be buffers at the low level.

For debugging purposes, a level can be "upgraded" that is for instance code targeted for the lower level can be forced in an interpreted mode to be easier to debug and modify on the fly.


20081105 Hint specs

Lexer -> num -int/f, id, .id, keyword, str - unicode, symbol

Input > lines > tokens (ide)

Interp tok stream

block is an "atomic" type, i.e. an exec scope:

id found:

function:

the interpreter:

Impl:


20081225 Structs vs classes vs function

Structs, classes and function are more or less instances of the same thing:

 A function is like a struct allocated on the stack with some code attached.

We only need one logical structure, a block, that contains all of these.


20081226

Let's examine a reduced goal: scripting language for a game.

2 levels:

Goal: write a Hint parser and interpreter in Hint with the minimum Java bootstrap.

Assumptions:

Concepts:

 execute and input/output arguments. So it can be either a struct or a
 method call, depending on how it gets used.

Implementation follows in literate programming form.

def main < in out
	# main entry point.
	# in is the input stream. Each item is a token string (space separated)
	# out is the output stream
	parse in out | interp out
end
def parse < in out > ast
	# parses the input stream and produces an AST
end
def interp < ast out
end
At this point we need to define "def" using some more basic syntax:

set hint.def
	set .syntax
		symbol name
		[ < symbol in-params* ]
		[ > symbol out-param ]
		block
	end
end

20081230

Right now attemps fails because there are two strategies at hand.

One is a free lex stream and the interpreter is mostly lex based with no prior interpretation. That is all methods actually have a prototype "def <name> <lex-stream> <output>". In this pattern we don't have arguments: the "function" just processes the lex stream itself. This can be done via its syntax declaration at runtime. This means arguments are determined by the callee, not the caller, and all evaluation is lazy and done as late as possible.

The other strategy is to have the caller determine the arguments. The obvious issue is that the current syntax is non-deterministic since there are no kind of delimiters.

The obvious limitation with the first strategy is that the caller cannot validate the call in any way. As far as the caller is concerned this is more a "goto" than a typical function call.

From the callee perspective we need to ensure arguments are parsed first, yet not evaluated.

So we get:

def name in out
  parse (a:expr) (b:expr)
  code
end
Example:

def str.for
  parse (n) (b:block)
  var l = this.len
  var i = 0
  while i < l
    var c = this[i]
    b c
    ++i
  end
end
Note that "parse" could be an internal method that takes the implicit lex stream and creates variables on the fly.


20081231

Ideally Hint should run at several levels. The top level is an interpreted environment. In the original idea I wanted to have a "native" IDE that can offer at least one advanced feature, which is type analysis. The IDE should be something simple kind of like the old GFA Basic editor, interpreter and validating on a per-line basis.

Eventually I'd also want a similar editor for Android.

We still need a simple source-file based interperter, so we'll start there. Ideally the IDE will have to be written in Hint itself.

On the desktop side we could simply have a plugin for an existing editor such as Eclipse that offers syntax highlight, integrated debugging and analys of a running VM.

To get started, we're going to assume the following environment:

Hint does not need source code. The source is just there to feed the VM initially. Blocks contain their own source code.

For an Eclipse plugin, we can implement a file system that displays the source code from blocks for immediate modifications, add new blocks on the fly and debug them.

Blocks have two parts:

So far this is just like a prototype-based language. Sounds like "self".

An underdlying idea would be that everything is a block: a module, a function,

a structure, an if-else, etc.


20090101

Let's try again.

Define something and put it in a namespace:

def [namespace.]name [< out]

An id is defined:

The expr "def name" associates a block to that name. Thus it creates the id (or removes an existing one) and uses the new block as the definition.

At runtime, existing block instances that already referred to that block def continue to point to the old definition.

One should be able to write:

var old_str_append = hint.str.append
def hint.str.append ... end

We could probably differentiate blocks and scopes.

E.g.:

def some.method
  if a == 0
    this is scope 1
  else
    this is scope 2
  end
end
Here we could say we have one block definition: the method itself. Its code list contains an "if", which is a function call. Both sides of the if-else can be either just "code lists" or full "anonymous" blocks.

The parsing syntax could be like this:

def (name) (block) end
if (expr) (block) [else (block)] end

In this case each branch of the if-else are anonymous blocks, with their independant variables and stuff, which parent is the def block. However such inner-blocks may not be instantiated separately from the parent: there's a "master" block, which has the actual definition and is the one instantiated and then there are "slave" blocks which are inner scopes. They can have their own properties but only exist as part of a master block instance.

The actual storage of a block is basically a property list, i.e. a collection of key/value pairs. There are several possible implementations. One is to use a real dynamic list or hash table for each running instance. Another choice is to associate offsets in a byte buffer for each variable. This makes lookups more efficient, but makes it harder to dynamically add or remove properties.

That's where the notion of execution levels could have a drastic influence: at the higher interpreted level, a property list can be dynamic and it's possible to alter it at runtime (aka "reflection"). At a lower level that is compiled, such an operation would fail.


20090101 Hint VM

There are a few interesting variations on the design of the hint vm.

First, at runtime, a VM is essentially a bunch of block definitions and their running instances.

Changing a block definition at runtime should be possible and should influence all the running instances. Obviously there are non-conflicting changes and conflicting changes. When there's a conflicting change, we can either decide to trash all the conflicting running instances or we can decide to "clone" the old definition and let those running instances use the old one.

A non-conflicting change would be changing the code in a block instance which is not being currently executed (master or slave), or changing the property list definitions by changing initial values or adding new properties.

A conflicting change would be to try to alter a code list that's being executed or removing properties. Even that last one, removing properties, might behave differently depending on the execution level: at high level, each instance will likely have its own property list so removing them is not an issue (code trying to access it will fail at runtime) whereas at lower level the property list is fixed so it can't be changed anyway.


20090102 Files vs live

The VM contains "live" objects with their own lex stream. These objects don't have actual source code. They could be created dynamically at run time.

The VM needs to be "seeded" with object definitions and the best way to do that is to use actual source files. We'll need some conventions for these files and a structure that facilities parsing them as well as saving them (for "backup"). Such source could also be interpreted at runtime to create new objects.

There are two issues at hand, or rather three.

First it's useful to have "modules". Each file is a module. Modules don't need to have any semantic meaning from the VM/runtime perspective but they are useful to the programmer to organize the source code. The parser doesn't really have to care about modules -- they could just be random files. It is interesting however to be able to display a representation of a running VM using the same logical representation as used by the original designer.

We also need a facility to let the parser indicate the namespace scope of the objects being loaded. In Python this is done based on the directory structure being parsed. In Java either the directory structure or the "package" keyword is used. We could start by just using the directory structure, which is a fairly natural convention. The downside is that it prevents from just loading a random file and understanding where it fits -- one needs to load full directories and what matters is the directory root. So maybe having a "namespace" statement that indicates the namespace will work better. It can then be a convention to place "module" files in the right spot, and that's where a custom IDE or plug-in can do the job.

We want to be able to extend namespaces. That is if a module operates on a namespace x.y, every definition in it will extend or replace existing stuff that already exists in the VM when loading.

The main issue that remains is how is such a module file loaded. Is it just parsed for lexems or is it actually interpreted/executed? I'd say it is interpreted by the VM as if it were a direct live input.

Definitions contained in the files are created as the parser sees them.

This means that we need to explicitely dependencies since the file loading order matters especially if loading more than one file. In this case loading files means the loader needs to first identify all files to load and their target namespace, then resolve dependencies and then load depended files first.

An example would be to have "directives" (e.g. meta-commands) at the beginning of the files, with one keyword per line:

requires ../libs/gl
namespace hint.graphics.mylib
var x
def y .... end
def z .... end
namespace my.program.blah
def x .... end
requires ../libs/sockets
def main .... end

Where "requires" indicates the name of a file to have loaded first. Such a file would be loaded only once at the time where the directive is seen.

We also need a way to resolve runtime dependencies. For example imagine we have a module that implement OpenGL wrappers hint.graphics.gl and then we have a game with some UI that depends on this GL library. At load time we should be able to enforce the present of this namespace or its methods. This is not a meta directive per se, just a parser-time check.


20090103

Since the beginning, there's a clear mixup between source parsing, lexers and in-memory code.

In a traditional compiler or interpreter, there is generally an input stream of characters that is transformed into lexems, e.g. a list of typed words: symbols, numerals, keywords, processing instructions, etc. Then grammar is applied to that to enforce grammar rules and an abstract syntax tree (AST) is built in memory along with a symbol table. There are many variations on that but at least that's the basic model.

In hint the in-memory source would be something in between the AST and the lexems. The basic idea is to keep the lexem stream as intact as possible and use this as the base of the interpretation. However a lexem stream is a pure flat stream whereas it is far more interesting to have structure in whatever the interpreter sees.

For interpretation, all we really want is the AST. That's these "blocks" mentionned earlier.

However in a traditional AST, everything is resolved in advance, whereas here what I'm trying to achieve is that a called function is the only one which knows how to parse its parameters.

To keep the language dynamic, it's enough to be able to redefine parsing rules on the fly, but really that only concerns how a source is read. It is however interesting to be able to change parsing rules just for a block of input in order to create domain-specific micro languages.

Thus one idea is to have the notion of delimiters. For example there's a syntax to start and end a block definition so that the parser know that everything between two delimiters belongs to the same block definition, without really having to understand what exactly is in there.

I suggested above a "def ... end" syntax. That could work except the "end" keyword was also used internaly to denote inner blocks, so it could be confusing.

Other languages have already solved the problem:

 yet totally obnoxious to read and write. YMMV.

Any of these solutions would work. I personally lean more towards the Python syntax whereas most people lean more towards the C syntax.

Now let's use explicit { } brackets for markers. An example would be:

def if {

 parse (expr:e) (block:b-if) else (block:b-else)
 # -> we don't have to specify an "end" to the parse rule
 var cond = eval e
 var result = cond ? b-if : b-else
 return result
}

Once parsed, we would get an in-memory block with the following information:

And as always this is still very wrong on so many levels and totally unworkable.

The issue is that the "parse" definition tries to do two things at the same time, and they are not compatible: it tries to define how to parse a lexem stream and at the same time how to create local variables.

Now what happens when we get a source looking like this:

 def a { if c > 1 { cos c + 2 } else { cos 2 + sin 3 + c } }

Which would be the same as writting:

 def a
   if c > 1
     cos c + 2
   else
     cos 2 + sin 3 + c

As with more parsers, we need more information, most important we need priorities.

To take the simplest issue, "cos c + 2", we need to define that "cos x" is an unary operation whereas "+" is a binary operation. Then we need to define that unary operations have generally a more importan priority than binary so implicetely this is writing "(cos c) + 2". One can always use parentheses to write "cos (c+2)" but we're back to square 1: there is not enough information to parse that before hand -- and not even at runtime -- unless it is built into the language, which is what we're trying to avoid all along.

Actually above we're not even addressing the typical if-if-else issue.


namespace example.mandelbrot
var max_iter = 100
def point < xc, yc > z: int
   var c = 0, x = 0, y = 0
   for z = 0,
       z < max_iter && x*x+y*y<4 ,
       z++
     t = x*x - y*y + xc
     y = 2*x*y + yc
     x = t
   end
def set < x1, x2, nx, y1, y2, ny > stream out
   o start
   for j = j < ny, j++
     y= y1 + (y2 - y1) / ny * j
     for i = 0, i < nx, i++
       x = x1 + (x2 - x1) / nx * i
       point x, y | out
     end
   end
end

Site License

Creative Commons License
This work is licensed by Raphaël Moll under a Creative Commons License.

Options
Color Theme: Gray  | Blue  | Black | Sand  | Khaki  | Egg  | None

Web ralf.alfray.com Powered by Google

Display Izumi & PHP Credits

Stats
539 accesses, 1 access from 38.107.179.210
Visited 15 times by Google, last 2012/01/19 08:43
Visited 7 times by Yahoo!, last 2011/10/07 12:25

< Generated in 2.88 seconds the 02/06/2012, 09:47 AM by Izumi 1.1.4 >