|
|
Index: Home | What Is Izumi | Misc Links | Random Thoughts | Too Much To Read | The Rant Vault | Quotes Dev: Projects | Ideas For Dev | Nerdkill | Rig | Hint
$Id: MiniHint.izu,v 1.6 2005/11/05 21:55:57 ralf Exp $
20050122 Warning: This project is indefinitely put on hold.
20041221
There is no goal yet. The project is not formaly specified yet. This page exists only because I needed a place to write down some notes. So please read the notes below if you wish but don't expect too much. The whole thing may never happen, or it may swing in totally different direction.
20041221
Milestones are not available yet.
To be done (by decreasing priority):
2004MMDD [1.N] Section: Desc
Finished (by decreasing date):
20041106 [1.F] Test: Switching from CSTools to Coco/R C# 20041024 [1.F] Test: Compiling CSTools 20041024 [1.F] Test: Skeleton Project
(Notes are given in chronological order.)
The main objective is to write down a tentative grammar for CocoR. The parser will generate an abstract tree, directly suitable for executation by a stack machine.
The main lines of the language are:
Classic parsers that generate compilers process one full syntactic unit, such as one C# file for example. I can start with this but it is not the ultimate goal and will be annoying later. What I want ultimately is have a line-oriented IDE for a PDA. Edit one line and process it on the fly, generating its tree.
It is actually easier than it seems to handle that. If the parser does not have to deal with a complete unit, it merely needs to process input as a list of statements. Once the abstract tree is built, a quick analysis can determine if the unit is valid -- the same that would need to happen at runtime anyway.
It thus then becomes possible to process full units or to have an IDE process a single line statement and maintain the tree.
One way to deal with this is to keep a tokenized version of the line in the IDE. Think Basic editors that reformat on the fly and indicate errors at the same time the line is typed. This is often annoying when editing though if the editor reformats the line or pops up error messages -- on the other hand it can be done without disturbing the user. Also the tokenized version of a line may need to maintain information about white spaces.
A simpler approach is just to store the line as text and only provide an auto-indentation facility. Aside the text line the corresponding abstract tree must be stored (or at least a pointer onto the main tree.) and if possible each identifier in the tree must retain token information (at least the column at which the token started so that errors can be pointed out -- line information can be inferred and needs not be stored.)
So the parser definition may look something like this:
Program = { statement line } .
Statement = declaration | instruction .
Declaration = module | class | function | var .
Instruction = expression | if | while | for | return | end .
Module = "module" namespace .
Class = "class" ident .
Namespace = ident { "." Ident } .
Function = "function" ident [ params ] .
Params = Weak "-" { [type] name } .
Var = Weak "var" namespace [ ident ] "=" expression .
// Semantic: ident is optional if namespace has no dot. It becomes the ident.
List = "(" [ expr | { , } ] { , expr { , } } ")" .
Expression = arith | new | fun.
Arith = (same order and functionnality than JavaScript) // Operands: + - * / % mod and or xor (logical) & | ^ (bits) << >> ! ~
New = "new" namespace [ list ] .
Fun = "fun" [ params ] "-" block .
// Semantic: this is an anonymous function. Note the confusion about line separators.
If = "if" expression line block
Block = { statement line } "end"
While = "while" expression line block
For = "for" expression [ "-" expression [ "-" expression ] ] line block . Return = "return" expression . Literal = string | number | boolean . Number = integer | double | hexa . Boolean = "true" or "false" .
Now there are a couple of problems here:
The grammar of course does a poor job at explaining semantics and overall behavior.
Ideally the ultimate goal is to have something that is prototype-based and that allows supple redefinitions are runtime. The language should be weakly typed, that is types can be ommited and infered or explicitly described. Yet for a first milestone, that's too much.
So the basic behavior could be something like that:
Note that there are not too many specifics at that point. The idea is to start with something simple -- let's say common. Things that are easy to add, such as operator overloading, will be done next.
Scoping in the first prototype will be standard: intrinsic functions, modules, classes, functions and blocks, in that order. Every intrinsinc things are going to be stored in the global Hint namespace.
There should be a default mechanism to access the host functionnality. Since the prototype will be coded in C#, it makes sense to provide access to the .Net Framework, that is:
Note that I haven't introduced the notion of a "using" or "import" statement yet in the grammar. I'll see as it goes.
Execution will be done by a standard stack-based interpreter. Ideally some elements of lazy evaluation and closures could be used in the implementation of the interpreter.
The interpreter will directly act on the tree generated by the parser. One possible data structure for this tree:
To be continued.
Defining the syntax and parsing it is only one part of defining the language. Since the language is going to be interpreted, the other obvious important part is the definition of the interpreter, or more exactly the way it will work which will influence how the language is used.
Things I rougthly have in mind:
and _prototype.
I was struggling on how to implement the parser. Defining the grammar is not difficult (although there are still some details that are sketchy to say the less) yet I wanted to start implementing it and I couldn't find what I wanted:
Let me explain the latter one: to test the language I want a minimal PocketPC editor. It's actually more an IDE in the sense that I want something similar to the old Basic editors where each line is tokenized on the fly. So rather than write a "full parser" then later a tokenizer for the IDE I'd rather write something that can parse one single line of the program. The problem when I do that is that I can't get a grammar that validates a full source file.
It just struck me I was wrong. I can get both with a trick.
A full source file must contain:
So what I just need is one grammar which first production can be either one statement or one module. Then it is a semantic error to parse one single statement when parsing a single source line.
The output of the parsing will be the abstract execution tree.
This has the advantage that I can do "partial recompilation" on the fly in the IDE but just parsing those lines that have been edited and merging the resulting tree where needed. This implies the IDE must keep track of which source lines produced which sub-tree. When processing a full file, the parser will need to keep that information around for the IDE to use.
Here's a stupid idea: implement this in Python rather than C#. Let's see:
So overall it's a poor choice although on some other level it is an interesting one. From an academic point of view let's say.
Speaking of Python, why not using its "identation defines block" way?
On another unrelated note, the syntax should be as flexible as possible:
|
|

This work is licensed by Raphaël Moll under a Creative Commons License.
|
|
| Color Theme: | Gray | Blue | Black | Sand | Khaki | Egg | None |
|
|
|
|