Kent Dybvig also wrote Chez Scheme [1] which on most benchmarks is far-and-away the fastest [2] Scheme implementation.
Racket recently got rewritten to be based off of Chez [3] and I have it from the maintainer of Racket himself that it’s paid off well: better performance and a smaller, more maintainable codebase.
Reading and hacking on the Chez Scheme codebase is always a treat and rather inspiring, especially compared with more mainstream compilers and code generators. As well as Kent Dybvig, Andy Keep's contribution (nanopass) is super-impressive. The whole thing is so cleanly designed and beautifully expressed.
I believe MIT Scheme compiled directly to C by intermingling switch statements and gotos to emulate its call stack. Problem was, as programs grew, compile times weren't linear. I gave it a shot once, it was a somewhat spiritual experience:
Thanks! Do you mean MIT Scheme's C backend? I've used MIT Scheme on and off for a long time and have never touched the C backend & have no idea how it works, so this is interesting.
(MIT Scheme also has a native code compiler for Intel CPUs, which seems to be what most users of MIT Scheme (an admittedly small community) actually use.)
If you'd like to define the backend like this, it's the easiest way to compile to C; we're just repurposing C as a general-purpose assembler: https://glouw.com/2023/11/07/Switch.html
Even MIT Scheme was used with SICP, the Scheme implementation
in the book is different from MIT Scheme.
MIT Scheme was for a long period one of the leading Scheme implementations.
It lacks support for Apple Silicon, so it is not as popular now, as it once was.
Shout out to Mr. Dybvig who taught our intro to CS course at IU Bloomington 20+ years ago! It was my first intro to recursion (of course the class was all scheme based) and gave me much more confidence in software development coming from a self taught background.
I had the great pleasure of studying compilers, and then optimizing compilers, with Dybvig at IU in the late 2000s. He was a phenomenal professor; obviously deeply knowledgeable (that's a given), but also uncommonly gifted at imparting that knowledge.
I had the good fortune to be in Dybvig’s compilers class in the mid 2000s. The way it was presented really brought the concepts to life and made it enjoyable.
> note! repl implies there's a top-level but there isn't...
I'm a little confused. How could it not have a top-level environment? There are environments which are extended on function application by consing lists of variables and their values onto the old environment.
Amazing, and you could see how this could be translated into perhaps a couple of thousand lines of assembly code to boostrap Scheme almost from the bare metal, similar to the early IBM 709 LISP.
A thought: I wonder if an LLM would be up to the job of writing the assembly code from this?
(sans LLMs -- I believe they have a Scheme (GNU Mes) that can be compiled from their 357 byte bootloader, and that Scheme can run a C compiler that can compile tinycc, and I think there's a path from tinycc to compiling gcc. I'm not sure how far it can get after that -- this blog post[1] implies that you still need a binary of guile to run Gash, which is a POSIX shell written in Scheme. I'm guessing the plan is to either simplify Gash or complexify Mes to be able to remove Guile as a dependency.
Generally speaking, program-writing-programs (when they are not either buggy, or being updated to a new version) predictably write the same thing from the same specification according to rules.
LLMs do not replace program-writing-programs; they should be used to work with program-writing-programs, not as a replacement.
E.g. I wouldn't use an LLM to convert a grammar to LR parsing tables, but perhaps to get help with the grammar file.
If your Scheme were a native compiler rather than an interpreter (i.e. like Chez) and written in (a constrained subset of) Scheme, then you could use an itty-bitty Scheme interpreter written like this to bootstrap the core of your native Scheme compiler so that the interpreted version of it could compile itself to native binary.
Given that this was the path that one of Dr. Dybvig's post-doc acolytes, Dr. Michael Ashley, taught us in his Scheme-based compiler class, I must guess that this was also Kent's path and that that was the intent for this little interpreter. I suppose I should (finally) take the time to read his dissertation.
A C compiler can output fairly readable code if you turn off optimizations, and it's definitely not going to take thousands of lines to do this in modern assembly. It may be only just barely a thousand lines to do this in aarch64, and the LLM can probably do it.
From what I've seen the LLM do it can definitely enhance these programs if you know what to ask, and it can explain how any piece this code works. It may even be able to add garbage collection to the evaluator since the root registers are explicit, and the evaluator only acts on static memory.
I'm quite aware of the existence of compilers, having worked on bootstrapping a production LISP compiler in the past. My point being that this would be an interesting experiment to do this "naïvely", given how close C is to (for example) PDP-11 assembly code.
You'll still probably need the `strcmp` because the pointers won't be the same unless you check for them and make them the same.
You may be thinking about how `eq?` (reference equality) works in scheme. That's usually done by hashing the identifier string. Which is the more general solution to this equality problem.
You're right `virtmach` only works on things that are output from `compile` and maintaining the invariant that virtmach lisp uses those pointers isn't difficult to do in with how the evaluator is presented.
It gives virtmach lisp and scheme different ontology, but I can't think of any practical reason why that would matter other than it makes things a little bit more complicated. But, then again if I'm thinking practically scheme should be using hashed identifiers, and then there's no reason for them to have different ontology and conceptually we're right back where we started with virtmach lisp and scheme using identifiers as objects.
I couldn't get it to do much. Also noticed a subtle bug here:
next = compile(read(buffer), cons("halt", NULL));
C doesn't guarantee the order in which arguments to a function are processed. Since 'cons' relies on 'read' being called first, the result is undefined behavior. (In fact I had to fix that before it would even "run" on my system.)
Kent Dybvig also wrote Chez Scheme [1] which on most benchmarks is far-and-away the fastest [2] Scheme implementation.
Racket recently got rewritten to be based off of Chez [3] and I have it from the maintainer of Racket himself that it’s paid off well: better performance and a smaller, more maintainable codebase.
1: https://github.com/cisco/ChezScheme (also, pronounced “Shay Scheme”)
2: https://ecraven.github.io/r7rs-benchmarks/
3: https://m.youtube.com/watch?v=s3Q3M2wZ7rI&pp=0gcJCRsBo7VqN5t...
Reading and hacking on the Chez Scheme codebase is always a treat and rather inspiring, especially compared with more mainstream compilers and code generators. As well as Kent Dybvig, Andy Keep's contribution (nanopass) is super-impressive. The whole thing is so cleanly designed and beautifully expressed.
I believe MIT Scheme compiled directly to C by intermingling switch statements and gotos to emulate its call stack. Problem was, as programs grew, compile times weren't linear. I gave it a shot once, it was a somewhat spiritual experience:
https://github.com/glouw/switch
Thanks! Do you mean MIT Scheme's C backend? I've used MIT Scheme on and off for a long time and have never touched the C backend & have no idea how it works, so this is interesting.
(MIT Scheme also has a native code compiler for Intel CPUs, which seems to be what most users of MIT Scheme (an admittedly small community) actually use.)
If you'd like to define the backend like this, it's the easiest way to compile to C; we're just repurposing C as a general-purpose assembler: https://glouw.com/2023/11/07/Switch.html
I believe MIT-scheme took it a step further with gnu extensions which allowed you to take the address of a label like &&label, which allowed for function pointers: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
I don't see anything in the manual about MIT scheme compiling to C, just to its own native code object files and some portable bytecode...
https://www.gnu.org/software/mit-scheme/documentation/stable...
It's there, but the documentation is weak.
The release notes from long ago when that back end was released are here, and give some detail:
https://share.google/xmUnDhC7lndujD7TI
I think a very rudimentary implementation (which is easy to understand) is in the SICP book
Even MIT Scheme was used with SICP, the Scheme implementation in the book is different from MIT Scheme.
MIT Scheme was for a long period one of the leading Scheme implementations. It lacks support for Apple Silicon, so it is not as popular now, as it once was.
https://www.gnu.org/software/mit-scheme/
Yes. In Apple the memory can be writable XOR executable. MIT scheme needs to write on pages that will execute, so it will trigger the MMU..
Install chicken scheme. Then:
At ~/.csirc put: Now you can do nearly all the execises.Shout out to Mr. Dybvig who taught our intro to CS course at IU Bloomington 20+ years ago! It was my first intro to recursion (of course the class was all scheme based) and gave me much more confidence in software development coming from a self taught background.
I had the great pleasure of studying compilers, and then optimizing compilers, with Dybvig at IU in the late 2000s. He was a phenomenal professor; obviously deeply knowledgeable (that's a given), but also uncommonly gifted at imparting that knowledge.
I had the good fortune to be in Dybvig’s compilers class in the mid 2000s. The way it was presented really brought the concepts to life and made it enjoyable.
How much faster or slower than 400 lines compiler? https://news.ycombinator.com/item?id=26191243
Since these seems to be turning into an opportunity to shout out for other Scheme implementations...
I was a happy use of VSCM by Matthias Blume, doing a bunch of my university assignments with it on my Linux PC in the early 90s.
> note! repl implies there's a top-level but there isn't...
I'm a little confused. How could it not have a top-level environment? There are environments which are extended on function application by consing lists of variables and their values onto the old environment.
This implies a root environment whose cdr is nil.Can someone give more info on the relevance of this particular code?
This is an implementation of an interpreter from his dissertation. https://www.cs.unc.edu/xcms/wpfiles/dissertations/dybvig.pdf
Amazing, and you could see how this could be translated into perhaps a couple of thousand lines of assembly code to boostrap Scheme almost from the bare metal, similar to the early IBM 709 LISP.
A thought: I wonder if an LLM would be up to the job of writing the assembly code from this?
This is the path that the GNU Mes and Guix folks are taking: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...
(sans LLMs -- I believe they have a Scheme (GNU Mes) that can be compiled from their 357 byte bootloader, and that Scheme can run a C compiler that can compile tinycc, and I think there's a path from tinycc to compiling gcc. I'm not sure how far it can get after that -- this blog post[1] implies that you still need a binary of guile to run Gash, which is a POSIX shell written in Scheme. I'm guessing the plan is to either simplify Gash or complexify Mes to be able to remove Guile as a dependency.
[1] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...
You could just use a C compiler to write the assembly code from this, and it'd be far less buggy.
Before there were LLMs, there were about 65 years of other program-writing-programs to save labor.
Generally speaking, program-writing-programs (when they are not either buggy, or being updated to a new version) predictably write the same thing from the same specification according to rules.
LLMs do not replace program-writing-programs; they should be used to work with program-writing-programs, not as a replacement.
E.g. I wouldn't use an LLM to convert a grammar to LR parsing tables, but perhaps to get help with the grammar file.
As simple as
gcc -S heap-lisp.c
If your Scheme were a native compiler rather than an interpreter (i.e. like Chez) and written in (a constrained subset of) Scheme, then you could use an itty-bitty Scheme interpreter written like this to bootstrap the core of your native Scheme compiler so that the interpreted version of it could compile itself to native binary.
Given that this was the path that one of Dr. Dybvig's post-doc acolytes, Dr. Michael Ashley, taught us in his Scheme-based compiler class, I must guess that this was also Kent's path and that that was the intent for this little interpreter. I suppose I should (finally) take the time to read his dissertation.
A C compiler can output fairly readable code if you turn off optimizations, and it's definitely not going to take thousands of lines to do this in modern assembly. It may be only just barely a thousand lines to do this in aarch64, and the LLM can probably do it.
From what I've seen the LLM do it can definitely enhance these programs if you know what to ask, and it can explain how any piece this code works. It may even be able to add garbage collection to the evaluator since the root registers are explicit, and the evaluator only acts on static memory.
> I wonder if an LLM would be up to the job of writing the assembly code from this
Why ask a cluster of GPU's to do something any single CPU from the last 30 years can accomplish?
I ask this question every day, and I think the answer is the same as the answer to the question "why are you doing this with blockchain?"
> I wonder if an LLM would be up to the job of writing the assembly code from this?
I could see a compiler doing that.
I'm quite aware of the existence of compilers, having worked on bootstrapping a production LISP compiler in the past. My point being that this would be an interesting experiment to do this "naïvely", given how close C is to (for example) PDP-11 assembly code.
If it's speed you're after, much more analysis (and thus code) is needed. See V8.
Instead of representing atoms as string literals, you can represent them as global variables, eg
Then you can use pointer comparison instead of strcmp().You'll still probably need the `strcmp` because the pointers won't be the same unless you check for them and make them the same.
You may be thinking about how `eq?` (reference equality) works in scheme. That's usually done by hashing the identifier string. Which is the more general solution to this equality problem.
The atoms strcmp()ed by the interpreter are all created by the compiler so you can ensure the pointers are equal by construction.
You're right `virtmach` only works on things that are output from `compile` and maintaining the invariant that virtmach lisp uses those pointers isn't difficult to do in with how the evaluator is presented.
It gives virtmach lisp and scheme different ontology, but I can't think of any practical reason why that would matter other than it makes things a little bit more complicated. But, then again if I'm thinking practically scheme should be using hashed identifiers, and then there's no reason for them to have different ontology and conceptually we're right back where we started with virtmach lisp and scheme using identifiers as objects.
Do Forth bootstrapping now
No matter what I try, I always get:
heap-lisp: heap-lisp.c:99: cons: Assertion `istext(textptr)' failed.
I wonder if the ptr comparison in istext is actually UB?
If textptr is not an element of the text array, then it's undefined.
Have you tested this? Does it work?
I couldn't get it to do much. Also noticed a subtle bug here:
C doesn't guarantee the order in which arguments to a function are processed. Since 'cons' relies on 'read' being called first, the result is undefined behavior. (In fact I had to fix that before it would even "run" on my system.)It works but it can’t do much other than demonstrate semantics. The evaluator won’t even be able to add numbers for example.
What an utterly delightful itty bitty scheme. <3