(This design document discusses the issues of implementing Java in Scheme. It predates the implementation, and does not necessarily match the current Kawa implementation.)
Our biggest outstanding deliverable for Guile is a byte-compiler and -interpreter. Using Java has been suggested, and there is wide-spread interest in using Java byte codes for other languages. For example, InterMetrics is porting their Ada95 compiler to emit Java byte-codes. See here for more languages that use Java.
There is already an implementation of Scheme on top of Java. This is "Kawa", written by R. Alexander Milowski. There are some major problems with Kawa 0.1, and it needs a lot of re-writing, but it may be a useful starting point.
There are some different possible targets for Scheme-in-Java:
Related design issues include:
call/cc
using
direct translation to portable bytecodes. It is possible if you re-write to
continuation-passing style.
Restricted call/cc that only returns to a caller is possible.
Object
class, like all Java objects.
One might add an intermediate class Scheme_Object
,
and have all Scheme types be derived from it. That is possible, but I
don't think it is a good idea. I think it is better to re-use existing
Java classes. For example the Scheme values #f
and
#t
should be the Java objects Boolean.FALSE
and Boolean.True
. This does imply
that one cannot use that standard Java print functions (since they
would print in Java syntax, rather than Scheme syntax).
Traditionally, Scheme (and Lisp) implementations represent objects using a short union, with special bit "tags" to distinguish heap objects and immediate values (such as fixnums). This allows some operations (including fixnum arithmetic) to be more efficient. Unfortunately, there is no way to do this with Java bytecodes. Olin Shivers proposes a Java extension to allow tagged types. I don't think this has much chance of being standardized, and I think pre-allocation of common values (characters and small ints) can give some of the benefits without changing the VM.
Here are my suggestions for representing the various Scheme types:
#f
and #t
Boolean
.
Character
.
Two minor problems:
eq?
, unless we provide a
system-wide "obarray" table for characters. Luckily, Scheme
does not require characters that are equal?
to be eq?
.
equal?
integers compare eq?
.
Pair
class.
String
class.
Use String.intern()
to make symbols unique.
StringBuffer
class.
Object
.
Java does not have function values.
They can be simulated with a new
Procedure
class with a (virtual) apply
method.
Each function gets implemented as a separate sub-class of
Procedure
, and the Scheme code of the function is
compiled into an apply
method for that
Procedure
sub-class.
In some cases, a number of functions can use the same Procedure
sub-class. For example, the c[ad]*r
functions could be
implemented with a single Procedure
sub-class, with
a field containing an encoding of the [ad]*
.
The apply
method uses that field to do the right thing.
A closure requires a Procedure
sub-class with a field that contains the static context.
The static context can be an array,
though it can also be an Object
of some suitable class.
Evaluating the nested lambda expression will allocate a context object,
and then "new" the Procedure
sub-class, passing it the
context object. The apply
method well get non-local
bindings using the context object.
Mutual tail-calls among multiple functions compiled together can be handled by compiling all the functions in one big procedure, and and starting it with a switch operation to jump to the code for the desired function. In that case, a tail-call to one of the other functions can easily be compiled as a GOTO. It is not clear whether this is worthwhile, given that it still does not get you general tail-call elimination.
General tail-call elimination can be implemented using Baker's trick (a function returns the address of the next function to call), but this constrains you calling convention strongly, and for various reasons is messy and undesirable in a Java context.
The ideal way to support general tail-call elimination is to just
do it in the Java interpreter (either for all tail-calls or using
a new operation). The more optimized and tuned the interpreter
is, the more likely is that that this will be tricky and
machine-dependent, since it may involve messy adjustments
to the C stack. It has been proposed that gcc be enhanced to
support a new (extra) calling convention such that the called
function be responsible for popping the arguments of the stack (rather
than the caller). C programs could specify an __attribute__
to
indicate that a function should use the new calling convention (thus
regular functions can call functions using the new convention, and
vice versa). If the Java interpreter is carefully written to use
the new calling convention where appropriate, then it should be fairly
easy to implement general tail-call elimination. If Scheme code is
compiled to native (C or assembler), it can also use the new calling
convention, thus getting full tail-call elimination. The FSF
(i.e. Stallman) is favorable to this approach, which would be
quite a bit of work, but would be generally useful to anyone who
needs general tail-call elimination.
call/cc
function) may be straight-forward in some environments. However, it
cannot be implemented in Java, nor can it be implemented in portable
C code.
One option is to not worry about fully implementation continuations.
Most uses for continuations are to implement exception handling,
threads, and back-tracking. Exception handling and threads are
already provided by Java. Continuations used to implement
(non-resuming) exception handling are a special case of continuations
that only return to stack frames in the current call chain; these
can be implemented using a combination of Continuation
objects and
exception handling. Guile already supports threads (not implemented
using continuations), as does Java; using continuations to switch
threads in such an environment is probably a bad idea. We do not
have a ready interface for back-tracking, except by using continuations,
but I don't think many Scheme applications use back-tracking.
If our Java-byte-code interpreter is embedded in a Scheme system that
already supports continuations, then it is trivial to implement
call/cc
.
An alternative is to have the compiler transform a Scheme program to a form which does not need explicit continuations. One option is to translate to continuation-passing-style. I don't understand this will enough to comment on the implications.
Jeffrey Mark Siskind is working on a highly optimizing compiler to translate Scheme to C. It could possibly be re-targeted to emit Java byte-codes. However, it is written for for a "closed-world assumption" with global analysis of the whole program. There is as yet no support for separate compilation of independent modules, though that is supposedly planned. In any case, if highly-optimized Scheme code is important, there are probably ideas (and code) we can use in Stalin. Also, William Clinger has expressed plans to re-target his Twobit optimizing compiler to emit Java byte-codes, and he believes it should be easy to do.
#f
are compiled into
references to static members of standard classes
(such as Boolean.FALSE
).
eval
, load
, and a
read
-eval
-print
loop
can be done in various ways.
If the bytecode interpreter is inside a Scheme interpreter, then
the latter presumably already has eval
. We can use that.
We would need gateways between the different interpreters, so
different kinds of functions can call each other. To do that,
the calling convention for Java methods should be based on the
already existing Scheme calling convention. Even if we do
a Scheme-in-Java system from scratch, it is not very difficult
to write a simple interpreter for eval
.
Alternatively, we can use bytecode for interpreted code too:
Code that needs to be evaluated is compiled to bytecodes,
and then immediately loaded and executed. Java provides a
methods to load a class from a bytearray in memory, so it is
not necessary to write out a temporary file. One problem is that
Sun's current Java release does not garbage collect classes,
so code that results from eval
is never freed.
This should seldom be a problem,
except possibly in a long-running interactive session,
since well-written Scheme code seldom uses eval
.
The problem can be reduced if we only byte-compile function
definitions, but immediately evaluate other top-level expressions.
And of course one can use a Java interpreter that does
garbage collect classes.
The bytecode-based
RScheme
system implements eval
by compiling and immediately executing
bytecodes. It has support for multiple bytecode interpreters, and has
immediate ("raw" or "unboxed") numbers, so it would probably be a strong
candidate for a combined Scheme/Java system.
Finally, it does not take much code to write a simple Scheme interpreter, This can be byte compiled, and linked in with the rest of the system. This makes most sense in a Scheme system that emphasises large applications, and uses a compiler that does global optimization, such as Stalin.