That is an even lower level API that lets you manipulate the bytecode as a byte array. You still need to parse it to do anything useful, hence libraries like ASM. And if you want to compile more code at runtime (or generate bytecode), you'll need some way to do that.
It is complete, and I’ve found it extremely usable when writing code to trawl over a large number of class files. Looks like it should be good for code generation as well but I haven’t used that yet.
Yup, the ASM dependency is one that would constantly cause us headaches. A load of frameworks have a path to ASM for one reason or another and it requires an update every time you move up JVM runtimes.
It's usually not painful to update (just bump the version) but it's an annoyance.
It seems like micronaut has been able to avoid runtime bytecode generation by doing everything at compile-time. I wonder if there’s things that you can’t do the micronaut way.
- There are how many computer architectures? A compile-once-run-anywhere binary looks closer to shipping a fancy interpreter with your code than shipping a compiled project. Runtime bytecode generation is one technique for making that fast.
- More generally, anything you don't know till runtime generates a huge amount of bloat if you handle it at compile-time. Imagine, e.g., a UI for dragging and dropping ML components to create an architecture. For as much compute as you're about to pour into training, even for very simple problems, it's worth something that looks like a compilation pass to appropriately fuse everything together. You could probably get away with literally shipping a compiler, but bytecode generation is a reasonable solution too.
- Some things are literally impossible at compile-time without boxing and other overhead. E.g., once upon a time I made a zero-cost-abstraction library allowing you to specify an ML computational graph using the type system (most useful for problems where you're not just doing giant matmuls all day). It was in a language where mutually recursive generics are lazily generated, so you're able to express arbitrary nth derivatives still in the type system, still with zero overhead. What you can't do though is create a runtime program capable of creating arbitrary derivatives; there must be an upper bound for any finite-sized binary (for sufficiently complex starting functions) -- you could cap it at 2nd derivatives or 10th or whatever, but there would have to be a cap. If you move that to runtime though then you can have your cake and eat it too, less the cost of compiling (i.e., bytecode generation) at runtime.
Etc. It's a tradeoff between binary size (which might have to be infinite in the compiled case) and runtime overhead (having to "compile" for each new kind of input you find).
I haven't used micronaut specifically, but I remember using Quarkus when it was rather new. It also does a lot at compile-time compared to, say, spring. The one big disadvantage I noticed that it's had to eject if you need to defer something to runtime for some reason. Don't know if it's still an issue, but that's really the only disadvantage I remember
Reminds me of a side project I did when first starting CS! The Java byte code specification is absolutely approachable and if you've never looked at it before I recommend it (although this project says you can still use it without that knowledge)
> The better question is why use Java for anything these days.
Java (the language) is pretty much "C for the JVM." By that, I mean frameworks/libraries intended for maximum potential use in languages running on the JVM (such as Kotlin, Scala, and of course Java) all support Java (the language) interoperability. Many written in alternate languages targeting the JVM, such as Akka[0], typically have some degree of Java (the language) support as well.
While I prefer to program in one of the alternate programming languages targeting the JVM, I understand why many OSS projects are implemented in Java (the language) for the reasons outlined above.
The problem is, if you are trying to optimize for the JVM, you are already down the wrong path. The JVM is useful in a very small niche when you want something that is faster than Python/Node , but still want cross platform support and somewhat rapid development. The cases where this applies are very niche.
It may allow closer to JVM access, however the entire ecosystem is a colossal mess. The main() implementation in having a class that wraps it is pretty dumb, standard stuff like Lombok hacks the AST (not to mention in general the annotation preprocessors work by printing code strings to file), and the whole dependency injection frameworks are very much separated from actual processing with how much stuff they do in the background.
And then there is the whole Apache foundation with its software being used widely as standard. The same foundation where someone wrote the code that allows log statements to pull arbitrary code from the internet and execute it, and that change made its way past multiple eyes before being merged to production without a single person realizing how crazy it is.
If you want speed, write stuff in C/Rust/Clean C++ (without templates, no C style memory access, e.t.c). If you want to be efficient, write stuff in Python/Node.
I like Rust as much as the next guy, but Kotlin is the most ergonomic programming language I know. So my approach is to use Kotlin by default and should it some day become clear that the service is a bottleneck (or if the cloud cost can be lowered substancially), to only then rewrite it in Rust. At that point the service has probably already gained most of the functionality it'll ever have, which should make the Rust conversion as straight-forward as it can be.
Funny that you assume the best position of the trade off continuum isn't somewhere in the middle for most people. Besides, for developer efficiency, I prefer a language where I don't have to constantly worry if the type system is defeated at runtime.
The best position in the middle is the combination of Python and C. I don't know why people are so aghast about writing small C programs, compiling them, and launching them with Python through an os call.
>I prefer a language where I don't have to constantly worry if the type system is defeated at runtime.
If you are doing this with Python, you are doing something very wrong, even without mypy. As for NodeJS, just use Typescript.
> The best position in the middle is the combination of Python and C.
This is an opinion of which many would disagree, for various legitimate reasons, yet appears to be the polyglot approach you prefer. So let's briefly explore it.
> I don't know why people are so aghast about writing small C programs, compiling them, and launching them with Python through an os call.
There are significant limitations to using fork[0]/exec[1] as a general purpose component integration strategy, not the least of which is the inability of fine-grained bidirectional interactions.
A better "Python and C" integration option is to employ SWIG[2] to incorporate C/C++ libraries directly into the Python execution environment.
You don’t really need any of the apache commons libraries with modern java versions, if that’s what you were referring to. Also I think that most people who are considering doing jvm development would consider kotlin as an alternative language or maybe c# and dotnet as an alternative ecosystem. I believe rust, c or cpp are rarely going to be considerations for most people in that situation.
Kotlin is fatter, compiler is slower, code completion is slow as hell on large projects, but other than building small applications - there's really no reason to not use kotlin except for the fact that you need to actually learn the language or else you're going to end up with very very slow codebase where opening a file and waiting for syntax highlighting takes 2-3 seconds and typing autocomplete is just painfully slow.
"fatter, compiler is slower, code completion is slow as hell" - if that's all you want out of your programming language, then Java is probably a good choice for you.
For others that value the things that Kotlin brings over Java (even modern Java), and for the ways in which it delivers a simpler experience than Scala - I think it's a pragmatic and sensible decision.
I wonder if that confusion is due to the fact that you haven't yet wrapped your head around the fact that extension functions are "just" syntactic sugar for static functions. The implicit "this" becomes the the first parameter of the static function and function parameters can be null. Now you might ask "why not use static (/first class) functions then? Because those "feel" like less ideomatic to use then extension functions or methods that are defined on the object (hirachy) itself.
But understanding why the extension type can be nullable is not the same is using it on nullable types. I restrict my extension functions to non-nullable types most of the time as well. The best exception to this preference -just to see where it makes sense- is the build-in function [toString](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/to-stri...), since you want it to return "null" if you invoke it on null.
val a: SomeType? = null
// I’m forced to null check here
if (a != null) {
a.someMethodOnIt()
}
// But I don’t have to null check here
a.someExtensionFn()
A recent good reason for using Java is that frontier LLMs are trained with very large amounts of high quality enterprise Java source code. Claude Code for example loves Java and its static type system.
I constrain my LLM-generated Java code to only static methods of 20 LOC or less, and limit data types to those that are JSON compatible. Both of these lead to more reliable code and data that Claude Code fully understands and generates.
I am preparing to auto-generate an agent-based application that might reach 1.5 million Java LOC. Hard to imagine accomplishing that with Javascript or Python or C++.
I first generate a specification JSON object from a text design narrative that lists fine-grained steps for each Java class that are constrained to be decomposed such that each fine grained step can be implemented as a static method in 20 lines of Java code or less. Likewise helper methods are similarly scoped to 20 LOC or less.
I also have a markdown-formatted document `core-programming-guidelines.md` that I include in the Claude Code code-generation prompt.
For example:
## Core Programming Principles
### Defensive Programming & Safety
1. *Use 'final' keyword aggressively* for method parameters, local variables, and class fields
2. *Null Safety*: Include null checks with Validate.notNull() and assertions for external calls
3. *Input Validation*: Validate all method parameters with clear preconditions using org.apache.commons.lang3.Validate
### Performance Optimization
1. *Collection Sizing*: Always provide calculated initial capacity for collections
2. *String Processing*: Use StringBuilder with pre-calculated capacity, avoid regex where possible and avoid `java.util.Scanner` where possible.
3. *Memory Management*: Clear large collections when done, reuse objects where appropriate
### Code Clarity & Documentation
1. *Naming Conventions*: Use descriptive names for variables, methods, and constants
- All StringBuilder variables should be suffixed `Builder`.
2. *Documentation*: Comprehensive JavaDoc for all public, protected, and private methods
3. *Inline Comments*: Explain complex logic, algorithms, and business rules
### Modern Java 23 Features
1. *Text Blocks*: Use for multi-line string literals
2. *Pattern Matching*: Use where appropriate for cleaner code
3. *Records*: Use for immutable data carriers
4. *Enhanced Switch*: Use new switch expressions
Note that Java now has its own API for this purpose.
https://openjdk.org/jeps/484
for those who might be clicking through thinking "since when??", the emphasis is on "now" - this was released in JDK 24.
bytebuddy predates it by at least a decade.
Hence the now on my comment. :)
Ideally, tools like ByteBuddy will adopt that API as it's for low level concerns.
We are already living in an (almost) ideal world: https://github.com/raphw/byte-buddy/discussions/1798
Wasn’t there already like instrumentation api that was run via premain?
That is an even lower level API that lets you manipulate the bytecode as a byte array. You still need to parse it to do anything useful, hence libraries like ASM. And if you want to compile more code at runtime (or generate bytecode), you'll need some way to do that.
How does that compare in terms of usability and completeness?
It is complete, and I’ve found it extremely usable when writing code to trawl over a large number of class files. Looks like it should be good for code generation as well but I haven’t used that yet.
I have not yet used it, only raising awareness.
This came to be, because Oracle noticed everyone, including themselves, were depending on ASM, so the JEP was born.
Yup, the ASM dependency is one that would constantly cause us headaches. A load of frameworks have a path to ASM for one reason or another and it requires an update every time you move up JVM runtimes.
It's usually not painful to update (just bump the version) but it's an annoyance.
In fact, Byte buddy has a dep on ASM.
It's complete but low level compared to Byte Buddy. A better comparison is the to ASM (which is what it was meant to replace).
https://asm.ow2.io/
If you are into code generation, another project of interest is Java Poet
https://github.com/square/javapoet
I've used it to do a mass refactoring of an annotation-based library. Worked pretty great.
Should probably link to https://github.com/palantir/javapoet instead, as the Square version has been deprecated since 2020.
Palantir does Java? Jikes !
It seems like micronaut has been able to avoid runtime bytecode generation by doing everything at compile-time. I wonder if there’s things that you can’t do the micronaut way.
Sure:
- There are how many computer architectures? A compile-once-run-anywhere binary looks closer to shipping a fancy interpreter with your code than shipping a compiled project. Runtime bytecode generation is one technique for making that fast.
- More generally, anything you don't know till runtime generates a huge amount of bloat if you handle it at compile-time. Imagine, e.g., a UI for dragging and dropping ML components to create an architecture. For as much compute as you're about to pour into training, even for very simple problems, it's worth something that looks like a compilation pass to appropriately fuse everything together. You could probably get away with literally shipping a compiler, but bytecode generation is a reasonable solution too.
- Some things are literally impossible at compile-time without boxing and other overhead. E.g., once upon a time I made a zero-cost-abstraction library allowing you to specify an ML computational graph using the type system (most useful for problems where you're not just doing giant matmuls all day). It was in a language where mutually recursive generics are lazily generated, so you're able to express arbitrary nth derivatives still in the type system, still with zero overhead. What you can't do though is create a runtime program capable of creating arbitrary derivatives; there must be an upper bound for any finite-sized binary (for sufficiently complex starting functions) -- you could cap it at 2nd derivatives or 10th or whatever, but there would have to be a cap. If you move that to runtime though then you can have your cake and eat it too, less the cost of compiling (i.e., bytecode generation) at runtime.
Etc. It's a tradeoff between binary size (which might have to be infinite in the compiled case) and runtime overhead (having to "compile" for each new kind of input you find).
I haven't used micronaut specifically, but I remember using Quarkus when it was rather new. It also does a lot at compile-time compared to, say, spring. The one big disadvantage I noticed that it's had to eject if you need to defer something to runtime for some reason. Don't know if it's still an issue, but that's really the only disadvantage I remember
I think that's noteworthy, but just not necessary. Still really cool if memory usage and startup times are your constraints.
Reminds me of a side project I did when first starting CS! The Java byte code specification is absolutely approachable and if you've never looked at it before I recommend it (although this project says you can still use it without that knowledge)
where to start?
https://docs.oracle.com/javase/specs/jvms/se8/html/
https://medium.com/@davethomas_9528/writing-hello-world-in-j...
The better question is why use Java for anything these days. If you really need to run something with JVM, use Kotlin.
> The better question is why use Java for anything these days.
Java (the language) is pretty much "C for the JVM." By that, I mean frameworks/libraries intended for maximum potential use in languages running on the JVM (such as Kotlin, Scala, and of course Java) all support Java (the language) interoperability. Many written in alternate languages targeting the JVM, such as Akka[0], typically have some degree of Java (the language) support as well.
While I prefer to program in one of the alternate programming languages targeting the JVM, I understand why many OSS projects are implemented in Java (the language) for the reasons outlined above.
0 - https://github.com/akka/akka
The problem is, if you are trying to optimize for the JVM, you are already down the wrong path. The JVM is useful in a very small niche when you want something that is faster than Python/Node , but still want cross platform support and somewhat rapid development. The cases where this applies are very niche.
It may allow closer to JVM access, however the entire ecosystem is a colossal mess. The main() implementation in having a class that wraps it is pretty dumb, standard stuff like Lombok hacks the AST (not to mention in general the annotation preprocessors work by printing code strings to file), and the whole dependency injection frameworks are very much separated from actual processing with how much stuff they do in the background.
And then there is the whole Apache foundation with its software being used widely as standard. The same foundation where someone wrote the code that allows log statements to pull arbitrary code from the internet and execute it, and that change made its way past multiple eyes before being merged to production without a single person realizing how crazy it is.
If you want speed, write stuff in C/Rust/Clean C++ (without templates, no C style memory access, e.t.c). If you want to be efficient, write stuff in Python/Node.
I like Rust as much as the next guy, but Kotlin is the most ergonomic programming language I know. So my approach is to use Kotlin by default and should it some day become clear that the service is a bottleneck (or if the cloud cost can be lowered substancially), to only then rewrite it in Rust. At that point the service has probably already gained most of the functionality it'll ever have, which should make the Rust conversion as straight-forward as it can be.
Have you ever coded groovy? If so, besides that it is nigh a dead pang, what does kotlin offer?
Kotlin is more type oriented, while offering a lot of niceness in terms of syntax (like not having to have a class that surrounds main).
> If you want speed
> If you want to be efficient
Funny that you assume the best position of the trade off continuum isn't somewhere in the middle for most people. Besides, for developer efficiency, I prefer a language where I don't have to constantly worry if the type system is defeated at runtime.
The best position in the middle is the combination of Python and C. I don't know why people are so aghast about writing small C programs, compiling them, and launching them with Python through an os call.
>I prefer a language where I don't have to constantly worry if the type system is defeated at runtime.
If you are doing this with Python, you are doing something very wrong, even without mypy. As for NodeJS, just use Typescript.
> The best position in the middle is the combination of Python and C.
This is an opinion of which many would disagree, for various legitimate reasons, yet appears to be the polyglot approach you prefer. So let's briefly explore it.
> I don't know why people are so aghast about writing small C programs, compiling them, and launching them with Python through an os call.
There are significant limitations to using fork[0]/exec[1] as a general purpose component integration strategy, not the least of which is the inability of fine-grained bidirectional interactions.
A better "Python and C" integration option is to employ SWIG[2] to incorporate C/C++ libraries directly into the Python execution environment.
0 - https://man.freebsd.org/cgi/man.cgi?query=fork&apropos=0&sek...
1 - https://man.freebsd.org/cgi/man.cgi?query=execve&sektion=2&a...
2 - https://swig.org/
You don’t really need any of the apache commons libraries with modern java versions, if that’s what you were referring to. Also I think that most people who are considering doing jvm development would consider kotlin as an alternative language or maybe c# and dotnet as an alternative ecosystem. I believe rust, c or cpp are rarely going to be considerations for most people in that situation.
As a Kotlin enjoyer, I find these comments counterproductive. Maybe they like the lack of extension functions?
Kotlin is fatter, compiler is slower, code completion is slow as hell on large projects, but other than building small applications - there's really no reason to not use kotlin except for the fact that you need to actually learn the language or else you're going to end up with very very slow codebase where opening a file and waiting for syntax highlighting takes 2-3 seconds and typing autocomplete is just painfully slow.
"fatter, compiler is slower, code completion is slow as hell" - if that's all you want out of your programming language, then Java is probably a good choice for you.
For others that value the things that Kotlin brings over Java (even modern Java), and for the ways in which it delivers a simpler experience than Scala - I think it's a pragmatic and sensible decision.
I do like the lack of extension functions. I find them confusing, especially when you can use them on things that are null.
I wonder if that confusion is due to the fact that you haven't yet wrapped your head around the fact that extension functions are "just" syntactic sugar for static functions. The implicit "this" becomes the the first parameter of the static function and function parameters can be null. Now you might ask "why not use static (/first class) functions then? Because those "feel" like less ideomatic to use then extension functions or methods that are defined on the object (hirachy) itself. But understanding why the extension type can be nullable is not the same is using it on nullable types. I restrict my extension functions to non-nullable types most of the time as well. The best exception to this preference -just to see where it makes sense- is the build-in function [toString](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/to-stri...), since you want it to return "null" if you invoke it on null.
I have wrapped my head around it. I think it's confusing to the reader and creates awkward semantics.
Yeah, extension functions are one of those features that went from 'oh, this is nice' to "this is so overused it's counterproductive".
It makes reading a lot of Kotlin source quite terrible.
Lately they've been shoveling a lot of similar magical "code comes from somewhere" features into a language, slowly giving it a C++ clutter type feel.
What I mean by that is this:
It’s weird.A recent good reason for using Java is that frontier LLMs are trained with very large amounts of high quality enterprise Java source code. Claude Code for example loves Java and its static type system.
I constrain my LLM-generated Java code to only static methods of 20 LOC or less, and limit data types to those that are JSON compatible. Both of these lead to more reliable code and data that Claude Code fully understands and generates.
I am preparing to auto-generate an agent-based application that might reach 1.5 million Java LOC. Hard to imagine accomplishing that with Javascript or Python or C++.
Could you please expand on how you limit the generated code? I haven't dived deep into Claude code, mostly just familiar with OpenAI's offering.
I first generate a specification JSON object from a text design narrative that lists fine-grained steps for each Java class that are constrained to be decomposed such that each fine grained step can be implemented as a static method in 20 lines of Java code or less. Likewise helper methods are similarly scoped to 20 LOC or less.
I also have a markdown-formatted document `core-programming-guidelines.md` that I include in the Claude Code code-generation prompt.
For example:
## Core Programming Principles
### Defensive Programming & Safety 1. *Use 'final' keyword aggressively* for method parameters, local variables, and class fields 2. *Null Safety*: Include null checks with Validate.notNull() and assertions for external calls 3. *Input Validation*: Validate all method parameters with clear preconditions using org.apache.commons.lang3.Validate
### Performance Optimization 1. *Collection Sizing*: Always provide calculated initial capacity for collections 2. *String Processing*: Use StringBuilder with pre-calculated capacity, avoid regex where possible and avoid `java.util.Scanner` where possible. 3. *Memory Management*: Clear large collections when done, reuse objects where appropriate
### Code Clarity & Documentation 1. *Naming Conventions*: Use descriptive names for variables, methods, and constants - All StringBuilder variables should be suffixed `Builder`. 2. *Documentation*: Comprehensive JavaDoc for all public, protected, and private methods 3. *Inline Comments*: Explain complex logic, algorithms, and business rules
### Modern Java 23 Features 1. *Text Blocks*: Use for multi-line string literals 2. *Pattern Matching*: Use where appropriate for cleaner code 3. *Records*: Use for immutable data carriers 4. *Enhanced Switch*: Use new switch expressions
> A recent good reason for using Java is that frontier LLMs are trained with very large amounts of high quality enterprise Java source code.
Where did it get it from?
GitHub public repositories mostly.
I don't think Github is necessarily full of high quality enterprise Java software, is it?
I've worked in both and I prefer Java