Can you please explain Dyna? I'm trying to understand it using what I know about Prolog. For example, what does this rule (?) do?
phrase(X,I,I+1) max= rule(X, word(I)).
What is the 'max=' operator(?). Is phrase/3 an interface to Definite Clause Grammars as in Prolog?
How about this one?
phrase(X,I,K) max= phrase(Y,I,K) * rule(X,Y).
Is the * operator (?) to be interpreted as a multiplication? As a disjunction or other logical operation? If a multiplication, what is it multiplying?
Thanks!
Edit: I'm struggling with the Fibonacci example. The text says that Dyna 2.0 uses unification "like Prolog" but I can't make head or tails of how the first-order terms in the following rule would unify:
fib(N) := fib(N-1) + fib(N-2).
It seems that 'N-1' and 'N-2' are meant to be replaced by their values. In that case, when does unification happen? Are the terms evaluated, then their values unified? Obviously those values would (eventually) be constants, so where is the unification happening? Is this something to do with lazy evaluation? Or is unification here only used as a value-passing mechanism, to propagate the value of N through the expression?
As refset said in the other comment, In Dyna, terms return values like a function in a functional programming language. Hence, when you write
word(I)
this returns the value defined by the `word` function. E.g.
word(0) = "Hello".
This is different from typical logic programming languages like Prolog in that if you write
foo(X, bar(Y))
then you are "calling" the term `foo`, but then `bar` ends up being a structured-term that gets unified with the second argument of `foo`. Prolog provides this shortcut as "calling" `bar` doesn't make sense in this case, as `bar` only would return the value `true`, which isn't particularly useful.
In Dyna, we provide square brackets to create structured-terms as both calling the term "bar" and creating the structured-term bar can be useful. E.g. the Prolog expression `foo(X, bar(Y))` would be equivalent to the dyna `foo(X, bar[Y])`.
For the aggregator `max=`, this is looping over the different possible values on the right-hand-side and selecting the value that is the max. Hence in
phrase(X,I,K) max= phrase(Y,I,K) \* rule(X,Y).
this is looping over the variable `Y` and selecting the one that is maximal. Using `max=` on
phrase(X,I,I+1) max= rule(X, word(I)).
is done because we want all of the right-hand-sides to use the same aggregator so that we can aggregate between different rules that contribute to `phrase/3`.
Thanks! Honesty: I don't know if I have the time to read a dissertation (or two, or three, judging from the references on the article above). I will try to make some time because Dyna looks interesting.
Regarding 'max=' I guess then phrase/3 is calculating... the string of words I with maximum probability? Which IIUC is bound to Y? If so, that's cool. I've done that with DCGs in the past, but the functional syntax makes it ... look more like a function :)
>> E.g. the Prolog expression `foo(X, bar(Y))` would be equivalent to the dyna `foo(X, bar[Y])`.
I'm guessing that's for convenience? I think it's not rare for functional languages to have a "quoting" (?) mechanism like that?
Edit: Might be a good idea to add an example that also shows a program's output, alongside its source code.
The dissertation covers extensive details, but on your last point at least it describes:
> 2.2.1 Evaluation by Default. One of the major syntactic differences between Dyna and other logic programming languages is that Dyna evaluates an expression in place by default. The reason for this change is that most terms have a meaningful value, much like how a function returns a value in a functional programming language. Conversely, in logic programming languages such as Prolog or Datalog, terms only “return” the value of true.
Prolog (and Datalog) plays fast and loose with the original FOL terminology from which its own is derived, so e.g. a "term" in Prolog is both an "atomic formula" or "atom" in FOL, and a "first order term" (or just term) in FOL. In FOL the sets of function and predicate symbols are distinct and disjoint, but in Prolog they are not, and my guess has always been that this is the reason that everything gets called a "term" (and "atom" ends up replacing "constant").
And nothing gets evaluated, either in FOL or Prolog/Datalog, so I'd like to nitpick and say that terms don't '"return" the value of true', they are only transformed, by unification and -in the case of Prolog- by SLD-Resolution. The interpretation of a Prolog program eventually "returns" the result of a proof, which is usually 'true' or 'false', but sometimes 'yes' or 'no'; but nothing in a Prolog program can really be seen as "returning" anything. It's a peculiarity of logic programming languages that I think takes a while for most programmers to get their heads around.
So a logic programming language that returns values and replaces expressions by their values is a substantial departure from Prolog syntax and semantics. But, you know that judging from the excerpt above.
It's very, interesting work. People keep wondering about production use. What they don't realize (or maybe do) is that going from high-level descriptions to efficient execution on imperative GPU's is solving a series of NP-hard problems, like ASIC synthesis. A few tools, like Cray's Chapel, do a pretty good job when given more specific information, though.
My background was high-assurance systems that are provably correct. The field found that designing something for easy analysis and verification was the opposite of for efficiency. So, the solution became to verify the high-level description (formal specifications) first. Then, verify equivalence to a lower-level, efficient form which was probably designed side by side with the other one to make that easier.
Having only read the abstract and example code, I see your work being most valuable as what we called an "executable, spefication language." Those were specs or models close to original, mathematical description that could actually run for exploration and testing. They were also, in theory, easy to modify by the researcher who could focus on intended behavior rather than low-level code.
You are already using transformations to produce low-level implementation. My brainstorm would be to do a bottom-up approach of making common operations on GPU's, in libraries like Jax, and rules about how to integrate them. Then, synethesize combinations of them with constraints on combinations with the system further specializing the components as it went on. (It will be interesting to see what you did.)
I think HLS on ASIC's, Cray's Chapel, and high-level libraries in machine learning show such an approach could make implementations fast enough to test theories on small models. I think someone should build on these concepts trying to connect them to primitives in Chapel or Jax or something already designed to itself synthesize or abstract away many low-level details. Then, implement pieces of common, useful models in it with pre-synthesized implementations.
Shorter version: I think it's neat work.worth further exploration which might accelerate development and validation of ML algorithms even with sub-optimal, runtime performance.
(And I still haven't read the papers. This is just independent brainstorming I do first to see if our ideas are converging any which has its own value.)
This language seems quite similar to Scallop [1], which was recently posted to HN [2]. Both are extensions of Datalog to arbitrary semirings, meaning that they generalise assigning merely true or false to relational statements, allowing probabilistic reasoning and more (almost arbitrary tagging of statements). Scallop is further focused on being differentiable and being able to integrate Scallop code into a PyTorch function. Both seem to have quite a bit of work put into them and have JIT compilers (Scallop also has a GPU implementation). I like the sound of "I have further modernized Dyna to support functional programming with lambda closures and embedded domain-specific languages." [3]
Same lineage (weighted/semiring logic programming for ML), but different system. Francis Landau’s work (Dyna) is a term rewriting implementation is of a weighted logic language with bag relational semantics, dynamic programming, and a tracing JIT. Scallop is a Datalog style neurosymbolic language built on provenance semirings with differentiable/relaxed semantics intended for e2e training with NNs. Consider Scallop a variation branch in the lineage optimized for differentiable neurosymbolic learning vs Dyna is a more general weighted logic programming framework with a different execution mode
I'd be fascinated to hear about the author's experience using Clojure for something as complex as a compiler. Was the lack of types an issue? Or was the simplicity and flexibility of the language worth the tradeoff?
In picking Clojure, there were two main things that I was looking for in a language: compile-time macros and a runtime eval function. The reason for wanting compile time macros is that without them, you essentially create another DSL to generate code in you language of choice (e.g. tablegen and pdll in the llvm project). As such I was mainly considering at LISP-like language for the implementation of dyna3. Clojure's emphasis on immutable data structures also fit nicely with the design for R-exprs.
Lack of types in Clojure wasn't an issue in terms of getting things working. However, I did end up getting very annoyed with the runtime speed and quality of the generated code from the Clojure implementation. Features such as `^:dynamic` and `defprotocol` were too slow, so I rolled my own by wrapping a Java class and using macros. I also had to replace Clojure's builtin map with my own implementation of a map for performance-critical code. The R-exprs themselves are implemented using `deftype`, which generates a Java class rather than using maps to hold the relevant data.
Not author, but I made the experience of writing a compiler, linker and assembler for a little programable ASIC.
We did it in CL, because we had only 1 month. The big advantage of a lisp for a compiler, is that you don’t need to make a parser: if you accept to follow lisp syntax, you can reuse the reader.
For making the linker is helped to have all in structs and lists structures. Each compiled piece of code was a list with actual binary code, where it had prepended the information as symbol name, pointers to where to change for relocation, etc.
The dynamic typing (sorry if I’m pedantic, but lisp is typed, even strong typed, but dynamic) played in favor. As a matter of fact, after we did 90% of the work we decided to change the type of some parts of the structures, it was almost no work. Had it been a static typed language, it would had been a MAJOR rewrite.
The fact that CL can also be static typed helped with performance, as for some thing we forced long int.
In certain respects, but the lack of static types can become a burden very fast. And compilers are a compelling use case for statically typed languages given the frequency with which constructors can appear.
I just finished a bit of spelunking in the Clojure Repo of Dyna3, and I got the distinct feeling that I had just stumbled upon an alien artifact from the future.
The focus of this work was a research project. IMO, a mature system would require a several more person years of work. However, there is nothing stopping you from using it in a production system if you find it useful (there is a python, clojure, and java api).
It's nice to discover my PhD research trending on Hacker news.
https://matthewfl.com/phd
https://matthewfl.com/papers/mfl-dissertation.pdf
https://github.com/argolab/dyna3
https://www.youtube.com/watch?v=sXRvba2yjY0
Can you please explain Dyna? I'm trying to understand it using what I know about Prolog. For example, what does this rule (?) do?
What is the 'max=' operator(?). Is phrase/3 an interface to Definite Clause Grammars as in Prolog?How about this one?
Is the * operator (?) to be interpreted as a multiplication? As a disjunction or other logical operation? If a multiplication, what is it multiplying?Thanks!
Edit: I'm struggling with the Fibonacci example. The text says that Dyna 2.0 uses unification "like Prolog" but I can't make head or tails of how the first-order terms in the following rule would unify:
It seems that 'N-1' and 'N-2' are meant to be replaced by their values. In that case, when does unification happen? Are the terms evaluated, then their values unified? Obviously those values would (eventually) be constants, so where is the unification happening? Is this something to do with lazy evaluation? Or is unification here only used as a value-passing mechanism, to propagate the value of N through the expression?Author here.
Chapter 2 of my dissertation covers a lot about the syntax of Dyna https://matthewfl.com/papers/mfl-dissertation.pdf
As refset said in the other comment, In Dyna, terms return values like a function in a functional programming language. Hence, when you write
this returns the value defined by the `word` function. E.g. This is different from typical logic programming languages like Prolog in that if you write then you are "calling" the term `foo`, but then `bar` ends up being a structured-term that gets unified with the second argument of `foo`. Prolog provides this shortcut as "calling" `bar` doesn't make sense in this case, as `bar` only would return the value `true`, which isn't particularly useful.In Dyna, we provide square brackets to create structured-terms as both calling the term "bar" and creating the structured-term bar can be useful. E.g. the Prolog expression `foo(X, bar(Y))` would be equivalent to the dyna `foo(X, bar[Y])`.
For the aggregator `max=`, this is looping over the different possible values on the right-hand-side and selecting the value that is the max. Hence in
this is looping over the variable `Y` and selecting the one that is maximal. Using `max=` on is done because we want all of the right-hand-sides to use the same aggregator so that we can aggregate between different rules that contribute to `phrase/3`.Thanks! Honesty: I don't know if I have the time to read a dissertation (or two, or three, judging from the references on the article above). I will try to make some time because Dyna looks interesting.
Regarding 'max=' I guess then phrase/3 is calculating... the string of words I with maximum probability? Which IIUC is bound to Y? If so, that's cool. I've done that with DCGs in the past, but the functional syntax makes it ... look more like a function :)
>> E.g. the Prolog expression `foo(X, bar(Y))` would be equivalent to the dyna `foo(X, bar[Y])`.
I'm guessing that's for convenience? I think it's not rare for functional languages to have a "quoting" (?) mechanism like that?
Edit: Might be a good idea to add an example that also shows a program's output, alongside its source code.
The dissertation covers extensive details, but on your last point at least it describes:
> 2.2.1 Evaluation by Default. One of the major syntactic differences between Dyna and other logic programming languages is that Dyna evaluates an expression in place by default. The reason for this change is that most terms have a meaningful value, much like how a function returns a value in a functional programming language. Conversely, in logic programming languages such as Prolog or Datalog, terms only “return” the value of true.
Prolog (and Datalog) plays fast and loose with the original FOL terminology from which its own is derived, so e.g. a "term" in Prolog is both an "atomic formula" or "atom" in FOL, and a "first order term" (or just term) in FOL. In FOL the sets of function and predicate symbols are distinct and disjoint, but in Prolog they are not, and my guess has always been that this is the reason that everything gets called a "term" (and "atom" ends up replacing "constant").
And nothing gets evaluated, either in FOL or Prolog/Datalog, so I'd like to nitpick and say that terms don't '"return" the value of true', they are only transformed, by unification and -in the case of Prolog- by SLD-Resolution. The interpretation of a Prolog program eventually "returns" the result of a proof, which is usually 'true' or 'false', but sometimes 'yes' or 'no'; but nothing in a Prolog program can really be seen as "returning" anything. It's a peculiarity of logic programming languages that I think takes a while for most programmers to get their heads around.
So a logic programming language that returns values and replaces expressions by their values is a substantial departure from Prolog syntax and semantics. But, you know that judging from the excerpt above.
Pedantry!
It's very, interesting work. People keep wondering about production use. What they don't realize (or maybe do) is that going from high-level descriptions to efficient execution on imperative GPU's is solving a series of NP-hard problems, like ASIC synthesis. A few tools, like Cray's Chapel, do a pretty good job when given more specific information, though.
My background was high-assurance systems that are provably correct. The field found that designing something for easy analysis and verification was the opposite of for efficiency. So, the solution became to verify the high-level description (formal specifications) first. Then, verify equivalence to a lower-level, efficient form which was probably designed side by side with the other one to make that easier.
Having only read the abstract and example code, I see your work being most valuable as what we called an "executable, spefication language." Those were specs or models close to original, mathematical description that could actually run for exploration and testing. They were also, in theory, easy to modify by the researcher who could focus on intended behavior rather than low-level code.
You are already using transformations to produce low-level implementation. My brainstorm would be to do a bottom-up approach of making common operations on GPU's, in libraries like Jax, and rules about how to integrate them. Then, synethesize combinations of them with constraints on combinations with the system further specializing the components as it went on. (It will be interesting to see what you did.)
I think HLS on ASIC's, Cray's Chapel, and high-level libraries in machine learning show such an approach could make implementations fast enough to test theories on small models. I think someone should build on these concepts trying to connect them to primitives in Chapel or Jax or something already designed to itself synthesize or abstract away many low-level details. Then, implement pieces of common, useful models in it with pre-synthesized implementations.
Shorter version: I think it's neat work.worth further exploration which might accelerate development and validation of ML algorithms even with sub-optimal, runtime performance.
(And I still haven't read the papers. This is just independent brainstorming I do first to see if our ideas are converging any which has its own value.)
This language seems quite similar to Scallop [1], which was recently posted to HN [2]. Both are extensions of Datalog to arbitrary semirings, meaning that they generalise assigning merely true or false to relational statements, allowing probabilistic reasoning and more (almost arbitrary tagging of statements). Scallop is further focused on being differentiable and being able to integrate Scallop code into a PyTorch function. Both seem to have quite a bit of work put into them and have JIT compilers (Scallop also has a GPU implementation). I like the sound of "I have further modernized Dyna to support functional programming with lambda closures and embedded domain-specific languages." [3]
Going to try it out.
[1] https://www.scallop-lang.org/
[2] https://news.ycombinator.com/item?id=43443640
[3] https://matthewfl.com/research#phd
Same lineage (weighted/semiring logic programming for ML), but different system. Francis Landau’s work (Dyna) is a term rewriting implementation is of a weighted logic language with bag relational semantics, dynamic programming, and a tracing JIT. Scallop is a Datalog style neurosymbolic language built on provenance semirings with differentiable/relaxed semantics intended for e2e training with NNs. Consider Scallop a variation branch in the lineage optimized for differentiable neurosymbolic learning vs Dyna is a more general weighted logic programming framework with a different execution mode
> Dyna3 — A new implementation of the Dyna programming language written in Clojure
There are some epic looking Clojure namespaces here, e.g. this JIT compiler https://github.com/argolab/dyna3/blob/master/src/clojure/dyn...
I'd be fascinated to hear about the author's experience using Clojure for something as complex as a compiler. Was the lack of types an issue? Or was the simplicity and flexibility of the language worth the tradeoff?
Author here.
In picking Clojure, there were two main things that I was looking for in a language: compile-time macros and a runtime eval function. The reason for wanting compile time macros is that without them, you essentially create another DSL to generate code in you language of choice (e.g. tablegen and pdll in the llvm project). As such I was mainly considering at LISP-like language for the implementation of dyna3. Clojure's emphasis on immutable data structures also fit nicely with the design for R-exprs.
Lack of types in Clojure wasn't an issue in terms of getting things working. However, I did end up getting very annoyed with the runtime speed and quality of the generated code from the Clojure implementation. Features such as `^:dynamic` and `defprotocol` were too slow, so I rolled my own by wrapping a Java class and using macros. I also had to replace Clojure's builtin map with my own implementation of a map for performance-critical code. The R-exprs themselves are implemented using `deftype`, which generates a Java class rather than using maps to hold the relevant data.
Not author, but I made the experience of writing a compiler, linker and assembler for a little programable ASIC.
We did it in CL, because we had only 1 month. The big advantage of a lisp for a compiler, is that you don’t need to make a parser: if you accept to follow lisp syntax, you can reuse the reader.
For making the linker is helped to have all in structs and lists structures. Each compiled piece of code was a list with actual binary code, where it had prepended the information as symbol name, pointers to where to change for relocation, etc.
The dynamic typing (sorry if I’m pedantic, but lisp is typed, even strong typed, but dynamic) played in favor. As a matter of fact, after we did 90% of the work we decided to change the type of some parts of the structures, it was almost no work. Had it been a static typed language, it would had been a MAJOR rewrite.
The fact that CL can also be static typed helped with performance, as for some thing we forced long int.
I believe lisps are commonly thought of as being good for writing compilers, and Clojure has more features than your average lisp.
In certain respects, but the lack of static types can become a burden very fast. And compilers are a compelling use case for statically typed languages given the frequency with which constructors can appear.
I just finished a bit of spelunking in the Clojure Repo of Dyna3, and I got the distinct feeling that I had just stumbled upon an alien artifact from the future.
https://github.com/argolab/dyna3
And it has a Python wrapper, https://github.com/argolab/dyna3/tree/master/dyna_python_mod....
Would this be useful in a production system today?
The focus of this work was a research project. IMO, a mature system would require a several more person years of work. However, there is nothing stopping you from using it in a production system if you find it useful (there is a python, clojure, and java api).
[dead]