Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add static typechecker #26

Merged
merged 145 commits into from
Feb 24, 2024
Merged

Add static typechecker #26

merged 145 commits into from
Feb 24, 2024

Conversation

ruuda
Copy link
Owner

@ruuda ruuda commented Jan 3, 2024

Description

This pull request grew waaaay too big. I should have broken it up earlier, but I failed to do so and it became hard to break up because everything is entangled.

Anyway, this adds a static typechecker. The final diff for this pull request is my third attempt at it. I started out pushing down expectations and inferring bottom-up, but it was messy, and I got stuck on functions. Then I switched to having different Rust types for expected types and inferred types, and that worked very well, I implemented it almost all the way — until I got to function arguments, where the expectation and actual type are reversed due to function types being contravariant in the arguments. So that entire approach was a dead end, back to having one Type type it was. But it helped to clarify the type lattice, and the handling of the subtype check (return a Defer if we want a runtime check). It also inspired tracking sources of types, which enables clarifications in error messages.

To do

  • Check goldens named runtime_ for whether they should be renamed to typecheck_.
  • Bring back context about why a particular type is expected (e.g. in conditions).
  • Verify that the fuzzer finds interesting code paths.
  • Ensure test coverage.
  • Include definition site and argument name data in function types, for better error messages.
  • Grep for introduced TODOs and see if they need to be resolved.
  • Update Pygments lexer to accept types. → c9fc556
  • Confirm that the Bison grammar is up to date. → c9fc556
  • Typecheck arguments of all function calls. → Follow-up PR instead.
  • Remove type checks from stdlib interior. → Follow-up PR instead.

@ruuda ruuda mentioned this pull request Jan 28, 2024
5 tasks
@ruuda ruuda force-pushed the types branch 2 times, most recently from e663d6e to df73db2 Compare January 30, 2024 20:35
Base automatically changed from types to master January 30, 2024 21:15
@ruuda ruuda force-pushed the types-static branch 6 times, most recently from b19c00a to 8c5d83e Compare February 5, 2024 18:19
@ruuda ruuda mentioned this pull request Feb 10, 2024
@ruuda ruuda force-pushed the types-static branch 3 times, most recently from dcfacbb to 327da58 Compare February 11, 2024 14:32
@ruuda
Copy link
Owner Author

ruuda commented Feb 11, 2024

Some learnings from the past two days: I wasn’t so happy with the messy and verbose ways the typechecker was implemented, having to check for Dynamic everywhere as a special case, etc. Then I added TypeReq to split requirements and inferred types, and also attach reasons for those requirements. It was a lot more elegant until I got to function types, the arguments are contravariant and suddenly everything breaks down. But one good thing that came out of this was the TypeDiff type as the result of checking. Now I'm back to having a single Type type, but with the subtype check returning TypeDiff, and the case where we have to defer a typecheck to runtime handled like with TypeReq, so it’s still a big step forward. Also I realized that the subtyping relation is not always conclusive. The cases where it's not are exactly the cases where we need to insert a runtime check.

@ruuda ruuda force-pushed the types-static branch 5 times, most recently from 5a6bd84 to 96e8952 Compare February 18, 2024 23:02
@ruuda ruuda force-pushed the types-static branch 3 times, most recently from 7144717 to 8854d06 Compare February 23, 2024 22:35
@ruuda ruuda marked this pull request as ready for review February 23, 2024 22:53
ruuda added 22 commits February 24, 2024 21:54
I thought of this case after writing it yesterday. Especiall an arity
error can occur, and by default we don't add a call frame, so then
replacing may be dangerous if there was a different frame on the stack
already. Though now that I'm writing this, I don't think that case can
happen.
This is only moving things around (and altering some doc comments), no
functional changes.
The previous cases that covered these are now caught statically by the
typechecker, but we can sidestep that with an Any annotation.
Now that the typechecker can determine whether something is a dict or
set (or list) literal, we can use that information in the evaluator.
The complexity moved to the typechecker already, so now we reap the
benefits and simplify the evaluator.

In fact there is another opportunity to simplify the evaluator further
by eliminating the SeqOut type entirely, but I'll do that in a follow-up
commit. The SeqOut served its purpose though, because the typechecker is
based on it.
Now that the evaluator knows what kind of collection to expect, because
of the typechecker, we no longer need to carry this runtime SeqOut thing
around and match on it all the time. We can work with closures instead!
This has more advantages:

 * We don't build up vecs of keys and values when we build a dict, we
   can build the dict directly.
 * Same for sets.
 * Maybe LLVM will even inline stuff?
 * If I want to do the unique dicts that deny duplicate keys, that is
   now much easier too. Just return an error from on_assoc.

But performance aside (which I really don't know if it matters, I
haven't measured), this mainly simplifies the logic, even though the
length of the code is about the same.
Some code in the evaluator where I previously had a to do because of
typechecks is now resolved!
We will likely have at least as many elements as syntactic elements.
(This is not always the case, you could have "if", but that would be
silly, I don't expect it to be common.) You can have many more with
loops, but for large (or even small) lists of static size, this could
help to avoid a few reallocations and not waste so much excess. (But
then the growth could create more excess ...)
Still not supported. Maybe just implementing it is less work ...
Some of these are now no longer runtime errors but static errors.
I think I accidentally moved it there when refactoring the type modules.
Also include a shorter version of the type name in messages now.
No, I don't want to admit "_" as a type name. When we still had
"Dynamic", then yes that was too verbose. But "Any" is fine, we don't
need "_".

There should be one -- and preferably only one -- obvious way to do it.
That way was not obvious to me at first though even though I'm Dutch.
@ruuda ruuda merged commit ba82244 into master Feb 24, 2024
15 checks passed
@ruuda ruuda deleted the types-static branch February 24, 2024 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant