Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider making Grammar a hashmap #166

Open
Carlyle-Foster opened this issue Feb 5, 2025 · 6 comments
Open

consider making Grammar a hashmap #166

Carlyle-Foster opened this issue Feb 5, 2025 · 6 comments
Milestone

Comments

@Carlyle-Foster
Copy link
Contributor

Having Grammar be a simple list of productions is elegant enough, but it doesn't really model the structure of a BNF grammar properly since there's nothing in it's structure explicitly connecting the nonterminals on the right to the nonterminals on the left, making Grammar a hashmap<Term, Vec<Expression>> would allow users to traverse grammars out of the box and be a good middle-ground between the current flat grammars and a fully connected approach.
i'm still uncertain about this idea and it's relatively low priority for now, in the meantime any feedback would be appreciated.

@CrockAgile
Copy link
Collaborator

I think this is a fair point about Grammar. It will have to be a bit more complex than that, because there can be multiple Term -> [Expression]. But the need for this arises in a few places, including the EarleyParser. To still provide convenient iteration, BTreeMap will probably be desirable, and should be a non breaking change for users.

@CrockAgile CrockAgile added this to the 0.6 milestone Feb 9, 2025
@Carlyle-Foster
Copy link
Contributor Author

can't multiple Term -> [Expression]'s just be combined into one? it seems semantically equivalent

@CrockAgile
Copy link
Collaborator

I think yes it is semantically equivalent, but then in order to track which was used for parsing input, it is easier to use:

ProdId = 0, <start> ::= <a> <b>
ProdId = 1, <start> ::= <c> <d>

// record during parsing that ProdId(1) was used to match input text

compared to:

ProdId = 0, <start> ::= <a> <b> | <c> <d>

// record during parsing that ProdId(0) and Index(1) was used to match input text

maybe silly reason but it certainly helps me when reasoning about the parser.

also another maybe silly reason, but someday I hope to have an API that supports user provided constructors for parse nodes, which having only one Expression per Prod helps:

let my_parser = Grammer::new([
   ("<start> ::= 'a' 'b'", UserDefinedStart::from_a_and_b),
   ("<start> ::= 'c' 'd'", UserDefinedStart::from_c_and_d),
]);

// then use that parser
let parsed : UserDefinedStart = my_parser.parse("cd").next().unwrap();

@Carlyle-Foster
Copy link
Contributor Author

couldn't we just do what we're doing now, using the normalized version for parsing but not exposing it to the user? if they give us s = a / b and s = c we combine it into s = a / b / c but normalize it to s = a, s = b and s = c before parsing

@Carlyle-Foster
Copy link
Contributor Author

though know that i say that i realize it makes decomposing productions the same way u might compose them impossible

@Carlyle-Foster
Copy link
Contributor Author

not sure if that's a compelling enough use case though, u can still decompose productions easily enough so long as they're normalized in the first place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants