consider making Grammar a hashmap #166

Carlyle-Foster · 2025-02-05T09:39:27Z

Having Grammar be a simple list of productions is elegant enough, but it doesn't really model the structure of a BNF grammar properly since there's nothing in it's structure explicitly connecting the nonterminals on the right to the nonterminals on the left, making Grammar a hashmap<Term, Vec<Expression>> would allow users to traverse grammars out of the box and be a good middle-ground between the current flat grammars and a fully connected approach.
i'm still uncertain about this idea and it's relatively low priority for now, in the meantime any feedback would be appreciated.

The text was updated successfully, but these errors were encountered:

CrockAgile · 2025-02-09T20:54:07Z

I think this is a fair point about Grammar. It will have to be a bit more complex than that, because there can be multiple Term -> [Expression]. But the need for this arises in a few places, including the EarleyParser. To still provide convenient iteration, BTreeMap will probably be desirable, and should be a non breaking change for users.

Carlyle-Foster · 2025-02-09T21:56:43Z

can't multiple Term -> [Expression]'s just be combined into one? it seems semantically equivalent

CrockAgile · 2025-02-09T22:20:06Z

I think yes it is semantically equivalent, but then in order to track which was used for parsing input, it is easier to use:

ProdId = 0, <start> ::= <a> <b>
ProdId = 1, <start> ::= <c> <d>

// record during parsing that ProdId(1) was used to match input text

compared to:

ProdId = 0, <start> ::= <a> <b> | <c> <d>

// record during parsing that ProdId(0) and Index(1) was used to match input text

maybe silly reason but it certainly helps me when reasoning about the parser.

also another maybe silly reason, but someday I hope to have an API that supports user provided constructors for parse nodes, which having only one Expression per Prod helps:

let my_parser = Grammer::new([
   ("<start> ::= 'a' 'b'", UserDefinedStart::from_a_and_b),
   ("<start> ::= 'c' 'd'", UserDefinedStart::from_c_and_d),
]);

// then use that parser
let parsed : UserDefinedStart = my_parser.parse("cd").next().unwrap();

Carlyle-Foster · 2025-02-09T22:31:15Z

couldn't we just do what we're doing now, using the normalized version for parsing but not exposing it to the user? if they give us s = a / b and s = c we combine it into s = a / b / c but normalize it to s = a, s = b and s = c before parsing

Carlyle-Foster · 2025-02-09T22:32:39Z

though know that i say that i realize it makes decomposing productions the same way u might compose them impossible

Carlyle-Foster · 2025-02-09T22:34:56Z

not sure if that's a compelling enough use case though, u can still decompose productions easily enough so long as they're normalized in the first place

CrockAgile added this to the 0.6 milestone Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider making Grammar a hashmap #166

consider making Grammar a hashmap #166

Carlyle-Foster commented Feb 5, 2025

CrockAgile commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025

CrockAgile commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025

consider making Grammar a hashmap #166

consider making Grammar a hashmap #166

Comments

Carlyle-Foster commented Feb 5, 2025

CrockAgile commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025

CrockAgile commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025

Carlyle-Foster commented Feb 9, 2025