parsing - Flex/Bison: lexing ambiguous tokens -
i'm dealing tricky problem in flex/bison lexer/parser.
here flex rules, roman numerals , arbitrary identifiers:
"i"|"ii"|"iii"|"iv"|"v"|"vi"|"vii"|"i"|"ii"|"iii"|"iv"|"v"|"vi"|"vii" { return numeral; } "foobar" { return foobar; } [a-za-z0-9_]+ { return identifier; } now, consider simple grammar:
%token <numeral> numeral %token <foobar> foobar %token <identifier> identifier program : numeral foobar { } ; finally, here example input:
ivfoobar i intend lex numeral iv, followed foobar. however, how can prevent lexing numeral followed identifier "vfoobar", or identifier "ivfoobar", both invalid?
if want process @ lexer level, have make sure rule identifier doesn't match strings starting roman numeral (i,ii,... vii ...).
that's because lex selects rule matches longest input.
maybe excluding roman numeral letters first char of identifier make satisfying set of valid identifiers?
{?i:[a-z0-9_]{-}[ivxlcdm]}{?i:[a-z0-9_]}* { return identifier; }
Comments
Post a Comment