parsing - Flex/Bison: lexing ambiguous tokens -

March 15, 2015

i'm dealing tricky problem in flex/bison lexer/parser.

here flex rules, roman numerals , arbitrary identifiers:

"i"|"ii"|"iii"|"iv"|"v"|"vi"|"vii"|"i"|"ii"|"iii"|"iv"|"v"|"vi"|"vii" { return numeral; }  "foobar" { return foobar; }  [a-za-z0-9_]+ { return identifier; }

now, consider simple grammar:

%token <numeral> numeral %token <foobar> foobar %token <identifier> identifier  program    : numeral foobar { }   ;

finally, here example input:

ivfoobar

i intend lex numeral iv, followed foobar. however, how can prevent lexing numeral followed identifier "vfoobar", or identifier "ivfoobar", both invalid?

if want process @ lexer level, have make sure rule identifier doesn't match strings starting roman numeral (i,ii,... vii ...).

that's because lex selects rule matches longest input.

maybe excluding roman numeral letters first char of identifier make satisfying set of valid identifiers?

{?i:[a-z0-9_]{-}[ivxlcdm]}{?i:[a-z0-9_]}* { return identifier; }

Search This Blog

How Y

parsing - Flex/Bison: lexing ambiguous tokens -

Comments

Post a Comment

Popular posts from this blog

meteor - inserting data to database gives error "insert failed: Method '/texts/insert' not found" -

angular - DownloadURL return null in below code -

html - unterminated string literal “onclick” event in anchor -