parsing - Flex/Bison: lexing ambiguous tokens -


i'm dealing tricky problem in flex/bison lexer/parser.

here flex rules, roman numerals , arbitrary identifiers:

"i"|"ii"|"iii"|"iv"|"v"|"vi"|"vii"|"i"|"ii"|"iii"|"iv"|"v"|"vi"|"vii" { return numeral; }  "foobar" { return foobar; }  [a-za-z0-9_]+ { return identifier; } 

now, consider simple grammar:

%token <numeral> numeral %token <foobar> foobar %token <identifier> identifier  program    : numeral foobar { }   ; 

finally, here example input:

ivfoobar 

i intend lex numeral iv, followed foobar. however, how can prevent lexing numeral followed identifier "vfoobar", or identifier "ivfoobar", both invalid?

if want process @ lexer level, have make sure rule identifier doesn't match strings starting roman numeral (i,ii,... vii ...).

that's because lex selects rule matches longest input.

maybe excluding roman numeral letters first char of identifier make satisfying set of valid identifiers?

{?i:[a-z0-9_]{-}[ivxlcdm]}{?i:[a-z0-9_]}* { return identifier; } 

Comments

Popular posts from this blog

What is happening when Matlab is starting a "parallel pool"? -

angular - DownloadURL return null in below code -

php - Cannot override Laravel Spark authentication with own implementation -