Skip to content

Store unescaped string in AST node #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wfdewith
Copy link

This PR does the following things:

  • Change the s field in astnodes.String to bytes, since Lua strings can contain invalid UTF-8, while Python 3 str is guaranteed to be UTF-8.
  • Custom unescape function instead of ast.literal_eval, which gives more flexibility when it comes to correctly interpreting escaped characters in Lua strings. For example, both the \u{<hex>} and \<digit> escape codes in Lua have different semantics compared to Python.
  • Update the lexer grammar to ensure the escape codes are valid.
  • Add a raw field to astnodes.String where the original string literal is stored. This is used to reproduce the string literal exactly as it was in the Lua output visitor. There were some bugs with this code that this PR fixes, for example: "\"" becomes """ after a parse -> print round-trip, which is not valid Lua.

Note that changing s to bytes is a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant