-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a parser to easily create dialogues from plain text files #10
Comments
I considered that, but there are also great open-source text editors out there and for the visual novel course we're working on, I'd prefer using one of them. As much as I like text languages like Ren'Py's, they're not ideal to keep track of story branches and variables. |
I gave it some more thought. My problem's mostly that I don't have time to work on this right now. Would you write the complete tokenizer + parser? And do you have availability to work on it these days? If you can do that without breaking compatibility with the ScenePlayer, I'd be willing to compensate you for tackling this. If so, you can email me at nathan [at] gdquest dot com |
If I'm compensated, sure. I have sent you an email. Glad to be working with you! |
Here's what a script may look like that will be exactly parsed into the
What do you think? I'll have some working code by Friday. |
I'm on the go so can't detail right now but I see some things you can simplify here. Definitely first only one command per line. No need for "with" on the background command. Think of lines as one command at the start with optional positional arguments. For variables, I'd use For blocks, it may be simpler to use no colon, and an "end" keyword at the end, both for the user and you. No need to track indents. I'll think more about it. But in general, try to think in terms of strict rules for how the language works. Like most lines being in the form The exception being characters, where the |
Here's how I'd write the script you suggested Notable differences:
background industrial_building fade_in
mark jump_location
sophia happy enter left "Hi there! My name's Sophia. How about you?"
dani neutral enter right "Hey, I'm Dani."
choice:
"Start over":
jump jump_location
"Continue":
pass
"Jump ahead":
set jumped_ahead true
jump next_jump_point
sophia "Well, let's continue."
mark next_jump_point
if jumped_ahead:
background steppes
dani "Did you jump ahead?"
scene next_scene
"And thus started Sophia and Dani's journey."
transition fade_out |
As far as writing a language, there are typically two tools you need to code:
Here's an example of a tokenizer for a command-line written in GDScript: https://github.com/winston-yallow/gd-tokenizer/blob/master/cmd/CommandParser.gd You shouldn't necessarily copy it, I think you can go with something simpler perhaps. For instance, instead of objects, you may use dictionaries, having each token be like: # For a command
{ type = "command", arguments = [argument1, argument2, argument3] }
# For a choice, the content is all the lines inside the choice block, that you'll break down recursively into other tokens
{ type = "choice", content = "..." } But you'll need something like that and then to translate it into an abstract syntax tree. The class that does that is called a parser. Think of it as a big dictionary that gives you a hierarchy of instructions that's a bit like the current scenes' data. The good news here is that our ScenePlayer is already an interpreter and the data format of scenes is what you could already target with your parser. Note that unlike in existing scene files, you don't need to use numbers for the nodes' Instead, you can find a naming scheme for the keys. Renpy seems to try to give each key in the abstract syntax tree a name. Finally, the parser should be decoupled from the tokenizer (make two scripts without dependencies) and work exclusively from the tokenizer's output. This makes extending, modifying, and fixing the language easier as we can edit either the tokenizer or the parser without breaking the other. |
One last tip to tokenize lines. You'll see most tokenizers out there iterate over characters, which is one approach. Another would be to use regular expressions, which might work for this mini-language. Depends if you're comfortable with regexp or not. If not, looping over characters may be easier. Note you can also use String methods to help you perhaps. Its functions to remove whitespace and split words could come in handy. |
Here's the rough draft of the tokenizer I promised: Right now the tokenizer can handle basic symbols, functions with arguments, escape characters in string literals. I have not yet get to properly handling The code above, with the updated script you provided:
... will produce this token list: (the The Looking forward to your feedback. |
Don't hesitate to directly open a pull request with your work-in-progress, it'll be easier to check and review changes. Please use full words for function and variable names, i.e. no abbreviations like If you open a pull request, I can directly rename variables etc. for you. Also, you don't need to put so many comments. In particular, avoid comments that paraphrase the code, like
The comments above are saying exactly what the code says. I suppose they're for you. If so, it's fine, but I'd recommend you to work without comments, so you always force your mind you read the code directly and not read and think in English, focusing your attention on comments. You'll improve your programming skills faster that way. Regarding indents, I don't think they should be individual tokens. You can instead add an indent key to all tokens that stores a number, the indentation level./ It'll make it easier to detect errors: for example, after starting a block, you expect the indent level to go up by one. In that case, you could directly compare the previous and next tokens' indentation value. It'll also help to detect the end of a block: it's when the indentation goes down by one. That'll simplify parsing, I think. |
I usually try to keep the comments low, the over-explaining bug got me there. I've trimmed the comments down and changed the variable names in the new PR #11 What's in the PR:
With this script:
the current lexer code will produce this token list: What do you think? If you think the lexer's is ready, I will start working on the actual parser. You can expect the parser to be done on next Monday by the latest. |
Great work so far, I left you a note on the PR for a change I'd like to the syntax. Then you're good to move forward. |
Sorry for not giving any updates for the past few days. The parser is in progress but it hasn't reached a state where I'm comfortable having code reviews for it just yet. The parser code right now is more my attempt at sketching out the overall logic for the parser than actual production code, it is extremely messy (300+ lines now with some excessive classes) and the logic still fail at some places (boolean operators aren't handled yet, In short, it's a bit too early for you to review right now. However, I'll start the clean-up process soon and you can expect some production-quality code tomorrow. |
Thanks much for the update, much appreciated! Don't be afraid to push work-in-progress code: you can mark your PR as draft. That is, if and when you want a second pair of eyes on it. Note I'll be away from tomorrow afternoon to Tuesday night so I may not review the code until then. |
I've made the draft PR: #12 The most notable thing in here is: Instead of having the parser directly produce a Right now the parser can produce an almost complete syntax tree (nested choice & ifs seem to parsed correctly also), only AND and OR operators can't be parsed properly just yet. The transpiler right now can only handle dialogue lines and commands. With this script:
... the parser and transpiler will produce: What do you think? I do have some doubts about the transpiler actually complicating things more and that the parser should just target the The code is a bit messy (token types are still retrieved directly from the SceneLexer, stringly typed values, style issues, etc.) but by Monday, you can expect the parser and the transpiler, and the task of keeping track of player choices (I'll add some code in the ScenePlayer interpreter to save any declared variables to the save file) to be finished. |
I'll be away from this afternoon to Wednesday just so you know. The idea just to produce a dictionary the ScenePlayer can read. So you don't have to produce a text file that represents a dictionary only to be loaded and converted back to a dictionary. You can directly produce the final data and remove the parts that convert everything to strings. That should simplify the work a bit. As to doing that transpile step... The code looks more complex than it needs to be at a glance. Considering this language's scope, perhaps you can simplify the process. An abstract syntax tree allows you to do compilation, analysis, and optimization, but here the goal is just to output data: a big dictionary the ScenePlayer will read. If possible, I'd directly produce the final dictionary. But without spending time playing with the code, I can't say whether it's easy to do or not, or whether it's the best solution. I'll let you see what you can do and review the code in greater detail on Wednesday. But at least, you don't have to produce a text file: you can directly create a dictionary. With your parser in place, we'll remove the scene files entirely from the project as we won't need them anymore. |
I'll re-think the transpiler and try to have another go at having the parser directly produce the dictionary for the ScenePlayer. I do hope that the transpiler will not be needed. |
I've decided to go with a transpiler that turns the syntax tree from the parser directly to a dictionary for the ScenePlayer. The parser's mostly finished. You can write a script like this:
and have the game actually do what the script tells it to do. Any declared variables are stored in the save file. That's handled in the ScenePlayer. The ScenePlayer is mostly unchanged, most of the tweaks I made are so the checks won't go crazy on commands with optional parameters. I had the most trouble with choice, if and their code blocks, specifically when they are nested. The behaviour right now for nested code blocks is that they work if you don't end a code block with a I'll send you a PayPal invoice once you deem the code to be finished. |
I've pushed simple fixes to these today. Nested conditionals, choices now work properly. The
...and still have it work as expected. |
Right now the demo uses the JSON-like
.scene
files for the dialogue and scripting which would be cumbersome to write new scenes with. We should build a proper parser first before tackling issue #5.An example script could be like this:
With a simple line-by-line loop, we can easily parse this into
.scene
files for the demo's dialogue system to read.The text was updated successfully, but these errors were encountered: