You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your paper, you mentioned that the action scorer module spits out two outputs (one action ("go", "eat"), and one object("east", "apple"). I wonder how does your architecture deals with illegal action such as the following:
given a state s, the possible actions are:
a1: eat apple
a2: go east
However the action scorer will score all possible word in the action ("go", "eat") and objects ("east", "apple"). which results in 4 possible actions
a1: eat apple (legal action) --> score: 0.9
a2: go east (legal action) --> score: 0.08
a3: eat east (illegal action) --> score: 0.01
a4: go apple (illegal action) --> score: 0.01
In such scenario how does your architecture deals with illegal actions? do you just look up the table for only legal actions?
The text was updated successfully, but these errors were encountered:
In your paper, you mentioned that the action scorer module spits out two outputs (one action ("go", "eat"), and one object("east", "apple"). I wonder how does your architecture deals with illegal action such as the following:
given a state s, the possible actions are:
a1: eat apple
a2: go east
However the action scorer will score all possible word in the action ("go", "eat") and objects ("east", "apple"). which results in 4 possible actions
a1: eat apple (legal action) --> score: 0.9
a2: go east (legal action) --> score: 0.08
a3: eat east (illegal action) --> score: 0.01
a4: go apple (illegal action) --> score: 0.01
In such scenario how does your architecture deals with illegal actions? do you just look up the table for only legal actions?
The text was updated successfully, but these errors were encountered: