Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept only non-batched tensor in TextTokenDecoder #118

Merged
merged 2 commits into from
Oct 31, 2023
Merged

Accept only non-batched tensor in TextTokenDecoder #118

merged 2 commits into from
Oct 31, 2023

Conversation

cbalioglu
Copy link
Contributor

This PR fixes the inconsistency between TextTokenEncoder and TextTokenDecoder, where the former accepts only a single string, while the latter accepts only a batched tensor. TextTokenDecoder now accepts only a one-dimensional tensor holding the token indices of a single sentence and returns a string instead of a list of strings (another issue with batching in the decoder was the lack of pad handling). Batching is typically handled by the code that performs decoding (e.g. SequenceToTextGenerator). Also included is a nit convenience CString method strip, which internally calls ltrim and rtrim.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2023
@cbalioglu cbalioglu merged commit f266a2b into main Oct 31, 2023
13 checks passed
@cbalioglu cbalioglu deleted the spm branch October 31, 2023 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants