Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor TextGeneration class to support development usecases (quic#165)
* Refactor TextGeneration class to support development usecases Development usecases require: - Support serving successive requests in same session. In current class structure, TextGeneration obj initialization is coupled with a combination of prompt, generation length. This adds an overhead of creating a QAICInferenceSession/ loading qpc, extracting its characteristics for each request. - Support yielding tokens as they are generated. Current call-flow uses TextStreamer to print generated tokens to console. An API that can yield tokens as they are decoded will offer cleaner solution for development purpose. Adopting this approach for high level APIs within QEfficient is an overkill. - Move components of TextGeneration class into a base class that primarily handles loading a QAICInferenceSession, low-level methods that fetch and leverage information from Session object. - Code maintenance to indicate scope of variables/methods in base class. - Add setup method in TextGeneration class to reset storage variables for a new request. - Add an API to yield decoded tokens as they are generated. Signed-off-by: quic-suppugun <[email protected]> * Revert reordering of methods Signed-off-by: quic-suppugun <[email protected]> * Update docstrings, Cleanup TextGeneration class Signed-off-by: quic-suppugun <[email protected]> * Format and lint Signed-off-by: quic-suppugun <[email protected]> * Added test module and minor fix Signed-off-by: Rishin Raj <[email protected]> * Format Signed-off-by: Rishin Raj <[email protected]> * Device ID fix Signed-off-by: Rishin Raj <[email protected]> --------- Signed-off-by: quic-suppugun <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Co-authored-by: Rishin Raj <[email protected]>
- Loading branch information