Question about Project Status and Potential Contributions #1

Hannibal046 · 2025-01-18T07:59:30Z

Hi Team,

First, I want to express my appreciation for maintaining this repository and fla. I'm finding both projects very valuable.

I have several questions about the project:

Project Status
- Is this repository actively being developed?
- Are you accepting external contributions?
Development Direction
- If external contributions are welcome, could you share your roadmap?
- I'd be interested in contributing based on the project's goals.
Technical Architecture
From my understanding:
- The project uses fla for model definition
- Training is handled by TorchTrain
- Given fla's HuggingFace compatibility, it should work with lm-eval-harness for evaluation
  Could you confirm if this understanding is correct?
Future Plans
- Are there plans to extend into post-training scenarios?
- If so, open-instruct could be a valuable reference point.

Looking forward to your response and potentially contributing to the project.

Best regards

The text was updated successfully, but these errors were encountered:

yzhangcs · 2025-01-18T09:02:49Z

@Hannibal046 Hi, yes, all of your understandings are correct.
This project is actively being developed, and we're continuously adding more features to flame. For example:

While torchtitan only supports 4D parallelism for Llama, we aim to provide comprehensive support for all FLA.
We're implementing support for online data tokenization with shuffling, which is currently lacking in torchtitan.

Regarding post-training, I don't have extensive experience in this field yet. However, I'd be very glad if you could contribute in this area. I also plan to add support for post-training features in the future.
In a word, flame is a framework closely integrated with fla and transformers, with ambitions to scale to much larger scales.

Hannibal046 · 2025-01-18T09:25:33Z

@yzhangcs
Hi, thanks for the quick reply!

If you're planning to implement support for online data tokenization with shuffling, I'd like to share an elegant implementation from Meta Lingua for your reference. Their approach:

Pre-shuffles data;
Accept Jsonline as inputs and performs online tokenization and reshuffling with a buffer;
Easily controls the ratio of different data sources;

I'm not sure which specific features you need to implement, but relying solely on "2. online tokenization and reshuffling with a buffer" might not be sufficient for large-scale training. This is because some datasets from Hugging Face are chronologically ordered, and even with a large online buffer, the data would still be biased.

I'm happy to help if you need any assistance!

yzhangcs · 2025-01-18T09:27:50Z

Thank you! I will be taking a look at it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Project Status and Potential Contributions #1

Question about Project Status and Potential Contributions #1

Hannibal046 commented Jan 18, 2025

yzhangcs commented Jan 18, 2025

Hannibal046 commented Jan 18, 2025 •

edited

Loading

yzhangcs commented Jan 18, 2025

Question about Project Status and Potential Contributions #1

Question about Project Status and Potential Contributions #1

Comments

Hannibal046 commented Jan 18, 2025

yzhangcs commented Jan 18, 2025

Hannibal046 commented Jan 18, 2025 • edited Loading

yzhangcs commented Jan 18, 2025

Hannibal046 commented Jan 18, 2025 •

edited

Loading