Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up categorization I #301

Open
2 of 5 tasks
Irratzo opened this issue Jun 13, 2024 · 1 comment
Open
2 of 5 tasks

Clean up categorization I #301

Irratzo opened this issue Jun 13, 2024 · 1 comment
Labels
category Add or update a category to the best-of list

Comments

@Irratzo
Copy link
Member

Irratzo commented Jun 13, 2024

  • Add category
  • Update category:

Category details:

Currently, the distribution of projects among categories is very uneven.

    Active learning 4 projects
    Biomolecules 2 projects
    Community resources 21 projects
    Datasets 35 projects
    Data Structures 4 projects
    Density functional theory (ML-DFT) 25 projects
    Educational Resources 24 projects
    Explainable Artificial intelligence (XAI) 4 projects
    Electronic structure methods (ML-ESM) 3 projects
    General Tools 22 projects
    Generative Models 11 projects
    Interatomic Potentials (ML-IAP) 65 projects
    Language Models 17 projects
    Materials Discovery 9 projects
    Mathematical tools 11 projects
    Molecular Dynamics 10 projects
    Reinforcement Learning 2 projects
    Representation Engineering 23 projects
    Representation Learning 55 projects
    Unsupervised Learning 7 projects
    Visualization 2 projects
    Wavefunction methods (ML-WFT) 4 projects
    Others 2 projects

The list of categories is still the same as from the repository's inception (2022-06). But the distribution now shows that it doesn't adequately categorize the atomistic ML (AML) projects it features. Two examples. 1) The two largest categories, ML-IAP and Rep-Learn, have 65 and 55 projects, respectively, and thematic overlap. 2) Some of the smallest categories, like Active learning and XAI, have less then five projects, and moreover, no significant growth for at least one year.

  • Resolve all categories with a) less than ~5 projects AND no significant growth for >1 year and resort each one either a) into the next-best fitting, still existing category, or b) if no existing category fits, into the new category "Miscellaneous". Add their former category as label wherever it is missing.
    • Candidate categories: TODO
    • Selected categories: TODO
    • Reasoning: TODO
    • Done in COMMIT: TODO
  • Merge ML-DFT, ML-ESM, ML-WFT into new category "Machine learnin of first-principles observables" (ML-FPO). Add their former category as label wherever it is missing.
    • Done in COMMIT: TODO
  • Come up with a solution for the largest category set ML-IAP, MD, Rep-Learn. How could this be broken up into more digestible categories, so that the project distribution is more even?
    • Suggested new categories: TODO
    • Reasoning: TODO

Additional context:

@Irratzo Irratzo added the category Add or update a category to the best-of list label Jun 13, 2024
@Irratzo
Copy link
Member Author

Irratzo commented Aug 11, 2024

Decided to split this issue from from #256 and postpone to subsequent regular update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category Add or update a category to the best-of list
Projects
None yet
Development

No branches or pull requests

1 participant