Skip to content
/ org Public

This repository is a place to plan and solidify ideas about data management in chemistry and related fields.

License

Notifications You must be signed in to change notification settings

neo-chem/org

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Organizational planning

Gitter

This repository is a place to plan and solidify ideas about data management in chemistry and related fields, primarily following discussions that occurred at the Chemical Science Symposium 2020: "How can machine learning and autonomy accelerate chemistry?".

It was suggested that the community should try to centralize efforts on a common core schema for data, that focusses on interoperability and openness. Mechanisms for other parties to extend, re-use and adapt these schemas can then be codified.

The aim of this repository (and GitHub organisation more broadly) is to provide a version controlled "scratchpad" for discussion, ideas and organisation. All content and names are placeholders until otherwise decided.

There is a Gitter chatroom associated with this repository that can be used for informal discussions (and potentially more in the future). There is also a Slack workspace for those interested in contributing; please request an invite on Gitter if you wish to join.

To-do

I'll try to summarise the possible to-do list that we discussed:

  • If this GitHub is not the place, then potentially make a Slack or otherwise.
    • A Slack and Gitter have been made.
    • One option may be to request a forum on the new matsci.org platform.
  • Collect interested parties, via email and so-on, and establish a focal point for discussion.
    • Discussions are now ongoing on Slack and Etherpad
  • Find out who has the time to contribute, and in what ways
  • Potentially prepare a perspective paper as a "call-to-arms" for improved data standards.

Contributing

All ideas and suggested changes are welcome, please submit a pull request!

Code of Conduct

This will need to be decided on by the community. As a placeholder, we adopt the Code of Conduct from the Contributor Covenant, found in CODE_OF_CONDUCT.md.

Related resources

This list is a scattershot of related projects that were mentioned in discussions. For a more complete list (under construction), please see neo-chem/awesome-chemical-data.

Software and existing projects

  • Comparison grid of many ELNs: produced by Hardvard Biomedical Data Management (more info)
  • ESCALATE: A fully-featured data platform for experiment specification, comprehension and data management.
  • RightField: Semantically-tagged spreadsheets, a potential data entry solution/approach with a low barrier to entry.
  • NMReData initiative: A FAIR data format for NMR experiments. CHEMeDATA. tries to do the same for all of chemistry.
  • SpectroscopyHub: A standardisation initiative/data platform for XPS experiments.
  • OPTIMADE: An open API specification for materials databases (recently released).
  • The bluesky project: A collection of Python libraries for experiment and data control.
  • Blue Obelisk on Github:
  • cheminfo: community on GitHub and their ELN (with option to export data to Zenodo, there is a well-defined structure to the JSON), one instance hosted on C6H6.
  • LabTrove: chemistry ELN/"Smart Research Framework" (potentially defunct)
  • Chemotion: ELN focussed on organic chemistry.
  • Chemical Analysis Metadata Platform: Defined metadata and ontology for chemical analysis.
  • Open Reaction Database: Conner Coley, Abby Doyle, Pfityer, Merck and Google work on a protobuffer schema for a chemical reactions database.
  • Autoprotocol: language for specifying experimental protocols for scientific research.

Papers

A collection of papers to motivate discussion.

  • Too many tags spoil the metadata: investigating the knowledge management of scientific research with semantic web technologies, Kanza, S, et al., Journal of Cheminformatics, 11, 23 (2019) 10.1186/s13321-019-0345-8.
  • What influence would a cloud based semantic laboratory notebook have on the digitisation and management of scientific research? Kanza, S, University of Southampton Doctoral Thesis, (2018) 10.5258/SOTON/D0384
    • An entire PhD on the merits of semantic lab notebooks, with an open source prototype Semanticat.
  • Tremouilhac, P.; Nguyen, A.; Huang, Y.-C.; Kotov, S.; Lütjohann, D. S.; Hübsch, F.; Jung, N.; Bräse, S. Chemotion ELN: An Open Source Electronic Lab Notebook for Chemists in Academia. Journal of Cheminformatics 2017, 9 (1), 54. https://doi.org/10.1186/s13321-017-0240-0.
  • Patiny, L.; Zasso, M.; Kostro, D.; Bernal, A.; Castillo, A. M.; Bolaños, A.; Asencio, M. A.; Pellet, N.; Todd, M.; Schloerer, N.; Kuhn, S.; Holmes, E.; Javor, S.; Wist, J. The C6H6 NMR Repository: An Integral Solution to Control the Flow of Your Data from the Magnet to the Public. Magnetic Resonance in Chemistry 2018, 56 (6), 520–528. https://doi.org/10.1002/mrc.4669.
  • Hastings, J.; Chepelev, L.; Willighagen, E.; Adams, N.; Steinbeck, C.; Dumontier, M. The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web. PLOS ONE 2011, 6 (10), e25513. https://doi.org/10.1371/journal.pone.0025513.
  • Chalk, S. J. SciData: A Data Model and Ontology for Semantic Representation of Scientific Data. J Cheminform 2016, 8 (1), 54. https://doi.org/10.1186/s13321-016-0168-9.
  • Murray-Rust, P.; Rzepa, H. S.; Tyrrell, S. M.; Zhang, Y. Representation and Use of Chemistry in the Global Electronic Age. Org. Biomol. Chem. 2004, 2 (22), 3192–3203. https://doi.org/10.1039/B410732B.
  • Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on, O'Boyle, N, et al., Journal of Cheminformatics, 3, 37 (2011) 10.1186/1758-2946-3-37

About

This repository is a place to plan and solidify ideas about data management in chemistry and related fields.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published