OpenLAM | 2024 Q0 Report

The slogan for OpenLAM is "Conquer the Periodic Table!" We hope to provide a new infrastructure for microscale scientific research and drive the transformation of microscale industrial design in fields such as materials, energy, and biopharmaceuticals by establishing an open-source ecosystem around large microscale models. Relevant models, data, and workflows will be consolidated around the AIS Square; related software development will take place in the DeepModeling open-source community. At the same time, we welcome open interaction from different communities in model development, data sharing, evaluation, and testing.

See AIS Square for more details.

Model Structure

  • The DPA-2 model structure (PyTorch based) has been released, showing a significant increase in fitting and transferability compared to the DPA-1 (arxiv:2312.15492).
  • A new capability for unsupervised denoise pretraining has been added (DOI:10.5281/zenodo.10483908).

Data

  • The DPA-2 paper includes pretrained data for 18 systems and downstream data for 10 systems, covering over ten million frames and 73 elements (for detailed data inventory, see below; data can also be directly downloaded from DOI:10.5281/zenodo.10483908).
  • Four new datasets have been added for energy&force data related to electrolytes, solid-state electrolytes, chemical reactions, and methane combustion (for details, see the data inventory below).
  • Seven new datasets in equilibrium state for unsupervised denoising tasks have been added, including AFLOW, MC2D/3D, CALYPSO, etc. (for details, see the data inventory below).

Training Strategy

  • The DPA-2 paper includes a multi-task pretraining framework for energy and force, supporting the combined training of datasets with different DFT settings.
  • Unsupervised denoising task has been added, which is integrated into the multi-task pretraining framework (results are detailed below).

Automation Process

  • The DPA-2 paper encompasses an automated process for all stages of pretraining, fine-tuning, transferability testing, distillation, and compression (experience it at DP Combo and try it on the notebook).
  • The AIS-Square website now includes an automated process for integrating user data, automatically determining the coverage of the pretrained model on current data.

Competition

Coming in March...

Teaching

Coming in February...

Readers interested in the background of the project and details of the paper can also refer to the OpenLAM initiative and the DPA-2 paper for further information.

Conclusion

Since the release of DPA-2 less than a month ago, there have been numerous developments that can be summarized as follows:

  • The DPA-2 multitask pre-training framework has added a new unsupervised training task: it is now possible to train with any data derived from different DFT calculations together, as well as denoise equilibrium state data without DFT labels, thereby learning a broader range of representation information;
  • The OpenLAM initiative has incorporated more production-type data and integrated more publicly available equilibrium state crystal structure data, with the pre-training data pool continuing to expand rapidly;
  • After incorporating the unsupervised training task, the overall energy prediction accuracy of the model is higher when compared fairly, indicating that information across different systems and tasks promotes mutual enhancement.

The OpenLAM initiative is currently in rapid continuous iteration. As we move towards the era of large atomic models, open-source sharing becomes an inevitable theme. We welcome like-minded individuals to join, opening up new opportunities for broader scientific discoveries and industrial applications. On the journey to conquering the periodic table of elements, we look forward to creating a new era with you!
To join the "OpenLAM Initiative", visit AISSquare.

Appendix

  • Unsupervised Denoise Method
    • Data Structure
      • Equilibrium state data consisting only of configurations without DFT computational results; noise is added separately to the coordinates and types during preprocessing (such as adding Gaussian noise to coordinate positions and masking certain element types).
    • Training Method
      • Configurations with added noise are inputted into the network, processed by DPA-2's unified descriptor and denoise fitting, to yield a denoise vector for each atom (i.e., the network's prediction of the proper displacement) as well as the element types. After restoring the configuration and element types based on the denoise vector, a loss is computed against the true configurations and element types without noise. The model is trained by minimizing this loss.
  • Data Inventory
    The datasets currently used for training the DPA-2 model cover a wide range of systems including semiconductors, perovskites, alloys, surface catalysis, cathode materials, solid-state electrolytes, organic molecules, and more. This includes the newly added unsupervised equilibrium state Denoise datasets. All these data have been uploaded to the AISSquare website, where users can find more detailed data descriptions, as well as download and use the datasets, specifically including:
    • Datasets included in the DPA-2 paper
IndexDataset nameContributors
1Alloy_DPA_v1_0Fuzhi Dai, Wanrun Jiang
2Cathode(Anode)_DPA_v1_0Linshuang Zhang, Jianchuan Liu
3Cluster_DPA_v1_0Fuqiang Gong
4Drug(drug-like-molecule)_DPA_v1_0Manyi Yang
5FerroEle_DPA_v1_0Jing Wu, Jiyuan Yang, YuanJinsheng Liu, Duo Zhang, Yudi Yang, Yuzhi Zhang, Linfeng Zhang, Shi Liu
6Open_Catalyst_2020(OC20_Dataset)Duo Zhang
7SSE-PBE_DPA_v1_0Jianxing Huang
8SemiCond_DPA_v1_0Jianchuan Liu
9H2O-PD_DPA_v1_0Linfeng Zhang, Han Wang, Roberto Car, Weinan E
10AgAu-PBE(unitary)_DPA_v1_0Yinan Wang, LinFeng Zhang, Ben Xu, Xiaoyang Wang, Han Wang
11AlMgCu_DPA_v1_0Wanrun Jiang, Yuzhi Zhang, Linfeng Zhang, Han Wang
12Cu_DPA_v1_0Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang
13Sn_DPA_v1_0Fengbo Yuan
14Ti_DPA_v1_0Tongqi Wen, Rui Wang, Lingyu Zhu, Linfeng Zhang, Han Wang, David J Srolovitz, Zhaoxuan Wu
15V_DPA_v1_0Rui Wang, Xiaoxiao Ma, Linfeng Zhang, Han Wang, David J Srolovitz, Tongqi Wen, Zhaoxuan Wu
16W_DPA_v1_0Xiaoyang Wang, Yinan Wang, Linfeng Zhang, Fuzhi Dai, Han Wang
17C12H26_DPA_v1_0Jinzhe Zeng, Linfeng Zhang, Han Wang, Tong Zhu
18HfO2_DPA_v1_0Jing Wu, Yuzhi Zhang, Linfeng Zhang, Shi Liu
  • Four new datasets for energy & force data
IndexDataset nameContributors
19ElectrolyteMengchao Shi, Yuzhi Zhang
20Solid_State_ElectrolyteMengchao Shi, Yuzhi Zhang
21Organic_reactions_datasetTong Zhu, Bowen Li
22CHO-methane-combustionJinzhe Zeng, Liqun Cao, Mingyuan Xu, Tong Zhu, John ZH Zhang
  • Seven new datasets in equilibrium state for unsupervised denoising
IndexDataset nameContributors/Link
1AFLOW_MPAFLOW, MP
2MC2DDavide Campi, Nicolas Mounet, Marco Gibertini, Giovanni Pizzi, Nicola Marzari, The Materials Cloud 2D database (MC2D), Materials Cloud Archive 2022.84 (2022), doi: 10.24435/materialscloud:36-nd.
3MC3DSebastiaan Huber, Marnik Bercx, Nicolas Hörmann, Martin Uhrin, Giovanni Pizzi, Nicola Marzari, Materials Cloud three-dimensional crystals database (MC3D), Materials Cloud Archive 2022.38 (2022), doi: 10.24435/materialscloud:rw-t0.
4ChemicalSimilarityHai-Chen Wang, Silvana Botti, Miguel A. L. Marques, Finding new crystalline compounds using chemical similarity, Materials Cloud Archive 2021.68 (2021), doi: 10.24435/materialscloud:96-09.
5ClusterIsomerGiuseppe Fisicaro, Bastian Schaefer, Jonas A. Finkler, Stefan Goedecker, Principles of isomer stability in small clusters, Materials Cloud Archive 2023.36 (2023), doi: 10.24435/materialscloud:46-nr.
6MolecularCrystalRose Cersonsky, Maria Pakhnova, Edgar Engel, Michele Ceriotti, Lattice energies and relaxed geometries for 2'707 organic molecular crystals and their 3'242 molecular components., Materials Cloud Archive 2023.5 (2023), doi: 10.24435/materialscloud:71-21.
7CALYPSO_databaseZhenyu Wang, Xiaoshan Luo
  • Latest Performance (root mean squared error, RMSE) of the Multi-task Pretrained Model (22 energy force systems + 7 unsupervised denoise systems)
WeightDPA2 (multi-task 18 heads for 1m steps)DPA2 (multi-task 29 heads for 1.84m steps)
Energy (meV/atom)Force (meV/Å)Energy (meV/atom)Force (meV/Å)
Alloy2.036.5169.532.2160.5
Cluster1.034.4162.540.6171.0
Anode1.03.339.82.545.0
FerroEle1.04.444.21.747.2
AgAu-PBE0.29.428.210.931.2
Cu0.13.618.26.821.2
Sn0.124.869.717.376.7
Ti0.116.3112.426.8133.7
AlMgCu0.34.923.410.628.6
V0.113.9110.216.7121.3
W0.124.6157.945.8174.0
C12H260.162.5710.675.31486.7
SSE-PBE1.02.164.02.275.7
HfO20.13.9102.85.0108.4
SemiCond1.06.5131.97.2139.8
Drug2.020.6128.921.8140.6
OC2M2.029.3157.626.7138.7
H2O-PD1.03.239.71.045.6
Weighted sum18.6116.318.3123.6
Electrolyte1.0//2.964.3
SSE_new1.0//3.272.4
Organic_reactions1.0//15.197.7
Methane-combustion1.0//147.2251.4