The DeepModeling Manifesto

Posted on 2021-06-10 Word count in article: 1.3k Reading time ≈ 5 mins.

The integration of machine learning and physical modeling is changing the paradigm of scientific research. Those who hope to extend the frontier of science and solve challenging practical problems through computational modeling are coming together in new ways never seen before. This calls for a new infrastructure--new platforms for collaboration, new coding
frameworks, new data processing schemes, and new ways of using the computing power. It also calls for a new culture—the culture of working together closely for the benefit of all, of free exchange and sharing of knowledge and tools, of respect and appreciation of each other's work, and of the pursuit of harmony among diversity.

The DeepModeling community is a community of such a group of people.

What is DeepModeling?

The two most important applications of computing are machine learning and physical modeling. The former is an effective tool for analyzing complex data; the latter is a scientific description of the physical world. The vitality boosted by the effective integration of the two is changing all aspects of scientific research. DeepModeling will ultimately be a set of methodologies and tools that combine machine learning, physical modeling, and cutting-edge computational platforms. People who are attracted by the DeepModeling community are attracted by its open, inclusive environment, as well as its dedication to the cause of advancing scientific computing worldwide.

Why choose open source?

There are different interpretations of the term "open source". The consensus among the DeepModeling community is that open source is a collaborative software development platform based on the spirit of openness and sharing. Open source is a familiar concept for people in the fields of machine learning and computer science, but it is not yet popular in the field of scientific computing. What we advocate is that an algorithm or software should not be judged by the reputation of the journal in which it is published, but by its ability to solve real world problems and its actual contribution to science. The sustainable development of a software requires continuous investment in manpower. It should undergo incremental improvement, and it should be put to the test of solving real-world problems in an open environment. This is often difficult to achieve by individuals or individual groups. The open-source community provides better solutions.

The history of the DeepModeling community

The "DeepModeling Community" started with the initiation of the "deepmd-kit" project. “deepmd-kit" is a software tool that combines machine learning and molecular dynamics, which helps to overcome a long-standing difficulty in the field of molecular dynamics, namely the dilemma of having to choose between efficiency and accuracy. The name "DeepModeling" was proposed by early developers of the deepmd-kit project, with the intention of using deep learning tools to solve the curse of dimensionality problem in multi-scale modeling. DeepModeling has therefore become the name of the GitHub organization (https://github.com/deepmodeling) which manages the original deepmd-kit project. After the development of deepmd-kit, the DeepModeling community has successively initiated projects such as dpdata, dp-gen, and dpdispatcher, and extended the modeling scale to electronic structure level through projects such as deepks-kit and ABACUS. These projects have brought together people from all over the world working on molecular simulations.

The short-term plan and long-term vision of the DeepModeling community

In the short term, developers in the DeepModeling community will focus on atomic-scale simulation methods and tools. This includes solving the many-body Schrödinger equation, electronic structure calculation, molecular dynamics simulation, and coarse-grained molecular dynamics simulation. This also includes tasks such as data generation, model training, high-performance optimization, etc. In addition, it includes different workflows and management tools, as well as computing power scheduling tools for different systems, different scenarios, and different purposes.

It should be pointed out that the combination of physical modeling and machine learning often fundamentally changes the implementation logic of a piece of software. Therefore, the new infrastructure will not be settled once and for all, but will be gradually improved through an iterative process and upgrades from time to time.

In the long run, the DeepModeling community is committed to combining physical models at all scales with machine learning methods, using the most cutting-edge computing platforms to solve the most challenging scientific and technological problems faced by the human society.

How can you contribute?

If you want to contribute to an existing project in the DeepModeling community, please just do so or contact
the corresponding developer directly; if you want to open a new project in the DeepModeling community, or if you want the DeepModeling community to help develop your project, just contact contact@deepmodeling.org.

If you are a programmer who loves science and is attracted by the future scientific computing platform built by the DeepModeling community, you can contribute not only through new algorithms, but also code development specifications, document writing specifications, community databases, task scheduling, workflow management and other tools. In addition, you can contribute to code architecture design and high-performance optimization tasks in the DeepModeling community. People in the field of scientific computing will greatly appreciate your expertise and contribution.

If you are a hardcore developer familiar with topics such as electronic structure calculations, molecular dynamics, and finite element methods, the DeepModeling community will be your place to showcase your talents. The addition of machine learning components requires us to rethink about architecture design, each specific implementation for the tasks mentioned above and high-performance optimization. You will become important bridges that connect other developers, contributors, and users in different areas.

If you have only used some basic scientific software and have worked on some post-processing scripts, the DeepModeling community also needs you. Try to ask questions and communicate on github/gitee and other communication platforms, try to give opinions, and try to fork, commit, pr... Your little by little contribution will make the DeepModeling community better and better, and the DeepModeling community will be very grateful for such contributions.

Even if you are just a bystander, if you support the concept of the DeepModeling community, your recognition and dissemination will also be a great encouragement and support for the DeepModeling community.

Final remarks

Despite the tremendous advances in AI and computing power, the scientific computing community is largely embedded in an old-fashioned culture. Many of the most important tasks rely on legacy codes. The core algorithms used in many commercial software have been outdated. The self-sufficient style of work is similar to that of the agricultural ages
resulting in poor efficiency. It is only in recent years that some promising open-source communities have emerged. However, these communities are often aimed at specific tools for specific scales, and are often maintained by specific academic research groups. They face serious challenges in terms of continuous development and improved user experience.

The DeepModeling project promises to change all that.

The combination of machine learning and physical modeling calls for a new paradigm, the open-source community paradigm. Such a paradigm has long been embraced in the computer and electronics industry, with Linux and Andriod being the very well-known examples. In this sense, what the DeepModeling project does is to borrow these ideas and use them for scientific computing. For people in computational science and engineering, efficient and reusable modeling tools that can be continuously improved will free researchers from the plight of no model or with only ad hoc models. For those who work on machine learning, the world of physical models will provide a relatively new and surely vast playground. Working together as an open-source community will make our work more productive, up to date, reliable, and transparent. The spirit of close collaboration, of respect and building on each other’s work will surely inspire more and more people to join the cause of advancing computing for the benefit of the human society. This is an exciting opportunity. This is the future of scientific computing!