J. Mater. Sci. Technol. ›› 2022, Vol. 122: 77-83.DOI: 10.1016/j.jmst.2021.12.052

• Research Article • Previous Articles     Next Articles

Symbolic regression in materials science via dimension-synchronous-computation

Changxin Wanga,b, Yan Zhanga,b, Cheng Wena,b, Mingli Yangc, Turab Lookmand, Yanjing Sua,b,*(), Tong-Yi Zhange,f   

  1. aBeijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
    bCorrosion and Protection Center, University of Science and Technology Beijing, Beijing 100083, China
    cResearch Center for Materials Genome Engineering, Sichuan University, Chengdu 610065, China
    dAiMaterials Research LLC, Santa Fe, NM 87501, United States of America.
    eSchool of Materials Science and Engineering, Harbin Institute of Technology, Shenzhen, China
    fMaterials Genome Institute, Shanghai University, 333 Nanchen Road, Shanghai 200444, China
  • Received:2021-09-17 Revised:2021-11-27 Accepted:2021-12-13 Published:2022-09-20 Online:2022-03-12
  • Contact: Yanjing Su
  • About author:* E-mail address: yjsu@ustb.edu.cn (Y. Su).

Abstract:

There is growing interest in applying machine learning techniques in the field of materials science. However, the interpretation and knowledge extracted from machine learning models is a major concern, particularly as formulating an explicit model that provides insight into physics is the goal of learning. In the present study, we propose a framework that utilizes the filtering ability of feature engineering, in conjunction with symbolic regression to extract explicit, quantitative expressions for the band gap energy from materials data. We propose enhancements to genetic programming with dimensional consistency and artificial constraints to improve the search efficiency of symbolic regression. We show how two descriptors attributed to volumetric and electronic factors, from 32 possible candidates, explicitly express the band gap energy of NaCl-type compounds. Our approach provides a basis to capture underlying physical relationships between materials descriptors and target properties.

Key words: Symbolic regression, Band gap, Dimensional calculation