Abstract:Machine learning, as an emerging data-driven technology, has provided a promising pathway for the intelligent research and development of energetic materials. However, data scarcity and heterogeneity have become core bottlenecks restricting modeling accuracy and practical application. This review examines state-of-the-art data acquisition methodologies, analyzing their advantages and limitations. Furthermore, mainstream data optimization strategies are comprehensively discussed from two perspectives: quantity expansion and quality improvement. For data quantity, recent advances in SMILES enumeration, generative adversarial networks, and transfer learning are introduced for enhancing model generalization. For data quality, the roles of outlier detection, standardized preprocessing, and feature engineering in improving model robustness and interpretability are discussed. It is shown that effective data optimization can not only alleviate data limitations but also significantly enhance prediction stability and structural extrapolation capabilities under small-sample and structurally complex conditions. Finally, future directions are proposed, including the development of high-throughput experimental platforms, unification of data standards, and establishment of intelligent closed-loop systems. This is expected to provide a feasible roadmap and methodological reference for advancing the data ecosystem and intelligent design of energetic materials.