A Bag of Tricks for Few-Shot Class-Incremental Learning
Large language models (LLMs) have shown great potential in complex reasoning
tasks, yet their performance is often hampered by the scarcity of high-quality
and reasoning-focused training datasets. Addressing this challenge, we propose
Key-Point-Driven Data Synthesis (KPDDS), a novel data synthesis framework that
synthesizes question-answer pairs by leveraging key points and exemplar
practices from authentic data sources. KPDDS ensures the generation of novel
questions with rigorous quality control and substantial scalability. As a
result, we present KPMath, an extensive synthetic dataset tailored for
mathematical reasoning, comprising over 800K question-answer pairs. Utilizing
KPMath and augmenting it with additional reasoning-intensive corpora, we create
the comprehensive KPMath-Plus dataset. The fine-tuned DeepSeekMath model on
KPMath-Plus achieves zero-shot PASS@1 accuracies of 83.9% on GSM8K and 48.8% on
MATH, and also reaches promising performance on other math reasoning datasets,
outperforming competitors in the 7B to 70B range.