Skip to content

Citation

If INFTY or its included algorithms are useful in your work, please cite the corresponding software and method papers.

Software citation

For the toolkit, use a software citation similar to:

@software{infty_toolkit,
  title        = {INFTY: An Optimization Toolkit to Support Continual AI},
  author       = {INFTY contributors},
  year         = {2026},
  url          = {https://github.com/THUDM/INFTY}
}

Replace the year, version, DOI, or official repository URL if a formal release is created.

Algorithm papers

INFTY currently exposes three optimizer families under infty.optim. The papers below are grouped by family, and each family is ordered by year from newest to oldest.

Geometry reshaping

Method Venue Paper
C-Flat++ arXiv 2025 C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning.
C-Flat arXiv 2024 Make Continual Learning Stronger via C-Flat.
GAM CVPR 2023 Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization.
LookSAM CVPR 2022 Towards Efficient and Scalable Sharpness-Aware Minimization.
GSAM ICLR 2022 Surrogate Gap Minimization Improves Sharpness-Aware Training.
SAM ICLR 2021 Sharpness-Aware Minimization for Efficiently Improving Generalization.

Implementation mapping: C_Flat exposes both C-Flat with strategy="basic" and C-Flat++ with strategy="plus".

Zeroth-order updates

Method Venue Paper
ZeroFlow arXiv 2025 Zeroflow: Overcoming Catastrophic Forgetting Is Easier Than You Think.
ZeroFlow (MeZO-style variants) NeurIPS 2023 Fine-Tuning Language Models with Just Forward Passes.
ZeroFlow (forward gradient) arXiv 2022 Gradients without Backpropagation.

Implementation mapping: ZeroFlow uses inftyopt="forward_grad" for the forward-gradient path, and uses zo_sgd, zo_adam, zo_sgd_sign, zo_adam_sign, zo_sgd_conserve, or zo_adam_conserve for the MeZO-style zeroth-order paths.

Gradient filtering

Method Venue Paper
UniGrad-FS IEEE TII 2024 UniGrad-FS: Unified Gradient Projection With Flatter Sharpness for Continual Learning.
CAGrad NeurIPS 2021 Conflict-Averse Gradient Descent for Multi-Task Learning.
GradVac ICLR 2021 Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models.
PCGrad NeurIPS 2020 Gradient Surgery for Multi-Task Learning.
OGD AISTATS 2020 Orthogonal Gradient Descent for Continual Learning.

For publications, cite the specific paper corresponding to the optimizer class or execution path used in your experiments.

Reproducibility note

When citing INFTY in experiments, report:

  • INFTY version or commit hash;
  • optimizer name;
  • optimizer args dictionary;
  • base optimizer and hyperparameters;
  • PyTorch version;
  • benchmark and task order.