Citation¶

If INFTY or its included algorithms are useful in your work, please cite the corresponding software and method papers.

Software citation¶

For the toolkit, use a software citation similar to:

@software{infty_toolkit,
  title        = {INFTY: An Optimization Toolkit to Support Continual AI},
  author       = {INFTY contributors},
  year         = {2026},
  url          = {https://github.com/THUDM/INFTY}
}

Replace the year, version, DOI, or official repository URL if a formal release is created.

Algorithm papers¶

INFTY currently exposes three optimizer families under infty.optim. The papers below are grouped by family, and each family is ordered by year from newest to oldest.

Geometry reshaping¶

Method	Venue	Paper
`C-Flat++`	arXiv 2025	C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning.
`C-Flat`	arXiv 2024	Make Continual Learning Stronger via C-Flat.
`GAM`	CVPR 2023	Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization.
`LookSAM`	CVPR 2022	Towards Efficient and Scalable Sharpness-Aware Minimization.
`GSAM`	ICLR 2022	Surrogate Gap Minimization Improves Sharpness-Aware Training.
`SAM`	ICLR 2021	Sharpness-Aware Minimization for Efficiently Improving Generalization.

Implementation mapping: C_Flat exposes both C-Flat with strategy="basic" and C-Flat++ with strategy="plus".

Zeroth-order updates¶

Method	Venue	Paper
`ZeroFlow`	arXiv 2025	Zeroflow: Overcoming Catastrophic Forgetting Is Easier Than You Think.
`ZeroFlow (MeZO-style variants)`	NeurIPS 2023	Fine-Tuning Language Models with Just Forward Passes.
`ZeroFlow (forward gradient)`	arXiv 2022	Gradients without Backpropagation.

Implementation mapping: ZeroFlow uses inftyopt="forward_grad" for the forward-gradient path, and uses zo_sgd, zo_adam, zo_sgd_sign, zo_adam_sign, zo_sgd_conserve, or zo_adam_conserve for the MeZO-style zeroth-order paths.

Gradient filtering¶

Method	Venue	Paper
`UniGrad-FS`	IEEE TII 2024	UniGrad-FS: Unified Gradient Projection With Flatter Sharpness for Continual Learning.
`CAGrad`	NeurIPS 2021	Conflict-Averse Gradient Descent for Multi-Task Learning.
`GradVac`	ICLR 2021	Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models.
`PCGrad`	NeurIPS 2020	Gradient Surgery for Multi-Task Learning.
`OGD`	AISTATS 2020	Orthogonal Gradient Descent for Continual Learning.

For publications, cite the specific paper corresponding to the optimizer class or execution path used in your experiments.

Reproducibility note¶

When citing INFTY in experiments, report:

INFTY version or commit hash;
optimizer name;
optimizer args dictionary;
base optimizer and hyperparameters;
PyTorch version;
benchmark and task order.