Citation¶
If INFTY or its included algorithms are useful in your work, please cite the corresponding software and method papers.
Software citation¶
For the toolkit, use a software citation similar to:
@software{infty_toolkit,
title = {INFTY: An Optimization Toolkit to Support Continual AI},
author = {INFTY contributors},
year = {2026},
url = {https://github.com/THUDM/INFTY}
}
Replace the year, version, DOI, or official repository URL if a formal release is created.
Algorithm papers¶
INFTY currently exposes three optimizer families under infty.optim. The papers below are grouped by family, and each family is ordered by year from newest to oldest.
Geometry reshaping¶
| Method | Venue | Paper |
|---|---|---|
C-Flat++ |
arXiv 2025 | C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning. |
C-Flat |
arXiv 2024 | Make Continual Learning Stronger via C-Flat. |
GAM |
CVPR 2023 | Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization. |
LookSAM |
CVPR 2022 | Towards Efficient and Scalable Sharpness-Aware Minimization. |
GSAM |
ICLR 2022 | Surrogate Gap Minimization Improves Sharpness-Aware Training. |
SAM |
ICLR 2021 | Sharpness-Aware Minimization for Efficiently Improving Generalization. |
Implementation mapping: C_Flat exposes both C-Flat with strategy="basic" and C-Flat++ with strategy="plus".
Zeroth-order updates¶
| Method | Venue | Paper |
|---|---|---|
ZeroFlow |
arXiv 2025 | Zeroflow: Overcoming Catastrophic Forgetting Is Easier Than You Think. |
ZeroFlow (MeZO-style variants) |
NeurIPS 2023 | Fine-Tuning Language Models with Just Forward Passes. |
ZeroFlow (forward gradient) |
arXiv 2022 | Gradients without Backpropagation. |
Implementation mapping: ZeroFlow uses inftyopt="forward_grad" for the forward-gradient path, and uses zo_sgd, zo_adam, zo_sgd_sign, zo_adam_sign, zo_sgd_conserve, or zo_adam_conserve for the MeZO-style zeroth-order paths.
Gradient filtering¶
| Method | Venue | Paper |
|---|---|---|
UniGrad-FS |
IEEE TII 2024 | UniGrad-FS: Unified Gradient Projection With Flatter Sharpness for Continual Learning. |
CAGrad |
NeurIPS 2021 | Conflict-Averse Gradient Descent for Multi-Task Learning. |
GradVac |
ICLR 2021 | Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models. |
PCGrad |
NeurIPS 2020 | Gradient Surgery for Multi-Task Learning. |
OGD |
AISTATS 2020 | Orthogonal Gradient Descent for Continual Learning. |
For publications, cite the specific paper corresponding to the optimizer class or execution path used in your experiments.
Reproducibility note¶
When citing INFTY in experiments, report:
- INFTY version or commit hash;
- optimizer name;
- optimizer
argsdictionary; - base optimizer and hyperparameters;
- PyTorch version;
- benchmark and task order.