Adversarial training (AT) is one of the most effective strategies for
promoting model robustness, whereas even the state-of-the-art adversarially
trained models struggle to exceed 65% robust test accuracy on CIFAR-10 without
additional data, which is far from practical. A natural way to improve beyond
this accuracy bottleneck is to introduce a rejection option, where confidence
is a commonly used certainty proxy. However, the vanilla confidence can
overestimate the model certainty if the input is wrongly classified. To this
end, we propose to use true confidence (T-Con) (i.e., predicted probability of
the true class) as a certainty oracle, and learn to predict T-Con by rectifying
confidence. Intriguingly, we prove that under mild conditions, a rectified
confidence (R-Con) rejector and a confidence rejector can be coupled to
distinguish any wrongly classified input from correctly classified ones. We
also quantify that training R-Con to be aligned with T-Con could be an easier
task than learning robust classifiers. In our experiments, we evaluate our
rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under
several attacks, and demonstrate that the RR module is well compatible with
different AT frameworks on improving robustness, with little extra computation.

