Results & Evaluation

Presented at AAAI25, the 39th Annual AAAI Conference on Artificial Intelligence, Philadelphia

Experimental Setup

We evaluated DPPL methods across multiple dimensions to comprehensively assess their performance:

Evaluation Dimensions

Privacy Budget (

\varepsilon

)

Tested across a range of privacy budgets from strict ( $\varepsilon = 0.01$ ) to more relaxed ( $\varepsilon = 10$ ) settings.

Imbalance Ratio

Evaluated on balanced (ratio=1) to highly imbalanced (ratio=100) datasets.

Encoder

Tested with ViT-H-14, ViT-L-16, ViT-B-16, and ResNet-50.

Dataset

Tested on CIFAR-10, CIFAR-100, STL10, and Food-101.

Key Findings

Dramatic Improvements for Underrepresented Classes

Our most striking result is the dramatic improvement for underrepresented classes at strict privacy budgets, e.g., $\varepsilon = 1.0$ :

Classic DP-SGD: 0% accuracy on smallest minority classes
Previous fairness-oriented approaches: 3% accuracy on minority classes
DPPL methods: 60% accuracy on minority classes

This represents drastic increases in accuracy on underrepresented groups, with no degradation for majority classes, achieving state-of-the-art results.

Balanced Accuracy of the smallest 25% classes
(CIFAR100, ViT-H-14)

Privacy-Utility Trade-off

Balanced Accuracy (CIFAR100, ViT-H-14)

DPPL methods maintain high accuracy even at very strict privacy budgets ( $\varepsilon = 0.1$ ), significantly outperforming DP-SGD in this regime. As $\varepsilon$ increases, the performance gap narrows, but DPPL methods remain competitive across all privacy settings. The results above show performance on CIFAR100 using the ViT-H-14 encoder with 10 samples per class.

Performance on Imbalanced Data

Balanced Accuracy (CIFAR100, ViT-H-14)

DPPL methods show remarkable robustness to class imbalance. Even with extreme imbalance ratios of 100:1 between the most and least represented classes, the accuracy drop is minimal compared to balanced datasets. This is a significant advantage over traditional methods that struggle with imbalanced private data. For the CIFAR100 dataset with the ViT-H-14 encoder, DPPL methods maintained over 85% of their accuracy when trained on highly imbalanced data, while DP-SGD approaches lost significant performance.