GitHub - naver-ai/dawin

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Changdae Oh^1,2*, Sharon Li², Kyungwoo Song^3†, Sangdoo Yun^1†, Dongyoon Han^1†,
^{^* Work done during an internship at NAVER AI Lab, † corresponding authors}
¹NAVER AI LAB, ²University of Wisconsin--Madison, ³Yonsei University

Abstract

Adapting a pre-trained foundation model on downstream tasks should ensure robustness against distribution shifts without the need to retrain the whole model. Although existing weight interpolation methods are simple yet effective, we argue their static nature limits downstream performance while achieving efficiency. In this work, we propose DaWin, a training-free dynamic weight interpolation method that leverages the entropy of individual models over each unlabeled test sample to assess model expertise, and compute per-sample interpolation coefficients dynamically. Unlike previous works that typically rely on additional training to learn such coefficients, our approach requires no training. Then, we propose a mixture modeling approach that greatly reduces inference overhead raised by dynamic interpolation. We validate DaWin on the large-scale visual recognition benchmarks, spanning 14 tasks across robust fine-tuning -- ImageNet and derived five distribution shift benchmarks -- and multi-task learning with eight classification tasks. Results demonstrate that DaWin achieves significant performance gain in considered settings, with minimal computational overhead. We further discuss DaWin's analytic behavior to explain its empirical success.

Updates

(2025/03/14): Our code is now available!
(2025/01/22): Our manuscript has been accepted at ICLR 2025🎉🎉;
(2024/10/09): A short version of the preprint has been accepted at NeurIPS 2024 Workshop on Adaptive Foundation Models🎉
(2024/10/03): Code is under internal review.
(2024/10/03): Preprint has been uploaded.

Installation

conda env create -f dawin.yaml

Application-specific Instructions

To reproduce multi-task learning experiments, refer to dawin_rft/ and corresponding README.md for detailed instructions.
To reproduce multi-task learning experiments, refer to dawin_mtl/ and corresponding README.md for detailed instructions.

Quick Start with Pseudo-code

def interpolation(alpha, weight1, weight2):
    return {key: (1 - alpha) * weight1[key] + alpha * weight2[key] for key in weight1.keys()}

basemodel, _ = clip.load(args.model, 'cpu', jit=False)

# load state_dict of individual models to be interpolated 
sd_pt = torch.load(args.pt_path, map_location=torch.device(args.device))
sd_ft = torch.load(args.ft_path, map_location=torch.device(args.device))

# get entropy from the outputs of each model for all samples
logit_pt = get_logits(basemodel, dataset_name=args.dsname, state_dict=sd_pt)
logit_ft = get_logits(basemodel, dataset_name=args.dsname, state_dict=sd_ft)
ent_pt = - (F.softmax(logit_pt,dim=1) * F.log_softmax(logit_pt, dim=1)).sum(dim=1)
ent_ft = - (F.softmax(logit_ft,dim=1) * F.log_softmax(logit_ft, dim=1)).sum(dim=1)

# exponentiated negative entropy as model expertise to weigh the interpolation
expertise_pt = (-ent_pt).exp()
expertise_ft = (-ent_ft).exp()
lambdas = (expertise_ft) / (expertise_pt + expertise_ft)

# sample-wise interpolation (w/o Beta Mixture Modeling)
eval_dataloader = torch.utils.data.DataLoader(..., batch_size=1, shuffle=False)
correct, n = 0., 0.
for i, (inputs, labels) in enumerate(eval_dataloader):
    inputs, labels = inputs.cuda(), labels.cuda()
    merged_sd = interpolation(lambdas[i], sd_pt, sd_ft)
    model = get_model_from_sd(merged_sd, basemodel)
    logits = model(inputs)

    preds = logits.argmax(dim=1, keepdim=True).to(device)
    correct_current = preds.eq(labels.view_as(preds)).sum().item()
    correct += correct_current
    n += labels.size(0)

top1_acc = correct / n

How to cite

@inproceedings{
oh2025dawin,
title={DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation},
author={Changdae Oh and Yixuan Li and Kyungwoo Song and Sangdoo Yun and Dongyoon Han},
booktitle={International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=L8e7tBf4pP}
}

License

DaWin
Copyright (c) 2025-present NAVER Cloud Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Abstract

Updates

Installation

Application-specific Instructions

Quick Start with Pseudo-code

How to cite

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dawin_mtl		dawin_mtl
dawin_rft		dawin_rft
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
dawin.yaml		dawin.yaml

License

naver-ai/dawin

Folders and files

Latest commit

History

Repository files navigation

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Abstract

Updates

Installation

Application-specific Instructions

Quick Start with Pseudo-code

How to cite

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages