TY - GEN
T1 - Exploiting Feature Heterogeneity for Improved Generalization in Federated Multi-task Learning
AU - Liu, Renpu
AU - Yang, Jing
AU - Shen, Cong
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In this work, we investigate a general federated multitask learning (FMTL) problem where each task may be performed at multiple clients, and each client may perform multiple tasks. Although the tasks share some common representation (i.e., feature-map) that can help to learn, the distribution of the features in the feature space may vary across different tasks at different clients, which poses a significant challenge to FMTL. While non-independent and identically distributed (non-IID) local datasets at different clients are often considered detrimental to model convergence in federated learning (FL), such statistical heterogeneity in feature space may be beneficial to the generalization performance. In this work, we establish the impact of statistical feature heterogeneity on generalization, through the lens of a multi-task linear regression model. In order to leverage the feature distribution heterogeneity, we propose a novel augmented dataset based approach, and prove that under certain conditions, FMTL on heterogeneous datasets can outperform the homogeneous counterpart in terms of the generalization performance. The theoretical analysis further leads to a simple client weighting method based on optimizing the excess risk upper bound. Experimental results demonstrate that the generalization performance can be improved on a real-world dataset with the proposed method.
AB - In this work, we investigate a general federated multitask learning (FMTL) problem where each task may be performed at multiple clients, and each client may perform multiple tasks. Although the tasks share some common representation (i.e., feature-map) that can help to learn, the distribution of the features in the feature space may vary across different tasks at different clients, which poses a significant challenge to FMTL. While non-independent and identically distributed (non-IID) local datasets at different clients are often considered detrimental to model convergence in federated learning (FL), such statistical heterogeneity in feature space may be beneficial to the generalization performance. In this work, we establish the impact of statistical feature heterogeneity on generalization, through the lens of a multi-task linear regression model. In order to leverage the feature distribution heterogeneity, we propose a novel augmented dataset based approach, and prove that under certain conditions, FMTL on heterogeneous datasets can outperform the homogeneous counterpart in terms of the generalization performance. The theoretical analysis further leads to a simple client weighting method based on optimizing the excess risk upper bound. Experimental results demonstrate that the generalization performance can be improved on a real-world dataset with the proposed method.
UR - http://www.scopus.com/inward/record.url?scp=85171441302&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85171441302&partnerID=8YFLogxK
U2 - 10.1109/ISIT54713.2023.10206757
DO - 10.1109/ISIT54713.2023.10206757
M3 - Conference contribution
AN - SCOPUS:85171441302
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 180
EP - 185
BT - 2023 IEEE International Symposium on Information Theory, ISIT 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE International Symposium on Information Theory, ISIT 2023
Y2 - 25 June 2023 through 30 June 2023
ER -