In this brief introduction, we give a high-level description of the PI's research in the generalization of Deep Neural Networks (DNNs) and his existing work in model compression. It is emphasized that the proposed research is based on the PI's prior works with substantially novel methods, as detailed in the proposal.
DNNs have achieved remarkable success across a wide range of applications. However, their large size and computational cost make deployment on resource-constrained devices challenging. Model compression techniques aim to reduce model size and computation while preserving performance. Despite the lower deployment cost, such compressed models often suffer from noticeable performance drops compared to the original models, such as drops in prediction accuracy, potentially leading to catastrophic failures in important applications. Therefore, it is of particular importance to reduce the deployment cost of DNNs by model compression while improving the generalization capability of compressed models for real-world applications, which is the purpose of the PI's work in model compression.
The PI’s prior works cover extensive theoretical and empirical studies on
the generalization of DNNs from two perspectives: an information-theoretic perspective and
a kernel learning perspective. These works include
sharp generalization bounds for regression with neural networks (ICML’25) [1] and transductive
learning (ICML’25) [2] by kernel complexity, Information Bottleneck (IB)-based token merging
(IEEE TPAMI’25) [3] and pruning (ICML’24) [4] for vision transformers, and kernel complexity-reduced training for improved
generalization of ViTs (NeurIPS’24) [5] and Graph Neural Networks (GNNs) for transductive learning (TMLR’25) [6].
The PI has extensive prior work in model compression using various techniques including channel pruning (IJCV/UAI/AAAI-MAKE) [7,8,9],
weight sharing (ICLR/ICML) [10,11], and NAS [7,8,10], aiming to find DNNs with low deployment cost.
These works in model compression ensure that the model compression part of the proposed project can be carried out smoothly with a solid methodological foundation.
The PI's research features model compression while preserving the in-distribution generalization capability of compressed models. In particular, the IB-based token merging [3]
and pruning [4] develop a distribution-free and computationally efficient variational upper bound for an IB ojective, so that the compressed models maintain
in-distribution prediction accuracy. Furthermore, a principled kernel complexity loss is proposed and reduced for improved in-distribution generalization of popular ViTs [5]
and GNNs [6].
It is emphasized that the IB-based and kernel complexity-based methods in the PI’s prior works are primarily designed to improve in-distribution generalization. However, these approaches do not explicitly address distribution shifts encountered in real-world scenarios for Scientific machine learning (SML). Building upon this foundation, the proposed research develops novel methodologies that extend these principles to explicitly model and control out-of-distribution (OOD) generalization and robustness, by introducing new objectives and learning mechanisms that account for distribution shifts for compression of SML models, going beyond the scope of the PI’s prior work.