TY - GEN
T1 - A Survey on Small Language Models in the Era of Large Language Models
T2 - 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
AU - Wang, Fali
AU - Lin, Minhua
AU - Ma, Yao
AU - Liu, Hui
AU - He, Qi
AU - Tang, Xianfeng
AU - Tang, Jiliang
AU - Pei, Jian
AU - Wang, Suhang
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/8/3
Y1 - 2025/8/3
N2 - Large language models (LLMs) based on Transformer architecture are powerful but face challenges with deployment, inference latency, and costly fine-tuning. These limitations highlight the emerging potential of small language models (SLMs), which can either replace LLMs through innovative architectures and technologies, or assist them as efficient proxy or reward models. Emerging architectures such as Mamba and xLSTM address the quadratic scaling of inference with window length in Transformers by enabling linear scaling. To maximize SLM performance, test-time compute scaling strategies reduce the performance gap with LLMs by allocating extra compute budget during test time. Beyond standalone usage, SLMs could also assist in LLMs via weak-to-strong learning, proxy tuning, and guarding, fostering secure and efficient LLM deployment. Lastly, the trustworthiness of SLMs remains a critical yet underexplored research area. However, there is a lack of tutorials on cutting-edge SLM technologies, prompting us to conduct one.
AB - Large language models (LLMs) based on Transformer architecture are powerful but face challenges with deployment, inference latency, and costly fine-tuning. These limitations highlight the emerging potential of small language models (SLMs), which can either replace LLMs through innovative architectures and technologies, or assist them as efficient proxy or reward models. Emerging architectures such as Mamba and xLSTM address the quadratic scaling of inference with window length in Transformers by enabling linear scaling. To maximize SLM performance, test-time compute scaling strategies reduce the performance gap with LLMs by allocating extra compute budget during test time. Beyond standalone usage, SLMs could also assist in LLMs via weak-to-strong learning, proxy tuning, and guarding, fostering secure and efficient LLM deployment. Lastly, the trustworthiness of SLMs remains a critical yet underexplored research area. However, there is a lack of tutorials on cutting-edge SLM technologies, prompting us to conduct one.
UR - https://www.scopus.com/pages/publications/105014313414
UR - https://www.scopus.com/pages/publications/105014313414#tab=citedBy
U2 - 10.1145/3711896.3736563
DO - 10.1145/3711896.3736563
M3 - Conference contribution
AN - SCOPUS:105014313414
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 6173
EP - 6183
BT - KDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 3 August 2025 through 7 August 2025
ER -