Abstract

Computational methods in drug discovery significantly reduce both time and experimental costs. Nonetheless, certain computational tasks in drug discovery can be daunting with classical computing techniques which can be potentially overcome using quantum computing. A crucial task within this domain involves the functional classification of proteins. However, a challenge lies in adequately representing lengthy protein sequences given the limited number of qubits available in existing noisy quantum computers. We show that protein sequences can be thought of as sentences in natural language processing and can be parsed using the existing Quantum Natural Language framework into parameterized quantum circuits of reasonable qubits, which can be trained to solve various protein-related machine-learning problems. We classify proteins based on their sub-cellular locations - a pivotal task in bioinformatics that is key to understanding biological processes and disease mechanisms. Leveraging the quantum-enhanced processing capabilities, we demonstrate that Quantum Tensor Networks (QTN) can effectively handle the complexity and diversity of protein sequences. We present a detailed methodology that adapts QTN architectures to the nuanced requirements of protein data, supported by comprehensive experimental results. We demonstrate two distinct QTNs, inspired by classical recurrent neural networks (RNN) and convolutional neural networks (CNN), to solve the binary classification task mentioned above. Our top-performing quantum model has achieved a 94% accuracy rate, which is comparable to the performance of a classical model that uses the ESM2 protein language model embeddings. It's noteworthy that the ESM2 model is extremely large, containing 8 million parameters in its smallest configuration, whereas our best quantum model requires only around 800 parameters. We demonstrate that these hybrid models exhibit promising performance, showcasing their potential to compete with classical models of similar complexity.

Original languageEnglish (US)
Title of host publicationGLSVLSI 2024 - Proceedings of the Great Lakes Symposium on VLSI 2024
PublisherAssociation for Computing Machinery
Pages132-137
Number of pages6
ISBN (Electronic)9798400706059
DOIs
StatePublished - Jun 12 2024
Event34th Great Lakes Symposium on VLSI 2024, GLSVLSI 2024 - Clearwater, United States
Duration: Jun 12 2024Jun 14 2024

Publication series

NameProceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI

Conference

Conference34th Great Lakes Symposium on VLSI 2024, GLSVLSI 2024
Country/TerritoryUnited States
CityClearwater
Period6/12/246/14/24

All Science Journal Classification (ASJC) codes

  • General Engineering

Cite this