VarMatch: Robust matching of small variant datasets using flexible scoring schemes

Chen Sun, Paul Medvedev

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


Motivation: Small variant calling is an important component of many analyses, and, in many instances, it is important to determine the set of variants which appear in multiple callsets. Variant matching is complicated by variants that have multiple equivalent representations. Normalization and decomposition algorithms have been proposed, but are not robust to different representation of complex variants. Variant matching is also usually done to maximize the number of matches, as opposed to other optimization criteria. Results: We present the VarMatch algorithm for the variant matching problem. Our algorithm is based on a theoretical result which allows us to partition the input into smaller subproblems without sacrificing accuracy. VarMatch is robust to different representation of complex variants and is particularly effective in low complexity regions or those dense in variants. VarMatch is able to detect more matches than either the normalization or decomposition algorithms on tested datasets. It also implements different optimization criteria, such as edit distance, that can improve robustness to different variant representations. Finally, the VarMatch software provides summary statistics, annotations and visualizations that are useful for understanding callers' performance.

Original languageEnglish (US)
Pages (from-to)1301-1308
Number of pages8
Issue number9
StatePublished - May 1 2017

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'VarMatch: Robust matching of small variant datasets using flexible scoring schemes'. Together they form a unique fingerprint.

Cite this