TY - GEN
T1 - Solving the minimum string cover problem
AU - Canzar, Stefan
AU - Marschall, Tobias
AU - Rahmann, Sven
AU - Schwiegelshohn, Chris
PY - 2012
Y1 - 2012
N2 - A string cover C of a set of strings S is a set of substrings from S such that every string in S can be written as a concatenation of the strings in C. Given costs assigned to each substring from S, the MINIMUM STRING COVER (MSC) problem asks for a cover of minimum total cost. This NP-hard problem has so far only been approached from a purely theoretical perspective. A previous integer linear programming (ILP) formulation was designed for a special case, in which each string in S must be generated by a (small) constant number of substrings. If this restriction is removed, the ILP has an exponential number of variables, for which we show the pricing problem to be NP-hard. We propose an alternative flow-based ILP formulation of polynomial size, whose structure is particularly favorable for a Lagrangian relaxation approach. By making use of the strong bounds obtained through a repeated shortest path computation in a branch-and-bound manner, we show for the first time that non-trivial MSC instances can be solved to provable optimality in reasonable time. We also provide and solve real-world instances derived from the classic text "Alice in Wonderland". On almost all instances, our Lagrangian relaxation approach outperforms a CPLEX-based implementation by an order of magnitude. Our software is available under the terms of the GNU general public license.
AB - A string cover C of a set of strings S is a set of substrings from S such that every string in S can be written as a concatenation of the strings in C. Given costs assigned to each substring from S, the MINIMUM STRING COVER (MSC) problem asks for a cover of minimum total cost. This NP-hard problem has so far only been approached from a purely theoretical perspective. A previous integer linear programming (ILP) formulation was designed for a special case, in which each string in S must be generated by a (small) constant number of substrings. If this restriction is removed, the ILP has an exponential number of variables, for which we show the pricing problem to be NP-hard. We propose an alternative flow-based ILP formulation of polynomial size, whose structure is particularly favorable for a Lagrangian relaxation approach. By making use of the strong bounds obtained through a repeated shortest path computation in a branch-and-bound manner, we show for the first time that non-trivial MSC instances can be solved to provable optimality in reasonable time. We also provide and solve real-world instances derived from the classic text "Alice in Wonderland". On almost all instances, our Lagrangian relaxation approach outperforms a CPLEX-based implementation by an order of magnitude. Our software is available under the terms of the GNU general public license.
UR - http://www.scopus.com/inward/record.url?scp=84882331322&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84882331322&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972924.8
DO - 10.1137/1.9781611972924.8
M3 - Conference contribution
AN - SCOPUS:84882331322
SN - 9781611972122
T3 - Proceedings of the Workshop on Algorithm Engineering and Experiments
SP - 75
EP - 83
BT - Proceedings of the Meeting on Algorithm Engineering and Experiments, ALENEX 2012
PB - Society for Industrial and Applied Mathematics Publications
T2 - 14th Annual Meeting on Algorithm Engineering and Experiments, ALENEX 2012
Y2 - 16 January 2012 through 16 January 2012
ER -