LIS and network flow

19 minute read

Published:

This post discusses Longest Increasing Subsequence (LIS) and network flow problem.

Two Chains Problem

link
Give a $N$-permutation $\sigma = (a_1, \ldots, a_n)$ and a poset $P_{\sigma}$ associated with it. That is, $P_{\sigma} = (\sigma, \prec)$ such that $(a,b)\prec(x,y)$ if $a < x$ and $b < y$. Find two nonintersecting chains $C_1, C_2$ such that $|C_1\sqcup C_2|$ is maximized. Output $|C_1\sqcup C_2|$.

Note that finding a maximal chain first and picking up another longest one does not work. For example, the array is $(2 3 6 1 4 5)$. Then, the longest chain is $(2 3 4 5)$, but the answer is $(2 3 6) \sqcup (1 4 5)$.

Sol 1 - DP

Let $A^m_{xy}$ denote the maximum size of two chain covering in $(a_1, \ldots, a_m)$ such that the two chains end with $x$ and $y$ respectively. Here, to make math concepts reflect the program code, I use superscripts to represent the iteration of algorithm (i.e. loop variables) and subscripts for indices of arrays.

At the $m$-th step, we try to pick $a_m$ to one of the chain, and let another chain remain $x$. Thus, the best length is $g^m_{x} = 1 + \max_{y=1}^{a_i}{A^{m-1}_{xy}}$. Then we can update $A$ as: \(A^m_{a_i,y} = \max(A^{m-1}_{a_i,y}, g^m_y) \\ A^m_{x,a_i} = \max(A^{m-1}_{x,a_i}, g^m_x)\) Other cells do not change.

Note that we can use a segment tree to handle the second dimension of $A$. Thus, the overall complexity is $O(N^2\log(N))$.

Sol 2 - Cost Flow

This problem can be reduced to a min-cost circulation problem as follows. Let node $s$ and $t$ denote the “source” and “destination”. We need two chains, so add a flow $t\to s$ with $(min=2,max=2, cost=0)$ For each element $a_i$, we build two nodes $p_i$ and $q_i$ for it. Add edges $s\to p_i : (0,1,0)$, $p_i\to q_i: (0, 1, -1)$, $q_i\to t: (0, 1, 0)$. Now we need to add edges among elements to represent the chain relationship.

The simplest solution is to add $q_i\to p_j: (0, 1, 0)$ for all $i < j\wedge a_i < a_j$. However, this adds up to $N^2$ edges into the graph. Since $\prec$ is a partial order, we can use the transtivity to optimize the graph. That is, we only link $a_i$ to the antichain covering $a_i$:

\[A_i = \{ j: a_i\prec a_j \wedge \forall k \neg(a_i\prec a_k\prec a_j) \}\]

Add $q_i\to p_j: (0, 1, 0)$ for all $j\in A_i$. To let a flow bypass a node, we can add one of the follows:

  • $p_i\to q_i: (0, \infty, 0)$ for all $i$;
  • $p_i\to p_j: (0, 1, 0)$ for all $j\in A_i$.

In the worst case we can still have $N^2$. For example, $N = 2N’$ and $\sigma = (N’, N’-1, \ldots, 1, 2N’, 2N’-2, \ldots, N’+1)$. But this is more efficient in most cases, especially when chains are long, since any two elements that are connected must be neighbours in a saturated chain.

Two-Chain Covering Problem

Give $P_{\sigma}$ and a saturated chain $C$. Find two disjoint chains $C_1, C_2$ such that $C \subseteq C_1\sqcup C_2$ and the total length is maximized. Output a feasible pair of chains $C_1, C_2$. Of course cost flow works. But the condition that $C$ is saturated gives us more chances.

This problem is proposed by me. I’m not sure whether there are more solutions if only the length is required. I think my DP is the best algorithm if we need the chains.

Sol 1 - DP

A naive thinking is letting $C_1=C$ and $C_2$ be another LIS. This is wrong of course, but we can improve it. (PS: a longer counter example is $(241369578)$ and $C=(23678)$, where $C$ needs to be broken into $\geq 3$ fragments.) Note that $C$ can be segmented into subsequences, where each segment is either in $C_1$ or $C_2$. Suppose $C=(c_1, \ldots, c_m)$ and $C_1 = X = (x_1, \ldots, x_u)$, $C_2 = Y = (y_1, \ldots, y_v)$. For example, assume $(c_l, \ldots, c_r)$ is in $X$. Then, $c_{l-1}$ and $c_{r+1}$ are in $Y$: $y_{l’-1} = c_{l-1}$, $y_{r’+1} = c_{r-1}$. Thus, if we replace $(c_l, \ldots, c_r)$ with $(y_{l’}, \ldots, y_{r’})$ in $C$, $C$ will be still a chain. Based on this observation, consider interval-based dynamic programming (区間DP).

Let $d_i$ denote the maximal length when we only consider $(c_1, \ldots, c_i)$ and $(a_1, \ldots, a_{s(i)})$ where $a_{s(i)}$ is $c_i$. Let $f_{lr}$ denote the how many element in $\sigma\setminus C$ we can choose to make a chain between $a_l$ and $a_r$. Clearly, $d_x = \max_{y=1}^{x-1}{(d_y + f_{s(y-1),s(x)})}$. Let $w_i = 0$ for all $i$ s.t. $a_i$ in $C$ and $w_i = 1$ for all $a_i$ not in $C$. We know that $f_{i,i} = w_i$ and $f_{l,r} = w_l + \max{ f_{i,r} : l+1 \leq i\leq r\wedge a_l < a_i < a_r }$. Here the calculation of $f_{l,r}$ is costly, so we want to optimize it. Similar to the optimization used in cost flow, let $A_l$ be the antichain covering $a_l$:

\[A_l = \{ j: a_l\prec a_j \wedge \forall k \neg(a_l\prec a_k\prec a_j) \}\]

And we only need to consider those $i$ from $A_l$:

\[f_{l,r} = w_l + \max\{ f_{i,r} : i\in A_l \wedge i < r \wedge a_i < a_r \}\]

If this algorithm can calculate $f_{l,r}$ in $O(N^2)$, the overall complexity will become $O(N^2)$. Well, it is false in some rare cases, but true most of the time.

$P_\sigma$ is a graded poset. All $A_l$ are actually part of an antichain decomposition. Therefore, the time complexity is bound by $O(N^3 / h)$ and the worst case is

\[\sigma = (x, x-1, \ldots, 1, 2x, 2x-1, \ldots, x+1, 3x, 3x-1, \ldots, 2x+1)\]

where $h=3$ and our algorithm is $O(N^3)$. However, the shorter $h$ is, the easier we can find another long chain.

Min-Cost Flow/Circulation Problem

Given a digraph $D=(V,E)$ with capacity function $c: E\to\mathbb{R}$, cost function $k: E\to\mathbb{R}$, supply-demand function $d: V\to\mathbb{R}$ A feasible flow is a function $f: E\to\mathbb{R}$ s.t. (capacity constraints) $0 \leq f(e)\leq c(e)$ for all $e\in E$ and (flow conservation) $\sum_{uv\in E}{f(uv)} - \sum_{vu\in E}{f(vu)} = d(u)$ for all $u\in V$. In most of the cases, $d(s)=d$, $d(t) = -d$, $d(u) = 0$ for all $u\in V\setminus{ s,t }$. In the circulation problem, $d(u)=0$ for all $u\in V$.

Negative Circuit Condition

Given a feasible flow $f$, define its residue network to be $D_f=(V,E_f = E\sqcup E^{-1})$, where $E^{-1} = \{ e^{-1}=(v,u) : e=(u,v)\in E \}$, with $c_f(e) = c(e) - f(e)$, $c_f(e^{-1}) = f(e)$, $k_f(e) = k(e)$, $k_f(e^{-1}) = -k(e)$. Then, $f$ is a min-cost flow if and only if there is no negative circuit with non-zero capacity in $D_f$.

Proof: $\Rightarrow$ is clear, as we can always augment on a negative circuit to get a better flow.

To prove $\Leftarrow$, suppose all circuits $C\in D_f$ has a non-negative cost $c_f(C) \geq 0$. Let $f’$ be an arbitrary feasible circulation. Then, by flow conservation, $f’-f$ is the sum of several circuits in $D_f$: $f’-f = \sum {\lambda_i C_i}$. Thus, $cost(f’-f) = cost(f’) - cost(f) = \sum {\lambda_i c_f(C_i)} \geq 0$.

Complementary Slackness Condition

The linear programming equiation is: (here $u\to v$ denotes $(u,v)\in E$)

\[\begin{eqnarray} \min \quad & \sum_{u\to v}{k(uv)f(uv)} \nonumber \\ \text{s.t.} \quad & \sum_{u\to v}f(uv) - \sum_{v\to u}f(vu) &=& d(u) && \forall u\in V \nonumber \\ & -f(uv) &\geq& - c(uv) && \forall u\to v \nonumber \\ & f(uv) &\geq & 0 && \forall u\to v \end{eqnarray}\]

The first constraint can be written as $\mathbf{B}\mathbf{f}=\mathbf{d}$, where $\mathbf{B}$ is the incidence matrix. Recall that a digraph’s incidence matrix is always totally unimodular, so the linear programming automatically gives an integer solution if $c$, $d$ and $k$ are all integers.

Its dual programming is:

\[\begin{eqnarray} \max \quad & \sum_{u}{d(u)\pi(u)} - \sum_{u\to v}{c(uv)z(uv)} \nonumber \\ \text{s.t.} \quad & \pi(u) - \pi(v) - z(uv) &\leq& k(uv) && \forall u\to v \nonumber \\ & z(uv) &\geq & 0 && \forall u\to v \end{eqnarray}\]

Recall how we write dual programming equations:

  • Transpose the matrix, swap coefficients in the goal and right hand side in constraints.
  • For every $\geq$ constraint in $\min$ ($\leq$ constraint in $\max$), let the dual varible be non-negative.
  • For every $=$ constraint, let the dual variable be free.
  • For every non-negative variable, let the dual constraint be $\leq$ in $\min$ ($\geq$ in $\max$).
  • For every free variable, let the dual constraint be $=$.

Look at the dual programming, observe that:

  • the absolute values of $\pi$ (called potential) does not matter. Relative values make sense.
  • Since all $c(uv)$ are positive, we can always take $z(uv) = \max(\pi(u)-\pi(v)-k(uv), 0)$ to get a optimized solution.

By the complementary slackness theorem, if $f$ and $\pi, z$ are feasible solutions to two programmings respectively, then they are optimal if and only if \(\begin{eqnarray} f(uv)[\pi(u) - \pi(v) - z(uv) - k(uv)] &=& 0 \nonumber \\ z(uv)[f(uv) - c(uv)] &=& 0 \end{eqnarray}\)

Substitute for $z(uv)$ and get the the following condition: $f$ is a min-cost flow if and only if there exists a potential function $\pi$ s.t.

  • $\pi(u) - \pi(v) \leq k(uv)$ for all $f(uv) < c(uv)$
  • $\pi(u) - \pi(v) \geq k(uv)$ for all $f(uv) > 0$
  • Note:
    • Some book uses $p(uv) = -\pi(uv)$ so $\leq$ and $\geq$ are interchanged.
    • When $0 < f(uv) < c(uv)$, we have $\pi(v) = \pi(u) - k(uv)$.
    • We can take $\pi(s) = 0$ if we want to calculate $\pi$.

Let $k(vu) = - k(uv)$ for all $u\to v$. Take a cycle $C$ with non-zero capacity from the residue network $D_f$, sum the inequality up, and get: \(\sum_{uv\in C}{k(uv)} \geq 0\) which is exactly the negative circuit condition.

Feasible Tree Theorem

In simplex algorithm of linear programming, one jumps among vertices to get an optimal solution. Then, a natural problem is what a vertex means in a digraph. Consider the edges with $0 < f(uv) < c(uv)$ in the residue graph. If there is a cycle $C$, it must have zero cost. Thus, we can flow in either direction until the cycle breaks. This cycle is corresponding to a hyperplane perpendicular to the target, and the two directions give two optimal vertices. Suppose $D$ is weakly connected. Then, we only need to consider case that all non-tight arcs form a spanning tree.

Theorem: A set of columns of the incidence matrix $\mathbf{B}$ is a basis if and only if its arcs form a spanning tree.

Proof: Since $\mathbf{B}$ is weakly connected, $\mathrm{rank}(\mathbf{B}) = n-1$. Let $B_1$ denote a set of $n-1$ columns. If $B_1$ is a basis, then $\mathrm{rank}(B_1) = n-1$, which means there is no cycle in its arcs. Thus, it must be connected and a spanning tree. If $B_1$ is a spanning tree, then clearly $\mathrm{rank}(B_1) = n-1$. Thus, $B_1$ is a basis.

Pivoting in the (residue) network is:

  • Select a non-tree edge $e=(u,v)$ such that the entering number $k^{\pi}(uv) = k(uv) - \pi(u) + \pi(v) < 0$.
    • $\pi$ is calculated along the tree by setting the root $\pi(s) = 0$.
  • Augment through the cycle it makes until some edge is tight ($f = 0$ or $f = c$)
    • The augment amount is the minimum capacity of all edges on the cycle, which can be zero.
  • Remove the tight edge and move the new edge in.

Similar to Bland Rule, one can prove that this algorithm does not fall into an infinite loop with some rules obeyed.

Lemma: Fix a root node $s$ in a feasible tree. Call a feasible tree strong, if all degenerated edges with $f(uv) = 0$ is pointing away from the root (and similarly edges that $f(uv)=c(uv)$ is pointing towards the root). A strong tree will only pivot into another strong tree. And if one always remove the first edge going from $s$ in pivoting, all strong trees will be different.

Greene-Kleitman Theorem

Theorem: Let $D=(V,E)$ be an DAG, $B\subseteq E$ and $k\in\mathbb{Z}_+$. Then, for any collection of at most $k$ directed cuts $\mathcal{C}$, and any collection of any number of paths $\mathcal{P}$, \(\max_{\mathcal{C}}|B\cap\Gamma| = \min_{\mathcal{P}}[|B \setminus \Pi| + k|\mathcal{P}|]\) where $\Gamma = \bigcup\mathcal{C}$ and $\Pi = \bigcup\mathcal{P}$.

Note:

  • Here, a directed cut is all edges going out of a set $S$ into $T=V\setminus S$, $\delta^{out}(S)=\delta^{in}(T)$, where $T \neq \varnothing$ and $\delta^{out}(T) = \varnothing$.
  • Some books define S-T cut as $\delta^{out}(S)$, $s\in S$ and $t\in T$, without requiring $\delta^{out}(T) = \varnothing$. The minimum S-T cut under this definition equals to the minimal directed cut.

Proof: To see $\max \leq \min$: \(|B\cap\Gamma| \leq |B\setminus\Pi| + |\Pi\cap\Gamma| \leq |B\setminus\Pi| + k|\mathcal{P}|\) because one path can intersect with each cut at most once.

Note: The intersection property does not apply to S-T cut. For example, $V = \{ s,a,b,t \}$, $E=\{ sa,sb,ab,at,bt \}$, $S = \{ s,b \}$ and $T = \{ a, t \}$. Then, the S-T cut contains 2 edges $\{ sa, bt \}$, and path $s\to a\to b\to t$ passes both.

To prove equality, suppose $D$ has a single source $s$ and a single sink $t$, and construct $\mathcal{P}, \mathcal{C}$ that make both sides equal.

Make a network $D’ = (V, E\cup B’\cup \{ (t,s) \})$, where $B’ = \{ b’=(u,v) : b=(u,v)\in B \}$ is a copy of $B$. Assign

  • $c(uv) = \infty, k(uv) = 0$ for $uv\in E$
  • $c(uv) = 1, k(uv) = -1$ for $uv\in B’$
  • $c(ts) = \infty, k(ts) = k$

Let $f$ be a min-cost circulation and $p$ be the negative of potential function $\pi_f$ such that $p(t) = 0$. More specifically,

  • $p(v)\leq p(u)$ for $a=uv\in A$, with equality if $f(a=uv)>0$.
    • This ensures $p(u) \geq 0$ for all $u$.
  • For $a’=uv\in B’$
    • $p(v)\leq p(u) - 1$ if $f(a’=uv) = 0$
    • $p(v)\geq p(u) - 1$ if $f(a’=uv) = 1$
  • $p(s) \leq p(t) + k$, with equality if the flow is not empty $f(ts) > 0$

Take $f$ as the collection of paths $\mathcal{P}$. The negative part of cost of $f$ (contributed by $B’$) is exactly $|B\cap \Pi|$, and the positive part is $k|\mathcal{P}|$. Thus, the right hand side of the equation is $|B| + \mathrm{cost}(f)$.

Note: Not clearly written in the book, but here we use the condition that $D$ is acyclic. Otherwise, $f$ may contain a cycle that does not pass $s$ and $t$.

Take $k$ cuts as follows. For $i= 1,\ldots, v$,let $U_i = \{v\in V : p(v) \geq i\}$. Take $\delta^{out}(U_i)$ as (at most) $k$ cuts. Since $p(u)\geq p(v)$, for all $uv\in E$, $\delta^{in}(U_i) = \varnothing$.

For all $uv\in B’$, if $f(uv) = 0$, then $p(v)\leq p(u) - 1$, so the original arc $uv\in\delta^{out}(U_u)$. That is, an arc in $B$ which is not in a path must be included in one of the cut ($B\setminus\Pi\subseteq B\cap\Gamma$). Thus, $B\setminus \Pi = (B\cap \Gamma)\setminus\Pi$.

On the other hand, for an arc in a cut, it may or may not be included in a path. ($p(v)\leq p(u) \leq p(u+1)$ for $f(a’ = uv) = 1$) If it is included($|B\cap\Gamma\cap\Pi|$), then must be on only exactly one path ($p(u)=p(v)$ for $f(a=uv) > 0$). Actually,

\[\begin{eqnarray} |B\cap\Pi\cap\Gamma| &=& \sum_{a=uv\in B}{(p(u)-p(v))f(a')} \nonumber \\ &=& \sum_{a=uv\in E}{(p(u)-p(v))f(a)} + \sum_{a=uv\in B}{(p(u)-p(v))f(a')} \nonumber \\ &=& - (p(t) - p(s))f(ts) \nonumber \\ &=& k|\mathcal{P}| \end{eqnarray}\]

The last step comes from flow conservation and that a potential function sums up to $0$ on a circuit.

Thus,

\[|B\cap\Gamma| = |B\cap\Gamma\cap\Pi| + |B\cap\Gamma\setminus\Pi| = |B\setminus\Pi| + k|\mathcal{P}|\]

Theorem: (the dual) For any collection of at most $k$ paths $\mathcal{P}$, and any collection of any number of directed cuts $\mathcal{C}$,

\[\max_{\mathcal{P}}|B\cap\Pi| = \min_{\mathcal{C}}[|B \setminus \Gamma| + k|\mathcal{C}|]\]

where $\Gamma = \bigcup\mathcal{C}$ and $\Pi = \bigcup\mathcal{P}$.

Proof: Use the network $D’$ we build before, but change the $ts$ arc to $c(ts)=k$, $k(ts)=0$. Note that this is exactly what we do in the “Two Chains Problem”. The rest of the proof is similar.

Theorem: In a poset $P=(X,\prec)$, the maximum size of the union of $k$ antichains is equal to:

\[\min_{\mathcal{C}}\sum_{C\in\mathcal{C}}{\min(k, |C|)}\]

where $\mathcal{C}$ ranges over disjoint chain partitions.

Proof: Build the network $D=(V,E)$ as follows:

  • $V = X\sqcup X’$, where $X’$ is a copy of $X$. Let $p_i$ and $q_i$ denote the the $x_i$ in $X$ and $X’$, respectively.
  • $E = \{(p_i,q_i): \forall x_i\in X\}\sqcup\{ (q_i, p_j): \forall x_i\prec x_j \}$
    • Recall what we do in the “Two Chains Problem”.
  • Let $B = \{(p_i,q_i): \forall x_i\in X\}$. Apply the theorem above. For each arc $(p_i,q_i)\notin\Pi$, we take a singleton $\{x_i\}$.

Theorem: (the dual) The maximum size of the union of $k$ chains is equal to:

\[\min_{\mathcal{A}}\sum_{A\in\mathcal{A}}{\min(k, |A|)}\]

where $\mathcal{A}$ ranges over disjoint antichain partitions.

Theorem: The difference of numbers of the union of $k$ chains & the union of $k$ antichains are conjugate partitions of $P$.

RSK correspondence

Robinson-Schensted Bijection

Recall how we calculate LIS:

  • Let $P_k^{(i)}$ be the least possible end of an increasing sequence of length $k$, out of first $i$ elements.
    • Initialize this sequence empty $P_k^{(0)}$.
  • When we move to $i$, we try to replace the first element greater than $a_i$ from $P_k^{(i-1)}$
    • Suppose $P_j^{(i-1)} > a_i$ is the first such element, then $P_j^{(i)} = a_i$ and $P_k^{(i)} = P_k^{(i-1)}$ for $k\neq j$.
    • If $a_i$ is larger than all $P_k^{(i-1)}$, append it to the last.

At last, the length of $P_k^{(n)}$ is the length of LIS. We call this operation “inserting $a_n$ into a row”.

Now let us modify this algorithm a bit so no element are dropped:

  • Initialize two empty standard Young tableaux $(P,Q)$.
  • For $i=1$ to $n$, insert $a_i$ into the first row of $P$.
    • If some element gets out, insert it into the next row; repeat this until no element is bumped out.
    • At last, write $i$ in $Q$ at the position we finish the insertion in $P$.

It is easy to prove that $(P,Q)$ are standard Young tableaux. Also, note that each step is invertible: we drop the largest element from $Q$, and pop the corresponding element in $P$ up. Thus, this algorithm gives a correspondence between a permutation and a pair of standard Yound tableaux of the same shape.

\[\sum_{\lambda\vdash n}\#SYT(\lambda)^2 = n!\]

With proof omitted we have the following theorems:

  • Row insertion and column insertion commute.
  • The $P$ of the reversed sequence is the transpose.
  • If $a_i$ enters $P$ at the $j$-th column, then the longest increasing sequence ending at $a_i$ has length $j$.

Knuth Correspondence

Knuth relation: Suppose we have $a_{i-1}< a_{i} >a_{i+1}$. Swap $a_{i}$ with $\min(a_{i-1}, a_{i+1})$, and we will get $a’_{i-1} > a’_{i} < a’_{i+1}$. We can also do this reversely.

Two permutations are Knuth equivalent if one can be flipped to another with a sequence of Knuth relations.

Theorem: Two permutations are Knuth equivalent if and only if they share the $P$ tableaux.

k-LIS

Theorem: The longest total length of $k$-increasing subsequences is the total size of first $k$ rows of $P$.

  • Let $A_P$ be the permutation concatenate the rows of $P$ bottom up. Clearly, the theorem is true for $A_P$.
  • It is proven that Kunth relations preserve $k$-LIS.

References

  • Igor Pak, UCLA MATH206A, Fall 2020.
  • Alexander Schrijver, Combinatorial Optimization, Chapter 14, Springer, 2004.
  • Bruce Sagan, The Symmetric Group, Second Ed., Chapter 3, Springer, 2000.