Jekyll2023-06-15T05:34:47-07:00https://zjkmxy.github.io/feed.xmlXinyu MaPH.D. Candidate in CSXinyu Maxinyuma@ucla.eduLocal and Global in Math2022-11-20T00:00:00-08:002022-11-20T00:00:00-08:00https://zjkmxy.github.io/posts/2022/11/local-global<p>Local vs Global seems to be one of the most important philosophies behind math. People in some fields call it “compactness”, others don’t. In short, it says that <em>the property of an object is related to the property of every smaller subobject</em>.</p> <p>In this post, I list some results I know, basically for personal notes. But there are way more theories related or inspired by this idea.</p> <h1 id="logic">Logic</h1> <h2 id="compactness">Compactness</h2> <p>Topological compactness is probably the first local-global relation that one learns at university. It is a generalization of a closed interval of $\mathbb{R}$. A space is called <strong>compact</strong> if every open cover has a finite subcover. Beside this, there is a concept called <strong>sequential compactness</strong>, stating that every sequence has a convergent subsequence. For a metric space, these two conditions are the same. A subset of Euclidean space is compact if and only if it is closed and bounded (<a href="https://en.wikipedia.org/wiki/Heine%E2%80%93Borel_theorem">Heine-Borel Theorem</a>). The product space of compact spaces is compact (<a href="https://en.wikipedia.org/wiki/Tychonoff%27s_theorem">Tychonoff’s theorem</a>).</p> <p>To some degree, first-order logic (FOL) gives a compact space. Hence comes the compactness theorem: a theory has a model iff every finite subset of it has a model. Intuitively, every proof is of finite length, so an inconsistent result (evidence) must be proved using finitely many axioms in the theory. And the completeness theorem shows that every consistent theory has a model. Compactness and downward Löwenheim-Skolem theorem are used to characterize FOL.</p> <p>Limit ordinals are the union of all ordinals below it, so some properties automatically hold at $\lambda$ if it holds for all ordinals below $\lambda$. This helps the transfinite induction.</p> <p>König’s lemma states that every infinite tree of finite width has an infinite branch. Unfortunately this fails at $\omega_1$ (<a href="https://en.wikipedia.org/wiki/Aronszajn_tree">Aronszajn tree</a>).</p> <h2 id="reflection-theorem">Reflection Theorem</h2> <p>The <a href="https://en.wikipedia.org/wiki/Reflection_principle#In_ZFC">reflection theorem</a> shows that for any cumulative hierarchy $V_{\alpha}$, a formula is true in $V$ iff it is true in a club of $V_{\alpha}$.</p> <h2 id="forcing">Forcing</h2> <p>It is possible for a (inner) model of set theory to recognize every finite piece of an object, but not understand this object itself.</p> <p>Forcing injects this not existing object to make desired proposition true.</p> <h1 id="geometry">Geometry</h1> <p>Without surprise, most of the local-global property usually comes from smoothness. A manifold $M$ is defined to be a locally $\mathbb{R}^n$ topologic space, so a smooth function on $M$ can be taken as a sum of smooth functions in $\mathbb{R}^n$.</p> <h2 id="residue-theorem">Residue Theorem</h2> <p>Let $C$ be a simple closed, positively oriented contour in the complex plane, and $f$ a function analytic (differentiable) in $C$ except for some points. Then, the contour integration of $f$ around $C$ is $2\pi i$ times the sum of residues on those points. We can take this as that being analytic is so strong a condition that most information about the function is stored in exceptional points.</p> <h2 id="stokes-theorem">Stokes Theorem</h2> <p>The Stokes theorem roughly says “boundary is the inverse operation of differential”. For an oriented $n$-dimensional manifold $M$ with boundary $\partial M$, and a $\omega\in\Omega^{n-1}(M)$ a $n-1$-form (with compact support), we have</p> $\int_{\partial M} \omega = \int_{M} d\omega$ <h3 id="proof">Proof</h3> <p>We use the following convention: suppose in the local coordinate $x_1, \ldots, x_n$, the boundary is given by $x_n \geq 0$. For the orientation $x_1\wedge \cdots \wedge x_n$, we set the induced orientation to be $- x_1\wedge \cdots \wedge x_{n-1}$. Using a partition of unity, express $\omega = \sum_{\alpha}\omega_{\alpha}$, where each $\omega_{\alpha}$ is within a local coordinate $U_{\alpha}$. It suffices to prove the theorem for $\omega_{\alpha}$. Suppose</p> $\omega_{\alpha} = a_1dx_2\wedge\cdots\wedge dx_n - a_2dx_1\wedge\cdots\wedge dx_n + \cdots + (-1)^{n-1} a_ndx_1\wedge\cdots\wedge dx_{n-1}$ <p>Then, we have</p> $d\omega_{\alpha} = \left( \frac{\partial a_1}{\partial x_1} + \cdots + \frac{\partial a_1}{\partial x_1} \right) dx_1\wedge\cdots\wedge dx_{n}$ <p>And thus</p> $\begin{eqnarray} \int_{M} d\omega &amp;=&amp; \int_{x_n\geq 0} \left( \frac{\partial a_1}{\partial x_1} + \cdots + \frac{\partial a_1}{\partial x_1} \right) dx_1\cdots dx_{n} \nonumber \\ &amp;=&amp; \int_{x_n\geq 0} a_n\mid^{\infty}_{0} dx_1\cdots dx_{n-1} \nonumber \\ &amp;=&amp; - \int_{x_n\geq 0} a_n(x_1,\ldots, x_{n-1},0) dx_1\cdots dx_{n-1} \nonumber \\ &amp;=&amp; \int_{\partial M} \omega \end{eqnarray}$ <h3 id="application">Application</h3> <p>The theorem is a generalization of the foundamental theorems of calculus as well as a set of similar formulas in $\mathbb{R}^2$ and $\mathbb{R}^3$.</p> <p>It can also be used to prove one version of the Brouwer fixed-point theorem: on a closed ball $B=\{ x\in\mathbb{R}^n : |x|\leq 1 \}$, every smooth endomorphism $F:B\to B$ has a fixed point.</p> <p>Proof: suppose ab absurdo $F$ has no fixed point. Then we can define a smooth map $f:B\mapsto \partial B$ by letting $f(x)$ be the point lying on the ray from $x$ to $F(x)$. $f$ is identity on the boundary $\partial B = S^{n-1}$. Take the standard volume form $\omega$ on $\partial B$, so $\int_{\partial B}\omega = 1$. Now pullback $\omega$ and apply the Stokes theorem</p> $1 = \int_{\partial B}\omega = \int_{\partial B}f^*\omega = \int_{B}d(f^*\omega) = \int_{B}f^*(d\omega) = 0$ <p>Contradiction.</p> <!--De Rham Cohomology ----- In $\\mathbb{R}^n$, closed forms are always exact forms. But it is not the case for an arbitrary manifold. The de Rham cohomology basically measures how closed forms can fail to be exact. TBD--> <!--Chern class ------ --> <!--TBD Need to have more understanding on this --> <h1 id="algebra">Algebra</h1> <p>In algebra, primes are analogue to points, so “local” means to focus on one prime.</p> <h2 id="localization">Localization</h2> <p>Localization focuses on a selected set of primes of a ring by injecting inverse elements to kill other primes. Specifically, given a multiplicative set $S$, $S^{-1}R$ injects $s^{-1}$ for all $s\in S$. Usually, $S$ can be the complement of a prime ideal $R_{p} = (R-p)^{-1}R$. Then, it does something similar to quotient but in a reverse direction: $R/p$ kills all primes below $p$ and makes $p$ the minimal prime ($0$); $R_{p}$ kills all primes above $p$ and makes $p$ the maximal prime ($p_p = pR_p$). Both operations can be applied to modules via extension of scalars: $M/p = R/p\otimes_R M$, $M_p = R_p\otimes_R M$, and both of them are exact functors. But the amazing point is a lot of properties can be recovered from all localizations (called local properties):</p> <ul> <li>$M=0$ if and only if $M_p = 0$ for all (maximal) primes $p$.</li> <li>$M\to N$ is injective/surjective/bijective if and only if the induced map $M_p\to N_p$ is injective/surjective/bijective for all (maximal) primes $p$.</li> <li>$M$ is a torsion-free/flat module if and only if all $M_p$ are so for (maximal) primes $p$.</li> <li>Injectivity and projectivity are not local properties, but preserved after localization.</li> <li>$S^{-1}(M\otimes_R N)$ is isomorphic to $S^{-1}M \otimes_{S^{-1}R} S^{-1}N$.</li> <li>If $M$ is finitely presented, then $S^{-1}\operatorname{Hom}_{R}(M,N)$ is isomorphic to $\operatorname{Hom}_{S^{-1}R}(S^{-1}M,S^{-1}N)$. <ul> <li>When $R$ is noetherian, finitely presented can be replaced by finitely generated.</li> <li>The proof sketch: if $M$ is free, take a base $\{m_1,\ldots, m_n \}$ of $M$. For any homomorphism $g: S^{-1}M\to S^{-1}N$, find the gcd $s$ of denominators of the images of the base elements, and $g$ is of form $f/s$ for some $f:M\to N$. If $M$ is not free, take its finite presentation and apply the five lemma.</li> </ul> </li> </ul> <h2 id="ingetral-extension">Ingetral extension</h2> <p>An integral extension of ring is an analogue of algebraic extensions but between rings. In an $A$-algebra $B$, an element $x\in B$ is integral over $A$ if it is a root of a <em>monic</em> polynomial with coefficients in $A$. The ring $B$ is integral over $A$ if every element in $B$ is integral over $A$. $x$ is integral over $A$ if and only if $A[x]$ is finite over $A$ (i.e. finite generated as an $A$-module). Thus, an integral extension $B$ can be considered as a union of finite extensions. If $B$ is both integral and of finite type over $A$ (i.e. finite generated as an $A$-algebra), then it is finite pver $A$.</p> <p>Like every field extension can be separated into an algebraic part and a transcedental part, an algebra over a field can also be separated into an integral part and an algebraically-indepenedent part. Namely, the Noether’s normalization lemma states that for every finitely generated $k$-algebra $A$, we can find algebraically indepenedent elements $y_1,\ldots,y_d$ s.t. $A/k[y_1,\ldots,y_d]$ is integral, where $d$ is exactly the dimension of $A$. This leads to Hilbert’s Nullstellensatz: for any ACF $k$, the algebraic sets of the affine space $\mathbb{A}^n_k$ are one-one corresponding to radical ideals of ring $k[X_1, \ldots, X_n]$, and the algebraic varieties are one-one corresponding to prime ideals. That is to say, the affine space is roughly the same thing as the spectrum of polynomial ring, with points being maximal ideals and varieties being prime ideals.</p> <h2 id="dedekind-domain">Dedekind domain</h2> <p>The concept of Dedekind domain arose from the research of algebraic integers in number theory. A Dedekind domain is defined to be a noetherian, integrally closed domain of dimension one. A local Dedekind domain is called a DVR, where exactly one prime number $p$ exists, and every number can be valued by the power of $p$ in it. Dedekind domain is a global version of DVR with multiple prime numbers.</p> <h2 id="hasse-minkovski-theorem">Hasse-Minkovski Theorem</h2> <p>If a quadratic form is solvable locally at every place (i.e. valuation), then it is solvable in rationals (integers). More specifically, a quadratic form has an integer solution if and only if it has a real solution and a $p$-adic solution for every prime $p$. Since the integer ring of the $p$-adic field is corresponding to the localization of $\mathbb{Z}$ at prime $p$, $p$-adic fields and the real numbers are considered as “local results”. The proof follows analyzing the equivalent classes of quadratic forms.</p> <p>This does not work for cubic forms, such as $3x^3+4y^3+5z^3=0$ has no rational solution, but it is solvable in all localizations.</p> <h1 id="combinatorics">Combinatorics</h1> <h2 id="eulers-circuit">Euler’s Circuit</h2> <p>A graph has an Euler circuit if and only if every node has an even degree. To obtain an Euler circuit, one can simply start DFS from an arbitrary node and backtrack when it fails.</p> <h2 id="ulams-reconstruction-conjecture">Ulam’s Reconstruction Conjecture</h2> <p>This conjecture says an graph can be reconstructed by all vertex-deleting subgraphs, which are obtained by deleting one vertex of the original graph. More specifically, if two graphs with $&gt;2$ vertices have pair-wisely isomorphic vertex-deleting subgraphs, then the two graphs are isomorphic.</p> <h1 id="references">References</h1> <ul> <li>https://math.stackexchange.com/questions/34053/list-of-local-to-global-principles</li> </ul>Xinyu Maxinyuma@ucla.eduLocal vs Global seems to be one of the most important philosophies behind math. People in some fields call it “compactness”, others don’t. In short, it says that the property of an object is related to the property of every smaller subobject.Rational Points of Elliptic Curves2022-05-03T00:00:00-07:002022-05-03T00:00:00-07:00https://zjkmxy.github.io/posts/2022/05/elliptic-curve-point-group<p>This post briefly proves why the rational points of an elliptic curve is a group. The proof idea comes from MATH 214B, but I tried to use classical language.</p> <h1 id="background">Background</h1> <p>Let $k$ be a field. Then we have the projective plane $\mathbb{P}^2_k$ with homogeneous coordinates $[ x,y,z ]$, where $x,y,z$ are not all zero and $[ wx,wy,wz ]$ represents the same point as $[ x,y,z ]$ for all non-zero $y\in k^{\times}$. A <em>smooth curve</em> $X$ can be defined as a projective smooth variety of dimension $1$. On a plane $\mathbb{P}^2_k$, it can be defined by an irreducible homogeneous polynomial equation $P(x,y,z) = 0$, for $P\in k[x,y,z]$.</p> <p>To simplify discussion, <em>Points</em> on $X$ are defined to be maximal ideals containing the ideal generated by those polynomials. That is, $n+1$ irreducible polynomials $f_0(x),f_1(y),f_2(z)$ where the $P$ above is a linear combination of those $f_i$. For example, we consider $[2, y^2+3, 1] = (x-2,y^2+3, z-1)$ a point of unit circle $x^2+y^2=z^2$ when $k=\mathbb{Q}$, because</p> $(x-2)\cdot (x+2) + (y^2+3) - (z-1)\cdot (z+1) = x^2+y^2-z^2$ <p>However, here $y^2+3$ is not a point in $\mathbb{Q}$ or $\mathbb{C}$, but instead gluing of two points $\pm \sqrt{-3}\in \mathbb{C}$. Points with coordinates inside $k$ are called <em>rational points</em>, whose set is denoted by $X(k)$. Like $[1, 0, 1] = (x-1, y, z-1)$ is both a point and a $\mathbb{Q}$-rational point.</p> <p>A <em>rational function</em> $f$ on $X$ is a non-zero homogeneous fraction polynomial $f=\frac{g}{h}$ on $X$, with $g,h\in \bar{k}[ x,y,z ]$ homogeneous and $\deg g=\deg h$. We use $K(X)$ denote the ring of rational functions. Clearly $f$ is defined on only some points of $X$, but not all points. Two rational functions are considered equal if they share the same poles, zeros, and values on $X$, like $\frac{x}{z}+1$ is the same as $\frac{xz+x^2+y^2}{z^2}$ in the above circle. The <em>order</em> of $f$ at a point $P\in X$, $\operatorname{ord}_P(f)$ is defined as:</p> <ul> <li>If $P$ is a zero of $f$, $\operatorname{ord}_P(f)$ is the order of the zero point. For example, $\operatorname{ord}_P(f) = 2$ for $P=[-1,0,1]$, $f=x/z+1 = -\frac{1}{xz-z^2}y^2$. $\{ f=0 \}$ is the tangent line at that point.</li> <li>If $P$ is a pole of $f$, $\operatorname{ord}_P(f)$ is the negative of the order of the pole. Like $\operatorname{ord}_P(f) = -1$ for $P=[1,i,0]$, $f=x/z+1 = (x+z)z^{-1}$. Note that if the base field is $\mathbb{Q}$ and $P = [1,y^2+1, 0]$ is glued, then $\operatorname{ord}_P(f) = -2$. <ul> <li>(Note: this is for the sake of explanation. In practice people change the way we count degree, instead of the coefficient)</li> </ul> </li> <li>Otherwise, $\operatorname{ord}_P(f)$ is zero.</li> </ul> <h1 id="weil-divisor">Weil Divisor</h1> <p>A <em>divisor</em> is an element of the free group generated by all points of $X$: $\operatorname{Div}(X) := \sum_{P\in X} n_P \cdot P$ with $n_P\in \mathbb{Z}$ and only finitely many $n_P$ non-zero. For example $D= 2\cdot [-1,0,1] - [1,i,0]$ is a divisor of the circle above. Note that a divisor itself is only a formal notation which makes no sense, what makes sense is the group structure on $\operatorname{Div}(X)$. The <em>degree</em> of a divisor $D = \sum_{P\in X} n_P \cdot P \in \operatorname{Div}(X)$ is the sum of coefficients: $\deg D = \sum_{P\in X} n_P$. Clearly, this is a homomorphism $\operatorname{Div}(X)\to \mathbb{Z}$. Let the zero divisor group $\operatorname{Div}^0(X)$ be the divisors of degree zero.</p> <p>For any rational function $f\in K(X)$, we can define the following divisor corresponding to $f$: $(f) := \sum_{P\in X} \operatorname{ord}_P(f)\cdot P$. We call it a <em>principal divisor</em>. We can prove that $\deg (f) = 0$ for all $f\in K(X)$, and $(f) = 0 \iff f\in \bar{k}^\times$ The quotient of $\operatorname{Div}(X)$ by principal divisor groups is called <em>divisor class group</em> $\mathcal{Cl}(X)$. Similarly, we set $\mathcal{Cl}^0(X)$ to be the subgroup of all zero-degree elements. This makes sense because principal divisors are all of degree zero.</p> <p>For any divisor $D = \sum_{P\in X} n_P \cdot P \in \operatorname{Div}(X)$, and a subset $U\subset X$ we can have a space of rational functions</p> $\mathcal{O}(D)(U) := \{ f\in K(X): (\forall P\in U)\ \operatorname{ord}_P(f) + n_P \geq 0 \}$ <p>And actually, $\mathcal{O}(D)(U)$ is always a finite-dimensional $\bar{k}$ vector space (proof omitted). $\mathcal{O}(D)$ is called a <em>line bundle</em> and elements in $\mathcal{O}(D)(X)$ is called its <em>global sections</em>.</p> <h1 id="riemann-roch-theorem">Riemann-Roch Theorem</h1> <p>The Riemann-Roch theorem implies that for all curve $X$ we have a magic number, <strong>genus</strong> $g=g(X)$, s.t.</p> <ol> <li>$\dim_{\bar{k}}\mathcal{O}(D)(X) \geq 1-g+\deg(D)$ for all divisor $D$.</li> <li>If $\deg D &gt; 2g-2$, then 1 becomes an equality.</li> </ol> <p>(proof omitted)</p> <h1 id="elliptic-curves">Elliptic Curves</h1> <p>An elliptic curve $X$ is a smooth curve of genus $1$, with a fixed rational point $x_0\in X$. To show that the rational points on an elliptic curve forms a group, it is enough to give a bijection $\varphi: X(k)\to \mathcal{Cl}^0(X)$. We can simply let $\varphi(p) := p-x_0$, and verify it is a bijection.</p> <h2 id="surjectivity">Surjectivity</h2> <p>Suppose $D = \sum_{P\in X} n_P P \in \operatorname{Div}^0(X)$. Let $D’ = D + x_0$, which is of degree $1$. By Riemann-Roch theorem, $\mathcal{O}(D’)(X)$ has dimension $1$. Let $f\in \mathcal{O}(D’)(X)$ be a non-zero global section. Since $\deg (f) = 0$, it must be of form $(f) = D’ - p$ for some point $p\in X$. We know $p\in X(k)$ because $f$ cannot have order $\pm 1$ for $\bar{k}-k$ points. Then, we have $\varphi^{-1}(D) = p$, since $f$ shows $p-x_0$ is equivalent to $D$ in $\mathcal{Cl}^0(X)$. Also, $p$ is unique, because non-zero vector $f$ generates the whole space $\mathcal{O}(D’)(X)$ via scala multiplication, which does not change poles.</p> <h2 id="injectivity">Injectivity</h2> <p>If $\varphi(p) = \varphi(q)$, then there is a rational function $f\in K(X)^{\times}$ s.t. $(f) = p-q$. Then, $f\in \mathcal{O}(q)(X)$. But by Riemann-Roch, $\mathcal{O}(q)(X)$ is of dimension $1$, and since constant functions are clearly in that vector space, $f$ itself is constant $f\in \bar{k}$. Thus, $p=q$.</p> <!--Weierstraß Equation ====================--> <h1 id="references">References</h1> <ul> <li>R. Hartshorne. Algebraic Geometry.</li> <li>J. H. Silverman. The Arithmetic of Elliptic Curves.</li> <li>W. Fulton. Algebraic Curves.</li> </ul>Xinyu Maxinyuma@ucla.eduThis post briefly proves why the rational points of an elliptic curve is a group. The proof idea comes from MATH 214B, but I tried to use classical language.Some Notes on Playing C++ Coroutine2022-04-15T00:00:00-07:002022-04-15T00:00:00-07:00https://zjkmxy.github.io/posts/2022/04/cpp-coroutine<p>This posts briefly introduced what I learned on C++ 20 coroutine mechanism after I used it to imitate Python generator and ayncio. <a href="https://github.com/zjkmxy/ndn-cpp-cocomo">REPO</a></p> <p>I write this because <a href="https://en.cppreference.com/w/cpp/coroutine">C++ reference</a> looks too dictionary on functions and <a href="https://devblogs.microsoft.com/oldnewthing/2019/12/page/2">Raymond’s blog</a> is very long and takes time to learn. So I can have this note as a reference.</p> <h1 id="objects">Objects</h1> <p>There are basically four objects involved in a coroutine implementation. May vary with implementation.</p> <ul> <li><strong>Task</strong> or <strong>Generator</strong>. This is the <code class="language-plaintext highlighter-rouge">class</code> we defined as the return type of a async function. Suppose we name it <code class="language-plaintext highlighter-rouge">task</code>. So the async function may be <code class="language-plaintext highlighter-rouge">task&lt;void&gt; f(){}</code>.</li> <li><strong>Handle</strong> with type <code class="language-plaintext highlighter-rouge">coroutine_handle&lt;promise_type&gt;</code>. This is a pointer predefined by the compiler.</li> <li><strong>Promise</strong>, whose type is <code class="language-plaintext highlighter-rouge">task::promise_type</code> by default. <ul> <li>Rare case: you may use <code class="language-plaintext highlighter-rouge">coroutine_traits</code> to change it.</li> </ul> </li> <li><strong>Awaiter</strong>, whose type is given by <code class="language-plaintext highlighter-rouge">operator co_await()</code>, which is <code class="language-plaintext highlighter-rouge">task</code> by default. <ul> <li>Special case: you may override <code class="language-plaintext highlighter-rouge">operator co_await()</code> to return a wrapper.</li> <li>Special case: the Promise of the caller (i.e. current coroutine) may have <code class="language-plaintext highlighter-rouge">await_transform</code> to alter the awaiter of callee (i.e. inner coroutine to be waited). This is the <em>only</em> chance that the caller can do something to the callee.</li> </ul> </li> </ul> <p>The traits that these types need to implement:</p> <ul> <li>Task <ul> <li>In the minimum case, it does not need to implement any trait.</li> <li>Generally, it needs to either override <code class="language-plaintext highlighter-rouge">co_await</code> and return an awaiter, or become an awaiter itself.</li> </ul> </li> <li>Handle (implemented internally) <ul> <li><code class="language-plaintext highlighter-rouge">destroy()</code>: terminates the coroutine and free data before it finishes.</li> <li><code class="language-plaintext highlighter-rouge">resume()</code>: resumes execution immediately in the current thread.</li> <li><code class="language-plaintext highlighter-rouge">promise()</code>: returns the Promise.</li> <li><code class="language-plaintext highlighter-rouge">void* address()</code>: returns the pointer value.</li> </ul> </li> <li>Promise <ul> <li>Constructor: if there is a constructor matching the async function’s argument list, that one will be called. Otherwise, the no argument one will be called. One may want to delete the copy constructor to avoid accident.</li> <li><code class="language-plaintext highlighter-rouge">task get_return_object()</code>: constructs and returns the Task.</li> <li><code class="language-plaintext highlighter-rouge">initial_suspend()</code>: called when immediately when the object is constructed. Returns an awaiter. Basically return <code class="language-plaintext highlighter-rouge">suspend_always</code> if you want to delay (Python style generator/coroutine).</li> <li><code class="language-plaintext highlighter-rouge">final_suspend()</code>: called when the async function finishes execution. Will explain later in <em>Life Cycle</em> section.</li> <li><code class="language-plaintext highlighter-rouge">return_void()/return_value(T value)</code>: can only have one. Handles the <code class="language-plaintext highlighter-rouge">co_return</code> expression in the async function. Note: reaching the end of the async function body will <em>not</em> trigger <code class="language-plaintext highlighter-rouge">return_void()</code>.</li> <li><code class="language-plaintext highlighter-rouge">unhandled_exception()</code>: handles the exception raised by the async function. Can use <code class="language-plaintext highlighter-rouge">std::current_exception</code> to catch the pointer and <code class="language-plaintext highlighter-rouge">std::rethrow_exception</code> to rethrow it. However, one may want not to do so if the coroutine runs in a different thread.</li> <li><code class="language-plaintext highlighter-rouge">awaiter yield_value(T value)</code>: called when the async function calls <code class="language-plaintext highlighter-rouge">co_yield value</code>. Returns an awaiter to be awaited.</li> </ul> </li> <li>Awaiter: <ul> <li><code class="language-plaintext highlighter-rouge">bool await_ready()</code>: returns whether the result is ready. The control flow goes to a shortcut path and ignores <code class="language-plaintext highlighter-rouge">await_suspend</code> when this function returns <code class="language-plaintext highlighter-rouge">true</code>.</li> <li><code class="language-plaintext highlighter-rouge">T await_resume()</code>: returns the value of <code class="language-plaintext highlighter-rouge">co_await awaiter</code>.</li> <li><code class="language-plaintext highlighter-rouge">await_suspend(coroutine_handle caller)</code>: Called when <code class="language-plaintext highlighter-rouge">await_ready</code> returns <code class="language-plaintext highlighter-rouge">false</code>. Should return <code class="language-plaintext highlighter-rouge">void</code> in general use.</li> </ul> </li> </ul> <h1 id="generator-and-coroutine">Generator and Coroutine</h1> <p>A generator is a function that yields values, similar to a loop. For example:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">math</span> <span class="k">def</span> <span class="nf">prime_numbers</span><span class="p">():</span> <span class="k">def</span> <span class="nf">is_prime</span><span class="p">(</span><span class="n">x</span><span class="p">):</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">math</span><span class="p">.</span><span class="n">floor</span><span class="p">(</span><span class="n">math</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span><span class="p">))):</span> <span class="k">if</span> <span class="n">x</span> <span class="o">%</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="k">return</span> <span class="bp">False</span> <span class="k">return</span> <span class="bp">True</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">2</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">if</span> <span class="n">is_prime</span><span class="p">(</span><span class="n">x</span><span class="p">):</span> <span class="k">yield</span> <span class="n">x</span> <span class="n">x</span> <span class="o">+=</span> <span class="mi">1</span> </code></pre></div></div> <p>A generator can use normal <code class="language-plaintext highlighter-rouge">return</code> to break execution, which raises a <code class="language-plaintext highlighter-rouge">StopIteration</code> exception carrying the returned value. A generator can also use <code class="language-plaintext highlighter-rouge">yield from</code> to yield results from another generator.</p> <p>A coroutine is a function that runs as a user-space thread. It can waits on other coroutines’ results. For example:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">asyncio</span> <span class="k">as</span> <span class="n">aio</span> <span class="k">async</span> <span class="k">def</span> <span class="nf">f</span><span class="p">():</span> <span class="k">print</span><span class="p">(</span><span class="s">'Hello, '</span><span class="p">)</span> <span class="k">await</span> <span class="n">aio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Sleep 1 seconds </span> <span class="k">print</span><span class="p">(</span><span class="s">'World'</span><span class="p">)</span> </code></pre></div></div> <p>Though using different keyword, Python’s coroutine is implemented by generator mechanism: <code class="language-plaintext highlighter-rouge">await</code> is the same as <code class="language-plaintext highlighter-rouge">yield from</code>. However, C++ is the opposite direction: it uses <code class="language-plaintext highlighter-rouge">co_await</code> to implement <code class="language-plaintext highlighter-rouge">co_yield</code>. So <code class="language-plaintext highlighter-rouge">co_yield value</code> is the same as <code class="language-plaintext highlighter-rouge">co_await promise.yield_value(value)</code>. C++ also does not have Python’s <code class="language-plaintext highlighter-rouge">StopIteration</code> natively, so one needs to manually handle <code class="language-plaintext highlighter-rouge">co_return</code>.</p> <h1 id="life-cycle">Life Cycle</h1> <p>Both the coroutine handle and the Task are created and allocated when the async function is called. For example, suppose we have <code class="language-plaintext highlighter-rouge">task&lt;int&gt; f()</code>. Then, when we call <code class="language-plaintext highlighter-rouge">f()</code>, the compiler will allocate memory of the promise and coroutine handle, and then calls <code class="language-plaintext highlighter-rouge">promise_type::get_return_object</code> to obtain the Task object and return.</p> <p>After that, the Task and the Handle/Promise have different life cycle now: the Task is live in its scope, but Handle/Promise is deallocated until explicitly <code class="language-plaintext highlighter-rouge">destroy()</code> or reaching the real end of the coroutine body, i.e. falling through of the <code class="language-plaintext highlighter-rouge">final_suspend()</code>.</p> <p>One important thing is the behavior of <code class="language-plaintext highlighter-rouge">final_suspend()</code>, which is a <code class="language-plaintext highlighter-rouge">noexcept</code> function called when the control flow reaches an <code class="language-plaintext highlighter-rouge">co_return</code> or the end of <code class="language-plaintext highlighter-rouge">f()</code>. It returns some awaiter and the compiler generated code will await on it. Typically people only use <code class="language-plaintext highlighter-rouge">suspend_always</code> to trigger the final suspension, or <code class="language-plaintext highlighter-rouge">suspend_never</code> to let the final suspension fall through.</p> <ul> <li>If it suspends, the Handle/Promise will be preserved until one manually calls <code class="language-plaintext highlighter-rouge">handle.destroy()</code>. However, one should never call <code class="language-plaintext highlighter-rouge">handle.resume()</code> after this point (undefined behaviour). If one uses the Promise to keep the result, this may be the desired implementation.</li> <li>If it does not suspends, the Handle/Promise will be deallocated immediately and one cannot refer to the Handle/Promise any more (undefined memory access). The Task object or the awaiter given to other coroutines can be used to keep the result if it is still in the scope.</li> </ul> <p>If there are other coroutines awaiting the current one, it is better to resume them in <code class="language-plaintext highlighter-rouge">final_suspend</code>, because <code class="language-plaintext highlighter-rouge">return_void/return_value</code> is <em>not guaranteed</em> to be called unless the user explicitly <code class="language-plaintext highlighter-rouge">co_return</code>.</p> <p>If the system is complex enough, <code class="language-plaintext highlighter-rouge">suspend_always</code> is recommeded, as the destruction of a coroutine becomes explicit and easier to track. In WinRT, all but fire-and-forget coroutines suspend at <code class="language-plaintext highlighter-rouge">final_suspend()</code>.</p> <h1 id="await-path">Await Path</h1> <p>Assume <code class="language-plaintext highlighter-rouge">await_suspend</code> returns <code class="language-plaintext highlighter-rouge">void</code>, which is the normal case. Then, the internal flow of <code class="language-plaintext highlighter-rouge">result = await awaiter</code> looks like the following:</p> <pre><code class="language-C++">if(!awaiter.await_ready()){ save_state(); awaiter.await_suspend(); // suspend current coroutine and returns to the caller // the caller means the one calls handle.resume() which triggers this coroutine // &lt;-- Retumes at this point restore_state(); } result = awaiter.await_resume(); </code></pre> <p>If <code class="language-plaintext highlighter-rouge">await_ready()</code> returns <code class="language-plaintext highlighter-rouge">true</code>, the control flow will skip the suspension and obtains the result via <code class="language-plaintext highlighter-rouge">await_resume()</code> immediately. Otherwise, <code class="language-plaintext highlighter-rouge">await_resume()</code> will be called later after next resumption, i.e. when <code class="language-plaintext highlighter-rouge">handle.resume()</code> is called again. Note that <code class="language-plaintext highlighter-rouge">await_ready()</code> will only be called once. That is, if the caller coroutine is resumed unexpectedly, there is no guarantee that the inner awaiter finishes and <code class="language-plaintext highlighter-rouge">awaiter.await_resume()</code> may be unable to give the result.</p> <p>It may be run on a different thread after <code class="language-plaintext highlighter-rouge">await_suspend()</code>. For example, WinRT’s <code class="language-plaintext highlighter-rouge">resume_background()</code> returns the control to the caller, and later resumes the current coroutine on some background thread pool.</p> <h1 id="async-function">Async Function</h1> <p>The transformed flow of an async function looks like the following:</p> <pre><code class="language-C++">task&lt;T&gt; f(P param) { allocate_frame(std::forward(param), frame_of_f); promise_type promise; auto return_object = promise.get_return_object(); handle.resume(); // goes to initial_suspend return return_object; } frame_of_f { try { co_await promise.initial_suspend(); f_body(); } catch (...) { promise.unhandled_exception(); } co_await promise.final_suspend(); deallocate_frame(promise); } </code></pre> <h1 id="a-simple-generator-design">A Simple Generator Design</h1> <p>To have a Python-style generator, we implement the following features:</p> <ul> <li>A generator always runs only in one thread.</li> <li>A generator may use <code class="language-plaintext highlighter-rouge">co_yield</code> to yield some data and returns the control.</li> <li>A generator may also call <code class="language-plaintext highlighter-rouge">co_await</code> on an inner generator to immitate Python’s <code class="language-plaintext highlighter-rouge">yield from</code>.</li> <li>The <code class="language-plaintext highlighter-rouge">next()</code> function resumes the generator and returns the yielded value if there is some. It returns <code class="language-plaintext highlighter-rouge">std::nullopt</code> if the generator finishes.</li> <li>The <code class="language-plaintext highlighter-rouge">result()</code> function gives the result of a generator after it is finished.</li> </ul> <p>We can do as follows:</p> <ul> <li>A generator keeps an optional pointer to an inner generator. If there is one inner generator running, <code class="language-plaintext highlighter-rouge">next()</code> delegates to the inner generator. It resumes the current one after the inner generator finishes.</li> <li>A generator always suspends at initial and final suspension. Since the outer generator holds the state of inner generator, we may simply put the result in the promise and destroy the handle on the destructor of a generator.</li> <li>Chaining the outer and the inner generator can be done in either the outer’s <code class="language-plaintext highlighter-rouge">await_transform</code> or the inner’s <code class="language-plaintext highlighter-rouge">await_suspend</code>.</li> </ul> <p><a href="https://github.com/zjkmxy/ndn-cpp-cocomo/blob/main/src/asyncio/generator.hpp">Generator.hpp</a> gives an implementation of such simple generator.</p> <h1 id="a-simple-coroutine-design">A Simple Coroutine Design</h1> <p>To have a asyncio style coroutine, we can do the following. The design is very different from Python because in C++ <code class="language-plaintext highlighter-rouge">co_await</code> is the primitive.</p> <ul> <li>To have a simple demo, we don’t suspend on <code class="language-plaintext highlighter-rouge">final_suspend</code>.</li> <li>Therefore, results are stored in Tasks, instead of Promises.</li> <li>In <code class="language-plaintext highlighter-rouge">await_suspend</code>, we register the suspended caller in the callback list of the inner coroutine. In <code class="language-plaintext highlighter-rouge">final_suspend</code>, we uses the current engine to schedule all registered waiting coroutines in the callback list.</li> <li>The Awaiter holds a reference to the task’s result, so the <code class="language-plaintext highlighter-rouge">await_resume</code> is able to work even after the Handle/Promise is destructed.</li> </ul> <p><a href="https://github.com/zjkmxy/ndn-cpp-cocomo/blob/main/src/asyncio/coroutine.hpp">Coroutine.hpp</a> gives an implementation of such simple coroutine.</p> <h1 id="conclusion">Conclusion</h1> <p>Without any douts, C++20 coroutines can be a powerful tool in future. However, its unique semantics needs careful handling and more time to learn. Hope we can have a good coroutine library working on Linux/MacOS soon.</p> <h1 id="references">References</h1> <ul> <li><a href="https://en.cppreference.com/w/cpp/coroutine">C++ reference</a></li> <li><a href="https://devblogs.microsoft.com/oldnewthing/2019/12/page/2">Raymond’s blog</a> and also <a href="https://devblogs.microsoft.com/oldnewthing/2021/03">this one</a></li> <li><a href="https://docs.microsoft.com/en-us/windows/uwp/cpp-and-winrt-apis/">WinRT</a></li> </ul>Xinyu Maxinyuma@ucla.eduThis posts briefly introduced what I learned on C++ 20 coroutine mechanism after I used it to imitate Python generator and ayncio. REPOTwo’s complement and 2-adic number2021-11-14T00:00:00-08:002021-11-14T00:00:00-08:00https://zjkmxy.github.io/posts/2021/11/twos-complement-2-adic<p>This posts discusses the relation between two’s complement and 2-adic integer in math. I want to show what operations we can have if we ignore overflow.</p> <h1 id="p-adic-number">p-adic number</h1> <p>The <em>p-adic number</em> system is a different extension to the rational number field $\mathbb{Q}$. Consider the $p$-radix representation of a fraction. Say $\frac{1}{7}$ in decimal:</p> $\frac{1}{7} = \frac{142857}{10^6-1} = \frac{0.142857}{1-10^{-6}} = \sum_{j=1}^{\infty} 142857\times 10^{-6j} = 0.\overline{142857}\cdots$ <p>This is a power series of $10^{-1}$. Here we use $\frac{1}{1-p} = \sum_{j=0}^{\infty}x^j$. If we accept all possible power series:</p> $\sum_{j=-m}^{\infty} a_j\times 10^{-j}$ <p>we get the real number $\mathbb{R}$. (Well, more strictly we need to make $0.\overline{9}=1$)</p> <p>However, we can also expand the same number in power series of $10^1$:</p> $\frac{1}{7} = - \frac{142857}{1-10^{6}} = - \sum_{j=0}^{\infty} 142857\times 10^{6j} = 1+\sum_{j=0}^{\infty} 857142\times 10^{6j} = \cdots\overline{285714}3.0$ <p>If we accept all possible power series:</p> $\sum_{j=-m}^{\infty} a_j\times 10^{j}$ <p>we get 10-adic numbers. This series does not converge under normal metric, but if we define a new metric $|10^ab| = 10^{-a}$ with $b$ not divisible by $10$, such power series converge. Typically we only use prime $p$, so a $p$-adic number is represented in a reverse $p$-base numeral system, where one can have infinite digits above the decimal point but finitely many after. It is a field of characteristic $0$, denoted by $\mathbb{Q}_p$. If we limit ourselves to those numbers without minus powers of $p$, i.e. without digits after decimal point, we have $p$-adic integer $\mathbb{Z}_p$. Formally, we define it as the inverse limit $\mathbb{Z}_p:= \varprojlim_{n=1}^{\infty}\mathbb{Z}/p^n\mathbb{Z}$, with projection $\mathbb{Z}/p^{n+1}\mathbb{Z}\to \mathbb{Z}/p^n\mathbb{Z}$. That is, the limit of finite ring $\mathbb{Z}/p^n\mathbb{Z}$ when $n\to\infty$. In a computer, we use finitely many bits to store an integer ($\mathbb{Z}/2^n\mathbb{Z}$). Imagine that if we extend it to infinitely many bits, we get $2$-adic number $\mathbb{Z}_2$.</p> <h1 id="basic-operations">Basic Operations</h1> <h2 id="addition--subtraction">Addition &amp; Subtraction</h2> <p>The normal integer $\mathbb{Z}$ embeds in $\mathbb{Z}_2$. More specifically, we have</p> $-1 = \frac{1}{1-2} = \sum_{j=0}^{\infty} 2^i = \cdots\overline{1}$ <p>Thus, $-n$ is equal to $\overline{1}-(n-1)$, i.e. the binary complement of $n-1$. This is exactly how we represent negative integers in 2’s complement: take complement and add one. In this sense, we can take 2’s complement representation as <em>the last $n$ bits of 2-adic representation</em>.</p> <p>2-adic representation works for every number in 2-adic ring. The same bit-by-bit addition and subtraction algorithm works for both positive and negative numbers. Therefore, 2’s complement is in the same situation: signed and unsigned integers are only about how humans interpret it; there is no need to separate them when we perform operations. That’s why in x86 assembly we only have <code class="language-plaintext highlighter-rouge">ADD/ADC</code> and <code class="language-plaintext highlighter-rouge">SUB/SBB</code>.</p> <p>But when we convert an integer of a small word size to a wider word, we do need to tell whether it’s negative or not. For negative integers, we fill in higher bits with 1; for positive, 0. Therefore, in x86 we have different instructions to extend an integer: Sign extension <code class="language-plaintext highlighter-rouge">CBW/CWDE/CDQE/MOVSX/MOVSXD</code> for signed integers; zero extension <code class="language-plaintext highlighter-rouge">MOVZX</code> for unsigned integers.</p> <h2 id="multiplication--euclidean-division">Multiplication &amp; Euclidean Division</h2> <p>Since 2-adic ring is closed under multiplication, the 2-adic representation works for both positive and negative. Therefore, signed and unsigned multiplication are the same. In x86 assembly we have <code class="language-plaintext highlighter-rouge">IMUL</code> vs <code class="language-plaintext highlighter-rouge">MUL</code>, but this is simply because x86 multiplication extends the result to a longer word. If we ignore the extended part:</p> $(2^na_h +a_l)\times(2^nb_h +b_l) = 2^n(2^na_hb_h + a_hb_l + a_lb_h) + a_lb_l$ <p>which shows that $a\times b$ modulo $2^n$ is the product of each modulo $2^n$.</p> <p>However, Euclidean division is a different story, because 2-adic numbers do not support comparison. For example, if we divide $8$-bit $-37(=11011011_{2})$ by $-3(=11111101_2)$, we get $12(=00001100_{2})$ with $-1(=11111111_{2})$. But if we divide $219(=11011011_{2})$ by $253(=11111101_{2})$, we get $0$ with remainder $219$.</p> <h2 id="inversion">Inversion</h2> <p>In 2-adic number ring, any number not divisible by 2 has a unique multiplicative inverse. For example,</p> $\frac{1}{45} = -\frac{91}{1-2^{12}} = - \sum_{j=0}^{\infty} 1011011_{2}\times 2^{12j} = 1+\overline{111110100100}.0_{2} = \overline{011111010010}1.0_{2}$ <p>This also works in finite word length if we multiply with overflow. For example, in $32$-bit integer, $1/45$ is truncated to $-1527099483(=10100100111110100100111110100101_{2})$, and we have <code class="language-plaintext highlighter-rouge">(-1527099483) * 45 == 1</code> in C++ <code class="language-plaintext highlighter-rouge">int</code>.</p> <p>Note that the binary fraction form of $1/45$ is</p> $\frac{1}{45} = \frac{91}{2^{12}-1} = \sum_{j=1}^{\infty} 1011011_{2}\times 2^{-12j} = 0.\overline{000001011011}_{2}$ <p>So the “flip and add one” rule somehow applies here.</p> <h1 id="multiplicative-group">Multiplicative Group</h1> <p>An interesting fact is the multiplicative group of $\mathbb{Z}_2$ is $\mathbb{Z}^{\times}_2 \cong \mathbb{Z}_2\times \mathbb{Z}/2\mathbb{Z}$, which contains itself as a component.</p> <p>To show this, let $U=\mathbb{Z}^{\times}$ and $\varepsilon_n: U\to (\mathbb{Z}/2^n\mathbb{Z})^{\times}$ being the projection modulo $2^n$. The kernal of this projection is $U_n = 1+2^n\mathbb{Z}_2$. Clearly, we have $U_1 = U$ since every odd number is invertible. Also, the map $1+2^nx\mapsto x\mod 2$ defines an isomorphism $U_n/U_{n+1} \cong \mathbb{Z}/2\mathbb{Z}$, because</p> $(1+2^nx)(1+2^ny) \equiv 1+2^n(x+y) \mod 2^{n+1}$ <p>Now, define $\theta_n: \mathbb{Z}/2^{n}\mathbb{Z}\to U_2/U_{n+2}$ be $x\mapsto 5^x$. Note that:</p> $5^{2^n} = (1+2^2)^{2^n} = 1 + 2^{n+2} + \cdots \in U_{n+2}\setminus U_{n+3}$ <p>$U_2/U_{n+2}$ is of order $2^n$. In this group we have $5^{2^{n-1}}\neq 1$ and $5^{2^n} = 1$. Thus, $U_2/U_{n+2}$ is cyclic and generated by $5$. Take the inverse limit and get</p> $U_2 = \varprojlim_{n=1}^{\infty}U_2/U_{n+2} \cong \varprojlim_{n=1}^{\infty}\mathbb{Z}/2^{n}\mathbb{Z} = \mathbb{Z}_2$ <p>On the other hand, $U_1/U_2 \cong \mathbb{Z}/2\mathbb{Z}$ with $1+2(2z+1) = 3\times(1+2^2(z/3))$. Thus, every invertible element $x\in \mathbb{Z}_2^{\times}$ can be written as $x = 5^zt$, where $z\in\mathbb{Z}_2$ and $t\in \{1, 3\}$ or $t\in \{1, -1\}$.</p> <p>Now let’s move to the finite case $\mathbb{Z}/2^n\mathbb{Z}$. We know that $\mathbb{Z}/p^n\mathbb{Z}$ ($n\geq 3$) has a primitive root if and only if $p$ is an <em>odd</em> prime. From the argument above, we further learn that the multiplicative group of $\mathbb{Z}/p^n\mathbb{Z}$ is actually of form $\{5^z, 3\times 5^z: 0\leq z&lt; 2^{n-2}\}$ or simply $\{\pm 5^z: 0\leq z&lt; 2^{n-2}\}$, when $n\geq 3$.</p> <h2 id="square-numbers">Square numbers</h2> <p>Every $2^px\in \mathbb{Z}_2$ is a square number if and only if $p$ is even and $x$ is a square number. By the analysis above, the latter one is equivalent to $x\equiv 1\ (\mathrm{mod}\ 8)$. The same condition applies to finite ring $\mathbb{Z}/2^n\mathbb{Z}$.</p> <p>Note that this shows that every square number in $\mathbb{Z}/2^n\mathbb{Z}$ has <strong>four</strong> square roots. For example, the square roots of $1$ are $\{\pm 1, \pm 5^{2^{n-3}} = 2^{n-1}\pm 1 \}$, with two of them divisible by $3$. For 32-bit unsigned int, they are $\{1, 2147483647, 2147483649, 4294967295\}$, which is exactly $\{1, 2^{31}-1, -(2^{31}-1), -1\}$ as a signed int.</p> <p>This does not give any contradiction to the theorem that a polynomial of degree $d$ over a domain (e.g. $\mathbb{Z}_2$) can have at most $d$ distinct roots. Because the two non-trivial roots of $1$ in $\mathbb{Z}/2^n\mathbb{Z}$ are $2^{n-1}\pm 1$, which cannot be lifted to a real solution in $\mathbb{Z}_2$.</p> <h2 id="hilbert-symbol">Hilbert symbol</h2> <p>Define the Hilbert symbol $(a,b)$ to be $1$ if $ax^2+by^2=z^2$ has a non-trivial solution, and $-1$ otherwise. Let $(a,b) = (-1)^{ [ a , b ] }$. Then, $[ a , b ]$ is a bilinear form on $\mathbb{Q}_2^{\times}/\mathbb{Q}_2^{\times 2}$. Under basis $\{2, -1, 5\}$, its matrix is $\begin{bmatrix} 0 &amp; 0 &amp; 1 \\ 0 &amp; 1 &amp; 0 \\ 1 &amp; 0 &amp; 0 \end{bmatrix}$. The proof is omitted as it contains a lot of computation.</p> <!--Quadratic form %====== %Anti-hash test for string %======--> <h1 id="references">References</h1> <ul> <li>Serre, J.-P. (1973). A course in arithmetic. Springer.</li> </ul>Xinyu Maxinyuma@ucla.eduThis posts discusses the relation between two’s complement and 2-adic integer in math. I want to show what operations we can have if we ignore overflow.CS 219: Network Verification - Course Review2021-04-01T00:00:00-07:002021-04-01T00:00:00-07:00https://zjkmxy.github.io/posts/2021/04/cs-219<p>George is a fantastic teacher with very attractive lectures. I think his secret sauces include the following:</p> <ul> <li>Look at everything from different views: Zoe (ζωή, big picture) versus Bios (βίος, details).</li> <li>Selectively focus on the most important techniques and examples, ignoring unnecessary points. Students can feel that they learned a lot without remembering too much boring concepts.</li> <li>Have his own methodologies on creative process. Students can experience those “Aha” times when following his introduction.</li> </ul> <p>The trade-offs may be as follows: (<em>very biased personal view</em>, don’t take it serious)</p> <ul> <li>(+) Attending lectures is always pleasant.</li> <li>(+) Students can learn things quickly and apply to his own research.</li> <li>(-) His lectures may give the false image that creating things is as easy as the fusion of ideas. This is not true because one must have a broad view to know what to borrow, and there are boring times such as trials and errors, non-trivial adoptation and modification of existing methods. He omitted these in his lecture.</li> </ul> <p>I would strongly recommend everyone interested in networking try his 216 and 219.</p> <p>In this post I won’t put all Bia unless interesting. Using those well-defined mathematical terms, Zoai are quite easy to state and thus very short.</p> <h1 id="mathematical-preparation">Mathematical Preparation</h1> <h2 id="poset-topology">Poset Topology</h2> <p>To understand a network we must have a language that describe sets of IP addresses. Though every paper we read in 219 uses its own notation, the idea behind is similar. To have a unified notation for this post, I use the concept of regular open algebra. (See also <a href="https://www.springer.com/gp/book/9780387402932">this book</a>, Ch.10)</p> <p>Suppose $P = (S, \leq)$ is a at most countable poset with a unique maximal $<em>$. Intuitively, $S$ should be the set of *all prefixes</em> of IP addresses or NDN names. We say $x$ <em>refines</em> or <em>extends</em> $y$ if $x\leq y$. Define the topology $\tau$ as the topology generated by principal ideal $\{y: y\leq x\}$ for all $x\in S$. Then,</p> <ul> <li>A set is <strong>open</strong> if and only if it is downward closed. For example, the set of all prefixes starting with <code class="language-plaintext highlighter-rouge">192.168.*</code> or <code class="language-plaintext highlighter-rouge">10.0.*</code> is open, which includes <code class="language-plaintext highlighter-rouge">192.168.0.1</code>.</li> <li>A set is <strong>closed</strong> if and only if it is upward closed. For example, the set of all prefixes of <code class="language-plaintext highlighter-rouge">192.168.*</code> is closed, which includes <code class="language-plaintext highlighter-rouge">192.*</code>.</li> <li>For a set $A$, let $A^{\perp}$ denote the closure of its complementary set, $A^{\perp} = \overline{S\setminus A}$.</li> <li>An open set $A$ is called <strong>regular</strong> if $A^{\perp\perp} = A$. <ul> <li>If $A$ is open, $A^{\perp\perp}$ is the minimal regular set containing $A$.</li> <li>For example, in a binary tree, the ideal generated by <code class="language-plaintext highlighter-rouge">0101*</code> and <code class="language-plaintext highlighter-rouge">0100*</code> is open but not regular. Include <code class="language-plaintext highlighter-rouge">010*</code> will make it regular. In plain English, a regular set cannot include all children of $x$ but exclude $x$ itself.</li> </ul> </li> <li>The <strong>meet</strong> $\wedge$ of two <em>regular</em> open sets is their intersection $\cap$.</li> <li>The <strong>join</strong> $\vee$ of two <em>regular</em> open sets is the minimal regular open set containing them. $A\vee B = (A\cup B)^{\perp\perp}$. <ul> <li>For example, <code class="language-plaintext highlighter-rouge">0101*</code> meeting <code class="language-plaintext highlighter-rouge">0100*</code> gives <code class="language-plaintext highlighter-rouge">010*</code>.</li> </ul> </li> <li>The <strong>negation</strong> $\neg$ of a <em>regular</em> open set $A$ is $\neg A = A^{\perp}$.</li> <li>The boolean <em>true</em> and <em>false</em> are the whole set $S$ and the empty set $\varnothing$, resp.</li> <li>The topological space $(P, \tau)$ is countable T0, and complete.</li> <li>Note that $S$ is not necessarily being a tree. <ul> <li>For example, one can combine IP addresses and port numbers to make a space of packets. In this space, <code class="language-plaintext highlighter-rouge">(192.168.*, 8080)</code> refines both <code class="language-plaintext highlighter-rouge">(192.*, 8080)</code> and <code class="language-plaintext highlighter-rouge">(192.168.*, *)</code>, which are incomparable.</li> </ul> </li> <li>For incomparable $x, y$, if there exists $z\leq x, y$, $x,y$ are called <strong>compatible</strong>.</li> </ul> <p>Every open set $A$ is a union of principal ideals $\{y: y\leq x\}$ for some $x$. Thus, we can use a minimal set of such $x$’s to represent $A$. It is easy to write a program which computes meet, join and negation using this principal representation. Depending on $S$, there will be a canonical algorithm and I don’t think I need to describe the details.</p> <p>This complete boolean algebra is sometimes called <strong>boolean-valued model</strong>, because it supports all operators of propositional logic.</p> <h2 id="modal-logic">Modal Logic</h2> <p>Modal logic includes modal operators to describe necessity $\Box$ and possibility $\Diamond$. In modal logic, we can transite from one state to another, and the truth values of propositions may change. Under different contexts there are different definitions of modal operators. Basically, $\Box P$ is true iff $P$ is true for <em>all</em> states that current state can transit to; $\Diamond P$ is true iff $P$ is true for <em>some</em> future state. One may define $\Diamond P := \neg\Box\neg P$.</p> <p><a href="https://en.wikipedia.org/wiki/Kripke_semantics">Kripke semantics</a> gives a way to model the semantics of all different modal logic systems. A <strong>Kripke model</strong> is a triple $\langle W,R,\Vdash \rangle$. $W$ is the set of possible states. $R$ is a relation on $W$ that defines transition. $\Vdash$ is the forcing relation, where $w\Vdash P$ if the proposition $P$ is true at $w$. Also, $w\Vdash \Box P$ iff $u\Vdash P$ for all $w R u$.</p> <p>In Kripke model, modal axioms are related to the properties of $R$:</p> <table> <thead> <tr> <th>Name</th> <th>Axiom</th> <th>Condition</th> </tr> </thead> <tbody> <tr> <td>K</td> <td>$\Box(A\to B)\to\Box A\to\Box B$</td> <td>-</td> </tr> <tr> <td>T</td> <td>$\Box A\to A$</td> <td>reflexive: $wRw$</td> </tr> <tr> <td>4</td> <td>$\Box A\to \Box\Box A$</td> <td>transitive: $wRv\wedge vRu\to wRu$</td> </tr> <tr> <td>D</td> <td>$\Box A\to \Diamond A$</td> <td>serial: $(\forall w)(\exists v)wRv$</td> </tr> <tr> <td>H</td> <td>$\Box(\Box A\to B)\vee\Box(\Box B\to A)$</td> <td>$(wRu\wedge wRv)\to (vRu\vee uRv)$</td> </tr> <tr> <td>G</td> <td>$\Diamond\Box A\to \Box\Diamond A$</td> <td>convergent: $(wRu\wedge wRv)\to \exists x(vRx\wedge uRx)$</td> </tr> </tbody> </table> <p>Sometimes people uses $\bigcirc$ for holding at the next step, and $\Diamond$ for eventually holding (i.e. the closure of $R$).</p> <p><a href="https://en.wikipedia.org/wiki/Linear_temporal_logic">Linear temporal logic (LTL)</a> is a modal logic system frequently used in language design and networks.</p> <h2 id="boolean-algebra-and-forcing">Boolean Algebra and Forcing</h2> <p>Boolean-valued model and forcing relation can sometimes replace each other.</p> $p\Vdash A \iff p\leq \|A\|$ <p>where $|A|$ is the truth value in some boolean-valued model.</p> <p>An example is given in the Symmetry and Surgery section.</p> <h1 id="introduction">Introduction</h1> <p>If we take routers as programs</p> <ul> <li>Control plane: <code class="language-plaintext highlighter-rouge">(Config * Env) -&gt; FIB</code></li> <li>Data plane: <code class="language-plaintext highlighter-rouge">(FIB * Packet) -&gt; FwdResult</code></li> </ul> <h1 id="data-plane-verification">Data Plane Verification</h1> <p>Data plane static check requires a snapshot of current network. Its input is ACLs and FIBs at a specific timing.</p> <h2 id="header-space-analysis-hsa">Header Space Analysis (HSA)</h2> <p>A router can be abstracted as a transfer function that rewrites the packet and transfers it to the next hop. A function that rewrites a packet will map a regular open set to another. Therefore, we can simply track those sets.</p> <ul> <li>Compute reachability: Inject $*$ at a node A and then propagate.</li> <li>Finding loops: Let $T$ denote the function describing packets start from A, pass some nodes and come back to A. Check if the following descending chain is stationary at empty:</li> </ul> $[A_0 = *] \supseteq [A_1 = A_0\cap T(A_0)] \supseteq \cdots \supseteq [A_{i+1} = A_i\cap T(A_i)] \supseteq \cdots$ <h2 id="netplumber">NetPlumber</h2> <p>If we keep necessary intermediate results, HSA can work incrementally. Assume the graph from node A to B is a DAG. We can push from the node that is modified.</p> <p>This paper also presents a language that describes the graph.</p> <h2 id="atomic-predicates">Atomic Predicates</h2> <p>Regular open sets are generated by principal ideals via $\vee$. Those ideals intersect with each other, and not everyone is useful in a network. Compute a minimal set $M$ of principal ideals that can generates every regular open set used in HSA. Encode every element of $M$ with a number, and every regular open set with $\vee$ of elements from $M$. The arithmetic is clear.</p> <h2 id="nod">NoD</h2> <p>NoD utilizes SMT solver. It adds new Select-Project operator and new ways of encoding. But it seems that the new encoding only works for binary strings with finite length.</p> <p>Instead of outputs all reachability relations, NoD checks beliefs specified by operators. It can provide all violations.</p> <h2 id="anteater">Anteater</h2> <p>It reduces the problem to SAT. The same logic expression as boolean-valued logic can be used.</p> <p>In implementation, it uses LLVM-IR. So the user can write programs in C++, compiles it with clang to IR and then with Anteater to SAT.</p> <h2 id="symmetries-and-surgeries">Symmetries and Surgeries</h2> <p>This is designed for fat-trees of data center networks. Since data center networks is symmetric, the network topology can be simplified. For example, if A1 and A2 are connected to exactly the same boxes and configured with exactly same rules, then they can be reduced to a single box A. Even if boxes are not perfectly symmetric, we can still reduce rules. For example, use a regular open set to “cut” the network.</p> <h3 id="network-logic">Network logic</h3> <p>This work defines a modal logic similar to LTL, but specifically for network. This is interesting so I give a breif introduction here with slightly modified notation.</p> <p>In this work, a state is a packet $h$ staying at a specific interface $i$ of a node $n$, denoted as $h@n.i$ or $h@p$ with $p=(n,i)$. $h\Vdash P$ abbreviates $(\forall p)h@p\Vdash P$ (globally true for a packet) and $\Vdash P$ abbreviates $(\forall h)h\Vdash P$ (tautology).</p> <p>There are two kinds of transitions: internal and external. An internal transition $h@n.i\to h’@n’.i’$ holds iff $n$ rewrites $h$ to $h’$, and sends it to an interface connected to the next hop $n’.i’$. An external transition $h@n.i\twoheadrightarrow h’@n.i’$ holds iff $n$ rewrites $h$ to $h’$ and sends it to an external interface $i$. (The paper writes $n’$ which I believe is a typo)</p> <p>There are two kinds of atomic propositions: $\alpha$ is a proposition describing $h$ and irrelevant to $p$: $h@p\Vdash \alpha$ iff $h\Vdash\alpha$. $@p$ is a proposition holds for all packets at $p$: $h@p\Vdash @p$. The three modal operators are applied to different relations: $\Box$ and $\Diamond$ are applied to internal transitions, but $\bigcirc$ is applied to external only. That is</p> <ul> <li>$\Diamond P$ iff $P$ is currently true or true after the packet is forwarded to some internal port.</li> <li>$\Box P$ iff $P$ is currently true and always true in the internal network.</li> <li>$\bigcirc P$ iff the packet is going to be forwarded to an outgoing face, and $P$ is true there.</li> </ul> <p>For example:</p> <ul> <li>$\Vdash \alpha\wedge @p \to \Box @p$ means packets satisfying $\alpha$ is dropped at $p$.</li> <li>$h \Vdash @p \to \Diamond(\neg @p \wedge\Diamond @p)$ means $h$ loops back to $p$, with header possibly rewritten.</li> <li>$\Vdash \alpha\wedge @p\to \Diamond(\neg @p \wedge\Diamond (\alpha\wedge@p))$ can detect infinite loops.</li> </ul> <p>$\Vdash$ is specific for a network. The paper proves for some specific $N, N’, P$ $\Vdash_{N} P$ iff $\Vdash_{N’} P$. Under this circumstance $N’$ can replace $N$.</p> <h3 id="note-boolean-valued-model">Note: boolean-valued model</h3> <p>This can be turned into a boolean valued model. For example, given the following network (address is 3 bit $a_0a_1a_2$)</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 0* P1 +------ P0 +-----+ | -------| Box |---+ +-----+ | 1* P2 +------ </code></pre></div></div> <p>And proposition $A = (a_0\oplus a_1)\wedge @P_1$. Then, we have $001@P1\Vdash \neg A$ and $010@P1\Vdash A$.</p> <p>On the other hand, we can define a boolean-valued model and let</p> $\|A\| = (01*\vee 10*)\wedge P_1 = \langle 01*,P_1\rangle \vee \langle 10*,P_1\rangle$ <p>And we have $h@p\Vdash A$ iff $h@p$ in the regular open set $|A|$.</p> <p>The operator $\Diamond$ now becomes a function on boolean values. For example, we have $010@P0\Vdash \Diamond A$ before. In the boolean-valued model,</p> $\|\Diamond A\| = \Diamond(\|A\|) = \langle 01*,P_1\rangle \vee \langle 10*,P_1\rangle \vee \langle 01*,P_0\rangle$ <p>However, $\Diamond$ can be very hard to compute, so boolean-valued model is not as useful as it is in HSA.</p> <h1 id="data-plane-testing">Data Plane Testing</h1> <p>Data plane testing is to send packets with specific header in the deployed network, and verify whether it works as expected.</p> <h2 id="atpg">ATPG</h2> <p>This work uses HSA to generate an all-pairs reachability table {Header Space, Ingress Port, Egress Port, Rule History}. In this table there are equivalent classes of packets (with ports). It picks one packet from each equivalent class. To further reduces the number of packets, it selects a subset of those packets that covers all rules in the network. The idea of rule coverage comes from path coverage in software verification. If packets are test cases, then rules are exactly <code class="language-plaintext highlighter-rouge">if</code>-branches.</p> <p>I think Atomic Predicate should also work here. We can pick one packet from each principal ideal, and then select a covering subset.</p> <h2 id="software-dataplane-verification">Software Dataplane Verification</h2> <p>This work only verifies the implementation of a software router Click. Nothing special related to networking. I think it should be submitted to a software engineering or a compiler design conference.</p> <h1 id="control-plane">Control Plane</h1> <h2 id="bgp-rcc">BGP-RCC</h2> <p>This work does static checking of BGP. I don’t fully understand this work. In very abstract, they parse router configurations, and check whether some beliefs (such as route validity and path visibility) hold.</p> <h2 id="batfish">Batfish</h2> <p>Batfish works as follows:</p> <ol> <li>Parse OSPF and BGF configuration from routers.</li> <li>Compute a data plane result from configuration. <ul> <li>LogiQL, a Datalog variant, is used here.</li> </ul> </li> <li>Analysis the data plane.</li> <li>Report the result.</li> </ol> <h2 id="efficient-network-reachability-analysis-era">Efficient Network Reachability Analysis (ERA)</h2> <p>Data plane reachability depends on route reachability. A packet cannot reach B from A unless on every node in the path:</p> <ul> <li>There is a route from B reaches A. <ul> <li>Route is an abstract of route advertisement, encoded as a bitvector.</li> </ul> </li> <li>There is no ACL drops this packet.</li> </ul> <p>How a router handles routes is a program that works on bitvector. We can model routes with BDD/ZDD, and let routers act on it.</p> <h2 id="synthesis-propane">Synthesis: Propane</h2> <p>Propane takes user input describing network topology and policies, compiles it into state machines, and generates BGP configuration of every router. Propane enables centralized configuration and distributed implementation simultaneously.</p> <h2 id="verification-minesweeper">Verification: Minesweeper</h2> <p>Minesweeper parses router configuration and encodes routing protocols with SMT. Then, it can check whether user-specified beliefs are violated. Minesweeper scales but only returns 1 counterexample.</p> <h1 id="lessons-summarized">Lessons Summarized</h1> <p>As introduced in CS 216, an idea arises when an impacting steam hits the main stream. Through these works, we can clearly see the influences (or similarities) from other fields on networking:</p> <ul> <li>Programming languages and formal verification <ul> <li>Use of automated proof tools (Datalog, SMT)</li> <li>Logical modeling (boolean-valued model, LTL)</li> <li>Use of existing compiler (LLVM-IR)</li> <li>Use of existing data structures (BDD)</li> </ul> </li> <li>Software testing <ul> <li>Branch coverage -&gt; rule coverage</li> <li>Source-code annotation -&gt; beliefs</li> </ul> </li> <li>Hardware design <ul> <li>EDA -&gt; Synthesis</li> <li>Programmable hardware (FPGA) -&gt; SDN, P4</li> </ul> </li> <li>Algebra <ul> <li>Exploiting symmetries</li> </ul> </li> </ul> <p>More methods, tools and concepts from all different fields will be applied to networking. Algorithms and data structures can be used to develop routers. Dynamical systems and graph theory can be utilized to research packet dynamics. Software architecture has similarities with cloud/distributed architecture. Cryptography influences cyber security. I believe networking field is still far from mature, and it will not stop growing as well as borrowing from other fields.</p> <h1 id="references">References</h1> <ul> <li>Givant, S., and P. R. Halmos, 2009: Introduction to Boolean Algebras. Springer.</li> <li>Kazemian, P., G. Varghese, and N. McKeown, 2012: Header Space Analysis: Static Checking for Networks. 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12).</li> <li>Kazemian, P., et al., 2013: Real Time Network Policy Checking Using Header Space Analysis. 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13).</li> <li>Yang, H., and S. S. Lam, 2013: Real-time verification of network properties using Atomic Predicates. 2013 21st IEEE International Conference on Network Protocols (ICNP).</li> <li>Lopes, N. P., et al., 2015: Checking Beliefs in Dynamic Networks. 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15).</li> <li>Mai, H., et al., 2011: Debugging the data plane with anteater. SIGCOMM Comput. Commun. Rev. 41, 4 (August 2011), 290–301.</li> <li>Plotkin, G. D., et al., 2016: Scaling network verification using symmetry and surgery. SIGPLAN Not. 51, 1 (January 2016), 69–83.</li> <li>Zeng, H., P. Kazemian, G. Varghese, and N. McKeown, 2012: Automatic test packet generation. In Proceedings of the 8th international conference on Emerging networking experiments and technologies (CoNEXT ‘12), 241–252.</li> <li>Dobrescu, M. and K. Argyraki, 2014: Software Dataplane Verification. 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14).</li> <li>Feamster, N. and H. Balakrishnan, 2005: Detecting BGP configuration faults with static analysis. In Proceedings of the 2nd conference on Symposium on Networked Systems Design &amp; Implementation - Volume 2 (NSDI’05), 43–56.</li> <li>Fogel, A., et al., 2015: A General Approach to Network Configuration Analysis, 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15).</li> <li>Fayaz, S. K., et al., 2016: Efficient Network Reachability Analysis Using a Succinct Control Plane Representation, 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).</li> <li>Beckett, R., et al., 2016: Don’t Mind the Gap: Bridging Network-wide Objectives and Device-level Configurations, SIGCOMM 2016.</li> <li>Beckett, R., et al., 2017: A General Approach to Network Configuration Verification. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ‘17), 155–168.</li> </ul>Xinyu Maxinyuma@ucla.eduGeorge is a fantastic teacher with very attractive lectures. I think his secret sauces include the following: Look at everything from different views: Zoe (ζωή, big picture) versus Bios (βίος, details). Selectively focus on the most important techniques and examples, ignoring unnecessary points. Students can feel that they learned a lot without remembering too much boring concepts. Have his own methodologies on creative process. Students can experience those “Aha” times when following his introduction.Authentication and Authorization2021-03-31T00:00:00-07:002021-03-31T00:00:00-07:00https://zjkmxy.github.io/posts/2021/03/authz-authn<p>Authentication (AuthN, 認証) and authorization (AuthZ, 承認) are important pieces in system security. In one word, AuthN verifies the identity of the requester, and AuthZ decides whether a specific operation is allowed.</p> <h1 id="overview">Overview</h1> <p>Authentication is the process or action of verifying the identity of a user or process. Current web services usually allow a user to use multiple ways to login: username+password, email to reset password, Single-Sign-On (SSO), etc. If 2-Factor Authentication (2FA) is enabled, the second factor can be phone calls, SMS, USB secure key, or an authentication app. This makes it complicated enough to be a standalone component. Typically, there will be a login service that handles users login requests and verifies their identities against a user database. The login service passes a token to the authenticated user, which can be verified by other services.</p> <p>Authorization is the process which an application determines if the accessing user or service has the necessary permissions to access a resource or perform a given operation. This is the AM part of IAM (Identity and Access Management). Typically people configure it in one of the two styles: RBAC (Role-based access control) or ABAC (Attribute-based access control).</p> <h1 id="openid-connect--oauth-20">OpenID Connect / OAuth 2.0</h1> <p>OpenID Connect (OIDC) is an authentication protocol built upon OAuth 2.0. It enables verifying the identity of a user by an authorization server.</p> <p>In abstract, the requester (called Relying Party, RP) and the authorization server (called OpenID Provider, OP) interact as follows:</p> <p><img class="mermaid" src="https://mermaid.ink/svg/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG5SUC0-PitPUDogKDEpIEF1dGhOIFJlcXVlc3Rcbk9QLS0-Pi1SUDogKDMpIEF1dGhOIFJlc3BvbnNlXG5PUC0-RW5kVXNlcjogKDIpIEF1dGhOICYgQXV0aFpcblJQLT4-K09QOiAoNCkgVXNlckluZm8gUmVxdWVzdFxuT1AtLT4-LVJQOiAoNSkgVXNlckluZm8gUmVzcG9uc2UiLCJtZXJtYWlkIjpudWxsfQ" /></p> <p>In this diagram, (1)-(3) is usually done by redirection, so the user is involved. (4) and (5) only involve communication between RP and OP.</p> <h3 id="1-authentication-request">(1) Authentication Request</h3> <p>This is usually done by a redirection. The following example from OIDC doc is a request that <code class="language-plaintext highlighter-rouge">client.example.org</code> wants <code class="language-plaintext highlighter-rouge">server.example.com</code> to verify the user.</p> <div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">302</span> <span class="ne">Found</span> <span class="na">Location</span><span class="p">:</span> <span class="s">https://server.example.com/authorize?</span> <span class="s"> response_type=code</span> <span class="s"> &amp;scope=openid%20profile%20email</span> <span class="s"> &amp;client_id=s6BhdRkqt3</span> <span class="s"> &amp;state=af0ifjsldkj</span> <span class="s"> &amp;redirect_uri=https%3A%2F%2Fclient.example.org%2Fcb</span> </code></pre></div></div> <p>In this request:</p> <ul> <li><code class="language-plaintext highlighter-rouge">scope</code> specifies what fields are needed by RP.</li> <li><code class="language-plaintext highlighter-rouge">client_id</code> is the ID of RP recognized by OP, so that OP can tell the user which service is requesting.</li> <li><code class="language-plaintext highlighter-rouge">state</code> is a string opaque to OP. It will be bounced back in the response. It is said that RP can combine <code class="language-plaintext highlighter-rouge">state</code> with browser cookie to prevent Cross-Site Request Forgery.</li> <li><code class="language-plaintext highlighter-rouge">redirect_uri</code> is where the response is sent to.</li> <li><code class="language-plaintext highlighter-rouge">response_type</code> is the Authorization Flow used. This post will only introduce <code class="language-plaintext highlighter-rouge">code</code>.</li> </ul> <h3 id="3-authentication-response">(3) Authentication Response</h3> <div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">302</span> <span class="ne">Found</span> <span class="na">Location</span><span class="p">:</span> <span class="s">https://client.example.org/cb?</span> <span class="s"> code=SplxlOBeZQQYbYS6WxSbIA</span> <span class="s"> &amp;state=af0ifjsldkj</span> </code></pre></div></div> <ul> <li><code class="language-plaintext highlighter-rouge">code</code> is some keyword recognized by OP. It may include info about the user or be merely a database index, depending on the implementation.</li> </ul> <h3 id="4-userinfo-request">(4) UserInfo Request</h3> <div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">POST</span> <span class="nn">/token</span> <span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="na">Host</span><span class="p">:</span> <span class="s">server.example.com</span> <span class="na">Content-Type</span><span class="p">:</span> <span class="s">application/x-www-form-urlencoded</span> <span class="na">Authorization</span><span class="p">:</span> <span class="s">Basic czZCaGRSa3F0MzpnWDFmQmF0M2JW</span> grant_type=authorization_code&amp;code=SplxlOBeZQQYbYS6WxSbIA &amp;redirect_uri=https%3A%2F%2Fclient.example.org%2Fcb </code></pre></div></div> <ul> <li><code class="language-plaintext highlighter-rouge">Authorization</code> is an access token of RP, so OP can verify the identity of RP.</li> <li><code class="language-plaintext highlighter-rouge">redirect_uri</code> and <code class="language-plaintext highlighter-rouge">code</code> are copied from the previous response.</li> </ul> <h3 id="5-userinfo-response">(5) UserInfo Response</h3> <p>OP will respond with an OAuth 2.0 token.</p> <div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span> <span class="na">Content-Type</span><span class="p">:</span> <span class="s">application/json</span> <span class="na">Cache-Control</span><span class="p">:</span> <span class="s">no-store</span> <span class="na">Pragma</span><span class="p">:</span> <span class="s">no-cache</span> <span class="p">{</span><span class="w"> </span><span class="nl">"access_token"</span><span class="p">:</span><span class="w"> </span><span class="s2">"SlAV32hkKG"</span><span class="p">,</span><span class="w"> </span><span class="nl">"token_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Bearer"</span><span class="p">,</span><span class="w"> </span><span class="nl">"refresh_token"</span><span class="p">:</span><span class="w"> </span><span class="s2">"8xLOxBtZp8"</span><span class="p">,</span><span class="w"> </span><span class="nl">"expires_in"</span><span class="p">:</span><span class="w"> </span><span class="mi">3600</span><span class="p">,</span><span class="w"> </span><span class="nl">"id_token"</span><span class="p">:</span><span class="w"> </span><span class="s2">"eyJhbGciOiJSUzI1NiIsImtpZCI6IjFlOWdkazcifQ.ewogImlzc yI6ICJodHRwOi8vc2VydmVyLmV4YW1wbGUuY29tIiwKICJzdWIiOiAiMjQ4Mjg5 NzYxMDAxIiwKICJhdWQiOiAiczZCaGRSa3F0MyIsCiAibm9uY2UiOiAibi0wUzZ fV3pBMk1qIiwKICJleHAiOiAxMzExMjgxOTcwLAogImlhdCI6IDEzMTEyODA5Nz AKfQ.ggW8hZ1EuVLuxNuuIJKX_V8a_OMXzR0EHR9R6jgdqrOOF4daGU96Sr_P6q Jp6IcmD3HP99Obi1PRs-cwh3LO-p146waJ8IhehcwL7F09JdijmBqkvPeB2T9CJ NqeGpe-gccMg4vfKjkM8FcGvnzZUN4_KSP0aAp1tOJ1zZwgjxqGByKHiOtX7Tpd QyHE5lcMiKPXfEIQILVq0pc_E2DzL7emopWoaoZTF_m0_N0YzFC6g6EJbOEoRoS K5hoDalrcvRYLSrQAZZKflyuVCyixEoV9GfNQC3_osjzw2PAithfubEEBLuVVk4 XUVrWOLrLl0nx7RkKU8NXNHq-rvKMzqg"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <ul> <li><code class="language-plaintext highlighter-rouge">access_token</code> is a credential used to access protected resources. Usually opaque to RP.</li> <li><code class="language-plaintext highlighter-rouge">expires_in</code> is the expiration time of <code class="language-plaintext highlighter-rouge">access_token</code>. In the example, it is 1 hour.</li> <li><code class="language-plaintext highlighter-rouge">refresh_token</code> is the credential used to obtain a new access token when the current one expires. Usually opaque to RP. Unlike access tokens, refresh tokens are intended for use only with authorization servers and are never sent to resource servers. Then refresh token also expires, but the time is decided by the authorization server internally. It can vary from minutes to months.</li> <li><code class="language-plaintext highlighter-rouge">id_token</code> is usually a JWT containing some info about the user. The example JWT contains the following info</li> </ul> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"alg"</span><span class="p">:</span><span class="w"> </span><span class="s2">"RS256"</span><span class="p">,</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Encryption</span><span class="w"> </span><span class="err">algorithm</span><span class="w"> </span><span class="nl">"kid"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1e9gdk7"</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Key</span><span class="w"> </span><span class="err">ID</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"iss"</span><span class="p">:</span><span class="w"> </span><span class="s2">"http://server.example.com"</span><span class="p">,</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Issuer</span><span class="w"> </span><span class="err">website</span><span class="p">,</span><span class="w"> </span><span class="err">OP</span><span class="w"> </span><span class="nl">"sub"</span><span class="p">:</span><span class="w"> </span><span class="s2">"248289761001"</span><span class="p">,</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Subject</span><span class="p">,</span><span class="w"> </span><span class="err">the</span><span class="w"> </span><span class="err">user</span><span class="w"> </span><span class="err">ID</span><span class="w"> </span><span class="nl">"aud"</span><span class="p">:</span><span class="w"> </span><span class="s2">"s6BhdRkqt3"</span><span class="p">,</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Audience</span><span class="p">,</span><span class="w"> </span><span class="err">referring</span><span class="w"> </span><span class="err">to</span><span class="w"> </span><span class="err">RP</span><span class="w"> </span><span class="nl">"nonce"</span><span class="p">:</span><span class="w"> </span><span class="s2">"n-0S6_WzA2Mj"</span><span class="p">,</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Nonce</span><span class="w"> </span><span class="nl">"exp"</span><span class="p">:</span><span class="w"> </span><span class="mi">1311281970</span><span class="p">,</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Expiration</span><span class="w"> </span><span class="err">time:</span><span class="w"> </span><span class="mi">07</span><span class="err">/</span><span class="mi">21</span><span class="err">/</span><span class="mi">2011</span><span class="w"> </span><span class="mi">13</span><span class="err">:</span><span class="mi">59</span><span class="err">:</span><span class="mi">30</span><span class="w"> </span><span class="nl">"iat"</span><span class="p">:</span><span class="w"> </span><span class="mi">1311280970</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Issued</span><span class="w"> </span><span class="err">time:</span><span class="w"> </span><span class="mi">07</span><span class="err">/</span><span class="mi">21</span><span class="err">/</span><span class="mi">2011</span><span class="w"> </span><span class="mi">13</span><span class="err">:</span><span class="mi">42</span><span class="err">:</span><span class="mi">50</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="err">//</span><span class="w"> </span><span class="err">Signature</span><span class="w"> </span><span class="err">Value</span><span class="w"> </span><span class="err">...</span><span class="w"> </span></code></pre></div></div> <p>Google Cloud users can use <a href="https://developers.google.com/oauthplayground/">OAuth 2.0 Playground</a> to go through this procedure.</p> <h1 id="google-cloud-iam-rbac">Google Cloud IAM (RBAC)</h1> <p>Google Cloud IAM is a RBAC system.</p> <p>From the browser UI, we can see two tables: Member and Role. The Role table contains all roles on the server. A role is associated with a collection of permissions, which define what access to which resource. The Member table contains all users that have access to the Google Cloud project. A member is bound to one or more roles by policies. One IAM policy binds one or more members to one role.</p> <p>A member is usually of one of the following types:</p> <ul> <li>Google Account. Identified by its email address.</li> <li>Service Account. Identified by its name which looks like an email address. For example, <code class="language-plaintext highlighter-rouge">299987776666-compute@developer.gserviceaccount.com</code>. There is no email service for this address. Service accounts are used by machines on the server. They are always authenticated by key pairs and access tokens issued by the server. No login allowed.</li> <li>Google group. A group is a collection of Google accounts and service accounts. Identified by its email address. Groups is used to manage permissions when the number of users is large. They don’t have login credentials.</li> </ul> <h1 id="aws-iam-abac">AWS IAM (ABAC)</h1> <p>In ABAC, attributes (tags) are attached to identities (users or roles) and resources. ABAC policies can be designed to allow operations when the principal’s tag matches the resource tag. ABAC scales more rapidly because permissions to new resources are granted automatically by the tags.</p> <h1 id="practice-in-kubernetes">Practice in Kubernetes</h1> <h3 id="authentication">Authentication</h3> <p>Kubernetes avoids authentication in a clever way: it does not provide an authentication server. It has an API server that every request must go through. There are two kinds of identities recognized by the API server: users and service accounts. I only discuss the most common use case in this post.</p> <h4 id="users">Users</h4> <p>A Kubernetes cluster is built on some nodes, which are on-prem machines or virtual machines that are under the control if the network operator. It is reasonable to assume that there already exist some credentials to login to those nodes. Therefore, Kubernetes asks human users be authenticated via the existing identity provider and pass the token to the API server.</p> <p>Google Cloud service accounts are considered as “human users” in a Kubernetes cluster.</p> <p><img class="mermaid" src="https://mermaid.ink/svg/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG5Vc2VyLT4-K0lkZW50aXR5UHJvdmlkZXI6IDEuIExvZ2luIHRvIElkUFxuSWRlbnRpdHlQcm92aWRlci0-Pi1Vc2VyOiAyLiBQcm92aWRlIGlkX3Rva2VuXG5Vc2VyLT4-K0t1YmVjdGw6IDMuIENhbGwga3ViZWN0bFxuS3ViZWN0bC0-PitBUElTZXJ2ZXI6IDQuIEF1dGhvcml6YXRpb246IEJlYXJlclxuQVBJU2VydmVyLT4-QVBJU2VydmVyOiA1LiBKV1QgU2lnbiB2YWxpZD9cbkFQSVNlcnZlci0-PkFQSVNlcnZlcjogNi4gSldUIGV4cGlyZWQ_XG5BUElTZXJ2ZXItPj5BUElTZXJ2ZXI6IDcuIFVzZXIgYXV0aG9yaXplZD9cbkFQSVNlcnZlci0-Pi1LdWJlY3RsOiA4LiBBdXRob3JpemVkOiBhY3Rpb25cbkt1YmVjdGwtPj4tVXNlcjogOS4gUmV0dXJuIHJlc3VsdCIsIm1lcm1haWQiOm51bGx9" /></p> <h4 id="service-accounts">Service accounts</h4> <p>Service accounts are usually used internally by Kubernetes pods. The API server will issue tokens to pods when it creates them.</p> <p>There is also a signed JWT certificate associated with each service account, stored as a secret in the cluster. If a user want to use it outside the cluster, he can copy its text. This is not that secure, because plain text secret may get leaked. As far as I know, these JWTs have a very long expiration time, so the operator needs to manually revoke leaked credentials.</p> <h3 id="authorization">Authorization</h3> <p>Kubernetes supports both RBAC and ABAC. By default it’s RBAC on Google Cloud. Kubernetes RBAC is very similar to Google Cloud IAM.</p> <p>Cloud native Kubernetes clusters are also managed by Cloud IAM. For example, a user need to be granted permissions by both K8s cluster and Google Cloud IAM. Cloud IAM cannot see resources in the cluster, so it works only in the operations granularity. For example, <code class="language-plaintext highlighter-rouge">container.pods.get</code> allows a user to query the state of all pods, where we cannot specify which pod. One has to use K8s RBAC to grant permissions on a specific pod or namespace.</p> <h1 id="designs-in-ndn">Designs in NDN</h1> <p>NDN requires every Data packet is signed, so authentication is merely certificate management. But authorization can be done in several styles:</p> <ol> <li>Have a distributive database containing RBAC/ABAC bindings. Keys or certificates only represent identity.</li> <li>A user must create a key pair and certificate for each role it is assigned to.</li> <li>(Hybrid) A user has key pairs associated to the identity only. But when he wants to access some resource, he needs to get a short-lived certificate signed under the namespace of the role.</li> </ol> <p>The trade-offs are obvious:</p> <ul> <li>(1) needs to maintain a distributive database. Sync is one problem. If there are constrained devices, storage is another problem.</li> <li>(2) needs to frequently issue and revoke certificates. Key management can be an issue.</li> <li>(3) requires the certificate manager server to have high availability.</li> </ul> <p>One always pay a cost for distribution. It is a problem of how.</p> <h1 id="references">References</h1> <ul> <li><a href="https://openid.net/specs/openid-connect-core-1_0.html">OpenID Connect Core 1.0 incorporating errata set 1</a></li> <li><a href="https://kubernetes.io/docs/reference/access-authn-authz/authentication/">Kubernetes Authenticating</a></li> <li><a href="https://tools.ietf.org/html/rfc6749">RFC6749: The OAuth 2.0 Authorization Framework</a></li> <li><a href="https://developers.google.com/oauthplayground/">Google Cloud OAuth 2.0 Playground</a></li> </ul>Xinyu Maxinyuma@ucla.eduAuthentication (AuthN, 認証) and authorization (AuthZ, 承認) are important pieces in system security. In one word, AuthN verifies the identity of the requester, and AuthZ decides whether a specific operation is allowed.MATH 220BC: Mathematical Logic - Course Review2021-01-01T00:00:00-08:002021-01-01T00:00:00-08:00https://zjkmxy.github.io/posts/2021/01/math-220bc<p>This is a review and summary for course MATH 220BC, given by Professor Artem Chernikov and Andrew Marks.</p> <p>In this post, I listed important theorems (in my mind) like a dictionary, so I can recall what I learnt when I read this in future. Only the overview and summary may be useful to other people.</p> <aside class="sidebar__right"> <nav class="toc"> <header><h4 class="nav__title"><i class="fa fa-file-text"></i> On This Page</h4></header> <ul class="toc__menu" id="markdown-toc"> <li><a href="#overview" id="markdown-toc-overview">Overview</a></li> <li><a href="#basic-model-theory" id="markdown-toc-basic-model-theory">Basic Model Theory</a> <ul> <li><a href="#first-order-logic" id="markdown-toc-first-order-logic">First Order Logic</a></li> <li><a href="#completeness-and-soundness" id="markdown-toc-completeness-and-soundness">Completeness and Soundness</a></li> <li><a href="#compactness--up-and-down" id="markdown-toc-compactness--up-and-down">Compactness &amp; Up and Down</a></li> <li><a href="#algebraic-closed-field" id="markdown-toc-algebraic-closed-field">Algebraic Closed Field</a></li> </ul> </li> <li><a href="#recursion-theory" id="markdown-toc-recursion-theory">Recursion Theory</a> <ul> <li><a href="#primitive-recursive-functions" id="markdown-toc-primitive-recursive-functions">Primitive Recursive Functions</a></li> <li><a href="#partial-recursive-functions" id="markdown-toc-partial-recursive-functions">Partial Recursive Functions</a></li> <li><a href="#universal-functions" id="markdown-toc-universal-functions">Universal Functions</a></li> <li><a href="#recursively-enumerable-sets" id="markdown-toc-recursively-enumerable-sets">Recursively Enumerable Sets</a></li> </ul> </li> <li><a href="#models-of-arithmetic" id="markdown-toc-models-of-arithmetic">Models of Arithmetic</a> <ul> <li><a href="#decidability" id="markdown-toc-decidability">Decidability</a></li> <li><a href="#peano-arithmetic" id="markdown-toc-peano-arithmetic">Peano Arithmetic</a></li> <li><a href="#first-incompleteness-theorem" id="markdown-toc-first-incompleteness-theorem">First Incompleteness Theorem</a></li> <li><a href="#second-incompleteness-theorem" id="markdown-toc-second-incompleteness-theorem">Second Incompleteness Theorem</a></li> </ul> </li> <li><a href="#set-theory" id="markdown-toc-set-theory">Set Theory</a> <ul> <li><a href="#cumulative-hierarchy" id="markdown-toc-cumulative-hierarchy">Cumulative Hierarchy</a></li> <li><a href="#cardinal-arithmetic" id="markdown-toc-cardinal-arithmetic">Cardinal Arithmetic</a></li> <li><a href="#infinitary-combinatorics" id="markdown-toc-infinitary-combinatorics">Infinitary Combinatorics</a></li> <li><a href="#constructible-hierarchy" id="markdown-toc-constructible-hierarchy">Constructible Hierarchy</a></li> <li><a href="#forcing" id="markdown-toc-forcing">Forcing</a></li> </ul> </li> <li><a href="#summary" id="markdown-toc-summary">Summary</a></li> <li><a href="#references" id="markdown-toc-references">References</a></li> </ul> </nav> </aside> <h1 id="overview">Overview</h1> <p>This is the first math course I took at UCLA. Due to time schedule, I didn’t take 220A, but I read though a online note and memorized important conclusions. I could somehow catch up with B and C part.</p> <p>220B is based on the textbook <em>A First Journey through Logic</em>. The book is too brief, but the professor’s lectures are clear. This part is more related to computer science, as we discussed computability and Turing machines. I heard that CS 181 covers something similar, but I think 220B is more on the math side.</p> <p>220C is the most tough course I have ever took. The professors made really great lectures, but the course material is very complex and counterintuitive. (Well, maybe because I don’t have <em>correct</em> intuitions) I feel that this part is more related to analysis.</p> <h1 id="basic-model-theory">Basic Model Theory</h1> <p><a href="https://en.wikipedia.org/wiki/Model_theory">Model theory</a> works on formal theories and their models. Most of the following results depend on infinite models. For exmaple, compactness does not hold if we require every model to be finite.</p> <h2 id="first-order-logic">First Order Logic</h2> <p><a href="https://en.wikipedia.org/wiki/First-order_logic">First order logic</a> is obtained by augmenting propositional logic with two quantifiers: the existential quantifier and the universal quantifier. These quantifiers iterate over all possible values (the base set), but cannot refer to propositional formulas. There are redundencies in logic connectors and quantifiers. A minimal definition of a FOL <em>language</em> $\mathcal{L}$ will contain the following:</p> <ul> <li>Logical symbols (shared by all languages) <ul> <li>Equality relation: $=$</li> <li>Connectives: $\neq$, $\wedge$</li> <li>The existential quantifier: $\exists$</li> <li>The variable set: $v_n$, $n\in\mathbb{N}$</li> </ul> </li> <li>The <em>signature</em> of $\mathcal{L}$ <ul> <li>A set of constant symbols, $\mathcal{C}$.</li> <li>Sets of functions, $\mathcal{F}_n$ for $n$-ary functions.</li> <li>Sets of relations, $\mathcal{R}_n$ for $n$-ary relations.</li> </ul> </li> </ul> <p>We can define <em>disjunction</em> $\vee$, <em>implication</em> $\rightarrow$, <em>equivalence</em> $\leftrightarrow$ and the <em>universal quantifier</em> $\forall$ based on these symbols.</p> <p>We can define <em>terms</em> and <em>formulas</em> of this language:</p> <ul> <li>A <em>term</em> is something that can be interpreted as some value. It can be one of the following: <ul> <li>A variable.</li> <li>A constant symbol.</li> <li>A function call: $f(t_1,\ldots, t_n)$, where $f\in\mathcal{F}_n$ and $t_i$ is a term.</li> </ul> </li> <li>An <em>atomic formula</em> is some elementary proposition: <ul> <li>$t_1 = t_2$, with $t_1,t_2$ terms.</li> <li>$R(t_1, \ldots, t_n)$, where $r\in\mathcal{R}_n$ and $t_i$ is a term.</li> </ul> </li> <li>A <em>formula</em> is a <strong>finite</strong> composition of atomic formulas: <ul> <li>$\phi$, where $\phi$ is an atomic formula.</li> <li>$\neg\phi$, where $\phi$ is an atomic formula.</li> <li>$\phi\wedge\chi$, where $\phi,\chi$ are atomic formulas.</li> <li>$\exists x\phi$, where $\phi$ is an atomic formula and $x$ is a variable.</li> </ul> </li> <li>We can define <em>free occurrence</em>, <em>bound occurrence</em>, <em>similtaneous substitution</em> just as we do in programming languages.</li> <li>A <em>sentence</em> is a formula with no free variables.</li> </ul> <p>A language decides what formula we can write, but does not specify its meaning. An interpretation of this language, is an $\mathcal{L}$-<em>structure</em> $\mathfrak{A}$:</p> <ul> <li>$\mathfrak{A}$ has a non-empty <em>base set</em> $A$.</li> <li>$\mathfrak{A}$ maps every constant to an element in $A$: $c^{\mathfrak{A}}\in A$ for all $c\in\mathcal{C}$.</li> <li>$\mathfrak{A}$ maps every $n$-ary function to a (well-defined) $n$-ary function on $A$: $f^{\mathfrak{A}}: A^n\to A$ for all $f\in\mathcal{F}_n$.</li> <li>$\mathfrak{A}$ maps every $n$-ary relation to a (well-defined) $n$-ary relation on $A$: $r^{\mathfrak{A}}\subseteq A^n$ for all $r\in\mathcal{R}_n$.</li> </ul> <p>Then, we can define semantics:</p> <ul> <li>An <em>assignment</em> is a function $\alpha$ that maps variables to the base set.</li> <li>Given a term $t$, a model $\mathfrak{A}$ and an assignment $\alpha$, we can calculate the value of $t$, $t^{\mathfrak{A}}[\alpha ]$ be substitute the variables with assignments.</li> <li>Similarly, we can decide the truth value of a formula. Denote $\mathfrak{A}\models \phi[\alpha ]$ if its true.</li> <li>A formula is <em>universally valid</em>, i.e. tautology, if true in all structures. <ul> <li>Propositional tautologies, equality axioms and quantifier axioms are always universally valid.</li> </ul> </li> </ul> <h2 id="completeness-and-soundness">Completeness and Soundness</h2> <p>An $\mathcal{L}$-<em>theory</em> $T$ is a set of $\mathcal{L}$-sentences. Its elements are called axioms. A $T$-<em>model</em> is an $\mathcal{L}$-structure where every axiom of $T$ is true.</p> <p>Any sentence $\phi$ in a theory $T$ can be one of the <strong>three</strong> different cases:</p> <ul> <li>True in all $T$-models. Then, $\phi$ is called a $T$<strong>-theorem</strong>.</li> <li>False in all $T$-models. Equivalent to that $\neg\phi$ is a theorem.</li> <li>True in some models, but false in others. Then, $\phi$ is called <strong><a href="https://en.wikipedia.org/wiki/Independence_(mathematical_logic)">independent</a></strong> of $T$.</li> </ul> <p>A <em>formal proof</em> is a <strong>finite</strong> sequence of sentences, where each sentence is obtianed by any of the following:</p> <ul> <li>An axiom of $T$.</li> <li>A predicate calculus tautology, an equality axiom, or a quantifier axiom</li> <li>Modus Ponens (MP) from two previous sentences.</li> <li>Universal generalization from a previous sentences. (From $\phi$ to $\forall x\phi$).</li> </ul> <p>If a formal proof of $T$ ends in $\phi$, then $T$ proves $\phi$: $T\vdash \phi$. A theory is <em>consistent</em> if it cannot prove $\phi\wedge\neg\phi$ for any $\phi$. The following theorems show that a sentence has a formal proof iff it’s true in all models.</p> <p><strong>Soundness</strong>: If $T\vdash\phi$, $\phi$ is true in all models.</p> <p><a href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_completeness_theorem"><strong>Gödel’s Completeness Theorem</strong></a> or <strong>Model existence theorem</strong>: Every consistent theory has a model. Hence, a sentence that is true in all models can always be proven.</p> <p>This is proven by constructing a model for an arbitrary theory. See also: <a href="https://en.wikipedia.org/wiki/Witness_(mathematics)#Henkin_witnesses">Henkin witness</a></p> <p><a href="https://en.wikipedia.org/wiki/Craig_interpolation"><strong>Craig’s Interpolation Theorem</strong></a>: Let $\phi$ be a $\mathcal{L}_1$-sentence and $\psi$ be a $\mathcal{L}_1$-sentence. Suppose $\models \phi\to\psi$. Then, there exists a $\mathcal{L}_0 := \mathcal{L}_1\cap\mathcal{L}_2$ sentence $\rho$ s.t. $\models\phi\to\rho$ and $\models\rho\to\psi$.</p> <h2 id="compactness--up-and-down">Compactness &amp; Up and Down</h2> <p><a href="https://en.wikipedia.org/wiki/Compactness_theorem"><strong>Compactness Theorem</strong></a>: A theory is consistent if and only if every <strong>finite</strong> subset of it is consistent.</p> <p>This is equivalent to the completeness theorem, as <em>every formal proof must be finite</em>. On the other hand, we can also use <a href="https://en.wikipedia.org/wiki/Ultraproduct#%C5%81o%C5%9B's_theorem">Łoś’s Theorem</a> to give a direct model.</p> <ul> <li>Two structures $\mathfrak{M}$ and $\mathfrak{N}$ are <em>elementarily equivalent</em> $\mathfrak{M}\equiv\mathfrak{N}$, if they satisfy the same sentences.</li> <li>If for every formula $\phi(x_1,\ldots, x_n)$ and $\bar{a}\in M^n$, $\mathfrak{M}\models\phi(\bar{a})$ iff $\mathfrak{N}\models\phi(\bar{a})$, then $\mathfrak{M}$ is an <em>elementary substructure</em> (or <em>submodel</em>) of $\mathfrak{N}$: $\mathfrak{M}\prec\mathfrak{N}$.</li> <li>Clearly, elementary substructures are equivalent. But the reverse is not true, i.e. subset + elementary equivalent ≠ elementary substructure. <ul> <li>For example, $\mathfrak{M} = (\mathbb{N}_+, &lt;)$ and $\mathfrak{N} = (\mathbb{N}, &lt;)$ are isomorphic and equivalent, but $\mathfrak{M}$ is not a elementary substructure of $\mathfrak{N}$. Consider $\phi(y) := \exists x(x&lt; y)$ and $y=1$.</li> </ul> </li> </ul> <p><a href="https://en.wikipedia.org/wiki/Elementary_equivalence#Tarski%E2%80%93Vaught_test"><strong>Tarski-Vaught Test</strong></a>: Suppose $\mathfrak{M}\subseteq \mathfrak{N}$. Consider formula $\phi(x_1,\ldots, x_n,y)$ and $\bar{a}=(a_1,\ldots,a_n)\in M^n$. If whenever there exists some $u\in N$ satisfying $\phi(\bar{a},y)$ ($\mathfrak{N}\models\exists y\phi(\bar{a}, y)$), there also exists some $v\in M$ satisfying $\phi(\bar{a},y)$ ($\mathfrak{M}\models\exists y\phi(\bar{a}, y)$), then $\mathfrak{M}\prec\mathfrak{N}$.</p> <p><strong>Downward <a href="https://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem">Löwenheim–Skolem theorem</a></strong>: Let $\mathfrak{M}$ be a $\mathcal{L}$-structure and $A\subseteq M$. Then, for any cardinal $\kappa$ s.t. $\max\{|\mathcal{L}|,|A|\}\leq \kappa\leq |M|$, there is an elementary submodel of $\mathfrak{M}$ of size $\kappa$ containing $A$. <br /> Note: <em>every language is at least countable</em>.</p> <p><em>Proof sketch</em>: For every existence formula $\phi:=\exists y\psi(x_1,\ldots, x_n,y)$, define Skolem function $f_{\phi}: M^n\to M$ as follows: if there exists some $b$ s.t. $\psi(a_1,\ldots, a_n,b)$ is true, let $f_{\phi}(a_1,\ldots, a_n)$ be any $b$; if there does not exists such $b$, let $f_{\phi}(a_1,\ldots, a_n)$ be any element. Take a superset $S\subseteq S’$ s.t. $|S’| = \kappa$. Take the closure of $S’$ under $f_{\phi}$ of all possible $\phi$ (at most $|\mathcal{L}|$ many).</p> <p><em>Note</em>: though this proof uses AC, there exists a proof without AC for the countable case.</p> <p><strong>Upward Löwenheim–Skolem theorem</strong>: Let $\mathfrak{M}$ be a $\mathcal{L}$-structure. Then, for any cardinal $\kappa$ s.t. $|M|\leq \kappa$, there is an elementary extension of $\mathfrak{M}$ of size $\kappa$.</p> <p><em>Proof idea</em>: Use <a href="https://en.wikipedia.org/wiki/Elementary_diagram">diagram method</a>.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Type_(model_theory)#The_omitting_types_theorem">Omitting Types Theorem(型排除定理)</a></strong>:</p> <ul> <li>A <strong>partial n-type</strong> (in theory $T$) is a (usually infinite) set of $n$-variable formulas $\pi$ consistent with $T$. Formally, $T\cup\{\exists \bar{v}\bigwedge_{\phi\in\pi_k} \phi\}$ is consistent for all finite subset $\pi_k$.</li> <li>A model $\mathfrak{M}$ <strong>realizes</strong> $\pi$ if it is satisfiable in $\mathfrak{M}$; otherwise, $\mathfrak{M}$ <strong>omits</strong> $\pi$.</li> <li>$\pi$ is <strong>isolated</strong> if there is a single formula $\phi$ that is equivalent to $\pi$, i.e. $T\vdash \exists\bar{v}(\phi\to\psi)$ for all $\psi\in\pi$.</li> <li>A <strong>complete n-type</strong> is a maximal partial n-type $\pi$. That is, for all $n$-formula $\phi(\bar{v})$, either $\phi(\bar{v})\in\pi$ or $\neg\phi(\bar{v})\in\pi$.</li> </ul> <p>The theorem states that for a theory $T$ of a countable language and any countably many non-isolated partial types, there exists a model of $T$ that omits all given types.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Saturated_model">Saturated Model</a></strong>: Suppose there is an $\mathcal{L}$-structure $\mathfrak{M}$.</p> <ul> <li>For $A\subseteq M$, an <strong>partial n-type</strong> (<em>of</em> $\mathfrak{M}$) <strong>over</strong> $A$ is a set of $n$-formulas $\pi(\bar{x})$ in language $\mathcal{L}_A := \mathcal{L}\cup\{c_a: a\in A\}$, that is consistent with the theory of $\mathfrak{M}$. Formally, for all finite subset $\pi_k$, there is some $b\in M$ s.t. $\mathfrak{M}\models\pi_k(b)$.</li> <li>A <strong>complete n-type over</strong> $A$ is a maximal partial $n$-type over $A$.</li> <li>We define the term <strong>realize</strong>, <strong>omit</strong> and <strong>isolate</strong> similar to types of theories.</li> <li>$\mathfrak{M}$ is $\kappa$<strong>-saturated</strong> if for all $|A|\leq \kappa$, every 1-type over $A$ is realized.</li> <li>$\mathfrak{M}$ is <strong>recursively saturated</strong> if all recursively definable 1-type is realized.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Lindstr%C3%B6m%27s_theorem">Lindström’s Theorem</a></strong>: FOL is the strongest logic satisfying both compactness and Downward Löwenheim–Skolem property.</p> <h2 id="algebraic-closed-field">Algebraic Closed Field</h2> <p><strong><a href="https://en.wikipedia.org/wiki/Quantifier_elimination">Quantifier elimination</a></strong>: A structure or a theory admits <em>quantifier elimination</em> if every formula is equivalent to a quantifier-free formula. It is proven that if every formula in the form $\exists x\phi$ with $\phi$ quantifier free has a quantifier free form, then the theory or structure admits quantifier elimination.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Algebraically_closed_field">Algebraically closed field</a></strong>: A field is <em>algebraically closed</em> if every non-constant polynomial has a root.</p> <p>The theory of algebraically closed field, ACF, admits quantifier elimination.</p> <p><strong><a href="https://proofwiki.org/wiki/Lefschetz_Principle_(First-Order)">Lefschetz principle</a></strong>: For a $\mathcal{L}_{ring}$-formula $\phi$, the following are equivalent:</p> <ul> <li>$\phi$ is true in some ACF of characteristic 0.</li> <li>$\phi$ is true in every ACF of characteristic 0.</li> <li>$\phi$ is true in ACFs of characteristic $p$ for arbitrarily large $p$. (i.e. for an infinite set of prime numbers)</li> <li>$\phi$ is true in ACFs of characteristic $p$ for sufficiently large $p$. (i.e. for all $p&gt;N$)</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Ax%E2%80%93Grothendieck_theorem">Ax’s theorem</a></strong>: For all polynomial functions $f:\mathbb{C}^n\to\mathbb{C}^n$, if $f$ is injective, then $f$ is also surjective.</p> <h1 id="recursion-theory">Recursion Theory</h1> <p><a href="https://en.wikipedia.org/wiki/Computability_theory">Recursion theory</a> studies computability and computable functions.</p> <h2 id="primitive-recursive-functions">Primitive Recursive Functions</h2> <p><strong>Primitive recursive functions</strong>: A primitive recursive function is a $n$-ary function on natural numbers obtained by the following:</p> <ul> <li>Basic functions: <ul> <li>Successor $S := \lambda x. x+1$;</li> <li>Constant zero $C^0_0 := \lambda. 0$;</li> <li>Projection $P^n_j:=\lambda x_1\cdots x_n.x_j$</li> </ul> </li> <li>Composition of p.r. functions: <ul> <li>$f := g(h_1,\ldots, h_n)$</li> </ul> </li> <li>Recursion: <ul> <li>$f(x_1,\ldots, x_n, 0) := g(x_1,\ldots, x_n)$;</li> <li>$f(x_1,\ldots, x_n, y+1) := h(x_1,\ldots, y, f(x_1,\ldots, x_n,y))$</li> </ul> </li> </ul> <p>A subset of $\mathbb{N}^n$ is p.r. iff its characteristic function is p.r.</p> <p>One can prove that p.r. functions and sets are stable under:</p> <ul> <li>Boolean combinations ($\cap$, $\cup$, $\setminus$)</li> <li>Definition by cases</li> <li>Bounded sums and products</li> <li>Bounded $\mu$-operator: $f(\bar{x},z):=(\mu t\leq z)((\bar{x},t)\in X)$ <ul> <li>$f(\bar{x},z)$ is the smallest natural number s.t. $(\bar{x},t)\in X$; 0 if there does not exist.</li> </ul> </li> <li>Bounded quantification.</li> </ul> <p>We can also define the following functions:</p> <ul> <li>Combine $n$ numbers into one. <ul> <li>Functions $\alpha_n, \beta^n_i$ s.t. $\beta^n_i(\alpha_n(x_1,\ldots, x_n)) = x_i$.</li> </ul> </li> <li>Finite sequences of natural numbers, <a href="https://en.wikipedia.org/wiki/G%C3%B6del_numbering">Gödel number</a>. <ul> <li>$\langle \cdot \rangle$ s.t. $\langle x_1,\ldots, x_n \rangle$ is a natural number</li> <li>$\text{lg}(\langle x_1,\ldots, x_n \rangle) = n$</li> <li>$(\langle x_1,\ldots, x_n \rangle)_i = x_i$</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Ackermann_function">Ackermann function</a></strong>: $\xi:\mathbb{N}^2\to\mathbb{N}$ defined as follows:</p> <ul> <li>$\xi_0(x) = \xi(0, x) := 2^x$;</li> <li>$\xi_y(0) = \xi(y, 0) := 1$;</li> <li>$\xi_{y+1}(x+1) = \xi(y+1, x+1) := \xi(y, \xi(y+1, x))$.</li> </ul> <p>This version is designed for the sake of proof, not the version that people generally use. In this version, $\xi_0$ is exponentiation, $\xi_1$ is (basically) tetration, and $\xi_2$ is (basically) pentation.</p> <p>The finite powers of Ackermann functions, $\xi^k_n = \xi_n\circ\cdots\circ\xi_n$ for $k$ times, give bounds of p.r. functions.</p> <ul> <li>A function $f\in\mathcal{F}_1$ <em>dominates</em> $g\in\mathcal{F}_n$ if $g(\bar{x}) \leq f(\max(\bar{x},N))$ for some $N$.</li> <li>If $f$ is obtained by no more than $n$ times of recursions, then there exists some $k\in\mathbb{N}$ s.t. $\xi_n^k$ dominates $f$.</li> </ul> <p>One can prove that all $\xi_n$ are p.r., but $\xi$ itself is not, because $\xi^k_n$ cannot bound itself $\xi$.</p> <p>More generally, we cannot define “computable functions” concept, say set $F$, with all of the following properties:</p> <ol> <li>$F$ is countable: $F = \{f_0, f_1,\ldots\}$.</li> <li>$F$ is closed under desired operations like recursion, composition, etc.</li> <li>There is a “computer”: $u\in F$, $u(n, \bar{x}) = f_n(\bar{x})$.</li> <li>All functions on $F$ are total: $f_i$ terminates on all inputs.</li> </ol> <p>This is because $\lambda n. u(n,n)+1$ does not equal to any $f_i$. We cannot drop 2 and 3, and 1 is implied by 3. Thus, to properly define the concept of “computability”, we have to drop 4.</p> <h2 id="partial-recursive-functions">Partial Recursive Functions</h2> <p>A <em>partial function</em> from $\mathbb{N^n}\to\mathbb{N}$ is a function $f:A\to \mathbb{N}$ for some $A\subseteq\mathbb{N^n}$. We use $f\in\mathcal{F}^*_n$ to denote the set of partial functions. We say $f$ <em>converges</em> at $p$, $f(p)\downarrow$, for $p\in A$; $f$ <em>diverges</em> at $p$, $f(p)\uparrow$, for $p\notin A$.</p> <p>The set of <em>partial recursive functions</em> is the smallest set $E$ satisfying the following:</p> <ul> <li>$E$ contains the basic functions.</li> <li>$E$ is stable under composition (of partial functions).</li> <li>$E$ is stable under recursion (of partial functions).</li> <li>$E$ is stable under (unbounded) $\mu$-operator: $g(\bar{x}) := \mu y(f(\bar{x},y)=0)$ is defined as <ul> <li>$g(\bar{x}) = z$ if $z$ is the minimal element s.t. $f(\bar{x},z) = 0$ and $f(\bar{x},y)$ is defined for all $y\leq z$.</li> <li>$g(\bar{x})\uparrow$ is not defined otherwise.</li> </ul> </li> </ul> <p>A total function is <em>(total) recursive</em> if it is a partial recursive function.</p> <p>Recursive functions are computable functions. (<a href="https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis">Church’s Thesis</a>) Every intuitively computable function is recursive.</p> <h2 id="universal-functions">Universal Functions</h2> <p><strong><a href="https://en.wikipedia.org/wiki/Universal_function">Universal Functions</a></strong>: There exists $\varphi^p\in\mathcal{F}^*_{p+1}$ s.t. every $f\in \mathcal{F}^*_{p}$ that is partial recursive equals to some $\varphi^p_i = \lambda \bar{x}.\varphi^p(i,\bar{x})$. Moreover, $\varphi^p$ is partial recursive.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Smn_theorem">s-m-n Theorem</a></strong>: For any natural numers $m$ and $n$, there exists a p.r. function $s^m_n\in\mathcal{F}_{n+1}$ s.t.</p> $\varphi^{n+m}_i(x_1,\ldots,x_n,y_1,\ldots,y_m) = \varphi^{m}_{s^m_n(i,x_1,\ldots,x_n)}(y_1,\ldots,y_m)$ <p>In programming world, this is <a href="https://en.wikipedia.org/wiki/Partial_application">partial application</a>, or <a href="https://en.wikipedia.org/wiki/Currying">currying</a>.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Kleene%27s_recursion_theorem">Kleene’s Fixed Point Theorem</a></strong>: For any total recursive function $\alpha\in\mathcal{F}_1$ and $m &gt; 0$, there exists $i\in\mathbb{N}$ s.t. $\varphi^m_{i} = \varphi^m_{\alpha(i)}$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Kleene%27s_recursion_theorem#Application_to_elimination_of_recursion">Elimination of Recursion</a></strong>: The set of <em>total</em> recursive functions is the smallest set that</p> <ul> <li>Contains $S$, $C^0_0$, $P^n_i$;</li> <li>Contains $+$, $\cdot$, $1_=$, $1_&lt;$;</li> <li>Is stable under composition;</li> <li>Is stable under total $\mu$-operator. <ul> <li>Total $\mu$-operator is the unbounded $\mu$-operator but only applicable when it always converges.</li> </ul> </li> </ul> <p>In other words, recurion is only used to define addition, multiplication and comparison.</p> <h2 id="recursively-enumerable-sets">Recursively Enumerable Sets</h2> <p><strong><a href="https://en.wikipedia.org/wiki/Recursively_enumerable_set">Recursively enumerable sets</a></strong> have many equivalent definitions. Suppose $S\subseteq\mathbb{N}^n$, then the following are equivalent:</p> <ul> <li>$S$ is either empty or the range of a total recursive function $f:\mathbb{N}\to\mathbb{N}^n$. <ul> <li>That is, there exists a recursive function that enumerates elements of $S$, but not necessarily in order.</li> </ul> </li> <li>$S$ is either empty or the range of a p.r. function.</li> <li>$S$ is a projection of some recursive set $T\subset\mathbb{N}^{n+1}$.</li> <li>$S$ is a projection of some p.r. set $T\subset\mathbb{N}^{n+1}$.</li> <li>$S$ is the domain of a partial recursive function $f: \mathbb{N}^n\to\mathbb{N}$.</li> </ul> <p>The set of r.e. sets is stable under intersection, union, projection, bounded universal quantification, image (replacement) under recursive functions.</p> <p><strong>Theorem of the Complement</strong>: A set $X\subseteq\mathbb{N}^n$ is recursive if and only if both $X$ and $\mathbb{N}^n\setminus X$ are r.e.</p> <p><strong>the Halting Problem</strong>: The domain of $\varphi^n$ is a r.e. set but not recursive. The set of indices of Turing machines that halt on the empty input is r.e. but not recursive.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Rice%27s_theorem">Rice’s Theorem</a></strong>: Suppose $\chi$ is a set of one-variable partial recursive functions that is non-empty nor full. Then, the indices $I = \{i: \varphi^1_i\in\chi\}$ is not recursive.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Kolmogorov_complexity">Kolmogorov complexity</a></strong>: Given $n\in\mathbb{N}$, let $K(n)$ be the least $i\in\mathbb{N}$ s.t. $\varphi^1_i(0) = n$. One can prove that there does not exist an infinite r.e. set $A\subseteq\mathbb{N}^2$ s.t. $K(n)=i$ for all $(n,i)\in A$.</p> <h1 id="models-of-arithmetic">Models of Arithmetic</h1> <p>In this part, we study the models of natural numbers. Since we can encode all programs into natural numbers, we can bring them into FOL formulas and try to formally define “provability”.</p> <h2 id="decidability">Decidability</h2> <p>If a language $\mathcal{L}$ has a finite signature, then we can assign each symbol to a Gödel number, and encode every term, formula, and proof into a natural number.</p> <ul> <li>The set of codes of formulas and logical axioms of $\mathcal{L}$ are p.r. sets</li> <li>There exist p.r. functions that simulate substitution of terms and formulas.</li> <li>The set of formal proofs $\mathrm{Prf}=\{(\#\#d, \#\phi): d\text{ is a formal proof of }\phi\}$ is a p.r. set.</li> <li>The set of tautologies $U=\{\#\phi:\ \vdash\varphi\}$ is r.e.</li> </ul> <p>Given a $\mathcal{L}$-theory $T$,</p> <ul> <li>$T$ is <em>recursive</em> if the code of axioms of $T$ is recursive.</li> <li>$T$ is <em>recursively</em> (or <em>effectively</em>) <em>axiomatizable</em> if there exists a recursive theory $T’$ that has the same theorem of $T$.</li> <li>$T$ is <em>decidable</em> if the set of codes of theorems $\#\mathrm{Thm}(T)$ of $T$ is recursive.</li> <li>If $T$ is recursive, then the formal proofs of $T$, $\mathrm{Prf}(T)$, is recursive.</li> <li>$T$ is effectively axiomatizable iff $\#\mathrm{Thm}(T)$ is r.e.</li> <li>If $T$ is effectively axiomatizable and complete, then $T$ is decidable.</li> <li>If $T$ is decidable, then the theory of $T$ plus a finite set of axioms is also decidable.</li> </ul> <h2 id="peano-arithmetic">Peano Arithmetic</h2> <p>We work in the language of arithmetic $\mathcal{L}_{ar} = \{0,S,+,\cdot, &lt; \}$. In this language, we can express any (standard) natural number $n\in\mathbb{N}$ as $\underline{n}$:</p> <ul> <li>$\underline{0} \equiv 0$;</li> <li>$\underline{n+1} \equiv S(\underline{n})$.</li> </ul> <p>The <strong>weak Peano axioms</strong> is the finite theory $\mathrm{PA}_0$ consisting of the following 8 axioms:</p> <ol> <li>$0$ has no predecessor. <ul> <li>(A1) $\forall x \neg Sx=0$</li> </ul> </li> <li>Every non-zero number has a predecessor. <ul> <li>(A2) $\forall x\exists y(\neg x=0\to Sy=x)$</li> </ul> </li> <li>Two numbers with equal successors are equal. <ul> <li>(A3) $\forall x\forall y(Sx=Sy\to x=y)$</li> </ul> </li> <li>$0$ is the identity of addition. <ul> <li>(A4) $\forall x\ x+0 = 0$</li> </ul> </li> <li>Successor commutes with addition. <ul> <li>(A5) $\forall x\forall y\ x+Sy =S(x+y)$</li> </ul> </li> <li>$0$ is the zero of multiplication. <ul> <li>(A6) $\forall x\ x\cdot 0 = 0$</li> </ul> </li> <li>Multiplication is distributive over successor. <ul> <li>(A7) $\forall x\forall y\ x\cdot Sy =x\cdot y + x$</li> </ul> </li> <li>“Less” means “subtractable”. <ul> <li>(A8) $\forall x\forall y(x&lt; y\leftrightarrow \neq x=y\wedge (\exists z\ x+z=y))$</li> </ul> </li> </ol> <p>The <strong><a href="https://en.wikipedia.org/wiki/Peano_axioms">Peano axioms</a></strong> is the infinite theory PA, consisting of $\mathrm{PA}_0$ and the <em>axiom (schema) of induction</em>.</p> <ul> <li>There exists a model of $\mathrm{PA}_0$ that both addition and multiplication are neither commutative nor associative, and $&lt;$ is not a total order.</li> <li>The standard natural number $\mathbb{N}$ is isomorphic to an initial segment of any $\mathrm{PA}_0$ model.</li> <li>In a model of PA: <ul> <li>Addition and multiplication are commutative and associative.</li> <li>Multiplication is distributive w.r.t. addition.</li> <li>$&lt;$ is a total order.</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Arithmetical_hierarchy">Arithmetical Hierarchy</a></strong>: Every FOL formula $\phi$ can be classified into the following hierarchy:</p> <ul> <li>$\phi$ is $\Sigma_0 = \Pi_0 = \Delta_0$ if is is equivalent to a formula with only <em>bounded</em> quantifiers;</li> <li>$\phi$ is $\Sigma_{n+1}$, if it is equivalent to a formula in the form $\exists x_1\cdots\exists x_m\psi$, where $\psi$ is $\Pi_{n}$;</li> <li>$\phi$ is $\Pi_{n+1}$, if it is equivalent to a formula in the form $\forall x_1\cdots\forall x_m\psi$, where $\psi$ is $\Sigma_{n}$;</li> <li>$\phi$ is $\Delta_{n}$ if it is both $\Sigma_{n}$ and $\Pi_{n}$.</li> <li>Sometimes people write $\Sigma^0_n$, $\Pi^0_n$ and $\Delta^0_n$. The superscript $0$ means counting <em>number quantifiers</em> instead of <em>function quantifiers</em>.</li> </ul> <p>Any $\Sigma_1$ sentence satisfied in $\mathbb{N}$ is a theorem of $\mathrm{PA}_0$.</p> $\mathbb{N}\models\phi[m_1,\ldots, m_n] \Rightarrow \mathrm{PA}_0\models\phi(\underline{m_1},\ldots, \underline{m_n})$ <p><strong>Representability Theorem</strong>: Every total recursive function is represented by a $\Sigma_1$ formula. A set is r.e. iff there is a $\Sigma_1$ formula defining it in $\mathbb{N}$. Here, $\phi$ “represent” $f$ means:</p> $\mathrm{PA}_0\models \forall y\left[ \phi(\underline{n_1},\ldots,\underline{n_p},y) \leftrightarrow y = \underline{f(n_1,\ldots, n_p)} \right]$ <p><strong><a href="https://en.wikipedia.org/wiki/Overspill">Overspill</a></strong>: Let $\mathfrak{M}$ be a <em>non-standard</em> model of PA and $\phi(x)$ a formula. If for all $n\in\mathbb{N}$, $\mathfrak{M}\models\phi(\underline{n})$, then there exists $c\in M$ non-standard s.t. $\mathfrak{M}\models\phi(c)$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Pigeonhole_principle">Pigeonhole Principle</a></strong>: For any PA model $\mathfrak{M}$, $\mathcal{L}_{ar}(M)$-formula $\theta(v,z)$ and $a\in M$:</p> $\mathfrak{M}\models \left[ \forall x(\exists z&gt;x)(\exists v&lt; a)\theta(v,z) \right] \to (\exists v&lt; a)\forall x(\exists z&gt;x)\theta(v,z)$ <p>In other words, if a predicate maps infinite many numbers ($z&gt; x$) into a finite number of elements ($v&lt; a$), then there exists a fixed number ($v$) that is the image of infinitely many numbers.</p> <p><strong>MacDowell-Specker Theorem</strong>: Any countable model of PA has an elementary end extension.</p> <h2 id="first-incompleteness-theorem">First Incompleteness Theorem</h2> <p><strong>The Diagonal Argument</strong>: For every formula $\phi(v)$, there exists a sentence $\Delta_{\phi}$ s.t. $\phi(\underline{\#\Delta_{\phi}})$ is equivalent to $\neg\Delta_{\phi}$ in $\mathrm{PA}_0$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Diagonal_lemma">Fixed Point Theorem</a></strong>: For all $\phi(v)$, there exists a sentence $\psi$ s.t.</p> $\mathrm{PA}_0\vdash \phi(\underline{\#\psi})\leftrightarrow\psi$ <p><em>Proof</em>: Take $\psi :\equiv \Delta_{\neg\phi}$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem">Tarski’s Theorem on the Non-definability of Truth</a></strong>: In any model $\mathfrak{M}$ of $\mathrm{PA}_0$, there does not exist a formula $S(v)$ s.t. $\mathfrak{M}\models\phi$ iff $\mathfrak{M}\models S(\underline{\#\phi})$.</p> <p><em>Corollary</em>: The theorems of $\mathbb{N}$ are undecidable.</p> <p><strong><a href="https://mathworld.wolfram.com/ChurchsTheorem.html">Church’s Theorem</a></strong>: Any consistent theory containing $\mathrm{PA}_0$ is undecidable.</p> <p><em>Corollary 1</em>: The tautologies of the finite language $\mathcal{L}_{ar}$ is undecidable.</p> <p><em>Corollary 2</em>: The tautologies of a language that consists only one binary relation is undecidable.</p> <p><em>Corollary 3</em>: The tautologies of a language that consists of a unary predicate and a constant is decidable.</p> <p><strong><a href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems#First_incompleteness_theorem">Gödel’s First Incompleteness Theorem</a></strong>: Any consistent and recursive theory containing $\mathrm{PA}_0$ is incomplete.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Rosser%27s_trick">Rosser’s Variant</a></strong>: Rosser writes down an independent sentence that witnesses this theorem. Let $T\supseteq \mathrm{PA}_0$ be consistent and recursive. Let $P_T(x,y)$ represents the set of formal proofs of $T$, i.e. $y$ proves $x$. Let $\nu(x, y)$ represents the negation of formula, i.e. $y$ is $\neg x$. Set:</p> $P^R_T(x,y) :\equiv P_T(x,y)\wedge\neg(\exists z\leq y)\exists u(P_T(u,z)\wedge \nu(x,u))$ <p>That means,</p> <ul> <li>for standard numbers $x$ and $y$, $y$ proves $x$;</li> <li>for non-standard numbers $y$, there does not exist a proof $z$ for the negation of $x$.</li> </ul> <p>Then, $\Delta_T^R :\equiv \Delta_{\exists y P^R_T(x,y)}$ is independent.</p> <p><em>Remark</em>: If we directly use $P^T$ as $\Delta_T :\equiv \Delta_{\exists y P_T(x,y)}$, then $T$ cannot prove $\Delta_T$, but $T$ may prove $\neg\Delta_T$. $P_T(\#\Delta_T, y)$ does not imply $T\vdash \Delta_T$ when $y$ is non-standard, so no contradiction.</p> <h2 id="second-incompleteness-theorem">Second Incompleteness Theorem</h2> <p>Though general satisfiability is undecidable, if we limit the complexity of formula, we can get decidable results. For example, satisfiability of $\Sigma_1$ formulas.</p> <p><strong>Provably Total $\Sigma_1$ Functions</strong>: Suppose $\chi_f$ represents $f$. If PA proves that for any (even non-standard) numbers $x_1,\ldots, x_n$, there is exactly one $y$ that satisfies $\chi_f$, then $f$ is <em>provably total</em> $\Sigma_1$. Formally,</p> $\text{PA}\vdash \forall x_1,\ldots,x_n\exists !y\ \chi_f(x_1,\ldots, x_n,y)$ <ul> <li>Every p.r. function is provably total $\Sigma_1$</li> <li>Not every recursive function is provably total $\Sigma_1$ (though representable). <ul> <li>There is a universal provably total $\Sigma_1$ function $g\in\mathcal{F}_3$, in the sense that: for every provably total $\Sigma_1$ function $f$ there exists $a,b\in\mathbb{N}$ s.t. $f = \lambda n.g(a,b,n)$.</li> </ul> </li> </ul> <p><strong>Definability of Satisfiability for $\Sigma_1$ Formulas</strong>: There exists a $\Sigma_1$-formula $\mathrm{Sat}(v)$ s.t. for all $\Sigma_1$-sentence $\phi$,</p> $\mathrm{PA}\vdash \phi\leftrightarrow \mathrm{Sat}(\underline{\#\phi})$ <p>The proof uses <a href="https://en.wikipedia.org/wiki/Certificate_(complexity)">certificate</a> method.</p> <p><strong>Definition of Provability</strong>: For a recursive theory $T$, there exists an operator $\Box_T$ on sentences s.t.</p> <ul> <li>If $\phi$ is a $\Sigma_1$-sentence, $\text{PA}\vdash \phi\to\Box_T\phi$;</li> <li>For any sentence $\psi$, $\mathbb{N}\models \Box_T\psi \iff T\vdash \psi$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/L%C3%B6b%27s_theorem">Löb’s Axioms</a></strong>: for any sentence $\phi,\psi$,</p> <ol> <li>(Necessitation Rule, N) $T\vdash \phi \Rightarrow T\vdash \Box_T\phi$;</li> <li>(Distribution Axiom, K) $T\vdash [\Box_T\phi\wedge \Box_T(\phi\to\psi)]\to \Box_T\psi$;</li> <li>(4) $T\vdash \Box_T\phi\to\Box_T\Box_T\phi$.</li> </ol> <p><em>Corollaries</em>:</p> <ul> <li>If $T\vdash \phi\to\psi$, then $T\vdash \Box_T\phi\to\Box_T\psi$;</li> <li>(GL) $T\vdash \Box_T(\Box_T\phi\to\phi)\to\Box_T\phi$;</li> <li>If $T\vdash \Box_T\phi\to\phi$, then $T\vdash \Box_T\phi$;</li> <li>If $T\vdash \neg\Box_T\phi$, then $T\vdash \Box_T\phi$.</li> </ul> <p>See also: <a href="https://plato.stanford.edu/entries/logic-modal/">Modal Logic</a>, <a href="https://plato.stanford.edu/entries/logic-provability/">Provability Logic</a>, <a href="https://en.wikibooks.org/wiki/Logic_for_Computer_Scientists/Modal_Logic/Kripke_Semantics">Kripke Semantics</a>.</p> <p><strong><a href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems#Second_incompleteness_theorem">Gödel’s Second Incompleteness Theorem</a></strong>: Let $T$ be a computable theory. Then, it is impossible to prove $\Box_T\phi$ for any $\phi$ in $T$. More specifically, let $\text{Con}_T$ denote the consistency of $T$: $\text{Con}_T :\equiv \neg\Box_T(0=1)$. Then, $T$ cannot prove $\text{Con}_T$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Tennenbaum%27s_theorem">Tenenbaum’s Theorem</a></strong>: There is no recursive non-standard model of PA. That is, for any countable non-standard model of PA, there is no way to code the elements of the model as (standard) natural numbers such that either the addition or multiplication operation of the model is a computable on the codes.<br /> SA: <a href="https://en.wikipedia.org/wiki/Non-standard_model_of_arithmetic#Structure_of_countable_non-standard_models">Non-standard model of arithmetic</a>, <a href="https://www.lesswrong.com/posts/i7oNcHR3ZSnEAM29X/standard-and-nonstandard-numbers">Standard and Nonstandard Numbers</a>.</p> <h1 id="set-theory">Set Theory</h1> <p>Axiomatic set theory studies models of set theory. Since the whole world of logic is built upon set theory, we cannot find a structure like $\mathbb{N}$ and define it as a “standard model”. However, assuming the consistency of the theory, there are models developed based on the universe that we are in.</p> <h2 id="cumulative-hierarchy">Cumulative Hierarchy</h2> <p><strong>Language of Set Theory</strong>: $\mathcal{L}_{set} = \{\in \}$. Other symbols can be developed as abbreviations of $\mathcal{L}_{set}$-formulas.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Axiom_schema_of_specification">The Axiom Schema of Comprehension</a></strong>: The only axiom for <em>naive</em> set theory, is that every logic formula $\phi$ has a set corresponding to it:</p> $\forall\bar{w}\exists y\forall z\ (z\in y\leftrightarrow \phi(z, \bar{w}))$ <p><strong><a href="https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument">Cantor’s Diagonal Argument</a></strong>: For all set $X$, there is no surjection $f: X\to\mathcal{P}(X)$.</p> <ul> <li>Cantor’s paradox: Let $V = \{x: \forall x\}$. Then, $\mathcal{P}(V)$ is part of $V$.</li> </ul> <p><strong><a href="https://plato.stanford.edu/entries/russell-paradox/">Russel’s Paradox</a></strong>: Let $D = \{x: x\notin x\}$. Then, $D$ is not a set. This proves that the axiom schema of comprehension is inconsistent.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory">The Axioms of ZF</a></strong>: To avoid the inconsistency, the construction of sets must be limited. Basic axioms of axiomatic set theory include the following. Models of set theory generally satisfy all or a major part of them:</p> <ul> <li><strong>The axiom of Extensionality</strong>: Every set is determined by its members. <ul> <li>$\forall x\forall y[x=y \leftrightarrow \forall z(z\in x = z\in y)]$.</li> </ul> </li> <li><strong>The axiom of Foundation</strong>: Every non-empty set contains a $\in$-minimal element. <ul> <li>Let $x\neq \varnothing$ abbreviate $\exists y(x\in x)$.</li> <li>$\forall x[x\neq\varnothing \to \exists y\in x\forall z\in x(z\notin y)]$</li> </ul> </li> <li><strong>The axiom of Pairing</strong>: Given two sets, there is a set containing exactly them. <ul> <li>$\forall x\forall y\exists w[x\in w\wedge y\in w\wedge \forall z(z\in w\to z=x\vee z=y)]$.</li> <li>We can let $\{x,y\}$ denote this set; use $\{x\}$ denote $\{x,x\}$.</li> <li>We can define ordered pair $(x,y) = \{x, \{x, y\}\}$.</li> </ul> </li> <li><strong>The axiom of Union</strong>: Given any set of sets $x$, there is set containing exactly all the element of these sets. <ul> <li>Let $y=\bigcup x$ denote $\forall z[z\in y\leftrightarrow \exists w\in x(y\in w)]$.</li> <li>$\forall x\exists y[y = \bigcup x]$.</li> </ul> </li> <li><strong>The axiom of Nullset</strong>: There is a set with no elements. <ul> <li>$\exists x[x = \varnothing]$</li> </ul> </li> <li><strong>The axiom of Infinity</strong>: There exists an <strong>inductive</strong> set. <ul> <li>$\exists x[\varnothing\in x\wedge\forall y(y\in x\to y\cup\{y\}\in x) ]$.</li> </ul> </li> <li><strong>The axiom of Powerset</strong>: For every set x, there is a set containing all the subsets of this set. <ul> <li>Let $y\subseteq x$ abbreviate $\forall z(z\in x\to z\in y)$.</li> <li>Let $y=\mathcal{P}(x)$ abbreviate $\forall z(z\in y\leftrightarrow z\subseteq x)$.</li> <li>$\forall x\exists y[ y=\mathcal{P}(x)]$</li> </ul> </li> <li><strong>The axiom schema of Separation</strong>: Every definable subset of a set exists. <ul> <li>$\forall x,\bar{w}\exists y\forall z[z\in y\leftrightarrow z\in x\wedge\phi(z, \bar{w})]$</li> <li>The collection of a formula is a <strong>class</strong> $Y = \{z: \phi(z,\bar{w})\}$.</li> <li>Let $z\in Y$ abbreviate $\phi(z,\bar{w})$.</li> <li>Note that a <em>proper</em> class $Y$ “does not exist”: $\neg \exists y\forall x[x\in y\leftrightarrow x\in Y]$.</li> <li>We can define cartesian products, relations and functions with this axiom schema.</li> </ul> </li> <li><strong>The axiom schema of Collection</strong>: If $x$ is a set, and $\phi$ defines a class from each element of $x$, then there is a set which meets all these classes. <ul> <li>$\forall x,\bar{v}\exists y\forall z\in x[ \exists w\phi(w,z,\bar{v})\to \exists w\in y\phi(w,z,\bar{v})]$.</li> </ul> </li> <li><strong>The axiom schema of Replacement</strong>: The image of a set under a definable class function is a set. <ul> <li>$\forall x,\bar{w}[((\forall a\in x)(\exists! y)\phi(a,y,\bar{w}))\to \exists z \forall y(y\in z\leftrightarrow (\exists a\in x)\phi(a,y,\bar{w})) ]$.</li> <li>This is equivalent to Seperation + Collection in ZF-(Sep+Col).</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Axiom_of_choice">Axiom of Choice (AC)</a></strong>: Every set of pairwise disjoint nonempty sets has a “choice set”.</p> <p>AC has many equivalent theorems. AC also has some pathological results. However, without a choice axiom, one cannot prove a countble union of countble set is countble. People developed ZF+<a href="https://en.wikipedia.org/wiki/Axiom_of_dependent_choice">DC</a>+<a href="https://en.wikipedia.org/wiki/Axiom_of_determinacy">AD</a> for set theory models that AC is not satisfied. AD is incompatible with AC.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Well-order">Wellordering</a></strong>:</p> <ul> <li>A <strong>strict partial order</strong> is a irreflecive and transitive binary relation.</li> <li>A strict partial order is <strong>linear</strong> if any two elements are comparable.</li> <li>A strict partial order is <strong>wellfounded</strong> if every subset contains a minimal element.</li> <li>A <strong>wellordering</strong> is linear wellfounded strict partial order.</li> <li><em>Theorem</em>: Given two wellorderings, either they two are isomorphic, or one is isomorphic to an initial segment of the other. Furthermore, the isomorphism is unique.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Ordinal_number">Ordinals</a></strong>:</p> <ul> <li>A set $x$ is <strong>transitive</strong> if forall $a\in x$, $b\in a$ implies $b\in x$. <ul> <li>Note: $\in$ is not necessarily a transitive relation on $x$. For exmaple, $\{\varnothing, \{\varnothing\}, \{\{\varnothing\}\}\}$.</li> </ul> </li> <li>An <strong>ordinal</strong> is a transitive set $\alpha$ s.t. $\in$ on $\alpha$ is a wellordering.</li> <li>Let $\text{ORD}$ denote the class of ordinals.</li> <li>Define $\alpha &lt; \beta$ iff $\alpha \in \beta$. Then, $&lt;$ is a wellordering on $\text{ORD}$.</li> <li>Any wellordered set $P$ is isomorphic to a unique ordinal, called the <strong>order type</strong> of $P$.</li> <li>Let $\alpha+1 = \alpha\cup\{\alpha\}$. <ul> <li>If $\alpha = \beta+1$, $\alpha$ is a <strong>successor ordinal</strong>;</li> <li>Otherwise, $\alpha$ is a <strong>limit ordinal</strong>.</li> </ul> </li> <li>Let $\omega$ be the intersection of all inductive sets. Then, $\omega$ is the least non-zero limit ordinal. The elements of $\omega$ are <strong>natural numbers</strong>. Let $0 := \varnothing$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Transfinite_induction">Transfinite Induction</a></strong>:</p> <ul> <li>A class $C$ is equal to ORD if it satisfies the following: <ul> <li>$0\in C$;</li> <li>For all $\alpha$, $\alpha\in C\to \alpha+1\to C$;</li> <li>If $\lambda$ is a (non-zero) limit ordinal, $(\forall\alpha&lt;\lambda) \alpha\in C\to\lambda\in C$.</li> </ul> </li> <li>Let $G$ be a class function. Then, there is a unique class function $F$ s.t. for all $\alpha\in\text{ORD}$, $F(\alpha) = G(F\restriction\alpha)$. <ul> <li><em>Note</em>: Generally, people prove uniqueness first, and then existence.</li> </ul> </li> <li>We can define ordinal addition, multiplication and exponentiation by recursion.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Ordinal_arithmetic#Cantor_normal_form">Cantor normal form</a></strong>: For any ordinal $\alpha$, there <em>uniquely</em> exist natural numbers $n,k_0,\ldots, k_{n-1}$ and ordinals $\alpha \geq \beta_0 &gt; \cdots &gt; \beta_{n-1}$ s.t.</p> $\alpha = \omega^{\beta_0}\cdot k_0 + \cdots + \omega^{\beta_{n-1}}\cdot k_{n-1}$ <p><em>Note</em>: It’s possible that $\alpha = \omega^{\alpha}$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Von_Neumann_universe">The cumulative hierarchy</a></strong>:</p> <ul> <li>For each ordinal $\alpha$, define set $V_{\alpha}$ as: <ul> <li>$V_0 = \varnothing$;</li> <li>$V_{\alpha+1} = \mathcal{P}(V_{\alpha})$, for any $\alpha$;</li> <li>$V_{\lambda} = \bigcup_{\beta&lt;\lambda}V_{\beta}$, for $\lambda$ a limit ordinal.</li> </ul> </li> <li>Let the class $V$ be the union of all $V_{\alpha}$.</li> <li>For $x\in V$, define its <strong>rank</strong> to be <em>the least ordinal</em> $\alpha$ that $x\in V_{\alpha}$. <ul> <li>One can verify that $\mathrm{rank}(x) = \sup\{\mathrm{rank}(y)+1: y\in x\}$</li> </ul> </li> <li>The <strong>transitive closure</strong> of $x$ is defined as the smallest transtive set containing $x$, <ul> <li>For any $x$, let $x_0 = x$ and $x_{n+1} = x_n\cup\bigcup x_n$. Then, the transitive closure is $\mathrm{TC}(x)=\bigcup_{n&lt; \omega} x_n$.</li> </ul> </li> <li>Assume $\text{ZF}^-$ (ZF minus Foundation), then the axiom of foundation is equivalent to $\forall x(x\in V)$. <ul> <li><em>Note</em>: $x\in V$ is actually the shorthand for $\exists \alpha[\alpha\in\mathrm{ORD} \wedge x\in V_{\alpha}]$.</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Mostowski_collapse_lemma">Mostowski Collapse</a></strong>: If a relation $R$ on a class $X$ is <em>wellfounded</em>, <em>setlike</em> and <em>extensional</em>, then there exists a unique transitive class $Y$ and a unique isomorphism $(X,R)\cong (Y,\in)$.</p> <ul> <li><strong>wellfounded</strong>: For every non-empty subset of $X$, there exists an $R$-minimal element.</li> <li><strong>setlike</strong>: $R^{-1}(x)$ is always a set;</li> <li><strong>extensional</strong>: $R^{-1}(x) = R^{-1}(y)$ iff $x=y$.</li> </ul> <h2 id="cardinal-arithmetic">Cardinal Arithmetic</h2> <p><strong>Equivalence of AC</strong>:</p> <ul> <li><strong><a href="https://en.wikipedia.org/wiki/Zorn%27s_lemma">Zorn’s lemma</a></strong>: Given a strict partial order, if every chain has an upper bound, then there is a maximal element.</li> <li><strong><a href="https://en.wikipedia.org/wiki/Hausdorff_maximal_principle">Hausdorff Maximality principle</a></strong>: Every partial ordered set has a maximal chain.</li> <li><strong><a href="https://en.wikipedia.org/wiki/Well-ordering_theorem">Zermelo’s wellordering theorem</a></strong> or $\mathrm{AC}^+$: Every set can be well-ordered. <ul> <li><em>Note</em>: Whether there exists a <em>computable</em> well-ordering of $\mathbb{R}$ is independent of ZFC.</li> </ul> </li> <li>The cardinalities of any pair of sets are comparable.</li> <li>For all infinite sets $X,Y$, $|X|+|Y| = |X|\times |Y|$.</li> </ul> <p>AC also implies the <strong>partition principle</strong>: If there is a surjection from $X$ to $Y$, then there is an injection from $Y$ to $X$. But whether this principle implies AC is still open.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Schr%C3%B6der%E2%80%93Bernstein_theorem">Cantor-Shröder-Bernstein theorem</a></strong>:</p> <ul> <li>Define $|X| = |Y|$ if there is a bijection.</li> <li>Define $|X| \leq |Y|$ if there is an injection from $X$ to $Y$.</li> <li>(ZF) If $|X| \leq |Y|$ and $|Y| \leq |X|$, then $|X| = |Y|$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Cardinal_number">Cardinal</a></strong>:</p> <ul> <li>An ordinal $\alpha$ is a cardinal if for all $\beta&lt;\alpha$, $|\beta| &lt; |\alpha|$.</li> <li>Let $\omega_{\alpha}$ denote the $\alpha$-th infinite cardinal; let $\aleph_{\alpha}$ denote its cardinality.</li> <li>Let $\kappa^+$ denote the least cardinal of cardinality greater than $\kappa$. <ul> <li>Similarly, $\kappa^+$ is a <strong>successor cardinal</strong>; A non-zero cardinal not in this form is a <strong>limit cardinals</strong>.</li> </ul> </li> <li>$\aleph_{\alpha} + \aleph_{\beta} = \aleph_{\alpha} \cdot \aleph_{\beta} = \max\{\aleph_{\alpha}, \aleph_{\beta}\}$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Hartogs_number">Hartogs’s theorem</a></strong>:</p> <ul> <li>(ZF) For any set $X$, there is an ordinal that cannot be injected into $X$.</li> <li>This ordinal is called <strong>Hartog’s number</strong> $h(X)$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Cofinal_(mathematics)">Cofinality</a></strong>:</p> <ul> <li>Given an ordinal $\alpha$ and $C\subseteq\alpha$, if there does not exist a $\beta\in\alpha$ that is strict greater than all elements of $C$, then $C$ is <strong>cofinal</strong> in $\alpha$. <ul> <li>Since no one cares successor ordinals, an equivalent definition is $\sup C = \alpha$.</li> </ul> </li> <li>The <strong>cofinality</strong> $\mathrm{cf}(\alpha)$ is the minimum of order-types of the cofinal subsets of $\alpha$.</li> <li>For all $\alpha$, $\mathrm{cf}(\mathrm{cf}(\alpha)) = \mathrm{cf}(\alpha)$.</li> <li>For all $\alpha$, $\mathrm{cf}(\alpha)$ is a cardinal not greater than $\alpha$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Regular_cardinal">Regular cardinals</a></strong>:</p> <ul> <li>A cardinal $\kappa$ is <strong>regular</strong> if $\mathrm{cf}(\kappa)=\kappa$; otherwise, <strong>singular</strong>.</li> <li>(ZFC) Every successor cardinal is regular.</li> <li>$\kappa &lt; \kappa^{\mathrm{cf}(\kappa)}$.</li> <li>(ZFC) $\mathrm{cf}(2^\kappa) &gt; \kappa$.</li> <li>The <strong><a href="https://en.wikipedia.org/wiki/Gimel_function">gimel function</a></strong> of an infinite cardinal $\kappa$ is: $\gimel(\kappa) = \kappa^{\mathrm{cf}(\kappa)}$. <ul> <li><em>Note</em>: If $\kappa$ is regular, $\gimel(\kappa) = \kappa^\kappa = 2^\kappa$.</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/K%C3%B6nig%27s_theorem_(set_theory)">König’s theorem</a></strong>: Suppose $\kappa_i &lt; \lambda_i$ for all $i\in I$. Then,</p> $\sum_{i\in I} \kappa_i &lt; \prod_{i\in I} \lambda_i$ <p><strong>Cardinal powers</strong>: Suppose $\kappa, \lambda$ are infinite cardinals, then $\kappa^{\lambda}$ can be one of the following:</p> <ul> <li>If $\kappa &lt; \lambda$, then $\kappa^{\lambda} = 2^{\lambda}$.</li> <li>If $\kappa &gt; \lambda$ and $(\exists \mu&lt;\kappa)\mu^{\lambda} \geq \kappa$, then $\kappa^{\lambda} = {\mu}^{\lambda}$.</li> <li>If $\kappa &gt; \lambda$ and $(\forall \mu&lt;\kappa)\mu^{\lambda} &lt; \kappa$, <ul> <li>if $\mathrm{cf}(\kappa) &gt; \lambda$, then $\kappa^{\lambda} = \kappa$.</li> <li>if $\mathrm{cf}(\kappa) \leq \lambda$, then $\kappa^{\lambda} = \kappa^{\mathrm{cf}(\kappa)}$.</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Continuum_hypothesis#The_generalized_continuum_hypothesis">Generalized continuum hypothesis (GCH)</a></strong>: GCH states that $2^{\kappa} = \kappa^+$ for all infinite $\kappa$.</p> <ul> <li>GCH implies that for $\kappa, \lambda$ infinite cardinals <ul> <li>If $\lambda &lt; \mathrm{cf}(\kappa)$, $\kappa^{\lambda} = \kappa$.</li> <li>If $\mathrm{cf}(\kappa)\leq \lambda &lt;\kappa$, $\kappa^{\lambda} = \kappa^+$.</li> <li>If $\kappa \leq \lambda$, $\kappa^{\lambda} = \lambda^+$.</li> </ul> </li> </ul> <p><strong>Cardinal exponentiation</strong>: Define $\kappa^{&lt;\lambda} = \bigcup_{\mu&lt;\lambda}\kappa^{\mu}$. Suppose $\kappa$ is an infinite cardinal.</p> <ul> <li>If $\kappa$ is regular, then $2^{\kappa} = \gimel(\kappa)$.</li> <li>If $\kappa$ is singular and $2^{\lambda}$ is eventually constant as $\lambda\to\kappa$ (so $2^{&lt;\kappa}$ is this constant value), then $2^{\kappa} = 2^{&lt;\kappa}$.</li> <li>If $\kappa$ is singular and $2^{\lambda}$ is not eventually constant as $\lambda\to\kappa$, then $2^{\kappa} = \gimel(2^{&lt;\kappa})$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Singular_cardinals_hypothesis">Singular cardinals hypothesis (SCH)</a></strong>: If $\kappa$ singular and $2^{\mathrm{cf}(\kappa)} &lt; \kappa$, then $\kappa^{\mathrm{cf}(\kappa)} = \kappa^+$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Inaccessible_cardinal">Inaccessible cardinals</a></strong>:</p> <ul> <li>A (weak) <strong>limit cardinal</strong> is a cardinal that is neither a successor cardinal nor zero.</li> <li>A <strong>strong limit cardinal</strong> $\kappa$ is not reachable by power: for all $\lambda&lt;\kappa$, $2^{\lambda} &lt; \kappa$.</li> <li>A <em>regular limit</em> cardinal is <strong>weakly inaccessible</strong>. <ul> <li>If $\kappa$ is an uncountable weakly inaccessible cardinal, then $L_{\kappa}$ is a model of ZFC.</li> </ul> </li> <li>An uncountable <em>regular</em> cardinal $\kappa$ is <strong>strongly inaccessible</strong> if it is a strong limit: for all $\lambda&lt;\kappa$, $2^{\lambda} &lt; \kappa$. <ul> <li>Then, $V_{\kappa}$ is a model of ZFC.</li> </ul> </li> <li>ZFC cannot prove the existence of any uncountable inaccessible cardinal.</li> </ul> <h2 id="infinitary-combinatorics">Infinitary Combinatorics</h2> <p><strong><a href="https://en.wikipedia.org/wiki/Filter_(mathematics)">Filters</a></strong>:</p> <ul> <li>A <strong>filter</strong> $F$ on a set $X$ is a collection of subsets of $X$ s.t. <ul> <li>$F$ is upward closed under $\subseteq$: $(A\in F\wedge B\subseteq X\wedge A\subseteq B)\to B\in F$.</li> <li>$F$ is non-empty: $X\in F$.</li> <li>$F$ is closed under finite intersections: $(A\in F\wedge B\in F)\to A\cap B\in F$.</li> </ul> </li> <li>$F$ is <strong>non-trivial</strong> if $\varnothing\notin F$.</li> <li>Dually, we can define <strong>ideal</strong> $I$ as a collection of subsets s.t. $I$ is downward closed, non-empty and closed under finite union.</li> <li>Given a filter $F$, its <strong>dual ideal</strong> is $\{X\setminus A: A\in F\}$.</li> <li>Given an ideal $I$ and its dual filter $F$, a set $A\subseteq X$ is $I$<strong>-positive</strong>/$F$<strong>-positive</strong> if $A\notin I$. <ul> <li>A filter is a collection of “large” sets; an ideal is “small”; $I$-positive sets are “not small”.</li> </ul> </li> <li>For $S\subseteq\mathcal{P}(X)$, then $F = \{A\subseteq X: (\exists B\in S)(B\subseteq A)\}$ is the <strong>filter generated by</strong> $S$.</li> <li>A filter is $\kappa$<strong>-complete</strong> if it is closed under intersections of size less than $\kappa$.</li> <li>A <strong><a href="https://en.wikipedia.org/wiki/Ultrafilter">ultrafilter</a></strong> is a filter s.t. for all $A\subseteq X$, exactly one of $A\in F$ or $(X\setminus A)\in F$. <ul> <li>Ultrafilters are exactly the filters maximal w.r.t. $\subseteq$.</li> </ul> </li> <li>(The ultrafilter lemma) (ZFC) every filter can be exteneded into an ultrafilter.</li> <li>An ultrafilter is <strong>principal</strong> if there is $A\subseteq X$ s.t. $B\in F$ iff $A\subseteq B$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Ultralimit">Ultralimits</a></strong>:</p> <ul> <li>Suppose $F$ is a filter on $\omega$, $\langle a_n: n\in\omega \rangle$ is a sequence of real numbers. Then $\lim_F a_n = x$ iff for all $\varepsilon&gt;0$, $\{n: |a_n - x| &lt; \varepsilon \}\in F$.</li> <li>This limit satisfies all the usual limit laws.</li> <li>If $U$ is a non-principal ultrafilter on $\omega$, then any bounded sequence has an ultralimit $\lim_U a_n$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Measurable_cardinal">Measurable cardinal</a></strong>:</p> <ul> <li>A cardinal $\kappa$ is <strong>measurable</strong> if there is a $kappa$-complete ultrafilter on $\kappa$.</li> <li>A cardinal $\kappa$ is called <strong>real-valued measurable</strong> if there is a $\kappa$-additive probability measure on the power set of $\kappa$ that vanishes on singletons.</li> <li>An <strong>atomless</strong> measurement $\mu$ on $\kappa$ is that, for every $A\subseteq\kappa$ with $\mu(A)&gt; 0$, there is $B\subseteq A$ with $0 &lt; \mu(B) &lt; \mu(A)$.</li> <li>(<a href="https://en.wikipedia.org/wiki/Ulam_matrix">Ulam</a>) There are two cases for a real-valued measurable $\kappa$: <ol> <li>$\kappa \leq 2^{\aleph_0}$, there is an atomless measurement on $\kappa$. There is also a measure on the full powerset $\mathcal{P}([0, 1])$ extending Lebesgue measure.</li> <li>$\kappa$ is measurable. $\kappa &gt; 2^{\aleph_0}$ (actually a strong limit). Every measurement on $\kappa$ has an atom which yields a $\kappa$-additive $\kappa$-complete ultrafilter.</li> </ol> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Ultraproduct">Ultraproducts</a></strong>: Suppose $\langle M_i: i\in I \rangle$ are $\mathcal{L}$ structures, and $U$ is an ultrafilter on $I$. Let $X$ be the set of $I$-sequences that pick one element from each structure:</p> $X = \{ f: \mathrm{dom}(f)=I\wedge (\forall i\in I)(f(i) \in M_i) \}$ <p>Let $\sim$ be the equivalence relation on $X$ where $f\sim g$ iff they coincide at the ultrafilter $\{i: f(i)=g(i)\}\in U$. Now, let the <strong>ultraproduct</strong> $\prod_U M_i$ be the structure on universe $X/\sim$:</p> <ul> <li>For each constant $c$ of $\mathcal{L}$, let $c^{\prod_U M_i} = [i\mapsto c^{M_i} ]_{\sim}$;</li> <li>For each relation $R$ of $\mathcal{L}$, let $R^{\prod_U M_i}([f_1 ]_{\sim},\ldots,[f_n ]_{\sim})$ be true iff $\{i: R^{M_i}(f_1(i),\ldots,f_n(i))\}\in U$;</li> <li>For each function $g$ of $\mathcal{L}$, let $g^{\prod_U M_i}([f_1 ]_{\sim},\ldots,[f_n ]_{\sim})$ be $[i\mapsto g^{M_i}(f_1(i),\ldots,f_n(i)) ]_{\sim}$.</li> </ul> <p>If $M_i = M$ for all $i\in I$, then $\prod_ U M$ is called the <strong>ultrapower</strong> of $M$ by $U$.</p> <p><strong><a href="https://proofwiki.org/wiki/%C5%81o%C5%9B%27s_Theorem">Łoś’s Theorem</a></strong>: For any $\mathcal{L}$-formula $\varphi$ and $[f_i ]_{\sim}\in X/\sim$, $\prod_ U M_i\models \varphi([f_1 ]_{\sim},\ldots, [f_n ]_{\sim})$ iff $\{i: M_i\models \varphi(f_1(i),\ldots,f_n(i)) \}\in U$. In particular, the ultraproduct models a sentence iff there is a “large” number of models modeling it.</p> <p><strong><a href="https://ncatlab.org/nlab/show/Keisler-Shelah+isomorphism+theorem">Keisler-Shelah isomorphism theorem</a></strong>: Two structures $M$ and $N$ are elementarily equivalent $M\equiv N$ iff there is an ultrafilter $U$ s.t. the two ultraproducts are isomorphic $\lim_U M \cong \lim_U N$.</p> <p><strong>Asympototic Cone</strong>:</p> <ul> <li>Suppose $\langle X_n, d_n: n\in\omega \rangle$ is a sequence of metric spaces and $x_n\in X_n$ are base points. Let $X$ be the set of sequences $\langle a_n\in X_n: n\in\omega\rangle$ where the sequence $d_n(a_n, x_n)$ is bounded. Let $\sim$ be the equivalence relation that $\langle a_n\rangle \sim \langle b_n\rangle$ if $\lim_U d_n(a_n, b_n) = 0$. Let $d$ be the metric on $X/\sim$ where $d(\langle a_n\rangle, \langle b_n\rangle) = \lim_U d_n(a_n, b_n)$. We define <strong>ultraproduct</strong> $\prod_U(X_n, d_n, x_n)$ to be $(X/\sim, d)$.</li> <li>Suppose $(X, d)$ is a given metric space with a base point $p\in X$. The <strong>asymptotic cone</strong> of $(X, d)$ is $\prod_U(X, d/n)$.</li> <li>The asymptotic cone is a way of viewing $(X, d)$ from “infinitely far away”.</li> <li>The asymptotic cone of $\mathbb{Z}$ with usual metric is isomorphic to $\mathbb{R}$.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Club_set">Club</a></strong>:</p> <ul> <li>Suppose $\lambda$ a limit ordinal (generally a <em>regular uncountable cardinal</em>) and $C\subseteq \lambda$. <ul> <li>$C$ is <strong>closed</strong> if it contains all its limit points less than $\lambda$. Formally, for all $\nu&lt;\lambda$, $C\cap \nu$ being cofinal in $\nu$ implies $\nu\in C$.</li> <li>$C$ is <strong>unbounded</strong> if for all $\alpha&lt;\lambda$, there exists $\beta\in C$ s.t. $\beta\not\leq \alpha$.</li> <li>A closed unbounded subset is called a <strong>club set</strong>.</li> </ul> </li> <li>Suppose $\lambda$ a limit ordinal with $\mathrm{cf}(\lambda)&gt;\omega$. Then, the intersection of finitely many clubs is a club.</li> <li>Suppose $\lambda$ a cardinal with $\mathrm{cf}(\lambda)&gt;\omega$. Then, for all $\beta&lt; \mathrm{cf}(\lambda)$, the intersection of $\beta$ many clubs $\cap_{\alpha&lt; \beta}C_{\alpha}$ is a club.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Club_filter">Club filter</a></strong>: Suppose $\kappa$ is a regular uncountable cardinal.</p> <ul> <li>The filter generated by the club sets on $\kappa$ is called the <strong>club filter</strong> on $\kappa$.</li> <li>The club filter is $\kappa$-complete.</li> <li>A subset of $\kappa$ is <strong>stationary</strong> if it intersects with every club. In other words, club filter positive.</li> <li>The dual ideal of the club filter is the <strong>nonstationary ideal</strong>.</li> <li><em>Note</em>: a set in the club filter is not necessarily a club. It’s a set containing club.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Diagonal_intersection">Diagonal intersection</a></strong>: Let $\langle X_{\alpha}: \alpha&lt;\kappa \rangle$ be a sequence of $\kappa$ many subsets of $\kappa$. Their <strong>diagonal intersection</strong> is defined to be</p> $\bigtriangleup_{\alpha&lt; \kappa} X_{\alpha} = \left\{ \beta&lt; \kappa: \beta\in\bigcap_{\alpha&lt; \beta} X_{\alpha} \right\} = \bigcap_{\alpha &lt; \kappa}\left( X_{\alpha}\cup\{\beta: \beta\leq\alpha\} \right)$ <ul> <li>The diagonal intersection of $\kappa$ many clubs is a club.</li> <li>A filter is called <strong>normal</strong> if it’s closed under diagonal intersection. The club filter is normal.</li> <li>A non-trivial normal $\kappa$-complete filter on a regular uncountable cardinal $\kappa$ includes every club of $\kappa$.</li> </ul> <p><strong>Properties of <a href="https://en.wikipedia.org/wiki/Stationary_set">stationary sets</a></strong>:</p> <ul> <li>Given a club $C$ and stationary set $S$, $C\cap S$ is stationary.</li> <li>A stationary set is unbounded.</li> <li>If $S$ is stationary, then $\{\lambda\in S: \lambda\text{ is a limit ordinal}\}$ is also stationary.</li> <li>If we partition a stationary set $S\subseteq\kappa$ into $\lambda&lt;\kappa$ many sets, then one of them will be stationary.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Fodor%27s_lemma">Fodor’s lemma</a></strong>:</p> <ul> <li>An ordinal function $f$ on a set $S$ is <strong>regressive</strong> if $f(\alpha) &lt; \alpha$ for every $\alpha\in S$, $\alpha &gt; 0$.</li> <li>If $f$ is a regressive function on a stationary set $S$ of $\kappa$, then there is a stationary subset that fixed $f$. Formally, there exists $\gamma &lt; \kappa$ and a stationary set $T\subseteq S$ s.t. $f(\alpha) = \gamma$ for all $\alpha\in T$.</li> <li><em>Example</em>: Suppose there is a train passing through an infinite number of stations. At every station, there is exactly 1 person getting off, and then finitely many people getting on. We know that after $\omega$ stations there could be arbitrarily many people on the train. Fodor’s lemma shows that after $\omega_1$ stations, there must be 0 people on it.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Sunflower_(mathematics)">The ∆-system lemma</a></strong>:</p> <ul> <li>Suppose $X$ is a collection of sets. If for all distinct $A,B\in X$, $A\cap B = r$, we call $X$ a $\Delta$<strong>-system with root</strong> $r$.</li> <li>Suppose $X$ is an <em>uncountable</em> set of <em>finite</em> sets. Then there are an uncountable subset $X’\subseteq X$ and a finite set $r$ s.t. $X’$ is a $\Delta$-system with root $r$.</li> <li>Suppose $\kappa &gt; \lambda$ are infinite regular cardinals s.t. for all $\delta&lt;\kappa$, $\delta^{&lt; \lambda} &lt; \kappa$. Let $X$ be a collection of $\kappa$ many sets of cardinality less than $\lambda$. Then, there is a subset $X’$ of size $\kappa$ which is a $\Delta$-system.</li> </ul> <p><strong>Silver’s Theorem</strong>:</p> <ul> <li>If $\kappa$ is a singular cardinal of uncountable cofinality and $2^{\lambda} = \lambda^+$ for a stationary set of $\lambda &lt; \kappa$, then $2^{\kappa} = \kappa^{+}$.</li> <li>If SCH holds for all singular cardinals of cofinality $\omega$, then it holds for all singular cardinals.</li> </ul> <p><strong>Infinite trees</strong>:</p> <ul> <li>A tree $T$ is a $\kappa$-tree if its height is $\kappa$ and every level has size less than $\kappa$.</li> <li><strong><a href="https://en.wikipedia.org/wiki/K%C5%91nig%27s_lemma">Kőnig’s lemma</a></strong>: An $\omega$-tree has an infinite branch.</li> <li><strong><a href="https://en.wikipedia.org/wiki/Aronszajn_tree">Aronszajn tree</a></strong>: There is a $\omega_1$-tree with no branches of order type $\omega_1$.</li> <li><strong><a href="https://en.wikipedia.org/wiki/Suslin%27s_problem">Suslin line</a></strong> <ul> <li>A linear order $&lt;$ on $X$ is <strong>dense</strong> if for all $a &lt; b$ there exists $c\in X$ s.t. $a &lt; c &lt; b$.</li> <li>A subset $A\subseteq X$ is <strong>dense in</strong> $X$ if for all $a &lt; b$ there exists $c\in A$ s.t. $a &lt; c &lt; b$.</li> <li>A linear order $&lt;$ is <strong>complete</strong> if any set with an upper bound has a least one, and any set with a lower bound has a greatest one.</li> <li>An <strong>endpoint</strong> is a maximum or minimum element.</li> <li>Any two countable dense linear orders without endpoints are isomorphic.</li> <li>$\mathbb{R}$ is isomorphic to any complete dense linear order without endpoints that has a countable dense subset.</li> <li>A linear order on $X$ is a <strong>Suslin line</strong> if $&lt;$ is a complete dense linear order without endpoints s.t. every set of disjoint open intervals is countable, but $&lt;$ has no countable dense set (thus not isomorphic to $\mathbb{R}$).</li> </ul> </li> <li><strong><a href="https://en.wikipedia.org/wiki/Suslin_tree">Suslin tree</a></strong> <ul> <li>A <strong>Suslin tree</strong> is an $\omega_1$-tree such that every chain and antichain is countable.</li> <li>There is a Suslin line iff there is a Suslin tree.</li> <li>Suslin tree is independent of ZFC.</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Diamond_principle">Diamond</a></strong>:</p> <ul> <li>A $\diamondsuit$<strong>-sequence</strong> is a sequence $\langle A_{\alpha}:\alpha&lt;\omega_1\rangle$ where $A_{\alpha}\subseteq\alpha$ s.t. for all $X\subseteq\omega_1$, $\{\alpha: X\cap\alpha = A_{\alpha}\}$ is stationary.</li> <li>$\diamondsuit$ is the statement that there exists a $\diamondsuit$-sequence.</li> <li>$\diamondsuit$ implies CH.</li> <li>$\diamondsuit$ implies there is a Suslin tree.</li> <li>A $\diamondsuit^+$<strong>-sequence</strong> is a sequence $\langle \mathcal{A}_{\alpha}:\alpha&lt;\omega_1\rangle$ where $\mathcal{A}_{\alpha}\subseteq\mathcal{P}(\alpha)$ s.t. for all $X\subseteq\omega_1$, there exists a club $C\subseteq\omega_1$ s.t. for all $\alpha\in C$, both $X\cap \alpha$ and $C\cap\alpha$ are in $\mathcal{A}_{\alpha}$.</li> <li>A $\diamondsuit^-$<strong>-sequence</strong> is a sequence $\langle \mathcal{A}_{\alpha}:\alpha&lt;\omega_1\rangle$ where $\mathcal{A}_{\alpha}\subseteq\mathcal{P}(\alpha)$ s.t. for all $X\subseteq\omega_1$, $\{\alpha: X\cap\alpha \in \mathcal{A}_{\alpha}\}$ is stationary.</li> <li>$\diamondsuit^+$ is the statement that there exists a $\diamondsuit^+$-sequence; $\diamondsuit^-$ is the statement that there exists a $\diamondsuit^-$-sequence.</li> <li>$\diamondsuit^+$ implies $\diamondsuit$; $\diamondsuit^-$ is equivalent to $\diamondsuit$.</li> </ul> <h2 id="constructible-hierarchy">Constructible Hierarchy</h2> <p><strong><a href="https://plato.stanford.edu/entries/paradox-skolem/">Skolem’s paradox</a></strong>:</p> <ul> <li>By the downward L-S theorem, there exists a countable model of ZFC. However, ZFC proves there exist uncountable ordinals.</li> <li>This is not a real paradox, because given the same sets $A, B$, it is possible that in $V$ there exists a bijection from $A$ to $B$, but an inner model lacks this bijection.</li> <li>Cardinality is not absolute.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Transitive_model">Transitive model</a></strong>:</p> <ul> <li>(<em>Theorem</em>) By compactness, there exists a model $(M, E)$ of ZFC s.t. $E$ is not well-founded. <ul> <li>Such models are not necessarily recursive.</li> </ul> </li> <li>(<em>Theorem</em>) By Mostowski collapse, every well-founded model $(M, E)$ is isomorphic to some $(N, \in\restriction N)$.</li> <li>A model $(N, \in\restriction N)$ when $N$ is a transitive set is called a <strong>transitive model</strong>. <ul> <li>A countable transitive model is often abbreviated as <strong>ctm</strong>.</li> </ul> </li> <li><em>Note</em>: ZFC+Con(ZFC) does not imply there is a well-founded set model of ZFC.</li> </ul> <p><strong>Models relation</strong>:</p> <ul> <li>By Tarski’s undefinability of truth, $\models$ is a well-defined relation only if $N$ is a set.</li> <li>For any sentence $\phi$, let $\phi^N$ denote the sentence where all quantifiers $\forall x$ and $\exists x$ are replaced with $\forall x\in N$ and $\exists x\in N$. <ul> <li>For any transitive set model, $N\models \phi$ iff $\phi^N$ is true.</li> <li>When $N$ is a proper class, we use $N\models \phi$ to abbreviate $\phi^N$.</li> </ul> </li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/L%C3%A9vy_hierarchy">The Lévy hierarchy</a></strong>: Similar to arithmetic hierarchy, every formula in set theory is categorized into $\Sigma_n$, $\Pi_n$ and $\Delta_n$.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Absoluteness">Absoluteness</a></strong>:</p> <ul> <li>If $\phi(v_1,\ldots, v_n)$ is a $\Delta_0$ formula and $M$ is a transitive class. Then for all $x_1,\ldots,x_n\in M$, we have $\phi(x_1,\ldots, x_n)$ iff $\phi^M(x_1,\ldots, x_n)$.</li> <li>Suppose $N$, $M$ are transitive classes and $N\subseteq M$. <ul> <li>If $\phi$ is $\Sigma_1$, then $\phi^N$ implies $\phi^M$. (upward absoluteness)</li> <li>If $\phi$ is $\Pi_1$, then $\phi^M$ implies $\phi^N$. (downward absoluteness)</li> <li>If $\phi$ is $\Delta_1$, then $\phi^M\leftrightarrow\phi^N$.</li> </ul> </li> </ul> <p><strong>The model</strong> $V_\alpha$:</p> <ul> <li>If $\alpha$ is a limit ordinal, then $V_\alpha$ is a model of Extensionality, Foundation, Pairing, Union, Nullset, Separation, and Powerset.</li> <li>If $\alpha&gt;\omega$, then $V_\alpha$ is a model of Infinity.</li> <li>If $\alpha$ is a strongly inaccessible cardinal, then $V_\alpha$ is a model of Collection.</li> </ul> <p><strong>The model</strong> $H_\kappa$:</p> <ul> <li>If $\kappa$ is a cardinal, then $H_\kappa$ is the collection of sets whose transitive closure has size less than $\kappa$.</li> <li>For every regular cardinal $\kappa$, $H_\kappa$ is a model of ZFC - Powerset.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Reflection_principle">The reflection theorem</a></strong></p> <ul> <li>Suppose $\langle M_{\alpha}: \alpha\in\text{ORD}\rangle$ is a <strong>cumulative hierarchy</strong>: <ul> <li>$M_{\alpha}$ is transitive.</li> <li>For all $\alpha &lt; \beta$, $M_{\alpha}\subseteq M_{\beta}$.</li> <li>(Continuous) If $\lambda$ is a limit, $M_{\lambda} = \bigcup_{\alpha &lt; \lambda} M_{\alpha}$.</li> </ul> </li> <li>Let $M = \bigcup M_{\alpha}$. Then for every formula $\varphi$, there exists a closed and unbounded set of ordinals $\alpha$, s.t. for all sets $\bar{X}\in M_{\alpha}$, $M_{\alpha}\models\varphi(\bar{x})$ iff $M\models\varphi(\bar{x})$.</li> <li><em>Corollary</em>: For every finite set of axioms of ZFC, ZFC proves that there is a ctm of these axioms. <ul> <li><em>Note</em>: One must give the axioms first, and then find the set.</li> <li><em>Note</em>: It is “For all …, ZFC proves that …”, but not “ZFC proves that for all …”. Therefore, the compactness theorem does not work here.</li> </ul> </li> <li><em>Corollary</em>: ZFC is not finitely axiomatizable.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Constructible_universe">Gödel’s</a></strong> $L$:</p> <ul> <li>For a set $M$, let $\mathrm{Def}(M)$ be the set of definable sets over $(M, \in)$ with parameters of $M$. The function $M\mapsto\mathrm{Def}(M)$ is $\Delta_1$.</li> <li>Replace $\mathcal{P}$ with $\mathrm{Def}$ in the definition of $V$, and we will get the <strong>constructible universe</strong> $L$. <ul> <li>$L_0 = \varnothing$.</li> <li>$L_{\alpha+1} = \mathrm{Def}(L_{\alpha})$, for any $\alpha$;</li> <li>$L_{\lambda} = \bigcup_{\beta&lt;\lambda}L_{\beta}$, for $\lambda$ a limit ordinal;</li> <li>$L = \bigcup_{\alpha} L_{\alpha}$.</li> </ul> </li> </ul> <p><strong>Properties of</strong> $L$:</p> <ul> <li>(ZF) For all ordinals $\alpha$, <ul> <li>$L_{\alpha} \subseteq V_{\alpha}$</li> <li>$L_{\alpha}$ and $L$ are transitive. $L$ is a cumulative hierarchy.</li> <li>$L_{\alpha}\cap\mathrm{ORD} =\alpha$, so $L$ contains all ordinals.</li> </ul> </li> <li>$|L_{\alpha}| = |\alpha|$ for each infinite $\alpha$.</li> <li>(ZF) $L$ is a model of ZF.</li> <li>Define a strict partial order $&lt;_L$ on $L$ as follows: $x &lt;_L y$ if the rank of $x$ in $L$ is less than that of $y$; or they have the same $L$-rank but the minimal number of formula defining $x$ is less than that of $y$. Then, $&lt;_L$ is a well-ordering on $L$. <ul> <li>(ZF) $L\models \mathrm{AC}$.</li> </ul> </li> <li>Let $V=L$ abbreviate $\forall x\exists \alpha\in\mathrm{ORD}\ x\in L_{\alpha}$.</li> <li>(The absoluteness of constructibility) If $M$ is a transitive class inner model of ZF that contains all ordinals, then $L^M = L$. Further, $(V=L)^L$.</li> <li><em>Corollary</em>: $L$ is the smallest transitive inner model that contains all ordinals.</li> </ul> <p><strong><a href="https://en.wikipedia.org/wiki/Condensation_lemma">Condensation</a></strong>:</p> <ul> <li>There is a finite set $S$ of axioms of ZF-Powerset s.t. if $M\models S$ and $M\models V=L$, then $M=L_{\lambda}$ for some limit ordinal $\lambda$.</li> <li>There is a $\Sigma_2$ sentence $\varphi$ s.t. for every transitive set $M$, $M\models\varphi$ iff $M = L_{\lambda}$ for some limit ordinal $\lambda$.</li> <li>For every limit ordinal $\lambda$, if $M\prec (L_\lambda)$ is an elementary submodel, then the transitive collaps of $M$ is $L_\gamma$ for some limit ordinal $\gamma \leq \lambda$.</li> <li>Assume $V=L$. If $\kappa$ is a cardinal and $x\subseteq\kappa$, then $x\in L_{\lambda}$ for some $\lambda &lt; \kappa^+$. <ul> <li>$L$ models GCH.</li> <li>$\text{Con}(\text{ZF})\to\text{Con}(\text{ZFC})\to\text{Con}(\text{ZFC}+\text{GCH})$.</li> </ul> </li> <li>If $\kappa$ is an uncountable regular cardinal, then $L_{\kappa}$ models ZF-Powerset. <ul> <li>If $V=L$, then $H_{\kappa} = L_{\kappa}$ for all cardinals.</li> </ul> </li> </ul> <p><strong>Other properties of</strong> $L$:</p> <ul> <li>$V=L$ implies $\diamondsuit^+$.</li> <li>If $\kappa$ is weakly inaccessible, then $L_{\kappa}$ models ZFC.</li> <li>$V=L$ implies there is no measurable cardinals.</li> </ul> <h2 id="forcing">Forcing</h2> <p>I don’t fully understand this topic, so only put basic ideas here.</p> <p>Forcing is a secret weapon of the logicians. To show that a sentence $\phi$ is consistent of $T$,</p> <ul> <li>First, start with a model $M$ in which $\phi$ is not necessarily true.</li> <li>Punch new elements into $M$ and get a structure $N$, where $\phi$ is true.</li> <li>Show that $N$ is still a model of $T$.</li> </ul> <p>Since downward absolute sentences cannot be forced, forcing only works for “complex enough” sentences.</p> <p>Since we must “understand” the elements we want to add into $M$, we cannot choose $M = V$. To have more choices on the missing elements, we want $M$ as simple as possible. Therefore, we can use a ctm $M$. The consistency of ZFC does not promise the existence of ctm, but that’s a minor issue that can be fixed.</p> <p>Now, suppose that we want to prove the negation of CH. Observe that:</p> <ul> <li>$M$ has $\omega$, since it’s absolute.</li> <li>$M$ does not have all subsets of $\omega$, because otherwise $M$ cannot be countable.</li> <li>In $V$, there is an injection from $\omega_2^M$ to $\mathcal{P}(\omega)$, but $M$ cannot see it.</li> </ul> <p>Now, we want to inject this injection into $M$. We cannot directly take a union, because there is no guarantee that we can have a model. Therefore, we use the idea of compactness here: Let $\mathbb{P}$ be finite partial functions $\omega\times\omega_2^M\to 2$. Define a partial relation $&lt;$ to be the reverse of inclusion. (Due to historical reasons, we want small elements be finer) Then,</p> <ul> <li>There is a unique maximal $1_{\mathbb{P}} = \varnothing$.</li> <li>Every element is part of a multimap from $\omega_2$ to $\omega$, $\omega\times\omega_2$. <ul> <li>Let $f(\alpha) = \{n: p(n, \alpha) = 1\}$, then $f$ will be a partial function $\omega_2\to\mathcal{P}(\omega)$.</li> </ul> </li> <li>$\mathbb{P}$ in $M$, because both $\omega$ and $\omega_2$ are in $M$.</li> <li>If $p\leq q$, then we say $p$ <strong>extends</strong> $q$ or $p$ <strong>refines</strong> $q$.</li> <li>If there is some $r$ extends both $p$ and $q$, we say $p$ and $q$ are <strong>compatible</strong>. <ul> <li>Intuitively, two compatible elements look like in a sublattice, though there does not exist a real lattice. For example, $p = \{(0,0)\mapsto 1, (0,\omega)\mapsto 0\}$ and $q = \{(0,0)\mapsto 1, (1,1)\mapsto 0\}$. Then, $r = p\cup q$ refines both $p$ and $q$.</li> </ul> </li> <li>A subset $F$ of $\mathbb{P}$ is called <strong>filter</strong> if it’s upward closed and pointwise compatible. <ul> <li>This is similar to a lattice filter: nonempty, upward closed, and closed under join $\vee$.</li> <li>$f = \bigcup F$ is a partial function $\omega\times\omega_2^M\to 2$.</li> </ul> </li> </ul> <p>Now we want to make sure that</p> <ul> <li>The union of $F$ is a total function $\omega\times\omega_2^M\to 2$.</li> <li>The function $\omega_2\to\mathcal{P}(\omega)$ that the union of $F$ leads to is an injection.</li> <li>By augmenting $M$ with $F$, we can get a ZFC model.</li> </ul> <p>The concept of <em>generics</em> solves these issues:</p> <ul> <li>A subset $X\subseteq\mathbb{P}$ is <strong>dense</strong> if it can refine every element of $\mathbb{P}$.</li> <li>$X$ is <strong>dense below</strong> $p$ if it can refine every element $q\leq p$.</li> <li>Suppose $M$ is a transitive model containing $\mathbb{P}$. A filter $G\subseteq\mathbb{P}$ is $M$<strong>-generic</strong> if it meets every dense set. That is, for all dense $D\subseteq\mathbb{P}$ in $M$, we have $G\cap D\neq\varnothing$.</li> <li>For each $(n,\alpha)\in\omega\times\omega_2$, the set $\{p\in\mathbb{P}: (n,\alpha)\in\mathrm{dom}(p)\}$ is dense. Thus, $g = \bigcup G$ is a total function.</li> <li>For all $\alpha, \beta&lt; \omega_2$, the set $\{p\in\mathbb{P}: (\exists n)(n,\alpha)\in\mathrm{dom}(p)\wedge (n,\beta)\in\mathrm{dom}(p)\wedge p(n,\alpha)\neq p(n,\beta) \}$ is also dense. Thus, the function $f(\alpha) := \{n: g(n,\alpha) = 1\}$ is an injection from $\omega_2$ to $\mathcal{\omega}$.</li> </ul> <p>Now we go back to augment $M$. We want to get a new model $M[G ]$ s.t. $M\subseteq M[ G]$ and $G\in M[G ]$. Intuitively, consider that we put associate each element and each formula with a “probability”. (See <a href="https://en.wikipedia.org/wiki/Boolean-valued_model">Boolean-valued model</a>. Not every textbook discusses this because the evil in details often makes things more complicated.) For every $m\in M$, we bind a “probability” $p\in\mathbb{P}$ to it as $(m, p)$. However, we cannot use $\{(m, p): m\in M\wedge p\in\mathbb{P}\}$ because it’s clearly not a model: Let $a = (m_1, p_1)$ and $b = (m_2, p_2)$, then $\{a, b\}$ cannot be represented. Therefore, we need to define a cumulative hierarchy:</p> <ul> <li>A set $\tau$ is a $\mathbb{P}$<strong>-name</strong> if every element in $\tau$ is an ordered pair $(\sigma, p)$, where $\sigma$ is a $\mathbb{P}$-name and $p\in\mathbb{P}$.</li> <li>Equivalently, we have the following hierarchy: <ul> <li>$V^{P}_0 = \varnothing$;</li> <li>$V^{P}_{\alpha+1} = \mathcal{P}(V^{P}_{\alpha}\times \mathbb{P})$;</li> <li>$V^{P}_{\lambda} = \bigcup_{\alpha&lt; \lambda} V^{P}_{\alpha}$;</li> <li>$V^{P} = \bigcup_{\alpha} V^{P}_{\alpha}$;</li> <li>Since being a $\mathbb{P}$-name is absolute, the set of $\mathbb{P}$-names in $M$ equals $V^{P}\cap M$.</li> </ul> </li> </ul> <p>What we define is like the language $\mathcal{L}_{M}$, but works on a “fuzzy” structure with “probabilities”. If we cut the “probability” with a “threshold”, then hopefully we will get a definite model.</p> <ul> <li>Let $G\subseteq\mathbb{P}$ be a $M$-generic. For every $\mathbb{P}$-name $\tau$, the <strong>value of</strong> $\tau$ <strong>under</strong> $G$ is defined by recursion: $\tau[ G] = \{\sigma[ G]: (\sigma, p)\in\tau\wedge p\in G\}$.</li> <li>For example, let $\tau = \{(\varnothing, p), (\{(\varnothing, 1)\}, g)\}$. Suppose $g\in G$. Then $\{(\varnothing, 1)\}[ G] = 0$ is in $\tau[ G]$. Hence, if $p\notin G$, $\tau[ G] = \{ 0\}$; otherwise, $\tau[ G] = 2$.</li> <li>Let $M[ G]$ be the set of $\tau[ G]$ for every $\mathbb{P}$-name $\tau$ that is in $M$.</li> </ul> <p>Intuitively, $M[G ]$ is like a threshold cut, plus Mostowski collapse.</p> <p>Now we have:</p> <ul> <li>$M[ G]$ is transitive. <ul> <li>By the hierarchical definition of values.</li> </ul> </li> <li>$M\subseteq M[ G]$. <ul> <li>Let $\check{x} = \{(\check{y}, 1_{\mathbb{P}}): y\in x\}$ for $x\in M$. Then $\check{x}[G ] = x$. Because the maximal $1$ is always in $G$, just like probability 1.</li> </ul> </li> <li>$G\in M[ G]$. <ul> <li>Let $\tau = \{(\check{p}, p): p\in\mathbb{P} \}$. Then, $\tau[ G] = G$.</li> </ul> </li> <li>$\mathrm{ORD}\cap M = \mathrm{ORD}\cap M[ G]$</li> </ul> <p>Such $G$ is not constructibly defined, but always exists, because the following lemma:</p> <ul> <li>If $M$ is a ctm, then for every forcing poset $\mathbb{P}$ and $p\in P$, there is a $M$-generic filter $G$ containing $p$.</li> <li><em>Note</em>: the existence of generic filters comes from that $M$ can only see finite subsets of $\mathbb{P}$. For example, let $\mathbb{P}$ be an infinite binary tree. If $\mathcal{P}(\mathbb{P})^M = \mathcal{P}(\mathbb{P})^V$, then there do not exist a generic filter. Because for every filter $F$, the complement set $\mathbb{P}\setminus F$ is dense.</li> </ul> <p>Now we need to show that $M[ G]$ is a model of ZFC. To show this, we simulate the “threshold cut” process for formulas: for each formula, we define a “probabilistic” value, and then cut it with $G$. Formally, we define the <strong>forcing relation</strong> $\Vdash$ s.t.</p> $M[ G]\models\varphi(\tau_1[ G],\ldots, \tau_n[ G]) \iff (\exists p\in G)\ p\Vdash^{\mathbb{P}}\varphi(\tau_1,\ldots, \tau_n)$ <p>Note that $\Vdash$ is well-defined in $M$, since every $\mathbb{P}$-name and $p$ is in $M$. The only thing $M$ does not know is $G$.</p> <p>We define it as follows:</p> <ul> <li>Atomic formula $\tau\in\sigma$: <ul> <li>An element is in a set if it equals to some element in the set.</li> <li>$\| \tau\in\sigma \| = \bigvee_{(\sigma’, p’)\in\sigma}\left( I_{p’}\cap \| \tau = \sigma’ \| \right)$, where $I_{p}$ is the principal ideal $\{q\in \mathbb{P}: q\leq p’\}$. <ul> <li><em>Note</em>: in a complete boolean algebra generated by a poset topological space, $\wedge$ is $\cap$, but $\vee$ is not $\cup$. Actually, $p\vee q$ is $(p\cup q)^{\bot\bot}$. Here, $\bot$ means the complement set of its closure. And of course, $\neg$ is $\bot$. For example, in a poset of finite binary strings, $I_{00}\cup I_{01}$ does not include the string $0$, but $I_{00}\vee I_{01} = I_{0}$.</li> <li><em>Note</em>; To understand why here we use $I_p$, consider that an event with probability $p$. It happens when $p$ is greater than the cut $G$, which is equivalent to $G\leq p$.</li> </ul> </li> <li>$p\Vdash \tau\in\sigma$ iff $\bigcup_{(\sigma’, p’)\in\sigma} \{q \leq p’: q\Vdash \tau = \sigma’ \}$ is dense below $p$.</li> <li>$p\Vdash \tau\in\sigma$ iff $\{ q\leq p: \exists (\sigma’, p’)\in\sigma [q\leq p’\wedge q\Vdash \tau = \sigma’]\}$ is dense below $p$.</li> </ul> </li> <li>Atomic formula $\tau = \sigma$: <ul> <li>A set $x$ is equal to $y$ if $z\in x \leftrightarrow z\in y$.</li> <li>$\| \tau = \sigma \| = \bigcap_{(\tau’, p’)\in\tau}(I_{p’}\rightarrow \|\tau’\in\sigma\|)\cap \bigcap_{(\sigma’, p’)\in\sigma}(I_{p’}\rightarrow \|\sigma’\in\tau\|)$ <ul> <li>Here, $x \rightarrow y$ means $\neg x\vee y$.</li> </ul> </li> <li>$p\Vdash \tau = \sigma$ iff for all $(\pi, \_)\in \tau\cup\sigma$ and all $q\leq p$, $q\Vdash \pi\in\tau \leftrightarrow q\Vdash \pi\in\sigma$.</li> <li>$p\Vdash \tau = \sigma$ iff for all $(\tau’, p’)$, $\{q\in\mathbb{P}: q\leq p’\to q\Vdash\tau’\in\sigma \}$ is dense below $p$, and for all $(\sigma’, p’)$, $\{q\in\mathbb{P}: q\leq p’\to q\Vdash\sigma’\in\tau \}$ is dense below $p$.</li> </ul> </li> <li>Formula $\varphi\wedge\psi$: <ul> <li>$\| \varphi\wedge\psi \| = \|\varphi\|\cap \|\psi\|$.</li> <li>$p\Vdash \varphi\wedge\psi$ iff $p\Vdash\varphi$ and $p\Vdash\psi$.</li> </ul> </li> <li>Formula $\varphi\vee\psi$: <ul> <li>$\| \varphi\vee\psi \| = \|\varphi\|\vee \|\psi\|$.</li> <li>$p\Vdash \varphi\vee\psi$ iff $\{q\in\mathbb{P}: q\Vdash\varphi\}\cup\{q\in\mathbb{P}: q\Vdash\psi\}$ is dense below $p$.</li> </ul> </li> <li>Formula $\neg\varphi$: <ul> <li>$\| \neg\varphi \| = \neg \| \varphi \|$</li> <li>$p\Vdash \neg\varphi$ iff there is no $q\leq p$ s.t. $q\Vdash \varphi$</li> </ul> </li> <li>Formula $\exists x\ \varphi(x)$: <ul> <li>$\| \exists x\ \varphi(x) \| = \bigvee_{\tau} \| \varphi[ \tau] \|$.</li> <li>$p\Vdash \exists x\ \varphi(x)$ iff the set of $q\leq p$ s.t. there exists a $\mathbb{P}$-name $\tau$ satisfying $q\Vdash \varphi[ \tau]$ is dense below $p$.</li> </ul> </li> <li>Formula $\forall x\ \varphi(x)$: <ul> <li>$\| \forall x\ \varphi(x) \| = \bigcap_{\tau} \| \varphi[ \tau] \|$.</li> <li>$p\Vdash \forall x\ \varphi(x)$ iff for all $\mathbb{P}$-name $\tau$, $p\Vdash \varphi[ \tau]$.</li> </ul> </li> </ul> <p>The forcing relation has the following properties:</p> <ul> <li><strong>Force = Truth</strong> lemma. Described above.</li> <li><strong>Definability</strong>. $\Vdash$ is a well-defined relation in $M$.</li> <li><strong>Coherence</strong>. <ul> <li>If $p\Vdash\varphi$, then for all $q\leq p$, $q\Vdash\varphi$.</li> <li>If the set $q\Vdash\varphi$ is dense below $p$, then $p\Vdash\varphi$.</li> </ul> </li> </ul> <p>Now we can prove that $M[ G]$ is a model by showing evidence as $\mathbb{P}$-names in $M$.</p> <p><em>Lemma</em>: If $M$ is a transitive model and $G$ is a $M$-generic, then $M[ G]\models\mathrm{ZFC}$.</p> <p>One thing we need to ensure is that in $M[ G]$, $\omega_2$ is the $\omega_2$ in $M$. This is given by ccc.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Countable_chain_condition">Countable chain condition (ccc)</a></strong>:</p> <ul> <li>A poset $\mathbb{P}$ is <em>ccc</em> if every <em>strong</em> <strong>anti</strong><em>chain</em> is countable.</li> <li>A strong antichain is an antichain whose elements are pairwise <em>incompatible</em>.</li> <li>If $M$ proves $\mathbb{P}$ is ccc, then $M$ and $M[ G]$ have exactly the same cardinals.</li> </ul> <p>At last, let’s handle the existence of ctm. Not every model of ZFC has a ctm. However, from a metatheorem’s view, the reflection theorem ensure that arbitrarily large finite subsets of ZFC axioms have a ctm. If the subset is large enough s.t. whatever we used till now is included, then we can force $|\mathcal{P}(\omega)| = \aleph_2$. Then, by the conpactness theorem, $\neg\mathrm{CH}$ must be consistent with ZFC.</p> <h1 id="summary">Summary</h1> <p>Things I have learnt:</p> <ul> <li>Have a basic understanding of mathematical logic. Know what mathematicians in the last century did.</li> <li>Logic objects are often organized in an infinite hierarchy. There does not exist an ultimate theory.</li> <li>Most useful theory is not complete. There are always not expected models. <ul> <li>Therefore, the strenth of the theory limits what we can prove.</li> </ul> </li> <li>Infinite numbers are often counterintuitive.</li> <li>The existence of something does not imply is understandable by human beings.</li> <li>In different universe, people have different views of the same concept.</li> </ul> <h1 id="references">References</h1> <ul> <li>Martin Hils, and François Loeser. <a href="https://bookstore.ams.org/stml-89/">A First Journey through Logic</a>. American Mathematical Society, 2019.</li> <li>Andrew Marks. <a href="https://www.math.ucla.edu/~marks/notes/set_theory_notes_2.pdf">Set Theory Notes</a>. 2020. <ul> <li>There are some typos, but none of them will affect the understanding.</li> </ul> </li> <li>Yiannis N. Moschovakis. <a href="https://www.math.ucla.edu/~ynm/lectures/lnl.pdf">Lecture Notes in Logic</a>. 2014.</li> <li>Robert I. Soare. <a href="https://link.springer.com/book/10.1007%2F978-3-642-31933-4">Turing Computability: Theory and Applications (1st. ed.)</a>. Springer, 2016.</li> <li>Thomas Jech. <a href="https://link.springer.com/book/10.1007%2F3-540-44761-X">Set Theory: The Third Millennium Edition, revised and expanded</a>. Springer, 2003.</li> </ul>Xinyu Maxinyuma@ucla.eduThis is a review and summary for course MATH 220BC, given by Professor Artem Chernikov and Andrew Marks.MATH 206A: Combinatorics - Course Review2020-12-28T00:00:00-08:002020-12-28T00:00:00-08:00https://zjkmxy.github.io/posts/2020/12/math-206a<p>This is a review and summary for course MATH 206A Combinatorics at UCLA, given by Prof. Igor Pak.</p> <p>I try to make the flow more naturally, but also want to include brief description of important theorems.</p> <aside class="sidebar__right"> <nav class="toc"> <header><h4 class="nav__title"><i class="fa fa-file-text"></i> On This Page</h4></header> <ul class="toc__menu" id="markdown-toc"> <li><a href="#overview" id="markdown-toc-overview">Overview</a></li> <li><a href="#chains--antichains" id="markdown-toc-chains--antichains">Chains &amp; Antichains</a> <ul> <li><a href="#dilworths-theorem" id="markdown-toc-dilworths-theorem">Dilworth’s theorem</a></li> <li><a href="#boolean-lattice" id="markdown-toc-boolean-lattice">Boolean lattice</a></li> <li><a href="#lym-inequality" id="markdown-toc-lym-inequality">LYM inequality</a></li> <li><a href="#sperner-property" id="markdown-toc-sperner-property">Sperner property</a></li> <li><a href="#bollobáss-theorem" id="markdown-toc-bollobáss-theorem">Bollobás’s theorem</a></li> <li><a href="#applications-of-chains--antichains" id="markdown-toc-applications-of-chains--antichains">Applications of chains &amp; antichains</a></li> </ul> </li> <li><a href="#linear-algebra-methods" id="markdown-toc-linear-algebra-methods">Linear Algebra Methods</a> <ul> <li><a href="#perfect-graphs" id="markdown-toc-perfect-graphs">Perfect graphs</a></li> <li><a href="#equal-subset-sums" id="markdown-toc-equal-subset-sums">Equal subset sums</a></li> </ul> </li> <li><a href="#combinatoric-optimization" id="markdown-toc-combinatoric-optimization">Combinatoric Optimization</a></li> <li><a href="#poset-arithmetic--lattice" id="markdown-toc-poset-arithmetic--lattice">Poset Arithmetic &amp; Lattice</a> <ul> <li><a href="#poset-operations" id="markdown-toc-poset-operations">Poset operations</a></li> <li><a href="#distributive-lattice" id="markdown-toc-distributive-lattice">Distributive lattice</a></li> </ul> </li> <li><a href="#linear-extensions" id="markdown-toc-linear-extensions">Linear Extensions</a> <ul> <li><a href="#estimations-of-linear-extensions" id="markdown-toc-estimations-of-linear-extensions">Estimations of linear extensions</a></li> <li><a href="#operations-on-linear-extensions" id="markdown-toc-operations-on-linear-extensions">Operations on linear extensions</a></li> <li><a href="#domino-tableaux" id="markdown-toc-domino-tableaux">Domino tableaux</a></li> </ul> </li> <li><a href="#poset-polytopes" id="markdown-toc-poset-polytopes">Poset Polytopes</a> <ul> <li><a href="#order-polytope" id="markdown-toc-order-polytope">Order polytope</a></li> <li><a href="#chain-polytope" id="markdown-toc-chain-polytope">Chain polytope</a></li> <li><a href="#ehrhart-polynomial" id="markdown-toc-ehrhart-polynomial">Ehrhart polynomial</a></li> <li><a href="#aleksandrov-fenchel-inequality" id="markdown-toc-aleksandrov-fenchel-inequality">Aleksandrov-Fenchel inequality</a></li> <li><a href="#brightwell-tetali-theorem" id="markdown-toc-brightwell-tetali-theorem">Brightwell-Tetali theorem</a></li> </ul> </li> <li><a href="#correlation-results" id="markdown-toc-correlation-results">Correlation Results</a> <ul> <li><a href="#four-functions-theorem" id="markdown-toc-four-functions-theorem">Four functions theorem</a> <ul> <li><a href="#kleitman" id="markdown-toc-kleitman">Kleitman</a></li> <li><a href="#four-functions-theorem-1" id="markdown-toc-four-functions-theorem-1">Four functions theorem</a></li> <li><a href="#fkg-inequality" id="markdown-toc-fkg-inequality">FKG inequality</a></li> </ul> </li> <li><a href="#shepps-xyz-theorem" id="markdown-toc-shepps-xyz-theorem">Shepp’s XYZ Theorem</a></li> <li><a href="#winklers-theorem" id="markdown-toc-winklers-theorem">Winkler’s theorem</a></li> <li><a href="#comparisons-via-linear-extensions" id="markdown-toc-comparisons-via-linear-extensions">Comparisons via linear extensions</a> <ul> <li><a href="#winklers-canonical-linear-ordering" id="markdown-toc-winklers-canonical-linear-ordering">Winkler’s canonical linear ordering</a></li> <li><a href="#preferential-ordering" id="markdown-toc-preferential-ordering">Preferential ordering</a></li> <li><a href="#intransitive-dice" id="markdown-toc-intransitive-dice">Intransitive dice</a></li> </ul> </li> <li><a href="#13-23-conjecture" id="markdown-toc-13-23-conjecture">1/3-2/3 Conjecture</a></li> </ul> </li> <li><a href="#summary" id="markdown-toc-summary">Summary</a></li> </ul> </nav> </aside> <h1 id="overview">Overview</h1> <p>The topics of 206A and 206B changes every year. This year, the professor decides to focus on <a href="https://en.wikipedia.org/wiki/Partially_ordered_set">poset</a>, a set associated with a (strict) partial order $&lt;$. The lectures become a slice that goes across every subfield of combinatorics. We use tools from algebra, graph theory, analysis, geometry, probabilistic method, etc.</p> <h1 id="chains--antichains">Chains &amp; Antichains</h1> <p>The property of a set is highly related to the properties of its subsets. Therefore, when a set is associatesd with an order, the first thing we want to ask is what properties its subsets may have. A random subset is boring, because when we restrict the order to it we always get another poset. However, there are two special kinds of subsets – one that every two elements can be compared; one that every two elements cannot. They are called <em>chains</em> and <em>antichains</em>, resp.</p> <h2 id="dilworths-theorem">Dilworth’s theorem</h2> <p>The story starts with the basic properties, height and width, of a poset $P$. Height is the maximum length of (maximal) chains $h(P) = \max |\{c_1&lt; c_2&lt; \ldots &lt; c_h\}|$; width is the maximum size of (maximal) antichains $w(P) = \max |\{a_1, \ldots, a_w: a_i,a_j\text{ incomparible} \}|$.</p> <p><a href="https://en.wikipedia.org/wiki/Dilworth%27s_theorem">Dilworth’s theorem</a> states that</p> <ul> <li>Height = size of smallest antichain partition</li> <li>Weight = size of smallest chain partition</li> </ul> <p>The first case is simple: we can define the height of an element $x\in P$ to be the height of its principal ideal $\{ y\in P: y&lt; x \}$, or equivalently, the longest path towards it. Then, the elements with the same height forms an antichain.</p> <p>The second case is not so simple. There does not exists an obvious dual poset, so the width of an element is undefined. To prove it, we can pick a maximal antichain, and induct on the two subposets separated by this antichain.</p> <p>Dilworth’s theorem has several corollaries. One interesting result is <a href="https://en.wikipedia.org/wiki/Hall%27s_marriage_theorem">Hall’t marriage theorem</a>: Given $n$ men and $n$ women, if any $k$ women has more than $k$ acceptable partners, then there exists an arrangement that everyone can be married; because the width of this graph is less than $n$.</p> <p>Dilworth’s theorem can be considered as a special case of more general theorems. One by <a href="https://en.wikipedia.org/wiki/Path_cover#:~:text=A%20theorem%20by%20Gallai%20and,from%20each%20path%20in%20P.">Gallai</a> states that every directed graph has a path partition of vertices, whose size is less than the number of independent sets.</p> <h2 id="boolean-lattice">Boolean lattice</h2> <p>An important kind of posets is (<em>finite</em>) <a href="https://en.wikipedia.org/wiki/Boolean_algebra_(structure)">boolean lattices</a>: $B_n = (2^{[n ]}, \subsetneqq)$, where $[n ] = {0,1,\ldots, n-1}$. We tried to count the number of chains and antichains using enumerative method. For chain, the generating function is the same as the number of surjections. Greene-Kleitman shows that $B_n$ can be partitioned into $w(B_n)$ symmetric saturated chains. As a corollary, the width of each level is unimodal: ${n\choose 0} \leq \cdots \leq {n\choose n/2} \geq \cdots \geq {n\choose n}$. G-K gives a bracket-sequence representation of these chains.</p> <p>Two properties $B_n$ satisfies are <a href="http://www.thi.informatik.uni-frankfurt.de/~jukna/EC_Book_2nd/katona.html">Bollobás’s Two Families Theorem</a>, <a href="https://en.wikipedia.org/wiki/Lubell%E2%80%93Yamamoto%E2%80%93Meshalkin_inequality">LYM inequality</a>, and <a href="https://en.wikipedia.org/wiki/Sperner_property_of_a_partially_ordered_set">Sperner property</a>.</p> <h2 id="lym-inequality">LYM inequality</h2> <p>Let $A\subset B_n$ be an antichain, then</p> $\sum_{a\in A} {n\choose |a|}^{-1} \leq 1$ <p>In other words, the sum of probabilities, that each set is chosen out of the rank it lies in, is less than 1.</p> <p>This can be proven by double counting the number of maximal chains: For each $a\in A$, there are $|a|!(n- |a|)!$ maximal chains passing it; each chain cannot pass two elements in an antichain, so the sum of this thing is less than $n!$.</p> <h2 id="sperner-property">Sperner property</h2> <p>Sperner property argues that the largest antichain is the largest rank. This is a corollary of LYM: suppose $A$ is the largest antichain, then</p> $1 \geq \sum_{a\in A}{n\choose |a|}^{-1} \geq \mathrm{width}(B_n){n\choose n/2}^{-1}$ <h2 id="bollobáss-theorem">Bollobás’s theorem</h2> <p>Suppose $A_1,\ldots, A_m,B_1,\ldots, B_m \subseteq [n ]$ s.t. $A_i\cap B_j = \varnothing$ iff $i=j$. Then.</p> $\sum_{i= 1}^{m} {|A_i|+|B_i| \choose |A_i|}^{-1} \leq 1$ <p>LYM inequality is a corollary of Bollobás’s theorem.</p> <h2 id="applications-of-chains--antichains">Applications of chains &amp; antichains</h2> <p>As applications, we learnt Gray codes and universal sequences. The latter one contains every subset of $[ n]$ as continuous subsequences.</p> <h1 id="linear-algebra-methods">Linear Algebra Methods</h1> <h2 id="perfect-graphs">Perfect graphs</h2> <p>A <a href="https://en.wikipedia.org/wiki/Perfect_graph">perfect graph</a> is a simple graph in which the chromatic number of every induced subgraph equals its the clique number.</p> <ul> <li>Comparability graphs and incomparability graphs are always perfect.</li> <li>Weak perfect graph conjecture: a graph is perfect iff its complement graph is perfect.</li> <li>Strong perfect graph conjecture: a graph is perfect iff neither it nor its complement graph contains any $(2n+1)$-cycle as an induced subgraph, for all $n\geq 2$.</li> <li>Replication lemma: If both $G$ and $H$ are perfect, then the result of replacing a vertex in $G$ with $H$ is perfect.</li> <li>A graph is perfect iff the size of every induced subgraph is $\leq$ its clique number times independent number.</li> </ul> <p>To prove the last lemma, we use some argument on matrix rank.</p> <h2 id="equal-subset-sums">Equal subset sums</h2> <p>Given a set $S$ of $n$ different positive real numbers, we want to count how many subsets of $S$ whose elements sum up to a given $K$. We proved that:</p> <ul> <li>The number is $\leq {n\choose n/2}$</li> <li>(Erdős-Moser Conjecture) The number is not larger than the case where $S = \{1,2,\ldots, n\}$ and $K = \frac{n(n+1)}{4}$ equals half the sum of $S$.</li> </ul> <p>The proof idea is: let</p> $M_n = \{(b_1,\ldots, b_n) : 0\leq b_i\leq n \wedge 0 = b_1 = \cdots = b_l &lt; b_{l+1} &lt; \cdots &lt; b_n \}$ <p>and $\leq$ on $M_n$ defined by pointwise $\leq$. Then, $M_n$ does not have symmetric saturated chain decomposition.</p> <p>However, after embedding $M_n$ into a linear spaces, we can still prove that $M_n$ is unimodular and has Sperner property. The conjecture is equivalent to Sperner property, so it holds. The operations in $M_n$ is somehow related to the representation of Lie group $\mathrm{SL}_2$.</p> <h1 id="combinatoric-optimization">Combinatoric Optimization</h1> <p>Dilworth gives one longest chain / antichain, so now we want to generalize it to the largest union set of $k$ chains / antichains. This is the Greene-Kleitman theorem:</p> <ul> <li>The largest $k$ chain union, $\alpha_k$, equals to the minimum $\sum_{A\in\mathcal{A}}\min\{k, |A|\}$, where $\mathcal{A}$ is an antichain cover. <ul> <li>$\min\{k, |A|\}$ is because every antichain can contribute at most $k$ elements to the $k$ chain union.</li> </ul> </li> <li>Dual is the largest $k$ antichain union $\beta_k$.</li> <li>For any poset, $\alpha_k$ and $\beta_k$ are the sum of lengths of rows and columns of one specific <a href="https://en.wikipedia.org/wiki/Young_tableau">standard Young diagram</a>. <ul> <li>That is, let $a_k = \alpha_k - \alpha_{k-1}$, $b_k = \beta_k - \beta_{k-1}$. Then, there exists some $\lambda\vdash n$ s.t. the $k$-row is $a_k$, the $k$-column is $b_k$.</li> </ul> </li> </ul> <p>This is proven by transform the poset into a <a href="https://en.wikipedia.org/wiki/Minimum-cost_flow_problem">min-cost circulation problem</a>, which is itself a linear programming and has some well-researched properties.</p> <p>On the other hand, <a href="https://en.wikipedia.org/wiki/Robinson%E2%80%93Schensted%E2%80%93Knuth_correspondence">RSK correspondence</a> also gives a bijection between an $n$-permutation and a pair of $n$ standard Yound tableaux. The Young tableaux given by RSK also gives $a_k$ and $b_k$.</p> <h1 id="poset-arithmetic--lattice">Poset Arithmetic &amp; Lattice</h1> <h2 id="poset-operations">Poset operations</h2> <p>We can define several arithmetic operations on poset:</p> <ul> <li><strong>Sum</strong>: $P+Q$ contains all elements of $P$,$Q$. Any $p\in P$ and $q\in Q$ are incomparable.</li> <li><strong>Product</strong> (<em>noncommutative</em>): $P\cdot Q$ contains $P$, $Q$. $p &gt; q$ for all $p\in P$, $q\in Q$. <ul> <li>A poset obtained from single-points via sum and product is <em>series-parallel</em>. A poset is series-parallel iff it does not contain $N = \{a&lt; c, a &lt; d, b &lt; d\}$ as an induced subposet.</li> </ul> </li> <li><strong>Cartesian product</strong>: $P\times Q$ contains every pair $(p, q)$, with pointwise order: $(p_1,q_1)\leq (p_2,q_2)$ iff $p_1\leq q_1 \wedge p_2 \leq q_2$. <ul> <li>Boolean lattice $B_n$ is the cartesian product of $n$ $B_1$.</li> </ul> </li> <li><strong>Power poset</strong>: $Q^P$ contains all functions $f:P\to Q$ that preserve the order. The order of functions $f\leq g$ is defined pointwise. <ul> <li>$P\times (Q+R) = P\times Q + P\times R$; $P^{Q+R} = P^Q\times P^R$</li> </ul> </li> <li>For a poset $P$, let $J(P)$ be the poset of <a href="https://en.wikipedia.org/wiki/Upper_set">lower sets</a> of $P$, with order $\subset$. <ul> <li>$J(P)$ is a <a href="https://en.wikipedia.org/wiki/Distributive_lattice">distributive lattice</a> with $\wedge = \cap$, $\vee = \cup$.</li> </ul> </li> </ul> <h2 id="distributive-lattice">Distributive lattice</h2> <p>A lattice is a poset $L = (X, &lt;)$, closed under operations meet $\wedge$ and join $\vee$:</p> <ul> <li><strong>Meet</strong> $a\wedge b$ is the <em>greatest (universal) lower bound</em>, <em>infimum</em>.</li> <li><strong>Join</strong> $a\vee b$ is the <em>least (universal) upper bound</em>, <em>supremum</em>.</li> </ul> <p>Lattice satisfies many propositions as basic logic $B_1 = \{ True, False \}$:</p> <ul> <li>A <strong>finite</strong> lattice always has one greatest element (<em>top</em>) $\top$ or $1$, and one least element (<em>bottom</em>) $\bot$ or $0$. <ul> <li>$a\vee 0 = a \wedge 1 = a$.</li> </ul> </li> <li>Meet and join are associative, commutative, and satisfies <em>absorption laws</em>: $a\wedge(a\vee b) = a\vee(a\wedge b) = a$</li> <li><em>Idempotent laws</em>: $a\wedge a = a\vee a = a$.</li> </ul> <p>A <em>filter</em> on a poset is a non-empty <em>upper sets</em> that does not contain $\bot$ and is closed under finitely many meets $\wedge$. Dually, $ideal$. A maximal filter is called a <em>ultrafilter</em>. A ultrafilter of the form $F_y = \{ x\in L: x \leq y \}$ is <em>principal</em>.</p> <p>A lattice is distributive if meet is distributive over join and vice versa.</p> <p><a href="https://en.wikipedia.org/wiki/Birkhoff%27s_representation_theorem"><em>Fundamental theorem of finite distr. lat.</em></a>: Every <strong>finite</strong> distributive lattice $L$ is isomorphic to some $J(P)$.</p> <ul> <li>An element is <em>join-irreducible</em> if it is not the join of two other elements.</li> <li>Every element has a unique factorization as a join of a set of join-irreducible elements.</li> <li>Let $P$ be the subposet containing all join-irreducible elements, then $L=J(P)$.</li> </ul> <p><a href="https://en.wikipedia.org/wiki/Stone%27s_representation_theorem_for_Boolean_algebras"><em>The Stone representation theorem</em></a>: If a (possibly infinite) distributive lattice has well-defined $\top$, $\bot$ and $\neg$ ($a\vee\neg a = \top$, $a\wedge\neg a = \bot$), then it is isomorphic to a field of sets.</p> <ul> <li>Let $S$ be the set of ultrafilters of $L$.</li> <li>Let $f: L\to 2^S$ be defined as $f(x) = \{U\in S: x\in U\}$. <ul> <li>$f$ is not necessarily be surjective if $L$ is infinite.</li> </ul> </li> </ul> <p>A lattice is distributive iff it admits cancallation: for all $x,y,z$, if $z\wedge x = z\wedge y$ and $z\vee x = z\vee y$, then $z = y$.</p> <p>A lattice is distributive iff it does not contain induced sublattice isomorphic to diamond $M_3$ or pentagon $N_5$.</p> <h1 id="linear-extensions">Linear Extensions</h1> <p>The set of linear extensions of a poset $P$ is the set of maps $P\to [ n]$ that preserves the order, denoted by $\alpha(P)$ Let $e(P) = |\alpha(P)|$ denote the number of linear extensions of $P$.</p> <ul> <li>$e(P\cdot Q) = e(P)e(Q)$; $e(P+Q) = e(P)e(Q){|P| + |Q|\choose |P|}$</li> <li>$e(P)$ of a tree is $|P|!$ divided by the product of sizes of all subtrees.</li> <li>$e(P)$ of a $2\times m$-square is the <a href="https://en.wikipedia.org/wiki/Catalan_number">Catalan number</a> $C_m$.</li> <li>The <a href="https://en.wikipedia.org/wiki/Hook_length_formula">hook length formula</a> computes $e(P)$ for standard Young diagrams.</li> <li>$e(P_{\sigma})$ is the number of permutations less than or equal to $\sigma$ under Bruhat order. <ul> <li>The <a href="https://en.wikipedia.org/wiki/Inversion_(discrete_mathematics)#Weak_order_of_permutations">weak Bruhat order</a> gives a partial order on permutations.</li> </ul> </li> </ul> <h2 id="estimations-of-linear-extensions">Estimations of linear extensions</h2> <p>Computing $e(P)$ is generally <a href="https://en.wikipedia.org/wiki/%E2%99%AFP-complete">♯P-complete</a>. However, we can estimate it.</p> <ul> <li>$e(P) \geq |A_1|!\cdots|A_n|!$ for any antichain partition $\{A_1,\ldots, A_n\}$.</li> <li>$n!/e(P) \geq |C_1|!\cdots|C_n|!$ for any chain partition $\{C_1,\ldots, C_n\}$.</li> <li>$e(P) \leq \mathrm{width}(P)^n$.</li> </ul> <h2 id="operations-on-linear-extensions">Operations on linear extensions</h2> <ul> <li><a href="https://en.wikipedia.org/wiki/Jeu_de_taquin">Jeu-de-taquin</a> acts on skew Young tableau.</li> <li><a href="https://math.mit.edu/~rstan/papers/evac.pdf">Schützenberger Promotion</a> <ul> <li>$\psi$: <ul> <li>Starting from $1$, replacing every element with its least child; go down the path till the maximal element.</li> <li>Remove the maximal element; shift every element by $1$ and fill in the blank by $n$.</li> </ul> </li> <li>$\psi$ is a bijection on linear extensions.</li> </ul> </li> <li>Evacuation: <ul> <li>$\eta$: similar to promotion</li> <li>$\eta$ is also a bijection. $\eta^2 = 1$.</li> </ul> </li> <li><a href="https://en.wikipedia.org/wiki/Coxeter_group">Coxeter group</a>: <ul> <li>The infinite group generated by $\langle\tau_1, \ldots, \tau_{n-1}\rangle$. <ul> <li>$\tau_i$ means swap $i$ and $i+1$ <em>if possible</em> in $P$.</li> <li>$\tau_i^2 = (\tau_i\tau_j)^2 = 1$, for non-adjacent $i,j$.</li> </ul> </li> <li>$\psi$ and $\eta$ can be expressed in Coxeter group.</li> </ul> </li> </ul> <h2 id="domino-tableaux">Domino tableaux</h2> <ul> <li>There is a bijection between domino diagrams and a pair of standard Young diagrams: $\phi(\lambda) = (\mu, \nu)$.</li> <li>The number of domino tableaux in shape $\lambda$ is ${|\mu|+|\nu|\choose |\mu|}\#SYT(\mu)\#SYT(\nu)$.</li> </ul> <p>We also proved the Hook length formula by poset sorting and $P$-partition theory.</p> <h1 id="poset-polytopes">Poset Polytopes</h1> <p>Poset polytopes are the application of geometry methods on posets. For each post $P$, we can define two kinds of polytopes: the order polytope and the chain polytope.</p> <h2 id="order-polytope">Order polytope</h2> <ul> <li>The order polytope $\mathcal{O}_p$ contains real functions $f: P\to [0, 1]$ that preserves the order.</li> <li>Facets of $\mathcal{O}_p$ are $f(x) = 0$ for $x$ minimal; $f(y) = 1$ for $y$ maximal; $f(x)=f(y)$ for $x$ covers $y$ in the Hasse diagram.</li> <li>Vertices of $\mathcal{O}_p$ are characteristic functions of upper sets.</li> <li>Its volume is $e(P)/n!$.</li> </ul> <h2 id="chain-polytope">Chain polytope</h2> <ul> <li>The chain polytope $\mathcal{C}_p$ contains real functions $f: P\to [0, 1]$ that the sum of any chain is $\leq 1$.</li> <li>Vertices of $\mathcal{C}_p$ are characteristic functions of antichains.</li> <li>Its volume is also $e(P)/n!$. <ul> <li>Proven by build a continuous, pointwise linear, volume preserving bijection between $\mathcal{O}_p$ and $\mathcal{C}_p$.</li> <li>Corollary: $e(P)$ depends only on the comparability graph.</li> </ul> </li> </ul> <h2 id="ehrhart-polynomial"><a href="https://en.wikipedia.org/wiki/Ehrhart_polynomial">Ehrhart polynomial</a></h2> <p>Given an $n$-d integral polytope $Q$ and $N\in\mathbb{N}$, then the integral points contained by $NQ$ is $L(Q,N) = |NQ\cap \mathbb{Z}^n|$ is a polynomial of $N$ of degree $n$, and the leading coefficient is the volume of $Q$.</p> <p>Let $a_P(m)$ be order preserving functions from $P$ to $[m ]$. Then, $a_P(m)$ is a polynomial of $m$, with its leading coeff = $e(P)/(m-1)!$.</p> <h2 id="aleksandrov-fenchel-inequality"><a href="https://en.wikipedia.org/wiki/Mixed_volume">Aleksandrov-Fenchel inequality</a></h2> <p>Suppose $Q_0$, $Q_1$ are convex polytopes in $\mathbb{R}^n$. Let $Q = conv\{Q_0, Q_1\}$ be the polytope that continuously vary from $Q_0$ to $Q_1$. Then, the volume of $Q_{\lambda}$ is</p> $\mathrm{vol}_{n-1}(Q_{\lambda}) = \sum_{i=0}^{n-1}{n-1 \choose i}V_i(Q_0, Q_1)\lambda^i(1-\lambda)^{n-1-i}$ <p>Where $V_i^2(Q_0, Q_1) \geq V_{i-1}(Q_0, Q_1)V_{i+1}(Q_0, Q_1)$.</p> <p>Given a poset $P$, let $\alpha_j(x)$ denote the number of linear extensions that maps $x$ to $j$. Then, $\alpha_j(x)^2\geq \alpha_{j-1}(x)\alpha_{j+1}(x)$ is <a href="https://en.wikipedia.org/wiki/Logarithmically_concave_sequence">log concave</a>. Log concavity implies unimodularity.</p> <p>Let $\beta_i(x,y)$ denote number of linear extensions where the images of $x$ and $y$ differs by $i$. Then, $\beta_i(x)$ is also unimodular.</p> <p>If $a_i/{n\choose i}$ is log concave, then $a_i$ is <em>ultra log concave</em>. The convolution $c_i = \sum_k{n\choose k}a_k b_{i-k}$ of two ultra log concave sequences is ultra log concave. Thus, for a series parallel poset $P$, $a_i$ is ultra log concave.</p> <h2 id="brightwell-tetali-theorem">Brightwell-Tetali theorem</h2> <p>Suppose $h: P\to \mathbb{R}_{+}$ s.t. the sum over any antichain is less than $1$. Then, $e(P)\leq \prod_{x\in P} 1/h(x)$.</p> <p>As a corollary, for all ranked $P$ with LYM property, $e(P) \leq \prod_{k=1}^{h(P)} (r_k)^{r_k}$</p> <h1 id="correlation-results">Correlation Results</h1> <p>Pick $x,y,z\in P$ pairwise incomparable and a linear extension $\sigma$. Then, if $\sigma(x) &lt; \sigma(z)$, then $\sigma(y) &lt; \sigma(z)$ will be generally higher. Because these two events are positive correlated. There are exceptions like trees and series parallel posets, where these events are independent.</p> <p>Generate a random graph $G$. Then, $G$ being planar and being Hamiltonian are negative correlated. Intuitively, planarity is downward closed but Hamiltonianity is upward closed. These two examples inspire us correlations widely exist in posets.</p> <h2 id="four-functions-theorem">Four functions theorem</h2> <h3 id="kleitman">Kleitman</h3> <p>Let $L,U\subseteq B_n$ s.t. $L$ is lower set and $U$ is upper set. Then, $|L\cap U|\cdot|B_n|\leq |L|\cdot|U|$.</p> <h3 id="four-functions-theorem-1"><a href="https://en.wikipedia.org/wiki/Ahlswede%E2%80%93Daykin_inequality">Four functions theorem</a></h3> <p>Suppose $\alpha,\beta,\gamma,\delta$ are four functions from $2^{[ n]}$ to $\mathbb{R}_+$ s.t. for all A,B\in 2^{[ n]},</p> $\alpha(A)\beta(B) \leq \gamma(A\cup B)\delta(A\cap B)$ <p>Then for all set family $\mathcal{A}, \mathcal{B}\subset 2^{[ n]}$,</p> $\alpha(\mathcal{A})\beta(\mathcal{B}) \leq \gamma(\underline{\mathcal{A}\cup \mathcal{B}})\delta(\underline{\mathcal{A}\cap \mathcal{B}})$ <p>where the underlined intersection and union means pairwise intersection and union.</p> <p>This can be extended to any finite distributive lattice. For example,</p> $|A|\cdot |B| \leq |A\wedge B|\cdot |A\vee B|$ <h3 id="fkg-inequality"><a href="https://en.wikipedia.org/wiki/FKG_inequality">FKG inequality</a></h3> <p>Let $L$ be a distributive lattice. A nonnegative function $\mu$ is <em>log supermodular</em> if for all $x,y\in L$ $$\mu(x)\mu(y)\leq \mu(x\wedge y)\mu(x\vee y)$$</p> <p>A nonnegative function $f$ is <em>increasing</em> if it preserves the order.</p> <p>FKG inequality states than given $\mu$ log supermodular and $f,g$ increasing,</p> $\left(\sum_{x\in L}\mu(x)f(x)\right)\left(\sum_{x\in L}\mu(x)g(x)\right) \leq \left(\sum_{x\in L}\mu(x)f(x)g(x)\right)\left(\sum_{x\in L}\mu(x)\right)$ <p>Suppose $\mu$ is a probability measure, such as a ultrafilter, then,</p> $\mathbb{E}(f) \mathbb{E}(g) \leq \mathbb{E}(fg)$ <p>indicates that two increasing functions are positive correlated.</p> <h2 id="shepps-xyz-theorem"><a href="https://en.wikipedia.org/wiki/Fishburn%E2%80%93Shepp_inequality">Shepp’s XYZ Theorem</a></h2> <p>Given $x,y,z\in P$ pairwise incomparable, then for $A$ varying over all linear extensions,</p> $\mathbb{P}[A(x)\leq A(y)]\cdot \mathbb{P}[A(x)\leq A(z)] \leq \mathbb{P}[A(x)\leq A(y), A(x)\leq A(z)]$ <p>Or equivalently,</p> $\mathbb{P}(A(x)\leq A(y)) \leq \mathbb{P}(A(x)\leq A(y)\mid A(x)\leq A(z))$ <h2 id="winklers-theorem">Winkler’s theorem</h2> <p>Suppose $P$ and $Q$ are two posets on the same base set $X$ but with different orders. If for all $x,y\in X$,</p> $\mathbb{P}_P[ A(x)\leq A(y) ] \leq \mathbb{P}_Q[ B(x)\leq B(y) ]$ <p>Then, $Q$ can be obtianed by refining $P$.</p> <h2 id="comparisons-via-linear-extensions">Comparisons via linear extensions</h2> <h3 id="winklers-canonical-linear-ordering">Winkler’s canonical linear ordering</h3> <p>Let $h_P(x)$ be the average rank of $x$ over all linear extensions of $P$.</p> <p>Suppose $x,y\in P$ are incomparable, $P’ = P\cup (x&gt; y)$ and $P’ = P\cup (x&lt; y)$. Then, $h_{P’}(x) \geq h_{P}(x)$ and $h_{P’}(x) \geq 1 + h_{P’’}(x)$.</p> <p>$h_P$ is not always linear, but always well-defined.</p> <h3 id="preferential-ordering">Preferential ordering</h3> <p>Before Winkler, people doing social choice tried $x\triangleright y$ if $x$ less than $y$ happens more often, i.e. $\mathbb{P}[A(x) &lt; A(y)] &gt; \frac{1}{2}$.</p> <p>However, $\triangleright$ is not always a partial order. Because in a poset there may exist 3 elements that are amazingly correlated. Fishburn constructs a set where $x\triangleright y\triangleright z \triangleright x$.</p> <h3 id="intransitive-dice">Intransitive dice</h3> <p>Let three dice $A = [2,2,4,4,9,9 ]$, $B = [1,1,6,6,8,8 ]$, $C = [3,3,5,5,7,7 ]$. Then, $P[A&gt; B ] = P[B&gt; C ] = P[C&gt; A ] = \frac{5}{9}$.</p> <h2 id="13-23-conjecture"><a href="https://en.wikipedia.org/wiki/1/3%E2%80%932/3_conjecture">1/3-2/3 Conjecture</a></h2> <p>This conjecture has not been proven yet. It states for all non-chain poset $P$, there exist two elements $x,y\in P$ such that the probability $x &gt; y$ is within $[\frac{1}{3}, \frac{2}{3} ]$.</p> <p>Some special cases are proven.</p> <h1 id="summary">Summary</h1> <p>By focusing on poset, this course touched nearly every fields of combinatorics:</p> <ul> <li><em>Graph Theory</em>: matching, path covering, perfect graph.</li> <li><em>Probabilistic Method</em>: random graphs, correlated inequalityes.</li> <li><em>Combinatoric Optimization</em>: min-cost flow.</li> <li><em>Geometric Combinatorics</em>: poset polytopes, permutohedron.</li> <li><em>Enumerative Combinatorics</em>: standard Young diagrams, evacuation, P-partition.</li> <li><em>Analytic Combinatorics</em>: generating functions, asymptotic methods.</li> <li><em>Algebraic Combinatorics</em>: Coxeter group, RSK.</li> <li><em>Arithmetic Combinatorics</em> <ul> <li><em>Extremal Combinatorics</em>: Sperner property, subsets with equal sums.</li> </ul> </li> <li><em>Discrete Geometry</em>: Integral polytope.</li> <li><em>Complexity</em></li> </ul> <p>The most impressive thing I feel is that, different branches of math are highly related. After changing to a different point of view, some concepts, ideas, methods that previously looked totally unrelated can conctribute a lot.</p>Xinyu Maxinyuma@ucla.eduThis is a review and summary for course MATH 206A Combinatorics at UCLA, given by Prof. Igor Pak.Polynomial Algorithms2020-12-10T00:00:00-08:002020-12-10T00:00:00-08:00https://zjkmxy.github.io/posts/2020/12/polynomials<p>This post introduces some tricks on polynomials widely used in ICPC. I will try to practice algebraic knowledge as well.</p> <h1 id="polynomial-inverse">Polynomial Inverse</h1> <p><strong>Prop</strong>: Let $R$ be a commutative ring and $I$ be an ideal. Then $[p ]_I$ is a unit in $R/I$ if and only if $\langle p\rangle$ is coprime with $I$.</p> <p><em>Proof</em>: $1\in Rp+I$ is equivalent to $\langle p\rangle+I = R$.</p> <p><strong>Cor</strong>: For $R$ UFD, $f\in R[ X]/\langle X^n\rangle$ is a unit if and only if $f(0)$ is a unit in $R$.</p> <p><em>Proof</em>: $R$ being a domain implies $X^a$ ($a&lt; n$) are the only factors of $X^n$. Suppose $f(0)$ is a unit, then $f$ is coprime with $X^n$ in $K[ X]$, where $K$ is the fractional field of $R$. Suppose $fg = 1$ in $K[ X]$. Since the content of $f$ is associated to $1$, by Gauß’s lemma, $C(f\hat{g})$ is also associated to $1$. Since $f\hat{g} = h\cdot X^n + f(0)\hat{g}(0)$, $[ f(0)\hat{g}(0)]^{-1}\hat{g}$ is the inverse of $f$ modulo $X^n$. The “only if” part is trivial.</p> <p><strong>Prop</strong>: If $fg = h\cdot X^n + 1$, then $(2-fg)g\cdot f = -h^2\cdot X^{2n} +1$.</p> <p>This proposition gives a binary lifting algorithm to calculate $f^{-1}$ modulo $X^n$, in $O(n\log n)$.</p> <h1 id="newtons-method">Newton’s Method</h1> <p>Let $R$ be a UFD and $f\in R[[ X]]$ be a power series. Suppose $f(a)\in Rp$ for some $a,p\in R$, i.e. $a$ is a root to $f$ modulo $p$. Then, consider $x = a+rp$. We have $f(a) + f’(a)rp \in Rp^2$. Since $p\mid f(a)$, if $f’(a)$ is invertible in $R/p^2R$, we have $p\mid f(a)f’(a)^{-1}$. Thus, $x = a - f(a)f’(a)^{-1}$ is a root to $f$ modulo $p^2$ if $f’(a)$ is invertible.</p> <p>Now, substitute $R$ with $R[ X]$ and $p$ with $X^m$. Consider $g\in R[ X, Y]$, but it can have infinite terms on $Y$. If there is some $a\in R$ s.t. $g(0, a) = 0, \frac{\partial g}{\partial Y}(0, a)\in R^{\times}$, then we can calculate the “root” of $g(X, f(X))$ in $R[ X]/\langle X^n\rangle$ for any $n$:</p> <ul> <li>Let $f_0(X) = a$ and $n_0 = 1$</li> <li>Let $f_i(X) = a - g(X, f_{i-1}) \left(\frac{\partial g}{\partial Y}(0, f_{i-1})\right)^{-1}$ and $n_i = 2n_{i-1}$. <ul> <li>Note that $X\mid g(X, f_{i-1})$. Thus, if the first derivative is invertible, all derivatives are invertible.</li> </ul> </li> </ul> <h2 id="examples">Examples</h2> <ul> <li>Inverse $h(X)$: let $g(X, Y) = \frac{1}{Y} - h(X)$. Get $f_{i+1} = 2f_{i} - f_i^2h$. <ul> <li>Condition: $h(0)$ is invertible.</li> </ul> </li> <li>Sqrt of $h(X)$: let $g(X, Y) = Y^2 - h(X)$. Get $f_{i+1} = \frac{1}{2}(f_i^2+h)f_i^{-1}$. <ul> <li>Condition: there exists $a$ s.t. $a^2 = h(0)$ and $2a\in R^{\times}$.</li> <li>For exmaple, let $R = \mathbb{Z}/9\mathbb{Z}$ and $h(X) = X+1$. Starting from $f_0 = 1$ and $2\times 5 = 1$, we can get $f_2(X)= -5X^3+X^2+5X+1$ is the solution modulo $X^4$. <ul> <li>Well, $R$ is not a domain, but it works.</li> </ul> </li> </ul> </li> <li>Exp of $h(X)$: let $g(X, Y) = \ln Y - h(X)$. Get $f_{i+1} = f_i(X)(1 - \ln {f_i(X)} + h(X))$. <ul> <li>Log can be calculated by integral on $\frac{f’(X)}{f(X)}$.</li> <li>Condition: $h(0)=0$</li> </ul> </li> </ul> <h1 id="polynomial-modulus">Polynomial Modulus</h1> <p>Let $K$ be a field. For $f\in K[ X]$ of degree $d$, let $f^T(X) := X^df(1/X)$ be the reverse of coefficients of $f$. Suppose $f = gq + r$, where $f, g$ are of degree $n, m$ resp. Then, $f^T(X) = g^T(X)q^T(X) + X^nr(1/X)$. Since $r$ is of degree at most $m-1$, $X^{n-m+1}\mid X^nr(1/X)$. Also, $q(X)$ is of degree $n-m &lt; n-m+1$. Thus, $q = [ f^T\cdot (g^T)^{-1}]^T (\text{mod }X^{n-m+1})$ and $r = f - gq$.</p> <h1 id="fft">FFT</h1> <p>Let $A$ be an $R$-algebra, and $g\in A^{\times}$ of order $n$. Suppose $\sum_{i=0}^{n-1}g^{ik} = 0$ for all $1\leq k &lt; n$ (generally true if $g^k-1$ is not a zero divisor). For a sequence $a = (a_0, \ldots, a_{n-1})\in R^n$, its discrete Fourier transform is $F(a) = (F_0(a), \ldots, F_{n-1}(a))\in A^n$ where</p> $F_k(a) = \sum_{i=0}^{n-1}a_ig^{ik}$ <p>If $n$ is invertible in $A$, the inverse of DFT is</p> $a_i = \frac{1}{n}\sum_{k=0}^{n-1} F_k(a)g^{-ik}$ <p>This is given by the zero sum condition.</p> <p>The DFT of the cyclic convolution $a * b$ is the pointwise product of their DFT:</p> $\begin{eqnarray} F_k(a*b) &amp;=&amp; F_k(a)F_k(b) \nonumber\\ (a*b)_i &amp;=&amp; \sum_{j=0}^{n-1} a_j b_{(i-j)\mod n} \end{eqnarray}$ <p>Thus, DFT can be used to compute convolution. However, directly computing the sum does not reduce the time complexity. Now we try to accelerate this. Suppose $n = 2^m$.</p> $\begin{eqnarray} F_k(a) &amp;=&amp; \sum_{i=0}^{2^m-1}a_ig^{ik} \nonumber\\ &amp;=&amp; \left(\sum_{i=0}^{2^{m-1}-1}a_{2i}g^{i\cdot 2k}\right) + g^k\left(\sum_{i=0}^{2^{m-1}-1}a_{2i+1}g^{i\cdot 2k}\right) \nonumber\\ &amp;=&amp; F_k'(a_{even}) + F_k'(a_{odd}) \end{eqnarray}$ <p>where $F_k’$ is the DFT of length $n/2$ using $g^2$ as the primitive root instead of $g$. Repeating this, we get $O(n\log n)$ fast Fourier transform (FFT) algorithm.</p> <h2 id="common-settings">Common settings</h2> <ul> <li>$R=\mathbb{Z}$, $A=\mathbb{C}$, $g = e^{\frac{2\pi i}{n}}$</li> <li>$R=A=\mathbb{Z}/p$, for some prime number $p = c\cdot 2^m + 1$. <ul> <li>$p = 0\text{x}\text{C}0000001$, $g = 5$, $m = 30$</li> <li>$p = 0\text{x}78000001$, $g = 22$, $m = 27$</li> <li>$p = 0\text{x}3\text{B}800001$, $g = 3$, $m = 23$</li> <li>$p = 0\text{x}0\text{A}000001$, $g = 3$, $m = 25$</li> </ul> </li> </ul> <h1 id="lagrange-polynomial">Lagrange polynomial</h1> <p>Let $K$ be a field and $f\in K[ X]$ is of degree $d$. If we know its value on $d+1$ different points $x_0, \ldots, x_d\in K$ and $f(x_0), \ldots, f(x_d)\in K$, then we can recover the coefficients of $f$.</p> <p>Let $l_i\in K[ X]$ be defined as:</p> $l_i(X) := \prod_{j\neq i}\frac{X - x_j}{x_i - x_j}$ <p>Then, $l_i$ is $1$ at $x_i$ and $0$ at other points $x_j\neq x_i$. Let $l := \sum_{i=0}^{d} y_il_i$, then $f = l$. Because otherwise, $f-l$ will be a polynomial of degree $d$ but have $d+1$ disjoint roots.</p> <p>To evaluate $f(x)$, we can directly substitute $x$ into $l_i$, and get an $O(d^2)$ algorithm.</p> <h2 id="example-sum-of-powers">Example: sum of powers</h2> <p>Let $f_k(n) = \sum_{i=1}^{n} i^k$. By induction we can prove $f_k$ is a polynomial of degree $k+1$. To calculate $f_k(n)$, it suffices to compute $f_k(0), \ldots, f_k(k+1)$ and then interpolate.</p> <h1 id="multipoint-evaluation">Multipoint Evaluation</h1> <p>Let $R$ be an commutative ring, $f\in R[ X]$ be a polynomial with degree less than $n$, $x_1,\ldots, x_n\in R$ be points. Compute $f(x_1), \ldots, f(x_n)$.</p> <p>Let $p_0(X) = \prod_{i=1}^{n/2} (X - x_i)$ and $p_1(X) = \prod_{i=n/2+1}^{n} (X - x_i)$. Define $r_0(X) = f\mod p_0$ and $r_1(X) = f\mod p_1$. Then it suffices to evaluate $r_0$ on $x_1,\ldots, x_{n/2}$ and $r_1$ on $x_{n/2+1},\ldots, x_{n}$. This divide-and-conquer algorithm works in $O(n\log^2 n)$.</p> <p>In special case there are algorithms that run faster, e.g. $O(n\log n)$, but need more memory.</p> <h2 id="example-factorial">Example: factorial</h2> <p>Let $f(X) := \prod_{i=1}^{n}(X+i)$. Then, $n! = f(0)$. Let $g(X) := \prod_{i=1}^{v}(X+i)$, where $v = \lfloor\sqrt{n}\rfloor$. Then,</p> $n! = \left( \prod_{i=0}^{v-1}g(vi) \right)\cdot \prod_{i=v^2+1}^{n}i$ <p>Thus, it is enough to calculate $g(X)$ on $0, v, \ldots, v(v-1)$.</p> <h1 id="recursive-sequence">Recursive Sequence</h1> <p>Suppose ${a_i}_ {\infty}$ is a sequence s.t. $a_i = \sum_{j=1}^{d} c_ja_{i-j}$. Then, the array is decided by its first $n$ elements.</p> <h2 id="matrix-exponentiation">Matrix Exponentiation</h2> <p>Clearly, we have</p> $\begin{pmatrix} a_{n+d-1} \\ a_{n+d-2} \\ a_{n+d-3} \\ \vdots \\ a_n \end{pmatrix} = \begin{pmatrix} c_1 &amp; c_2 &amp; \cdots &amp; c_{d-1} &amp; c_d \\ 1 &amp; 0 &amp; \cdots &amp; 0 &amp; 0 \\ 0 &amp; 1 &amp; \cdots &amp; 0 &amp; 0 \\ \vdots &amp; \vdots &amp; \ddots &amp; \vdots &amp; \vdots \\ 0 &amp; 0 &amp; \cdots &amp; 1 &amp; 0 \end{pmatrix}^n \cdot \begin{pmatrix} a_{d-1} \\ a_{d-2} \\ a_{d-3} \\ \vdots \\ a_0 \end{pmatrix}$ <p>Thus, we can let $M$ denote the recurrenc matrix and calculate $M^n$ in $O(d^3\log n)$:</p> <ul> <li>$M^{2m} = (M^m)^2$</li> <li>$M^{2m+1} = (M^m)^2\cdot M$</li> </ul> <h2 id="polynomial-modulus-1">Polynomial Modulus</h2> <p>By Hamilton-Cayley theorem, the characteristic polynomial of $M$ is an annihilator of $M$. Let $p(X) := X^d - \sum_{i=1}^{d}c_iX^{d-i}$ and $r_n(X) := X^n\mod p$. Then, $A^n = r_n(A)$. Thus, if $r_n(X) = \sum_{i=0}^{d-1} s_iX^i$, then $a_n = \sum_{i=0}^{d-1} s_ia_i$.</p> <p>However, there is a much cleaner way to view this. Recall that given $c\in R$, we have $R[ X]/\langle X-c \rangle\cong R$ via $X\mapsto c$. This somehow gives an solution for $a_i = ca_{i-1}$ as $a_i = c^i$. We want to generalize this, but $a_1 \times a_1 \neq a_2$. This inspires us that we should <strong>forget the multiplication</strong>.</p> <p>Consider $R[ X]$ as a abelian group. Then, define $f: R[ X]\to R$ as $f(uX^i) = u\cdot a_i$ for all $u\in R$ and $i\in\mathbb{N}$. By the distributivity of $R$, $f$ is a well-defined <em>group</em> homomorphism. Let $I = \langle p \rangle$ be the ideal generated by $p$. We claim that $I$ is in the kernel of $f$. Since $f$ preserves the addition, so it suffices to show that $f(uX^ip) = 0$ for all $u\in R$ and $i\in\mathbb{N}$. This is clearly true since</p> $u(a_{i+d} - c_1a_{i+d-1} - \ldots - c_da_{i}) = u\cdot 0 = 0$ <p>Observe that now $R$ becomes a cocone of the diagram $1\leftarrow I \hookrightarrow R$. And $R/I$ is exactly the coproduct of this diagram, with canonical map $\pi: R\to R/I$ via $f\mapsto (f\mod p)$. By the universality, $f = f\restriction_{R/I}\circ \pi$. Hence, $a_n = f(X^n\mod p) = \sum_{i=0}^{d-1} s_ia_i$, with $s_i$ defined above.</p> <h2 id="berlekamp-massey-algorithm">Berlekamp-Massey Algorithm</h2> <p>This is the inverse problem: given a recurrence sequence $\{a_i\}_ {\infty}$ (first $2d$ elements are enough), can we compute its recurrence relation $a_i = \sum_{j=1}^{d} c_ja_{i-j}$?</p> <p>Again we use a polynomial to represent the recurrence relation. Suppose $p_i\in R[ X]$ s.t. $p_i$ works for $a_j$, $0\neq j &lt; i$. Formally, this means $f(X^j\mod p_i) = a_j$ and $f(X^{j-\deg p_i}p_i) = 0$. Let the error</p> $e_i = a_i + \sum_{j=1}^{\deg p_i} [ X^{\deg p_i-j}]p_i\cdot a_{i-j} = f(X^{i-\deg p_i}p_i)$ <ul> <li>If $e_i = 0$, we can keep it $p_{i+1} = p_i$.</li> <li>If $e_i \neq 0$, we need to fix it. <ul> <li>Suppose $p_k\neq p_i$ is <em>some</em> polynomial we have different from $p_i$. Let $p_{i+1} = p_i + \frac{e_i}{e_k}p_kX^{i-k}$.</li> </ul> </li> </ul> <p>Clearly, $f(X^{i-\deg p_{i+1}} p_{i+1})=0$, so $p_{i+1}$ works for $a_{i}$. One can verify it works for all $a_j$, $0\leq j\leq i$.</p> <p>Now we want to minimize the degree of final $p$. The flexibility we have is $p_k$ when fixing the error. Massey argues that we can use the last $p_k$ s.t. $\deg p_{k+1} &gt; \deg p_k$.</p> <h1 id="references">References</h1> <ul> <li>OI Wiki. <a href="https://oi-wiki.org/math/poly/intro/">Polynomials</a>.</li> <li>Wikipedia. <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial">Lagrange polynomial</a>.</li> <li><a href="https://min-25.hatenablog.com/entry/2017/04/10/215046">Factorial modulo Prime Numbers</a>.</li> <li>Chandan Saha. Computational Number Theory and Algebra. <a href="https://www.csa.iisc.ac.in/~chandan/courses/CNT/notes/lec6.pdf">Lecture 6</a>.</li> </ul>Xinyu Maxinyuma@ucla.eduThis post introduces some tricks on polynomials widely used in ICPC. I will try to practice algebraic knowledge as well.Chess Picking Problem2020-11-24T00:00:00-08:002020-11-24T00:00:00-08:00https://zjkmxy.github.io/posts/2020/11/chess-picking<p>Rephrase the chess picking problem in a more formal way.</p> <h1 id="problem-statement">Problem Statement</h1> <p>Let $M$ be the set of $2n\times 2n$ 0-1 matrices such that</p> <ul> <li>All matrices have the same number of $1$’s</li> <li>Any $n\times n$ submatrix has at least one $1$</li> <li>The number of 1’s, $k$, is the minimal possible value</li> </ul> <p>Calculate $k$ and $|M|$.</p> <h1 id="solution">Solution</h1> <h2 id="minimal-number-of-1s">Minimal Number of 1’s</h2> <p>Let $A = S_{2n}\times S_{2n}$ act on $M$ by reordering the rows and columns. Clearly, this group action is well-defined. Now consider $|M/A|$, the number of different orbits. We can use one $S_{2n}$ to make the diagonal all $1$’s, and still have the ability to reorder the column.</p> <p>Define a graph $G = (N, M)$, where $N = \{1,\ldots,n\}$ and there is an edge from $x$ to $y$ iff $(x,y)$ is $1$ in $M$. We may assume there is a self loop on each node. Removing those $2n$ self loops, we will get a simple graph with $k-2n$ edges. Let the second $S_{2n}$ act as relabeling the nodes, so it suffices to consider unlabeled graphs. Now, the submatrix condition becomes that:</p> <ul> <li>For any set of $n$ nodes, there exists an edge pointing out.</li> </ul> <p>And $|M/A|$ is equal to the number of such simple graphs, up to isomorphism.</p> <p>Clearly, $n$ edges does not work. But there is a solution with $n+1$ edges: a $n+1$ cycle plus $n-1$ isolated nodes. Thus, $k = 3n+1$.</p> <h2 id="number-of-orbits">Number of Orbits</h2> <p>We can prove this is the only graph.</p> <ul> <li>If there is a cycle of length $c&lt; n+1$, then the rest $2n-c$ elements will have only $n-c+1$ edges. Thus, there exists at least $n-1$ nodes that does not have an outgoing edge. Pick the $c$ nodes from the cycle and $n-c$ nodes with 0 outgoing degree, and we get a cut.</li> <li>If there is no cycle at all, e.g. forest, then we can simply sort nodes in the topological order and take the last $n$ ones.</li> </ul> <p>Thus, $|M/A| = 1$. A (somehow) surprising result.</p> <h2 id="number-of-matrices">Number of Matrices</h2> <p>Since there is only one orbit, $|M| = [A: \mathrm{stab}_{A}(x)]$, where $x$ is a solution.</p> <p>Without loss of generality, let $x$ contain the cycle $1\to 2\to\cdots\to (n+1)$ plus isolated points $(n+2,n+2),\ldots,(2n,2n)$. Consider $|\mathrm{stab}_{A}(x)|$:</p> <ul> <li>We can reorder the $n+2,\ldots,2n$ rows in any permutation, as long as we do the same to those columns: $(n-1)!$</li> <li>For the cycle part, the stablizer set size is equal to automorphisms of an <em>unlabeled</em> cycle: $2(n+1)$ <ul> <li>It’s hard to describe why verbally. But you can have a try: pick any row in the cycle to be the first row, and you’ll find there are exactly 2 ways to do the rest.</li> </ul> </li> </ul> <p>Thus, the total number is $|M| = \frac{(2n)!^2}{2(n-1)!(n+1)}$.</p> <p>We can also directly count this:</p> <ul> <li>For the $n-1$ isolated nodes, we can insert them into any columns &amp; rows: ${2n \choose n-1}^2$ <ul> <li>And then, we can choose the position to place 1 at each row: $(n-1)!$</li> </ul> </li> <li>For the $n+1$ cycle, we can shuffle the rows as a free circular permutation: $\frac{1}{2}n!$ <ul> <li>And then each row is different so we can shuffle them: $(n+1)!$</li> </ul> </li> </ul> <p>Thus, the total number is $|M| = \frac{1}{2}(n-1)!n!(n+1)!{2n \choose n-1}^2$.</p> <h1 id="ps-complexity">PS: Complexity</h1> <p>If the answer is $|M|$ modulo $p$, then we have a faster algorithm $O(\sqrt{n}\log n)$ via multipoint evaluation. See <a href="https://min-25.hatenablog.com/entry/2017/04/10/215046">here</a>.</p> <h1 id="reference">Reference</h1> <ul> <li><a href="https://brooksj.com/2019/07/24/%E5%A4%9A%E9%A1%B9%E5%BC%8F%E7%9B%B8%E5%85%B3%E7%AE%97%E6%B3%95%E9%9B%86%E6%88%90/">Polynomial Algorithms</a></li> <li><a href="https://koba-e964.hatenablog.com/entry/2019/05/22/020912">Factorial mod Prime in Rust</a></li> </ul>Xinyu Maxinyuma@ucla.eduRephrase the chess picking problem in a more formal way.