Final Project Ideas
You are welcome to do any project you like — the lists below are just suggestions. If you are unsure whether a project idea works, you can email any of the course staff.
Easy Projects
Greedy Algorithms & Correctness
Formalize a classic greedy algorithm (e.g., Dijkstra’s algorithm for shortest paths or the greedy set cover algorithm) and prove its correctness using invariants. If choosing set cover, prove the logarithmic approximation guarantee.
Varshamov–Tenengolts (VT) Codes
Formalize VT codes, prove their single-deletion correction property, and show that their redundancy is asymptotically optimal. This involves primarily combinatorial and number-theoretic arguments.
Asymptotic Calculation Simplification Tactic
Develop a Lean 4 tactic to automatically simplify asymptotic expressions (e.g., $O$ and $\Theta$ bounds). This could include normalization rules, dominance comparisons, and the simplification of recurrence-style expressions.
Karp’s NP-Complete Reductions
Formalize a small collection of classical NP-completeness reductions (e.g., SAT $\to$ 3SAT $\to$ Clique). Emphasize the correctness of the reductions rather than building a full formalization of polynomial time complexity.
Johnson-Lindenstrauss Lemma
Formalize the lemma demonstrating that a set of points in a high-dimensional space can be embedded into a space of much lower dimension while nearly preserving distances. Requires formalizing probability bounds like Chernoff or Hoeffding bounds.
Medium Projects
Discrete Brascamp-Lieb Inequality
Formalize the discrete Brascamp-Lieb inequality, which generalizes classical inequalities like Hölder’s and Loomis-Whitney. Involves working with entropy or information-theoretic inequalities over discrete domains.
Karger Cut Counting Bound
Formalize bounds on the number of near-minimum cuts in a graph, which underlies Karger’s randomized min-cut algorithm. This project involves combinatorics and probabilistic reasoning.
Decision Tree Measures
Define deterministic, randomized, and certificate complexity for Boolean functions. Prove basic inequalities between these measures and analyze small example functions.
Query Complexity
Build a formalized framework for query complexity models. Prove relationships such as degree lower bounds or the separations between deterministic, randomized, and nondeterministic query complexity.
Communication Complexity
Formalize communication protocols and rectangle partitions. Prove lower bounds for specific functions using techniques such as fooling sets or rank arguments. Attempt to define randomized and nondeterministic communication complexity.
Streaming Algorithms
Formalize frequency moment estimation in the streaming model, such as the Alon-Matias-Szegedy (AMS) sketch or Count Sketch. Prove their approximation guarantees and error bounds using probabilistic analysis.
BLR Linearity Test
Formalize the Blum–Luby–Rubinfeld linearity test and prove its soundness using Fourier analysis on the Boolean cube. Connect the testing guarantees to the distance from linear functions.
KKL Theorem (Fourier Analysis)
Formalize the Kahn-Kalai-Linial (KKL) theorem, a foundational result stating that every Boolean function has a variable with high influence. Requires setting up the Bonami-Beckner hypercontractivity inequality on the Boolean cube.
BCH Codes Construction
Construct BCH codes using finite fields and minimal polynomials, and prove their designed distance guarantee. Requires foundational development in algebraic coding theory.
Expander Codes
Define expander graphs and construct expander codes. Prove linear distance and the correctness of a simple decoding algorithm based on expansion properties.
Dvir’s Kakeya Set Lower Bound (Solo-friendly)
Formalize Dvir’s polynomial method proof, which establishes a lower bound on finite-field Kakeya sets. Involves polynomial interpolation and combinatorial geometry.
VC Theorem
Formalize VC dimension and shattering, and prove the Sauer–Shelah lemma and the VC generalization bound. Combines combinatorics with learning theory.
Singleton-Type Bound for Locally Recoverable Codes
Formalize locally recoverable codes (LRCs) and prove a Singleton-type bound relating length, dimension, minimum distance, and locality parameter.
Tamo-Barg Construction of LRCs
Formalize the Tamo-Barg construction of optimal locally recoverable codes using polynomial evaluation over finite fields. Prove that this construction achieves the Singleton-type bound for LRCs.
Hard Projects
Proof of Correctness of AKS Primality Algorithm
Formalize the algebraic correctness of the Agrawal-Kayal-Saxena deterministic primality test. Prove the core congruence properties in polynomial rings over finite fields without necessarily formalizing the polynomial-time computational complexity.
Sensitivity Conjecture (Huang’s Proof)
Formalize Huang’s proof resolving the sensitivity conjecture using the spectral properties of subgraphs of the Boolean cube. This should include the reduction from the sensitivity conjecture to the hypercube question (bounding the maximum degree of induced subgraphs of the Boolean cube), as well as the spectral argument that resolves it. Requires the development of eigenvalue arguments and linear algebra machinery.
Evasiveness of Monotone Graph Properties
Formalize the result that every monotone graph property is evasive (using topological or group-action arguments). This will require substantial combinatorial or topological machinery in Lean.
Schaefer’s Dichotomy Theorem
Formalize the dichotomy theorem for Boolean CSPs, proving hardness lower bounds. Use gadget reductions to avoid formally defining polynomial time (skipping the algorithmic tractability side). Requires modeling constraint languages.
Slice Rank and Capset Bounds
Formalize the slice rank method and apply it to bound cap sets in $\mathbb{F}_3^n$. Involves tensor methods and polynomial techniques.
MRRW Bound (Navon–Samorodnitsky Proof)
Formalize the McEliece–Rodemich–Rumsey–Welch linear programming bound using Fourier-analytic techniques in coding theory. Involves inequalities, orthogonality, and harmonic analysis on Hamming space.
Spectral Graph Theory & Cheeger’s Inequality
Formalize the discrete version of Cheeger’s inequality, relating the edge expansion (or conductance) of a graph to the second smallest eigenvalue of its Laplacian matrix. Requires developing the graph Laplacian and foundational spectral bounds.