Computer-Aided Verification of P/NP Proofs: A Survey and Discussion

We survey a collection of proofs towards equality, inequality, or independence of the relation of P to NP. Since the problem has attracted much attention from experts, amateurs, and in-betweens, this work is intended as a pointer into directions to enable a “self-assessment” of ideas laid out by people interested in the problem. To this end, we identify the most popular approaches to proving equality, inequality, or independence. Since the latter category appears to be without any attempts to follow the necessary proof strategies, we devote a Section to an intuitive outline of how independence proofs would work. In the other cases of proving equality or inequality, known barriers like (affine) relativization, algebrization, and others are to be avoided. The most important and powerful technique available in this regard is a formalization of arguments in automated proof assistants. This allows an objective self-check of a proof before presenting it to the scientific community.


I. INTRODUCTION
Among the most famous open questions of computer science is whether the complexity classes P and NP are equal or not, or if this relation is provable at all (say, within Zermelo-Fraenkel set theory).Given a decision problem, whose size is measured by an integer n, we may qualitatively (and informally) think of P as the set of all decision problems that are solvable in a number of steps that is at most some polynomial in n, whereas NP is different in asking (only) for the verifiability of a given answer in a polynomial amount of steps (depending on n), based on a certificate string whose size is as well at most polynomial in n.More formally, let be a finite alphabet, and let L ⊆ * be a formal language.For a given word w ∈ * , let the problem be the decision about whether or not w ∈ L holds.In that context, a language L is in the complexity class P if and only if there is an algorithm A that outputs A(w) = 1 ⇐⇒ w ∈ L (and zero otherwise), after at most p(|w|) many steps, where p is some polynomial (that depends only on L) and |w| is the number of symbols The associate editor coordinating the review of this manuscript and approving it for publication was Byung-Gyu Kim. in w.It is important to note that A takes only w and no other input.The class NP is characterized by allowing algorithm A to take a limited amount of auxiliary information for the decision.Formally, let q be another polynomial, and let the algorithm take two strings w, x to output A(w, x) = 1 ⇐⇒ w ∈ L. Herein, x may explicitly depend on w, and must not have more than a length of |x| ≤ q(|w|), where q is a polynomial.Other than that, the same constraint on the running time of A holds, i.e.A(w, x) must terminate with 0 or 1 after at most p(|w|) many steps, 1 where the polynomial p again depends on L (only).
Despite its conceptual simplicity, the problem of whether the inclusion P ⊆ NP is strict or not has yet escaped all attempts to answer whether the two classes are equal or not.Many domain experts believe that the two classes are distinct [1], but there are also a considerable lot of papers that claim equality of the two classes.

A. PAPER SEARCH AND SELECTION METHODOLOGY
This work presents some summary statistics about the developments of proofs around P-vs-NP towards equality, inequality, or unprovability of a solution, which starts from a collection by Woeginger [2], and the survey by Aaronson [3], and extends this list by papers appeared after these references.We herein do not provide a full account of each solution but rather seek to overview the entirety of attempts that have been made.As Gasarch has eloquently put it [1]: ''The practical impact would come not from the result itself, but from the new ideas needed to achieve it.''.
In turn, and despite all heuristic arguments to not become overly excited about yet-another-proof-of-P-vs-NP, the problem has attracted many people with many potentially interesting ideas, but the number of proofs coming up simply exceeds the community's capacities of verifying them.
Starting from the aforementioned surveys, we extended the list by querying the following digital libraries: For the following keywords ''P-vs-NP'', ''Cook's conjecture'' (with and without the apostrophe), and ''P/NP question'', ''P equal to NP'', ''P not equal to NP'' and some slight variations thereof, all using full phrase search (not letting words separated by a space being treated as independent search terms), we used Boolean connectives or conditions between these keywords and ''proof'' as the only second keyword (using the keyword ''answer'' delivered too many results unrelated to our goal, to be considered as useful).We intentionally did not specify additional keywords like ''equal'', ''not equal'', ''unprovable'' or ''independence'', since we were interested in working on all these cases, so no restriction based on such keywords appeared necessary.Given the relevance of the topic, the resulting number of papers referring to the issue was then narrowed down by excluding papers that matched one or more of the following criteria: • Reference to the question as an unproven hypothesis, but not attempting to answer the question itself (for example, speaking about intractability or the hypothesis that some problem is not solvable in polynomial time unless P is equal to NP), except for other surveys about the same topic.
• References that used P and/or NP separately in their arguments, but not speak about the relation between these two explicitly, except for known conditions (such as the inclusion of P in NP).
• References that we already knew from the previous lists (see above).To screen the papers of the existing lists (such as Woeginger's [2]), we queried normal i.e., not scientifically special-ized, search engines to locate papers that were published on personal websites, blogs, and others.Works that we could not locate in this way were searched using www.archive.organd with the help of weblinks that were provided in Woeginger's list.This lets us retrieve almost all (except for a few) papers that disappeared from the web as of today.
As of the time of writing this article, we have collected a total of 126 papers dealing with how P relates to NP, which can roughly be divided into three classes2 : • Proofs that P = NP, proposed in a total of 67 papers, • Proofs that P ̸ = NP, proposed in a total of 55 papers, • and proofs that the problem itself is unsolvable, as only 4 papers claim.This is in considerable contrast to the (third) poll made by Gasarch [1], which found ≈ 88% of people believe P ̸ = NP, (only) 12% believe P = NP, but nobody believed in independence (presumably from Zermelo-Fraenkel set theory with the axiom of choice (ZFC)) or void of an opinion (different to earlier polls about the same question, also conducted by W. Gasarch3 ).
Taking these numbers as statistics about the community's belief about the answer to the question, however, could be misleading, since many proof proposals have been contributed, but some lack the required rigor in their definitions and reasoning.
Nonetheless, the problem remains important and outstanding, however, due to its apparent simplicity may be in danger of going unresolved forever if we think about it as a ''queuing problem'': if λ > 0 papers about P-vs-NP come up per time unit (undefined, but different choices may only scale λ accordingly), and 0 < µ < λ undergo reviews to identify flaws, then the number of unverified proofs will grow towards infinity.Hence, even if the solution is found someday, there would be a decent chance for it to vanish in the vast amount of competitor work on the topic.To get some numbers, let us decompose the above figures into more details, to get the average papers count to appear, versus the average number of papers to be verified.Figure 1 displays the counts, excluding (sometimes frequent) updates or revisions of papers (and excluding work that escaped our eyes 4 ).
We believe that all proof attempts deserve scientific attention and review in the first place, and not be rejected because appearing to be from an ''overly ambitious amateur''.History has lots of examples of groundbreaking accomplishments made by people from all professions or institutions and at all levels of education.However, time for peer review is usually scarce, and resources need to be primarily dedicated to the most promising proof proposals among the many.To this end, we believe that putting proofs to automated proof assistants can be a first step of ''self-assessment'' to accomplish a baseline level of scientific rigor before approaching the community for a peer review.This may save time for professional peer reviews and avoids flaws in the construction of proofs at an early stage.The ambition to ultimately answer P-vs-NP must not be under-appreciated, making the empowerment of the community a desirable goal (motivating this work).

II. TOO MANY PROOFS MAY LEAVE A PROBLEM UNSOLVED TOO
The number of papers having received peer review (or at least some form of scrutiny) is considerably smaller, but it is worth mentioning that some papers indeed got to publication following scientific processes.Over the range of years from 1986 until 2023, only 28 papers seem to have received (documented) consideration by the scientific community.Taking an average of the counts in Figure 1, we find λ > 5 papers to appear per year on average.In the 23 years that the collection covers (which excludes the exceptionally quiet period from 1987 until 1995), about 28 papers have received some sort of attention from the scientific community; Table 1 overviews these with the status as far as we could determine it.This means that µ ≈ 28 papers/23 years ≈ 1.22, i.e., less than 2 papers are being reviewed per year.If we view the incoming new proofs as filling a queue that requires peer reviews to handle each paper, the capacities of the scientific peer review system will necessarily lag behind.Conversely, and based on well-known facts from queuing theory, the expected number of papers queued for verification will long-run diverge towards infinity.
It might be for this (among other) reasons that some venues like the ACM Journal of Computing have adopted a designated ''P/NP policy'' [44], which, once an author submits a proof claim for a review, explicitly forbids any further submission speaking about P-vs-NP by the same author for two years (simply to avoid a ''paper overflow'').In light of the above numbers, this appears reasonable and also helps to stay away from known routes to failure.Specifically, some ''meta-results'' are known to be barriers to avoid for a decent attempt, such as relativization, algebrization, natural proofs, and others that we will discuss below.These allow us to cross out many candidate proofs from the list in a review, but still, a vast amount of work remains unverified as of today, leaving the possibility that a correct argument and answer may already have been found.
For the particular P-vs-NP problem, a considerable body of research deals with the identification of ''dead-ends'' by classifying certain arguments or proof techniques as incapable of settling the question in the first place.These so-called barriers are primarily used as heuristics for a quick judgment about whether or not some new proof attempt deserves a deeper inspection.In light of a critical discussion about the use of such barriers that we let follow in Section IV-A, we made our screening of proofs agnostic of these heuristic conditions and instead focused on the possibility of formalizing proofs in automated proof assistants like Isabelle/HOL or Coq.
We did our screening without alluding to the known barriers against proving P-vs-NP, receiving more attention in Section IV-A since we were interested in the exploration of the idea of using proof assistants to help with independent verification of the (too many) proofs around P-vs-NP.

A. THE POSSIBILITY TO ''OBJECTIVELY SELF-REVIEW'' ONES PAPER
Based on the collection of proofs, we believe that the scientific community would simply be overwhelmed by the sheer flood of papers coming out, why not empower the ones interested in the problem with running their own objective and independent reviews?
Clearly, a human reader, if it were the author itself, is biased, but formal proof assistants like Isabelle/HOL [45] or Coq [46] may help out here.
While it is prestigious to present a mathematical proof to the community, the equally important task of independent verification, today almost in all cases done by a peer-review, is far less ''attractive'' and offers only little incentive to domain experts to invest lots of time here without any revenue for it.Somewhat ironically, the P-vs-NP question is again special in this regard, since the aforementioned intuition behind NP is it capturing all problems to which a given solution is efficiently verifiable.So, the question is whether a proof about P and NP can itself be verified in reasonable (e.g., polynomial) time by humans or a machine.A machineverification has the appeal of being objective by construction.
Of course, objectivity only holds to the extent of the human accurately mapping human-made proof into a machine-readable form that allows an automated verification.However, with the goal being a relief for domain experts from the burden to review P-vs-NP proofs, the author of such a proof has a natural interest in an accurate representation of the proof to a machine, who can then subsequently do an independent verification.This idea of assigning the verification back to the author, but obliging the person with the provisioning of a machine-verifiable proof has been investigated along a research project about which this paper in parts will report.

B. FORMALIZING PROOFS
''Proof assistants'', or interactive theorem provers hereafter, refer to software that aids in the construction of mathematical proofs.Unlike automated theorem provers, which try to prove theorems without further instruction from humans, proof assistants tell the user, acting like a programmer applying proof techniques or tactics, which claims or statements are left open to prove as sub-goals, until the list of open goals becomes empty, which finishes the overall proof.These interactive assistants may also opportunistically use automated provers to propose proof strategies or prove small sub-goals directly.
Most modern proof assistants base their foundation on some variant of typed lambda calculus rather than ZFC, as they seem to provide a much more suitable environment for automatic solvers.There exist projects that formalize mathematics in ZFC [47] but this is the exception, rather than the rule.
Our choice of proof assistant for this project has been Isabelle/HOL [45], as it aims to provide a logic and language similar to that typically seen in publications, in addition to powerful tooling.We believe it to be more pleasant and intuitive to work with than systems like Coq [46] and Lean, which provide powerful dependent type theories and in some cases enable much more concise and elegant definitions, but lag behind in readability and automation.

C. FORMALIZATION FOR AN ''OBJECTIVE SELF-REVIEW''
The motivation to look into proof assistants for verification of arguments has various appealing aspects, such as: • It is not possible to omit implicit assumptions, since the proof assistant will throw errors if an attempt is made to use an assumption that was not stated in the initial hypotheses or some sub-goal is left unproven (for instance, consider an induction that is not properly started, or an inequality that may be intuitive but still not proven rigorously).And though there exist debugging commands that allow developers to skip the proof of a statement, proof documents containing them are not considered valid, and it is easy to determine if any such command was used.5 • Especially for complexity theoretic arguments, it is easy to overlook matters like the explicit construction of the Turing machines (TMs) involved, if a proof is just done on paper; the proof assistant's requirement to present all arguments fully formally, while leading to significant increases in development time, naturally avoids such pitfalls.
• The computer has no personal interests or bias towards or against certain affiliations or backgrounds and cannot be convinced by authority, reputation, or other subjective (human) factors.This enables anyone (who is willing to learn to work with a proof assistant) to get fast, direct, and objective feedback.
The last two points imply a degree of objectiveness when somebody formalizes one's own proof about P-vs-NP since even though the person may have a strong personal wish for the work to be correct, the proof assistant will mercilessly point out any error.Even if the author then attempts to manipulate the proof's code towards making the proof assistant accept, it will end up in either (i) adding tweaks like skipped (sub-)proofs when compared to the paper version, which clearly marks weak spots in the line of argument, or (ii) changing the proof entirely, such that it no longer corresponds to the paper version.The second case is not necessarily problematic, since as long as the deviating proof is correct, the intended result has still been proven.The consistent translation problem then remains, only in the converse direction, since we then need to convert the machine-readable and -checkable proof back into a text version that a human reader can understand and verify.The consistency issue, however, is vital in terms of definitions, axioms, and basic assumptions.Definitions of concepts on the paper may -even slightly -differ from the way they are formalized, i.e., ''programmed'', in the proof assistant.Hence, the equivalence of concepts on paper and their counterparts (of the same name) in the proof document needs to be verified manually by humans.Otherwise, the truth asserted by the proof assistant may have little to say about the correctness of the corresponding proof on paper.
At least for the P-vs-NP question, the problem of how accurately a textbook proof is mapped into its formalized version is crucial in terms of how the underlying definitions are implemented.Provided that the theoretical and implemented concepts (definitions, axioms, etc.) are verifiably consistent, the match between the argument flow on paper and in the proof assistant becomes secondary, since if the formalized proof is correct, we do have a correct proof.

III. OVERVIEW OF PROOF ATTEMPTS
Many arguments for equality are based on seemingly polynomial-time algorithms for some NP-complete problem, whose existence is known to imply equality of the two classes.Arguments for inequality have various roots, some identifying certain properties that all problems in NP must have, but which are absent at some problems in P (thus concluding inequality) or extending the known proper inclusions between complexity classes or lower bounds on the complexity required to solve certain problems, towards a proper inclusion of P inside NP (the inclusion P ⊆ NP is trivially true).
Arguments for independence, i.e., unprovability of either relation (= or ̸ =), would require (i) a choice for an axiomatic system relative to which independence is concluded, and (ii) two models of that axiom system, one in which P = NP, and another one in which P ̸ = NP.At least the (four) papers [48], [49], [50], [51] mentioned in the above summary statistics do not follow this general line of arguments, and applications of rigorous techniques (see Section III-D) seem to have not been tried so far, except to study the barriers [52], [53].

A. FORMALIZED PROOFS AROUND P-VS-NP
A few of the proofs we found have been fully or partly formalized, such as [54] (partly, but verified to the extent it was formalized and revised to fix mistakes that the formalization revealed), or [55] and [56] (both for which the formalization disclosed flaws).
Probably the most complete formalization of any P ̸ = NP proof attempt is due to René Thiemann who formalized the paper ''On P Versus NP'' by Lev Gordeev using Isabelle [57].In his paper, Gordeev claims to have shown that no circuit of polynomial size can solve CLIQUE.Gordeev's approach is to generalize ''Razborov's theorem'' which proves this fact for monotone circuits (i.e.those only composed of ∨ and ∧) to non-monotone circuits which can also contain negation.The attempts to verify the paper in Isabelle uncovered problems with the proof.As a byproduct, an Isabelle formalization of Razborov's theorem was published in a computer-verified version [58].
Among the methods to tackle the problems, linear programming turned out as popular, and used in, e.g., [9], [25], [26], [86], [87], [96], [100], all of which formulate an NP-complete problem as a linear program.The necessary polynomial size of the resulting linear program is in contrast to the results of Yannakakis [10], who proved that any such linear programming formulation of the traveling salesperson problem would be of exponential size.Further methods include modifications of the TMs themselves, such as implementing an oracle query mechanism efficiently [74], converting nondeterminism into determinism while preserving polynomial complexities [100], [101], proving the equality of P and NP to a third class that is introduced newly for this purpose [73], [102], or using category theory [85].
Some works, interestingly, do not allude to P-vs-NP at all, except in the title, such as [103]: this work states the equality in the subtitle, however, considers a linear optimization problem without any integer constraints and with no obvious relation to either P or NP.

D. APPROACHES TO PROVE UNPROVABILITY (INDEPENDENCE)
Proving that something is not provable in some given axiomatic system is called an independence proof.This is a relatively rare kind of argument found throughout mathematics, but a few notable examples do exist such as (mentioning only a small sample here): • The continuum hypothesis, to which half of the answer was contributed by Gödel [145], and the other half is due to Cohen [146], [147] • The independence of the axiom of choice from the remaining Zermelo-Fraenkel axioms of set theory was also proven by Paul Cohen, • The independence of the parallel axiom and Euclid's other axioms of plane geometry was proven by Eugenio Beltrami.A general proof of independence is a model-theoretic argument formulated relative to a certain set of axioms, such as for example, Zermelo-Fraenkel (ZF), perhaps including the axiom of choice (ZFC).If ψ is a logical formula of which we seek to prove that φ cannot be proven from a set of axioms A, we need to prove two things: 1) The existence of a model for A under which ψ is provably true, 2) and the existence of (another) model for A under which ψ is provably false, so that in total, we can conclude that there is no way to logically deduce ψ from A, nor is there a way to refute ψ by proving that the truth of ¬ψ is implied by A.
Hence, like a proof of logical equivalence, an independence proof has ''two directions'' to show, and we can outline the argument easiest letting ψ be the parallel postulate, stating that ψ : Given any straight line and a point not on it, there ''exists one and only one straight line which passes'' through that point and never intersects the first line, no matter how far they are extended [148].A model under which ψ is true is the plane geometry, which is intuitively easy to believe (yet considerably harder to prove formally).Another model under which ψ is false is the geometry on the surface of a sphere: thinking of a straight line to be a set of points without a defined start or end, it is not difficult to imagine a given circle (as the ''line'' about which ψ speaks) and a point not on the circle through which we can easily draw two further circles on the surface of a sphere that does not touch the given first line.The other four of Euclid's postulates remain true in both models, the plane and the surface of a sphere.Hence, ψ cannot be logically derived from the other axioms.
In total, we found only four papers claiming the unprovability of P-vs-NP in any sense.However, none of these works applied the proving strategy of constructing models in which both, equality and inequality would hold.Some works [48], [149] argue the impossibility of an answer due to the problem itself being ill-posed.In light of rigorous definitions of both classes, P and NP, together with a well-formed axiomatic foundation of set theory (such as ZFC), this argument would not hold.The work of [49] refers to ZFC and presents a reformulation of the question similar as for relativization along a self-assessment, but was also counter-argued soon after by AMS Mathematical Reviews [2].The work of [50] modified the Peano axioms for the natural numbers towards 13518 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.two different axiomatic systems, one in which equality of the classes holds, the other, in which P and NP are not equal.In being a change of the axiomatic system, this does not count as an independence proof.It is, however, noteworthy to mention that also relativization (discussed later in Section IV-A) can be interpreted as an extension to the axiomatic system, via the additional oracle assumption, which allows to prove equality or inequality, depending on how the extension (e.g., the oracle) looks like.The authors of [68] and [123] identified a logical inconsistency in ZFC itself and stated the invalidity of the continuum hypothesis (CH) [123] or Cook's theorem that satisfiability is NPcomplete [68].Given that CH already has been proven as independent of ZFC by Kim, and that Cook's theorem has also undergone countless verifications, the far-reaching consequences that [123] gives, including P ̸ = NP and the invalidity of many fundamental results of complexity theory despite there being manifold verifications and proofs of the opposite, makes this work interesting to mention, yet invalid from a scientific perspective.Finally, the work of [51] more or less philosophically speaks about the problem, unfortunately, neither following the required proof strategy of constructing models, nor being in itself consistent, since the paper title suggests the impossibility of proving how P relates to NP, but the paper itself concludes with the final line that ''nonetheless P ̸ = NP''.

IV. FINDINGS
When going into a self-verification of a proof using formalization in languages like Isabelle/HOL, it nonetheless is necessary to understand what dead-ends need to be avoided.To ease following the upcoming descriptions, we refer to Table 2 for a list of symbols.

A. BARRIERS: (AFFINE) RELATIVIZATION, ALGEBRIZATION AND NATURAL PROOFS
The earliest barrier against separating or equalizing P and NP is based on the concept of oracle TMs: Fix any formal language A ⊆ * over a (fixed) alphabet , and define a modified TM with the ability to decide w ?∈ A in a constant amount of steps for any w that it produces on its tape.We call A an oracle, and a Turing machine M endowed with the capability of querying A is an oracle-TM denoted as M A .Complexity classes are definable in a canonical way by allowing the defining TM to access A, which naturally leads to generalizations of P and NP as P A and NP A .In such worlds, which we call relativized, a famous result is the following: Theorem 1 (Baker, Gill, and Solovay [150]): There are oracle sets A, B for which P A = NP A and P B ̸ = NP B .This has several consequences, among them: 1) Oracles, as a proof technique, apparently cannot settle the issue between P and NP, since there are oracles that lead to either possible outcome.2) Given any PROOF towards some (any) relation between P and NP, one may simply rephrase PROOF into speaking about oracle-TM whenever it uses a (normal) TM.If the proof remains intact under such generalizations, in which case it is said to relativize, it will -no matter what it originally claimed about Pvs-NP-be in contradiction with Theorem 1.Hence, the usual conclusion is that any such relativizing proof cannot be effective against the original question.More generally (without alluding to P-vs-NP), if a result is such that its proof generalizes towards another setting (e.g., a different space, weaker hypotheses, or similar), but the conclusion is provably wrong in the generalized context, then the original proof must be flawed.Such relativization, for example, applies to arguments based on diagonalization.The technique itself does not hinge on (not) using oracles, so it is straightforward to modify wellknown results, such as the deterministic time hierarchy theorem (DTHT).
Theorem 2: Let f : N → N be time-constructible, and let t, T : N → N be such that lim inf n→∞ t(n) log t(n)/T (n) = 0 and t(n) ≥ n for all n.Then, whose proof uses diagonalization, can be modified into stating, under the same hypotheses, that for complexity classes that have access to the oracle A; and the oracle A can be any language here.
Proofs have therefore been said to ''relativize'' if they go through just as usual, under every possible oracle, i.e., or by contraposition No matter what PROOF O thus concludes about P-vs-NP, it will contradict Theorem 1 and hence be found wrong in (4) by putting O = A or O = B with the oracles from the Baker-Gill-Solovay theorem.Then, (4) indicates to abandon PROOF for this reason.This is how the relativization barrier is usually applied.
In the past, some results were found to not relativize because their reasoning about Turing machines is so specific that the oracle query mechanism, e.g., modeled by a designated query state ''?'', is not trivially considerable in the proof without substantially changing the argument.An example of such a result was the Cook-Levin theorem.The discovery of more general barriers, such as algebrization [53], [151], however, exhibited also these results as relativizing, only under a modified form of oracle.Technically, the oracle was changed from a set of strings to a family of low-degree polynomials that extend the space of possible queries and enable reasoning with techniques like arithmetization, as introduced in the context of proving the famous equality IP = PSpace.Using the resulting generalized concept of algebraic oracles, we have a sibling to the second part of Theorem 1: Theorem 3 (Aaronson, Widgerson [151]): There exists an algebraic oracle Ã such that P Ã = NP Ã.As a consequence, any proof of P ̸ = NP will require non-algebrizing techniques.
Further refinements and other barriers (not chronologically mentioned here) are local checkability [152], and natural proofs [153].The most recent unified account for relativization and algebrization was presented in [52] and [154], which described how to integrate the oracle assumption in the axiomatic system from which P-vs-NP shall be analyzed.Specifically, introducing the concept of an affine oracle, they call a proof relativizing if it is a theorem of the axiomatic system extended by ''the oracle assumption'' TheOA, ZFC + [O is a language] + [FP equals (TheFP) O ] =: TheOA (5) in which • ZFC is the usual Zermelo-Fraenkel axiomatization of set theory including the axiom of choice, • O is a language in the complexity-theoretic sense, considered as a mapping {0, 1} * → {0, 1} (letting the set be represented by its characteristic function acting on strings) • FP is a symbol of the signature to formulate propositions, and taken to be the class ''TheFP'' of polynomial time computable functions (defined in the standard way using Turing machines or any equivalent thereof), with explicit oracle access to O.
In the notation of [52], the collection of all theorems implied by the axiomatic system (5) will depend on the oracle, and therefore be denoted as the relativized complexity theory CT (O 0 ) for the fixed oracle language O 0 .The derived symbols CT ( * ) then denote the relativized complexity theory (against all possible oracles), while CT (0) is the nonrelativized universe, represented by the empty oracle O(x) = 0 for all x ∈ {0, 1} * , in which classes defined with access to the oracle, such as FP, match their conventional counterparts.
In other words, the axiomatic system is extended by letting the property of ''polynomial time computability'' be redefined under the additional capability of evaluating the function O on any input, counting time-and space complexity for algorithms that can use access to O in their ''basic instruction set'' (thereby also naturally settling questions of how the oracle tape is used, and whether its use counts towards time or space complexity, as [155] raised as an issue).Then, one can rigorously define a statement to relativize against a specific oracle O 0 if it is a theorem of ( 5) with O 0 substituted for O [52,Def.3], or just as relativizing if it is a theorem relative to every language [52,Def.4].This view allows to reproduce various past results from the literature that are known to (not) relativize, plus discover new relativizing and non-relativizing results.The beauty of this approach from first principles (i.e., axioms) is that no change to the usual definitions from complexity theory is required, since the re-interpretation of the symbol FP as using (or not using) an oracle naturally endows all computational mechanisms, formulated with use of the symbol FP and/or O, with the power to query the oracle.
This naturally covers the unrelativized world by using an empty oracle as a function that returns a constant value for all arguments, and also not requires to leave the unrelativized world at any time, since we can just re-interpret the symbol FP in the statements to be proven.This models relativization as a syntactic change of letting FP change its meaning throughout the entire PROOF from the unrelativized version (e.g., with an empty oracle), to the oracle-enhanced version with an explicit additional capability to evaluate the function O 0 , i.e., equivalently, querying the oracle O 0 .Consequently, and continuing the previous thought, the definition of, for example, Dtime(t) under FP becomes the familiar concept based on Turing machines, but the identical definition under TheFP will lead to Dtime O (t) with the oracle O coming in via TheFP.
Under a proper modification of the proof to take explicit advantage of the oracle, the conceptualization of oracles as provided by [52] together with proof assistants can address prior criticism about oracle results as uttered by R. Lipton [156], who raised the question about which would be the specific predicate to decide whether PROOF relativizes or not.13520 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. OPEN RESEARCH QUESTIONS
The guidance that barriers provide is to avoid known deadend arguments, such as the fact that a linear program to describe the traveling salesperson problem is necessarily of exponential size [10] and hence takes longer than polynomial, even if solved by a polynomial-time algorithm.Thus, the known barriers to avoid are at least: 1) Affine relativization (including the (original) relativization and (younger) algebrization barrier) [52] 2) Naturalization [153] (for example, by diagonalization, but this needs care to not fall victim to the relativization barrier) 3) Linear programming, as it is known to fail for at least some NP-complete problems like Traveling Salesperson [10] There are several recommendations about how to bypass relativization barriers, such as the interpolation technique [52], [151] or using methods from communication complexity [52,Sec.7.2].Conversely, some methods are explicitly deprecated, such as diagonalization [150], although specifically this technique is known to bypass another barrier known as naturalization [153].
In addition, exact predicates and conditions to recognize techniques and arguments that fall under the above dead-ends are an open issue of research.More precisely, how can we ''automatically'' in the sense of algorithmically, analyze a given series of (logical) arguments to recognize it as algebrizing, relativizing, affinely relativizing, or similar?The power of proof assistants is interesting to explore to this end, a starting point of which has partially been made in past work.While the above are already explicit routes to fail, a general classification of what techniques could work towards settling P-vs-NP is another challenge for future research.
A potential route to explore further is analogous to oracle extensions, but rather directed to the opposite of limiting the underlying axiomatic foundation or model mathematics intentionally.For example, by removing the axiom of choice or generally asking for purely constructive arguments (diagonalization would then also naturally fall out of such considerations).While we did not explicitly screen the papers here for proofs under such limitations, some work does explicitly consider the axiom of choice [49], [134], respectively consistency thereof with the P-vs-NP question [85], [123], occasionally also with far-reaching claims of fundamental results of complexity theory (including Cook's and Fagin's theorems) to be wrong and ZFC to be overall inconsistent [68], [123].Despite any of these past claims, the idea of trying to prove equality, inequality, or independence of the question from a reduced axiomatic system or ''nonstandard'' models (of mathematics or complexity theory) seems widely unexplored.

C. USING PROOF ASSISTANTS
The actual value of a proof assistant remains in its objectiveness of checking, being unbiased even against the user of the proof assistant itself.Some proof assistants like Isabelle/HOL provide instructions to skip parts of the proof (in Isabelle/HOL by the sorry keyword) to make the system accept unproven statements.The use of the sorry keyword in Isabelle/HOL is to be considered as dangerous as using words like ''trivial'' or ''obviously'' in mathematics, since what hides behind what has been skipped can be arbitrarily small, arbitrarily large, or even impossible to prove.That said, if the entirety of a proof is formalized in an automated proof assistant and verified without any skipped parts (by sorry or comparable constructs), then its correctness would be strongly certified (up to possible errors in the proof assistant software itself).
If a counter-argument is made based on barriers like the above, then either the counterargument is itself wrong, or the gap in the proof would have been identifiable as a sorry or missing details or inaccuracies in the underlying definitions (cf. the above discussion about consistency between definitions on paper and in the proof assistant, for example, the size of a problem instance upon a Karp reduction may have been missed.In a case where an NP-complete problem is transformed into a linear program, this could result in worst-case exponential size, thus making the proof fail against the aforementioned barrier, and similarly, for the relativization and other barriers).
Leaving ''gaps'' in the formalized proof is not per se an indication of incorrectness; any part verifiable by the proof assistant already saves a human's time for peer review, and at the same time, provides the (only) reasonable points to counter-argue.
The manual labor left, by formalization, thus boils down to: • writing the formalized proof itself, • checking consistency between the definitions used in a paper version of a proof versus the definitions used inside the proof assistant • arguing about the remaining ''gaps'' in the arguments (skipped sub-proofs).The last point is most considerable in terms of efforts.In the project underlying this paper, we selected two papers (one [56] claiming P = NP, the other [54] claiming P ̸ = NP, both from the set of candidate proofs; no paper was selected from the ''unprovable'' category since no work showed the required structure of an independence proof), and formalized them in Isabelle/HOL. 6For [56], issues with the arguments were revealed by the formalization. 7For [54], the formalization was accomplished only partially (up to approximately 30% of 50 pages at an effort of (equivalently) 23.6 person-months).Issues were found, but could all be fixed, so that [54] is formalized up to ≈ 30%, with only a few sorrys that require special attention.
The formalizations are all open access available in a public github organization, 8 with the possibility and explicit invitation to add one's own formalization as a new repository, continuing the proposal of this work.The two formalizations reported above are included as examples there.

D. CONCLUSION
We believe that automated proof assistants can be one possible way to handle the lot of ideas versus the relatively smaller capabilities of expert peer reviews.The number of proofs being proposed per year has apparently led to some sort of ''proof-fatigue'', at least visible in the insignificant excitement of the community about yet another proof to be published.The Riemann-hypothesis is another example of a conjecture whose fate will depend on the right idea to not go unseen in a flood of flawed attempts.
Our hope in this work is to empower anyone aiming to contribute solutions at the public github organization 8 , to self-check their work, with some guidance towards known pitfalls and dead-ends that have already been explored in the literature.Equally important are exact predicates and conditions to unambiguously and objectively decide about relativization or other barriers.
Automated proof assistants can offer (i) objective checking, ignorant of affiliations, education, subjective opinions, and other factors, and (ii) help to identify errors in a seemingly logical argument, very much like examples and exercises help a student to develop deep knowledge and insights.

FIGURE 1 .
FIGURE 1.Average number of papers published on P-vs-NP over time.

TABLE 2 .
List of symbols.