Introduction
Welcome to the Quip documentation. Quip ("Quick Proofs") is an experimental proof format for first-order and higher-order automatic theorem provers, designed to be easy to produce, and reasonably easy to check.
There are two sides to this documentation:
- the point of view of theorem provers, SMT solvers, etc. that might produce Quip proofs;
- the point of view of proof checkers that aim at validating such proofs.
NOTE: the proof format is experimental and its design will most likely fluctuate a lot for a while, as checkers and proof-producing provers are developed.
Overall design
A Quip proof is organized around a few main notions: terms, substitutions, clauses, and proof steps.
-
Terms are expressions in a simple higher-order logic similar to the ones of HOL light, HOL, Isabelle/HOL, etc. It features prenex polymorphism, applications, equality, variables, and lambda abstractions.
An example term might be \( p~ x = (\lambda y. f~ x~ y = y) \) where \( p \) is a predicate symbol and \( f \) a function symbol. We use here a ML-like syntax with currying.
-
Substitutions are finite mappings from variables to terms, noted as such: \( \{ x_1 \mapsto t_1, \ldots, x_n \mapsto t_n \} \).
Applying a substitution to a term yields a new term, where the variables have been replaced by their image in the substitution (or kept if they're not bound in the substitution)
-
Clauses are sets of literals1, where each literal is a tuple \( sign, term \). The sign of a literal is a boolean that indicates whether the term is positive (
true
) or negative (false
). A negative literal(false, t)
fundamentally represents \( \lnot t \) but is distinct from the term \( \lnot t \) because the polarity (the sign) is only relevant in the context of a clause.We will denote a negative literal as \( - t \) and a positive literal as \( + t \).
An example clause might be: \[ \{ - (a=b), - (c=d), + ((f~ a~ c) = (f~ b~ d)) \} \] It represents the basic congruence lemma: \[ a=b, c=d \vdash f~ a~ c = f~ b~ d \]
-
Proof steps are the basic building blocks of proofs. A proof step is composed of:
- a rule, which indicates how to process this step;
- a conclusion, which is the clause that this proof step produces;
- a set of premises, which are inputs to the proof step.
- optionally, some additional arguments such as terms, substitutions, etc.
A proof step asserts (via its particular rule) that the set of premises implies its conclusion. In practice not all the elements of a step might be explicitly written down! For example \( \text{refl}~ t \) (with \( t \) a term) is a basic proof step with rule "refl", no premises, and conclusion \( \{ + (t = t) \} \).
A distinguishing feature of Quip compared to some other proof formats such as the one in VeriT is that the rules are high-level and require proof checkers to do more work in order to verify them. In exchange, the hope is that we get more concise proofs that are also easier to produce from theorem provers.
TODO: citations for VeriT
Clauses can alternatively be seen as classical sequents, with negative elements on the left of \( \vdash \), and positive elements on the right of it.
Example
This is a very simple SMT problem in the category QF_UF
, representing
the problem \( a=b \land f~a = c \vdash f~b = c \) by trying to contradict it:
(set-logic QF_UF)
(declare-sort u 0)
(declare-fun a () u)
(declare-fun b () u)
(declare-fun c () u)
(declare-fun f (u) u)
(assert (= a b))
(assert (= (f a) c))
(assert (not (= (f b) c)))
(check-sat)
(exit)
With Sidekick's experimental proof producing mode, we solve this problem (it replies "UNSAT"):
$ sidekick example-pb.smt2 -o example-proof.quip
Unsat (0.003/0.000/0.000)
And we get the following proof:
(quip 1
(steps ()
((deft $t1 (f a))
(deft $t2 (= $t1 c))
(deft $t3 (= a b))
(deft $t4 (f b))
(deft $t5 (= c $t4))
(stepc c0 (cl (- $t2) (- $t3) (+ $t5))
(cc-lemma (cl (- $t2) (- $t3) (+ $t5))))
(stepc c1 (cl (- $t5)) (assert (not $t5)))
(stepc c2 (cl (+ $t3)) (assert $t3))
(stepc c3 (cl (+ $t2)) (assert $t2))
(stepc c4 (cl )
(hres (init (ref c0)) (r1 (ref c3)) (r1 (ref c2)) (r1 (ref c1)))))))
NOTE: the proof uses the S-expression format, but I do hope to have a more efficient binary format relatively early on.
Detailed explanation
-
proof start with
(quip <version> <proof>)
. For nowversion
is 1. -
(steps (<assumptions>) (<steps>))
is the main composite proof rule. Let's ignore the assumptions for now. Each step in this rule is either a symbol definition (see below) or a(stepc <name> <clause> <proof>)
, which introduce a name for an intermediate lemma (the clause) as proved by the sub-proof.A side-effect of this is that you can have partially correct1 proofs if a step is used correctly in the remainder of the proof, but its proof is invalid.
-
(deft $t1 (f a))
is a term definition. It defines the new symbol$t1
as a syntactic shortcut for the term(f a)
, to be expanded as early as possible by the proof checker.Similarly,
$t5
is short for(= c (f b))
.This kind of definition becomes important in larger proofs, where re-printing a whole term every time it is used would be wasteful and would bloat proof files. Using definitions we can preserve a lot of sharing in the proofs.
-
(stepc c0 …)
is the first real step of the proof. Here,c0
is the clause{ - $t2, - $t3, + $t5 }
, and is proved by the proof(cc-lemma …)
(a congruence closure lemma, i.e. a tautology of equality). It is, in essence, the proof we seek; the rest of the steps are used to derive a contradiction by deducing "false" fromc0
and our initial assumptions. -
the steps deducing
c1
,c2
, andc3
do so by usingassert
(meaning that the clause is actually an assumption of the initial problem), and then "preprocessing" them.For
(stepc c1 (cl (- $t5)) (hres (init (assert (not $t5))) (p1 (refl $t5))))
we get the following tree:\[ \cfrac{ \cfrac{}{$t5 ~\tiny{(assume)}} \qquad \cfrac{}{$t5 = $t5 ~\tiny{(refl)}} }{$t5 ~\tiny{(para1)}} \]
In this tree we can see the general shape of preprocessing terms: assume the initial term
± t
, prove+ (t = u)
(whereu
is the simplified version), and then use boolean paramodulation to obtain± u
.It just so happens that no meaningful simplification occurred and so
t
andu
are the same, and sidekick did not shorten the proof accordingly into(stepc c1 (cl (- $t5)) (assert (not $t5)))
or partially incorrect, depending on your point of view. Glass half-full and all that.
Quip for the Automatic Theorem Prover
Quip is biased towards being easy to produce from theorem provers and SMT solvers, while remaining reasonably efficient to check.
The easiness comes from several aspects:
-
redundancy in rules: many rules will have a general form (e.g. a congruence closure lemma, or hyper-resolution with \( n \) steps), and some shorter forms for the common case (e.g. unary resolution or the reflexivity rule).
-
rules can be quite high-level, requiring the proof checkers to reimplement congruence closure, resolution, etc.
-
the proof rules do not need to always specify their result, only enough information that the conclusion can be reconstructed.
-
proofs are based on a proof language ("proof terms") that allow for easy composition of several steps. This way it's not necessary to name each single clause occurring in the proof.
Quip for the Proof Checker
Quip's design favors the production of proofs, not their checking. However, proof checking should still be implementable in an efficient way.
A proof checker must implement a few core algorithms to be able to verify proofs (not to mention the particular set of theories it might support). These are:
-
Congruence Closure: equality is central to Quip, and most equality proofs will be of the shape \( t_1=u_1, \ldots, t_n = u_n \vdash t=u \). These can be checked in \( O(n ~ log(n)) \) time using Congruence Closure (See for example the [EGG paper]).
-
Resolution: clause-level reasoning is done via multiple steps of resolution and boolean paramodulation.
The core rules are resolution:
\[ \cfrac{C_1 \lor l \qquad C_2 \lor \lnot l}{ C_1 \lor C_2 } \]
and boolean paramodulation (where \(t\) is a boolean term):
\[ \cfrac{C_1 \lor (t = u) \qquad C_2 \lor t}{ C1 \lor C_2 \lor u } \]
In practice, in the proof format, "hres" is the rule used to perform a series of such steps from an initial clause.
-
Instantiation: A clause \( C \) contains free variables \( \{ x_1, \ldots, x_n \} \). Given a substitution \( \sigma \triangleq \{ x_1 \mapsto t_1, \ldots, x_n \mapsto t_n \} \), we can deduce \( C\sigma \).
Reference
This chapter contains1 a detailed specification of the proof language constructs. For the concrete file formats, see the S-expr format and Binary format chapters.
will eventually contain. 😇
Terms
Terms belong to higher-order logic, with prenex polymorphism. In general we denote terms as \( t \) or \( u \) (with indices if needed).
(TODO: provide some citations)
(TODO: specify types)
The term can be constructed in the following ways:
-
Variables, denoted \( x, y, z \). They can be free, or bound. Variables are typed.
-
Applications, denoted
(f t1 t2 t3)
. Partial applications are transparent, meaning that(f t1 t2 t3)
is short for(((f t1) t2) t3)
. -
Constants, denoted \( a, b, c, f, g, h \) (where \(a,b,c\) are of atomic type and \(f, g, h\) are functions by convention).
Polymorphic constants are applied to type arguments. Quip does not accept a term made of a partially applied constant: polymorphism constants must always be applied to enough type arguments.
NOTE: This does not include constants introduced by the prover using
(deft <name> <term>)
steps. In that construct, say(deft c (f a b))
,c
and(f a b)
are considered syntactically equal; the proof checker can just expandc
into(f a b)
at parse time and then forget entirely aboutc
. -
Binders, such as \( \lambda (x:\tau). t \), or \( \forall (x:\tau). t \). The latter is a shortcut for the application \( \text{forall}~ (\lambda (x:\tau). term) \) (where \( \text{forall} \) is a constant of type \( \Pi a. (a \to \text{bool}) \to \text{bool} \) ).
With lambda-abstraction comes \( \beta\)-reduction (more details in the rules section).
-
Box is a special term constructor that has no logical meaning. We denote it
(box <clause>)
, and it is equivalent to the boolean term represented by the clause. The use case of box is to hide the clause within so that it appears as an atomic constant to most rules but the ones specifically concerned with handlingbox
.For example
(box (cl (- (forall (x:τ) (p x))) (+ (p a))))
represents the boolean term \( \lnot (\forall x:τ. p~ x) \lor p~ a \).The benefit of
box
becomes apparent when one tries to use clauses to represent conditional proofs. If I want to represent that I have a proof ofA
assuming a proof ofB
, whereB
is a clause and not just a literal, andA
is the clause{l_1, …, l_n}
, then the simplest way to do so is:(cl (cl l_1 … l_n (- (box B))))
.See the rules about box.
Clauses
We denote clauses as (cl <lit>*)
. The empty clause is therefore (cl)
.
Each literal is (<sign> <term>)
with the sign being either +
or -
(for a clause-level negation).
For example, (cl (- true) (+ (= b b)))
represents the clause
\( \lnot \top \lor b=b \), or the sequent \( \top \vdash b=b \).
Proof Rules
Rules form an expression language designed to only return valid clause (i.e. theorems, or assumptions of the problem). Most rules are of the atomic kind (they take arguments but look like a function application); the main structuring rule permits the definition of named intermediate steps. This is also necessary to introduce sharing in a proof, where some clause might be proved once but used many times.
Atomic rules
-
ref (
(ref <name>)
): returns a proof previously defined usingstepc
(see composite proofs below) or a unit clause corresponding to a local assumption ofsteps
. There is no need to re-prove anything, we assume the step was valid and just reuse its conclusion.In other words,
(ref c5)
coming after(defc c5 (cl (+ p) (+ q)) <proof>)
is a trivial proof of the clause(cl (+ p) (+ q))
.The serialization might use "@" instead of "ref".
-
refl (
(refl <term>)
):(refl t)
is a proof of the clause(cl (+ (= t t)))
. -
assert (
(assert <term>)
): proves a unit clause(cl (+ t))
(if the term ist
), but only if it matches exactly an assertion/hypothesis of the original problem.(TODO: be able to refer to the original formula by file+name if it was provided)
-
hres (
(hres <proof> <hstep>+)
): a fold-like operation on the initial proof's result. It can represent a resolution step, or hyper-resolution, or some boolean paramodulation steps. As opposed to the other rules which are mostly useful for preprocessing/simplification of the original problem, this is expected to be one of the main inference rules for resolution/superposition provers.The proof step
(hres (init p0) h1 h2 … hn)
starts with the clauseC0
obtained by validatingp0
. It then applies each "h-step" in the order they are presented. Assuming the current clause isC == (cl l1 … lm)
, each h-step can be one of the following:-
resolution (
(r <term> <proof>)
):(r t proof)
resolvesproof
into a clauseD
which must contain a literal(+ t)
or(- t)
. Then it performs boolean resolution between the current clauseC
and the clauseD
with pivot literal(+ t)
.Without loss of generality let's assume the proof returns
D
where \( D \triangleq D' \lor (+ t) \), and \( C \triangleq C' \lor (- t) \). Then the new clause is \( C' \lor D' \). -
unit resolution (
(r1 <proof>)
):(r1 proof)
resolvesproof
into a clauseD
which must be a unit-clause (i.e. exactly one literal). Since there is not ambiguity on the pivot, it performs unit resolution betweenC
andD
on the unique literal ofD
. -
boolean-paramodulation (
(p <term> <term> <proof>)
):(p lhs rhs proof)
resolvesproof
into a clauseD
which must contain a literal(+ (= lhs rhs))
, where bothlhs
andrhs
are boolean terms. In other words, \( D \triangleq lhs = rhs \lor D' \).The current clause
C
must contain a literal(± lhs)
; ie. \( C \triangleq C' \lor ± lhs \).the result is obtained by replacing
lhs
withrhs
inC
and addingD'
back. Mathematically:\[ \cfrac{ lhs = rhs \lor D' \qquad lhs \lor C' }{ C' \lor D' \lor rhs } \]
-
unit-boolean-paramodulation (
(p1 <proof>)
): Same asp
but the proof must return a unit clause(+ (= lhs rhs))
.
-
-
r (
(r <term> <proof> <proof>)
): resolution on the given pivot between the two clauses.The proof term
(r pivot p1 p2)
corresponds to(hres p1 (r pivot p2))
. -
r1 (
(r1 <proof> <proof>)
): unit resolution. The shortcut(r1 p1 p2)
allows the user to omit the pivot if one the the two proofs is unit (ie. has exactly one literal). It is the same as(hres p1 (r1 p2))
. -
rup (
(rup <clause> <proof>+)
):(rup c steps)
proves the clausec
by reverse unit propagation (RUP) on the results ofsteps
. This corresponds to linear resolution but without specifying pivots nor the order. -
cc-lemma (
(ccl <clause>)
): proves a clausec
if it's a tautology of the theory of equality. There should generally be n negative literals and one positive literal, all of them equations.An example:
(cc-lemma (cl (- (= a b)) (- (= (f b) c)) (- (= c c2)) (- (= (f c2) d)) (+ (= (f (f a)) d))))
-
cc-imply (
(cc-imply (<proof>*) <term> <term>)
): a shortcut step that combinescc-lemma
andhres
for more convenient proof production. Internally the checker should be able to reuse most of the implementation logic ofcc-lemma
.(cc-imply (p1 … pn) t u)
takesn
proofs, all of which must have unit equations as their conclusions (say,p_i
proves(cl (+ (= t_i u_i)))
); and two termst
andu
; and it returns a proof of(cl (+ (= t u)))
if the clause(cl (- (= t_1 u_1)) (- (= t_2 u_2)) … (- (= t_n u_n)) (+ (= t u)))
is a valid equality lemma (one thatcc-lemma
would validate).In other words,
(cc-lemma (p1…pn) t u)
could be expanded (using fresh names for steps) to:(steps () ((stepc c_1 (cl (+ (= t_1 u_1))) p_1) (stepc c_2 (cl (+ (= t_2 u_2))) p_2) … (stepc c_n (cl (+ (= t_n u_n))) p_n) (stepc the_lemma (cl (- (= t_1 u_1)) (- (= t_2 u_2)) … (- (= t_n u_n)) (+ (= t u))) (cc-lemma (cl (- (= t_1 u_1)) (- (= t_2 u_2)) … (- (= t_n u_n)) (+ (= t u))))) (stepc res (cl (+ (= t u))) (hres (init (ref the_lemma)) (r1 (ref c_1)) (r1 (ref c_2)) … (r1 (ref c_n))))))
-
clause-rw (
(clause-rw <clause> <proof> (<proof>*))
): the term(clause-rw c p steps)
provesc
from the resultc0
ofp
using a series of equations insteps
. Each equation in step is used to rewrite at least one literal ofc0
; no new literal is added,c
is obtained purely by rewriting literals ofc0
, (or simplifying away if the literal rewrites tofalse
).A possible way to check this step is as follows.
- Compute
c0
and the results of stepsd1, …, dn
which should be positive equations or positive atoms (a
standing fora=true
). - Given
c == (a1 \/ a2 \/ … \/ a_m)
, assert¬a1, ¬a2, …, ¬a_m
into a congruence closure (where assertinga
means assertinga = true
, and asserting¬a
means assertinga = false
). - For each literal
b
ofc0
in turn, assertb
into the congruence closure and query fortrue == false
before undoing the assertion ofb
. Iftrue == false
can be proved from each case ofb
, then the step is valid, becausec0 /\ d1 … dn |= c
.
This rule is convenient for preprocessing of clauses. Each term can be preprocessed (rewritten into a simpler form) individually, possibly leading to the literal beeing removed (when shown absurd by preprocessing), and
clause-rw
can be used to tie together all these rewriting steps.For example,
(cl (+ (ite (x+1 > x) p q)) (+ false) (+ t))
could be simplified into(cl (+ p) (+ u))
assumingt
simplifies tou
. The proof would look like:(stepc c (cl (+ p) (+ u)) (clause-rw (cl (+ p) (+ u)) (<proof of input clause) ((<proof of t=u) (<proof of ite simplification>))))
- Compute
-
nn (
(nn <proof>)
): not-normalization: a normalization step that transforms some literals in a clause. It turns(- (not a))
into(+ a)
, and(+ (not a))
into(- a)
. -
true is true (
t-is-t
): the lemma(cl (+ true))
. -
true neq false (
t-ne-f
): the lemma(cl (- (= true false)))
. -
bool-c (
(bool-c <name> <term>+)
):(bool-c <name> <terms>)
proves a boolean tautology of depth 1 using the particular sub-rule named<name>
with some term argument(s). In other words, it corresponds to one construction or destruction axioms for the boolean connectiveand
,or
,=>
, boolean=
,xor
,not
,forall
,exists
.The possible axioms are:
rule axiom (and-i (and A1…An))
(cl (- A1) … (- An) (+ (and A1…An)))
(and-e (and A1…An) Ai)
(cl (- (and A1…An)) (+ Ai))
(or-e (or A1…An))
(cl (- (or A1…An)) (+ A1) … (+ An))
(or-i (or A1…An) Ai)
(cl (- Ai) (+ (or A1…An)))
(imp-e (=> A1…An B))
(cl (- (=> A1…An B)) (- A1)…(- An) (+ B))
(imp-i (=> A1…An B) Ai)
(cl (+ Ai) (+ (=> A1…An B)))
(imp-i (=> A1…An B) B)
(cl (- B) (+ (=> A1…An B)))
(not-e (not A))
(cl (- (not A)) (+ A))
(not-i (not A))
(cl (- A) (+ (not A))
(eq-e (= A B) A)
(cl (- (= A B)) (- A) (+ B))
(eq-e (= A B) B)
(cl (- (= A B)) (- B) (+ A))
(eq-i+ (= A B))
(cl (+ A) (+ B) (+ (= A B)))
(eq-i- (= A B))
(cl (- A) (- B) (+ (= A B)))
(xor-e- (xor A B))
(cl (- (xor A B)) (- A) (- B))
(xor-e+ (xor A B))
(cl (- (xor A B)) (+ A) (+ B))
(xor-i (xor A B) B)
(cl (+ A) (- B) (+ (xor A B)))
(xor-i (xor A B) A)
(cl (- A) (+ B) (+ (xor A B)))
(forall-e P E)
(cl (- (forall P)) (+ (P E)))
(forall-i P
(cl (- (= (\x. P x) true)) (+ (forall P)))
(exists-e P)
(cl (- (exists P)) (+ (P (select P))))
(exists-i P)
cl (- (P x)) (+ (exists P))
(wherex
fresh) -
bool-eq (
(bool-eq <term> <term>)
):(bool-eq t u)
proves the clause(cl (+ (= t u)))
(wheret
andu
are both boolean terms) ift
simplifies tou
via a basic simplification step.This rule corresponds to the axioms:
TODO: also give names to sub-rules here
axiom (= (not true) false)
(= (not false) true)
(= (not (not t)) t)
(= (= true A) A)
(= (= false A) (not A))
(= (xor true A) (not A))
(= (xor false A) A)
(= (= t t) true)
(= (or t1…tn true u1…um) true)
(= (or false false) false)
(= (=> true A) A)
(= (=> false A) true)
(= (=> A true) true)
(= (=> A false) (not A))
(= (or false false) false)
(= (and t1…tn false u1…um) false)
(= (and true true) true)
-
\( \beta \)-reduction (
(beta-red <term> <term>)
): Given(lambda (x:τ) body)
andu
, returns the clause(cl (+ (= ((lambda (x:τ) body) u) body[x := u])))
wherebody[x := u]
denotes the term obtained by substitutingx
withu
inbody
in a capture avoiding way.Mathematically: \[ \vdash (\lambda x:\tau. t)~ u = t[x := u] \]
-
substitution (
(subst <subst> <proof>)
):(subst sigma p)
returns the same claues asp
, but where free variables have been replaced according to the substitution. A substitution has the form(x1 e1 x2 e2 … xn en)
wherex_i
are names (variables) ande_i
are expressions. Bindings are parallel (substitution doesn't apply recursively).For example,
(subst (x a y (f x)) p)
, whenp
proves(cl (+ (p x)) (+ q y))
, proves(cl (+ (p a)) (+ q (f x)))
.x
is not substituted again in the image ofy
.
TODO: lra (in own file) TODO: datatypes (in own file)
TODO: instantiation
TODO: reasoning under quantifiers
Composite rule
The main structuring construct for proofs is steps
. Its structure is
(steps (<assumption>*) (<step>+))
.
-
Each assumption is a pair
(<name> <literal>)
. As a reminder, literals are of the shape(+ t)
or(- t)
.These assumptions can be used in the steps by using
(ref <name>)
(see below), which is a trivial proof of the unit clause(cl <literal>)
. -
Each step is one of:
-
Term definition (
(deft <name> <term>)
), which introduces an alias for a term.<name>
must not be in the signature of the original problem.Logically speaking, after
(deft c t)
,c
andt
are syntactically the same.c
has no real existence, it is only a shortcut, so a proof of(cl (+ (= c t)))
can be simply(refl t)
(or(refl c)
). -
Reasoning step (
(stepc <name> <clause> <proof>)
):(stepc name clause proof)
(wherename
must be fresh: be defined nowhere else) introducesname
as a shortcut forproof
in the following steps.The result of
proof
must be exactlyclause
, otherwise the step fails. This means we can start using(ref name)
in the following steps before we validateproof
, since we know the result ahead. In a high-performance proof checker this is a good opportunity for parallelisation, whereproof
can be checked in parallel with other steps that make use of its result.
-
-
The result of
(steps (A1 … Am) (S1 … Sn))
, where the last stepSn
has as conclusion a clauseC
with literals(cl l1 … ln)
, is the clause(cl l1 … ln ¬A1 … ¬Am)
.In particular, if
Sn
proves the empty clause, then(steps …)
proves that at least one assumption must be false. If bothSn
's conclusion and the list of assumptions are empty, then the result is the empty clause. -
The list of assumptions can be useful either for subproofs, or to produce a proof of unsatisfiability for the SMTLIB v2.6 statement
(check-sat-assuming (<lit>+))
.
Toplevel composite rule
The syntax allows the toplevel composite rule (i.e. the main proof, as opposed to a subproof) to be flattened into a list of S-expressions.
It's probably not very useful to have assumptions here1.
For example, a proof could look like:
(quip 1)
(step1)
(step2)
(step3)
…
famous last words.
Box rules
Box is a term constructor that allows abstracting a clause into an opaque constant term. This way, a whole clause can become a literal in another clause without having to deal with true nested clauses; see the section on terms for more details.
First, a few remarks.
(box C)
and(box D)
are syntactically the same term if and only if (iff)C
andD
are the same clause, modulo renaming of free variables and reordering of literals.(box C)
is opaque and will not be traversed by congruence closure.
That said, we have a few rules to deal with boxes:
-
box-assume (
(box-assume <clause>)
):(box-assume c)
, wherec
is(cl l1 l2 … ln)
, proves the tautology(cl (- (box c)) l1 l2 … ln)
.If one erases box (which is, semantically, transparent), this corresponds to the tautology \( \lnot (\forall \vec{x} C) \lor \forall \vec{x} C \) where \( \vec{x} \) is the set of free variables in \( C \).
The use-case for this rule is that we can assume
C
freely and use it in the rest of the proof, provided we keep an assumption(box C)
around in a negative literal. Once we actually proveC
we can discharge(box C)
. -
box-proof (
(box-proof <proof>)
): given a proofp
with conclusionC
, this returns a proof whose conclusion is(cl (+ (box C)))
. Semantically it is merely the identity.
An interesting possibility offered by box
is simplifications in a tactic framework.
A simplification rule might take a goal clause A
, and simplify it
into a goal clause B
. To justify this, the theorem prover might produce
a proof (cl (- (box B)) l1 … ln)
(assuming A
is (cl l1 … ln)
)
which means \( B \Rightarrow A \).
It might do that by starting with (box-assume B)
and applying the rewrite
steps backward to get back to the literals of A
.
Once the goal B
is proved, we obtain a proof of B
which we can lift
to a proof of (cl (+ (box B)))
using box-lift
; then we only have to do unit-resolution
on (cl (+ (box B)))
and (cl (- (box B)) l1 … ln)
to
obtain (cl l1 … ln)
, ie. the original goal A
.
Another possibility is to box a full clause, and use it as an assumption
in a sub-proof using steps
.
Then, box-assume
can be used to make use of the assumption by resolving
it with the (- (box C))
literal from box-assume
.
Full example
We're going to explore a bigger example: the proof of unsatisfiability
of the SMTLIB problem QF_UF/eq_diamond/eq_diamond2.smt2
.
It starts normally:
(quip 1 (steps () ((deft $t1 (= x0 y0))
(deft $t2 (= y0 x1))
followed by a bundle of term definitions for better sharing
(note that the _tseitin_xxx
symbols are actually introduced by Sidekick
during clausification; they're not just proof printing artefacts):
(deft $t2 (= y0 x1))
(deft _tseitin_and_0 (and $t1 $t2))
(deft $t3 (= x0 z0))
(deft $t4 (= x1 z0))
(deft _tseitin_and_1 (and $t3 $t4))
(deft _tseitin_or_2 (or _tseitin_and_0 _tseitin_and_1))
(deft $t5 (= x0 x1))
(deft $t6 (not $t5))
(deft _tseitin_and_3 (and _tseitin_or_2 $t6))
and then the actual proof:
(stepc c0 (cl (- $t1) (- $t2) (+ $t5))
(ccl (cl (- $t1) (- $t2) (+ $t5))))
(stepc c1 (cl (- _tseitin_and_3) (- $t5))
(nn (bool-c and-e _tseitin_and_3 $t6)))
(stepc c2 (cl (+ _tseitin_and_3)) (assert _tseitin_and_3))
(stepc c3 (cl (- $t5)) (hres (@ c1) ((r1 (@ c2)))))
(stepc c4 (cl (- _tseitin_and_0) (+ $t2))
(nn (bool-c and-e _tseitin_and_0 $t2)))
(stepc c5
(cl (- _tseitin_or_2) (+ _tseitin_and_1) (+ _tseitin_and_0))
(nn (bool-c or-e _tseitin_or_2)))
(stepc c6 (cl (- $t3) (- $t4) (+ $t5))
(ccl (cl (- $t3) (- $t4) (+ $t5))))
(stepc c7 (cl (- $t4) (- $t3)) (hres (@ c6) ((r1 (@ c3)))))
(stepc c8 (cl (- _tseitin_and_1) (+ $t3))
(nn (bool-c and-e _tseitin_and_1 $t3)))
(stepc c9 (cl (- _tseitin_and_1) (+ $t4))
(nn (bool-c and-e _tseitin_and_1 $t4)))
(stepc c10 (cl (- _tseitin_and_1))
(hres (@ c7) ((r $t4 (@ c9)) (r $t3 (@ c8)))))
(stepc c11 (cl (- _tseitin_and_3) (+ _tseitin_or_2))
(nn (bool-c and-e _tseitin_and_3 _tseitin_or_2)))
(stepc c12 (cl (+ _tseitin_or_2))
(hres (@ c11) ((r1 (@ c2)))))
(stepc c13 (cl (+ _tseitin_and_0))
(hres (@ c5) ((r1 (@ c12)) (r1 (@ c10)))))
(stepc c14 (cl (+ $t2)) (hres (@ c4) ((r1 (@ c13)))))
(stepc c15 (cl (- _tseitin_and_0) (+ $t1))
(nn (bool-c and-e _tseitin_and_0 $t1)))
(stepc c16 (cl (+ $t1)) (hres (@ c15) ((r1 (@ c13)))))
(stepc c17 (cl)
(hres (@ c0) ((r1 (@ c16)) (r1 (@ c14)) (r1 (@ c3))))))))
Note that the last step returns the empty clause, which means we did prove the problem to be unsatisfiable:
(stepc c17 (cl)
(hres (@ c0) ((r1 (@ c16)) (r1 (@ c14)) (r1 (@ c3))))))))
Let's examine a few steps.
-
c0
:(stepc c0 (cl (- $t1) (- $t2) (+ $t8)) (ccl (cl (- $t1) (- $t2) (+ $t5))))
which means that \( x0 = y0, y0 = x1 \vdash x0 = x1 \) is a tautology of the theory of equality. Indeed it is, it's the transitivity axiom.
-
c3
:(stepc c3 (cl (- $t5)) (hres (@ c1) ((r1 (@ c2)))))
Here, we have:
c1
is(cl (- _tseitin_and_3) (- (= x0 x1))
c2
is(cl (+ _tseitin_and_3))
- so, by resolution of
c1
andc2
(note the use of "r1" sincec2
is unit: we do not need to specify a pivot) we obtain(cl (- (= x0 x1)))
.
-
c4
:(stepc c4 (cl (- _tseitin_and_0) (+ $t2)) (nn (bool-c and-e _tseitin_and_0 $t2)))
where
_tseitin_and_0
is actually a name for(and $t1 $t2)
.The
bool-c
rule is valid since(cl (+ $t2) (- (and $t1 $t2)))
is one of the basic tautology on theand
boolean connective. We usenn
to make sure we eliminate weird literals like(- (not a))
. -
c10
:(stepc c10 (cl (- _tseitin_and_1)) (hres (@ c7) ((r $t4 (@ c9)) (r $t3 (@ c8)))))
We have:
c7
is(cl (- $t5) (- $t4))
c9
is(cl (- _tseitin_and_1) (+ $t5))
c8
is(cl (- _tseitin_and_1) (+ $t4))
And the hyper-resolution steps therefore go:
clause step (cl (- $t5) (- $t4))
start with c7
(cl (- $t4) (- _tseitin_and_1))
resolve with c9
on$t5
(cl (- _tseitin_and_1))
resolve with c8
on$t4
-
c17
:(stepc c17 (cl) (hres (@ c0) ((r1 (@ c16)) (r1 (@ c14)) (r1 (@ c3))))))))
We prove the empty clause! Here we have:
c0
is(cl (- $t1) (- $t2) (+ $t8))
c16
is(cl (+ $t1))
c14
is(cl (+ $t2))
c3
is(cl (- $t8))
So by starting with
c0
and performing unit resolution steps we get:clause step (cl (- $t1) (- $t2) (+ $t8))
start with c0
(cl (- $t2) (+ $t8))
unit-resolve with c16
(cl (+ $t8))
unit-resolve with c14
(cl )
unit-resolve with c3
More on boxing
If two clauses C
and D
are known to be equivalent, but are not syntactically the same,
one can prove (cl (- (box C)) (+ (box D)))
(and conversely) as follows.
Let's assume C
is (cl (+ a) (+ b))
, D
is (cl (+ a2) (+ b2))
,
and we have some clauses PA
and PB
proving (= a a2)
and (= b b2)
:
(steps
((hyp_bc (+ (box C))))
(
; first use box-assume
(stepc s1 (cl (- (box C)) (+ a) (+ b)) (box-assume C))
; discharge the box locally using the assumption
(stepc s2 (cl (+ a) (+ b))
(hres (init (ref s1)) (r1 (ref hyp_bc))))
; prove D
(stepc s3 (cl (+ a2) (+ b2))
(hres
(init (res s2))
(p1 (ref PA)) ; a=a2
(p1 (ref PB)) ; b=b2
))
; box D
(stepc s4 (cl (+ (box D)))
(box-proof (ref s3)))))
Because we add the negation of the assumptions,
the result of this sub-proof
is thus (cl (- (box C)) (+ (box D)))
.
We used the assumption mechanism to get rid of (- (box C))
locally
to avoid boxing it along with the other literals
when we apply box-proof
.
Tools
Here is a list of tools that can produce or consume Quip proofs.
Proof-producing provers and SMT solvers
- sidekick (WIP)
TODO: zipperposition
Proof-checking tools
- Quip (WIP) is a proof checker in OCaml.
Others
TODO: proof-to-dot and others