Introduction

Welcome to the Quip documentation. Quip ("Quick Proofs") is an experimental proof format for first-order and higher-order automatic theorem provers, designed to be easy to produce, and reasonably easy to check.

There are two sides to this documentation:

the point of view of theorem provers, SMT solvers, etc. that might produce Quip proofs;
the point of view of proof checkers that aim at validating such proofs.

NOTE: the proof format is experimental and its design will most likely fluctuate a lot for a while, as checkers and proof-producing provers are developed.

Overall design

A Quip proof is organized around a few main notions: terms, substitutions, clauses, and proof steps.

Terms are expressions in a simple higher-order logic similar to the ones of HOL light, HOL, Isabelle/HOL, etc. It features prenex polymorphism, applications, equality, variables, and lambda abstractions.

An example term might be $ p~ x = (\lambda y. f~ x~ y = y) $ where $ p $ is a predicate symbol and $ f $ a function symbol. We use here a ML-like syntax with currying.
Substitutions are finite mappings from variables to terms, noted as such: $ \{ x_1 \mapsto t_1, \ldots, x_n \mapsto t_n \} $.

Applying a substitution to a term yields a new term, where the variables have been replaced by their image in the substitution (or kept if they're not bound in the substitution)
Clauses are sets of literals¹, where each literal is a tuple $ sign, term $. The sign of a literal is a boolean that indicates whether the term is positive (true) or negative (false). A negative literal (false, t) fundamentally represents $ \lnot t $ but is distinct from the term $ \lnot t $ because the polarity (the sign) is only relevant in the context of a clause.

We will denote a negative literal as $ - t $ and a positive literal as $ + t $.

An example clause might be: \[ \{ - (a=b), - (c=d), + ((f~ a~ c) = (f~ b~ d)) \} \] It represents the basic congruence lemma: \[ a=b, c=d \vdash f~ a~ c = f~ b~ d \]
Proof steps are the basic building blocks of proofs. A proof step is composed of:
- a rule, which indicates how to process this step;
- a conclusion, which is the clause that this proof step produces;
- a set of premises, which are inputs to the proof step.
- optionally, some additional arguments such as terms, substitutions, etc.
A proof step asserts (via its particular rule) that the set of premises implies its conclusion. In practice not all the elements of a step might be explicitly written down! For example $ \text{refl}~ t $ (with $ t $ a term) is a basic proof step with rule "refl", no premises, and conclusion $ \{ + (t = t) \} $.

A distinguishing feature of Quip compared to some other proof formats such as the one in VeriT is that the rules are high-level and require proof checkers to do more work in order to verify them. In exchange, the hope is that we get more concise proofs that are also easier to produce from theorem provers.

TODO: citations for VeriT

Clauses can alternatively be seen as classical sequents, with negative elements on the left of $ \vdash $, and positive elements on the right of it.

Example

This is a very simple SMT problem in the category QF_UF, representing the problem $ a=b \land f~a = c \vdash f~b = c $ by trying to contradict it:

(set-logic QF_UF)
(declare-sort u 0)
(declare-fun a () u)
(declare-fun b () u)
(declare-fun c () u)
(declare-fun f (u) u)

(assert (= a b))
(assert (= (f a) c))
(assert (not (= (f b) c)))
(check-sat)
(exit)

With Sidekick's experimental proof producing mode, we solve this problem (it replies "UNSAT"):

$ sidekick example-pb.smt2 -o example-proof.quip
Unsat (0.003/0.000/0.000)

And we get the following proof:

(quip 1
 (steps ()
  ((deft $t1 (f a))
   (deft $t2 (= $t1 c))
   (deft $t3 (= a b))
   (deft $t4 (f b))
   (deft $t5 (= c $t4))
   (stepc c0 (cl (- $t2) (- $t3) (+ $t5))
    (cc-lemma (cl (- $t2) (- $t3) (+ $t5))))
   (stepc c1 (cl (- $t5)) (assert (not $t5)))
   (stepc c2 (cl (+ $t3)) (assert $t3))
   (stepc c3 (cl (+ $t2)) (assert $t2))
   (stepc c4 (cl )
    (hres (init (ref c0)) (r1 (ref c3)) (r1 (ref c2)) (r1 (ref c1)))))))

NOTE: the proof uses the S-expression format, but I do hope to have a more efficient binary format relatively early on.

Detailed explanation

proof start with (quip <version> <proof>). For now version is 1.
(steps (<assumptions>) (<steps>)) is the main composite proof rule. Let's ignore the assumptions for now. Each step in this rule is either a symbol definition (see below) or a (stepc <name> <clause> <proof>), which introduce a name for an intermediate lemma (the clause) as proved by the sub-proof.

A side-effect of this is that you can have partially correct¹ proofs if a step is used correctly in the remainder of the proof, but its proof is invalid.
(deft $t1 (f a)) is a term definition. It defines the new symbol $t1 as a syntactic shortcut for the term (f a), to be expanded as early as possible by the proof checker.

Similarly, $t5 is short for (= c (f b)).

This kind of definition becomes important in larger proofs, where re-printing a whole term every time it is used would be wasteful and would bloat proof files. Using definitions we can preserve a lot of sharing in the proofs.
(stepc c0 …) is the first real step of the proof. Here, c0 is the clause { - $t2, - $t3, + $t5 }, and is proved by the proof (cc-lemma …) (a congruence closure lemma, i.e. a tautology of equality). It is, in essence, the proof we seek; the rest of the steps are used to derive a contradiction by deducing "false" from c0 and our initial assumptions.
the steps deducing c1, c2, and c3 do so by using assert (meaning that the clause is actually an assumption of the initial problem), and then "preprocessing" them.

For (stepc c1 (cl (- $t5)) (hres (init (assert (not $t5))) (p1 (refl $t5)))) we get the following tree:

\[ \cfrac{ \cfrac{}{$t5 ~\tiny{(assume)}} \qquad \cfrac{}{$t5 = $t5 ~\tiny{(refl)}} }{$t5 ~\tiny{(para1)}} \]

In this tree we can see the general shape of preprocessing terms: assume the initial term ± t, prove + (t = u) (where u is the simplified version), and then use boolean paramodulation to obtain ± u.

It just so happens that no meaningful simplification occurred and so t and u are the same, and sidekick did not shorten the proof accordingly into (stepc c1 (cl (- $t5)) (assert (not $t5)))

or partially incorrect, depending on your point of view. Glass half-full and all that.

Quip for the Automatic Theorem Prover

Quip is biased towards being easy to produce from theorem provers and SMT solvers, while remaining reasonably efficient to check.

The easiness comes from several aspects:

redundancy in rules: many rules will have a general form (e.g. a congruence closure lemma, or hyper-resolution with $ n $ steps), and some shorter forms for the common case (e.g. unary resolution or the reflexivity rule).
rules can be quite high-level, requiring the proof checkers to reimplement congruence closure, resolution, etc.
the proof rules do not need to always specify their result, only enough information that the conclusion can be reconstructed.
proofs are based on a proof language ("proof terms") that allow for easy composition of several steps. This way it's not necessary to name each single clause occurring in the proof.

Quip for the Proof Checker

Quip's design favors the production of proofs, not their checking. However, proof checking should still be implementable in an efficient way.

A proof checker must implement a few core algorithms to be able to verify proofs (not to mention the particular set of theories it might support). These are:

Congruence Closure: equality is central to Quip, and most equality proofs will be of the shape $ t_1=u_1, \ldots, t_n = u_n \vdash t=u $. These can be checked in $ O(n ~ log(n)) $ time using Congruence Closure (See for example the [EGG paper]).
Resolution: clause-level reasoning is done via multiple steps of resolution and boolean paramodulation.

The core rules are resolution:

\[ \cfrac{C_1 \lor l \qquad C_2 \lor \lnot l}{ C_1 \lor C_2 } \]

and boolean paramodulation (where $t$ is a boolean term):

\[ \cfrac{C_1 \lor (t = u) \qquad C_2 \lor t}{ C1 \lor C_2 \lor u } \]

In practice, in the proof format, "hres" is the rule used to perform a series of such steps from an initial clause.
Instantiation: A clause $ C $ contains free variables $ \{ x_1, \ldots, x_n \} $. Given a substitution $ \sigma \triangleq \{ x_1 \mapsto t_1, \ldots, x_n \mapsto t_n \} $, we can deduce $ C\sigma $.

Reference

This chapter contains¹ a detailed specification of the proof language constructs. For the concrete file formats, see the S-expr format and Binary format chapters.

will eventually contain. 😇

Terms

Terms belong to higher-order logic, with prenex polymorphism. In general we denote terms as $ t $ or $ u $ (with indices if needed).

(TODO: provide some citations)

(TODO: specify types)

The term can be constructed in the following ways:

Variables, denoted $ x, y, z $. They can be free, or bound. Variables are typed.
Applications, denoted (f t1 t2 t3). Partial applications are transparent, meaning that (f t1 t2 t3) is short for (((f t1) t2) t3).
Constants, denoted $ a, b, c, f, g, h $ (where $a,b,c$ are of atomic type and $f, g, h$ are functions by convention).

Polymorphic constants are applied to type arguments. Quip does not accept a term made of a partially applied constant: polymorphism constants must always be applied to enough type arguments.

NOTE: This does not include constants introduced by the prover using (deft <name> <term>) steps. In that construct, say (deft c (f a b)), c and (f a b) are considered syntactically equal; the proof checker can just expand c into (f a b) at parse time and then forget entirely about c.
Binders, such as $ \lambda (x:\tau). t $, or $ \forall (x:\tau). t $. The latter is a shortcut for the application $ \text{forall}~ (\lambda (x:\tau). term) $ (where $ \text{forall} $ is a constant of type $ \Pi a. (a \to \text{bool}) \to \text{bool} $ ).

With lambda-abstraction comes $ \beta$-reduction (more details in the rules section).
Box is a special term constructor that has no logical meaning. We denote it (box <clause>), and it is equivalent to the boolean term represented by the clause. The use case of box is to hide the clause within so that it appears as an atomic constant to most rules but the ones specifically concerned with handling box.

For example (box (cl (- (forall (x:τ) (p x))) (+ (p a)))) represents the boolean term $ \lnot (\forall x:τ. p~ x) \lor p~ a $.

The benefit of box becomes apparent when one tries to use clauses to represent conditional proofs. If I want to represent that I have a proof of A assuming a proof of B, where B is a clause and not just a literal, and A is the clause {l_1, …, l_n}, then the simplest way to do so is: (cl (cl l_1 … l_n (- (box B)))).

See the rules about box.

Clauses

We denote clauses as (cl <lit>*). The empty clause is therefore (cl). Each literal is (<sign> <term>) with the sign being either + or - (for a clause-level negation).

For example, (cl (- true) (+ (= b b))) represents the clause $ \lnot \top \lor b=b $, or the sequent $ \top \vdash b=b $.

Proof Rules

Rules form an expression language designed to only return valid clause (i.e. theorems, or assumptions of the problem). Most rules are of the atomic kind (they take arguments but look like a function application); the main structuring rule permits the definition of named intermediate steps. This is also necessary to introduce sharing in a proof, where some clause might be proved once but used many times.

Atomic rules

ref ((ref <name>)): returns a proof previously defined using stepc (see composite proofs below) or a unit clause corresponding to a local assumption of steps. There is no need to re-prove anything, we assume the step was valid and just reuse its conclusion.

In other words, (ref c5) coming after (defc c5 (cl (+ p) (+ q)) <proof>) is a trivial proof of the clause (cl (+ p) (+ q)).

The serialization might use "@" instead of "ref".
refl ((refl <term>)): (refl t) is a proof of the clause (cl (+ (= t t))).
assert ((assert <term>)): proves a unit clause (cl (+ t)) (if the term is t), but only if it matches exactly an assertion/hypothesis of the original problem.

(TODO: be able to refer to the original formula by file+name if it was provided)
hres ((hres <proof> <hstep>+)): a fold-like operation on the initial proof's result. It can represent a resolution step, or hyper-resolution, or some boolean paramodulation steps. As opposed to the other rules which are mostly useful for preprocessing/simplification of the original problem, this is expected to be one of the main inference rules for resolution/superposition provers.

The proof step (hres (init p0) h1 h2 … hn) starts with the clause C0 obtained by validating p0. It then applies each "h-step" in the order they are presented. Assuming the current clause is C == (cl l1 … lm), each h-step can be one of the following:
- resolution ((r <term> <proof>)): (r t proof) resolves proof into a clause D which must contain a literal (+ t) or (- t). Then it performs boolean resolution between the current clause C and the clause D with pivot literal (+ t).
  
  Without loss of generality let's assume the proof returns D where $ D \triangleq D' \lor (+ t) $, and $ C \triangleq C' \lor (- t) $. Then the new clause is $ C' \lor D' $.
- unit resolution ((r1 <proof>)): (r1 proof) resolves proof into a clause D which must be a unit-clause (i.e. exactly one literal). Since there is not ambiguity on the pivot, it performs unit resolution between C and D on the unique literal of D.
- boolean-paramodulation ((p <term> <term> <proof>)): (p lhs rhs proof) resolves proof into a clause D which must contain a literal (+ (= lhs rhs)), where both lhs and rhs are boolean terms. In other words, $ D \triangleq lhs = rhs \lor D' $.
  
  The current clause C must contain a literal (± lhs); ie. $ C \triangleq C' \lor ± lhs $.
  
  the result is obtained by replacing lhs with rhs in C and adding D' back. Mathematically:
  
  \[ \cfrac{ lhs = rhs \lor D' \qquad lhs \lor C' }{ C' \lor D' \lor rhs } \]
- unit-boolean-paramodulation ((p1 <proof>)): Same as p but the proof must return a unit clause (+ (= lhs rhs)).
r ((r <term> <proof> <proof>)): resolution on the given pivot between the two clauses.

The proof term (r pivot p1 p2) corresponds to (hres p1 (r pivot p2)).
r1 ((r1 <proof> <proof>)): unit resolution. The shortcut (r1 p1 p2) allows the user to omit the pivot if one the the two proofs is unit (ie. has exactly one literal). It is the same as (hres p1 (r1 p2)).
rup ((rup <clause> <proof>+)): (rup c steps) proves the clause c by reverse unit propagation (RUP) on the results of steps. This corresponds to linear resolution but without specifying pivots nor the order.
cc-lemma ((ccl <clause>)): proves a clause c if it's a tautology of the theory of equality. There should generally be n negative literals and one positive literal, all of them equations.

An example:
```
(cc-lemma
  (cl
    (- (= a b))
    (- (= (f b) c))
    (- (= c c2))
    (- (= (f c2) d))
    (+ (= (f (f a)) d))))
```
cc-imply ((cc-imply (<proof>*) <term> <term>)): a shortcut step that combines cc-lemma and hres for more convenient proof production. Internally the checker should be able to reuse most of the implementation logic of cc-lemma.

(cc-imply (p1 … pn) t u) takes n proofs, all of which must have unit equations as their conclusions (say, p_i proves (cl (+ (= t_i u_i)))); and two terms t and u; and it returns a proof of (cl (+ (= t u))) if the clause (cl (- (= t_1 u_1)) (- (= t_2 u_2)) … (- (= t_n u_n)) (+ (= t u))) is a valid equality lemma (one that cc-lemma would validate).

In other words, (cc-lemma (p1…pn) t u) could be expanded (using fresh names for steps) to:
```
(steps ()
  ((stepc c_1 (cl (+ (= t_1 u_1))) p_1)
   (stepc c_2 (cl (+ (= t_2 u_2))) p_2)
   …
   (stepc c_n (cl (+ (= t_n u_n))) p_n)
   (stepc the_lemma
    (cl (- (= t_1 u_1)) (- (= t_2 u_2)) … (- (= t_n u_n)) (+ (= t u)))
    (cc-lemma
      (cl (- (= t_1 u_1)) (- (= t_2 u_2)) … (- (= t_n u_n)) (+ (= t u)))))
   (stepc res (cl (+ (= t u)))
    (hres
      (init (ref the_lemma))
      (r1 (ref c_1))
      (r1 (ref c_2))
      …
      (r1 (ref c_n))))))
```
clause-rw ((clause-rw <clause> <proof> (<proof>*))): the term (clause-rw c p steps) proves c from the result c0 of p using a series of equations in steps. Each equation in step is used to rewrite at least one literal of c0; no new literal is added, c is obtained purely by rewriting literals of c0, (or simplifying away if the literal rewrites to false).

A possible way to check this step is as follows.
- Compute c0 and the results of steps d1, …, dn which should be positive equations or positive atoms (a standing for a=true).
- Given c == (a1 \/ a2 \/ … \/ a_m), assert ¬a1, ¬a2, …, ¬a_m into a congruence closure (where asserting a means asserting a = true, and asserting ¬a means asserting a = false).
- For each literal b of c0 in turn, assert b into the congruence closure and query for true == false before undoing the assertion of b. If true == false can be proved from each case of b, then the step is valid, because c0 /\ d1 … dn |= c.
This rule is convenient for preprocessing of clauses. Each term can be preprocessed (rewritten into a simpler form) individually, possibly leading to the literal beeing removed (when shown absurd by preprocessing), and clause-rw can be used to tie together all these rewriting steps.

For example, (cl (+ (ite (x+1 > x) p q)) (+ false) (+ t)) could be simplified into (cl (+ p) (+ u)) assuming t simplifies to u. The proof would look like:
```
(stepc c (cl (+ p) (+ u))
  (clause-rw (cl (+ p) (+ u))
    (<proof of input clause)
    ((<proof of t=u)
     (<proof of ite simplification>))))
```
nn ((nn <proof>)): not-normalization: a normalization step that transforms some literals in a clause. It turns (- (not a)) into (+ a), and (+ (not a)) into (- a).
true is true (t-is-t): the lemma (cl (+ true)).
true neq false (t-ne-f): the lemma (cl (- (= true false))).

bool-c ((bool-c <name> <term>+)): (bool-c <name> <terms>) proves a boolean tautology of depth 1 using the particular sub-rule named <name> with some term argument(s). In other words, it corresponds to one construction or destruction axioms for the boolean connective and, or, =>, boolean =, xor, not, forall, exists.

The possible axioms are:

rule	axiom
`(and-i (and A1…An))`	`(cl (- A1) … (- An) (+ (and A1…An)))`
`(and-e (and A1…An) Ai)`	`(cl (- (and A1…An)) (+ Ai))`
`(or-e (or A1…An))`	`(cl (- (or A1…An)) (+ A1) … (+ An))`
`(or-i (or A1…An) Ai)`	`(cl (- Ai) (+ (or A1…An)))`
`(imp-e (=> A1…An B))`	`(cl (- (=> A1…An B)) (- A1)…(- An) (+ B))`
`(imp-i (=> A1…An B) Ai)`	`(cl (+ Ai) (+ (=> A1…An B)))`
`(imp-i (=> A1…An B) B)`	`(cl (- B) (+ (=> A1…An B)))`
`(not-e (not A))`	`(cl (- (not A)) (+ A))`
`(not-i (not A))`	`(cl (- A) (+ (not A))`
`(eq-e (= A B) A)`	`(cl (- (= A B)) (- A) (+ B))`
`(eq-e (= A B) B)`	`(cl (- (= A B)) (- B) (+ A))`
`(eq-i+ (= A B))`	`(cl (+ A) (+ B) (+ (= A B)))`
`(eq-i- (= A B))`	`(cl (- A) (- B) (+ (= A B)))`
`(xor-e- (xor A B))`	`(cl (- (xor A B)) (- A) (- B))`
`(xor-e+ (xor A B))`	`(cl (- (xor A B)) (+ A) (+ B))`
`(xor-i (xor A B) B)`	`(cl (+ A) (- B) (+ (xor A B)))`
`(xor-i (xor A B) A)`	`(cl (- A) (+ B) (+ (xor A B)))`
`(forall-e P E)`	`(cl (- (forall P)) (+ (P E)))`
`(forall-i P`	`(cl (- (= (\x. P x) true)) (+ (forall P)))`
`(exists-e P)`	`(cl (- (exists P)) (+ (P (select P))))`
`(exists-i P)`	`cl (- (P x)) (+ (exists P))` (where `x` fresh)

bool-eq ((bool-eq <term> <term>)): (bool-eq t u) proves the clause (cl (+ (= t u))) (where t and u are both boolean terms) if t simplifies to u via a basic simplification step.

This rule corresponds to the axioms:

TODO: also give names to sub-rules here

axiom
`(= (not true) false)`
`(= (not false) true)`
`(= (not (not t)) t)`
`(= (= true A) A)`
`(= (= false A) (not A))`
`(= (xor true A) (not A))`
`(= (xor false A) A)`
`(= (= t t) true)`
`(= (or t1…tn true u1…um) true)`
`(= (or false false) false)`
`(= (=> true A) A)`
`(= (=> false A) true)`
`(= (=> A true) true)`
`(= (=> A false) (not A))`
`(= (or false false) false)`
`(= (and t1…tn false u1…um) false)`
`(= (and true true) true)`

$ \beta $-reduction ((beta-red <term> <term>)): Given (lambda (x:τ) body) and u, returns the clause (cl (+ (= ((lambda (x:τ) body) u) body[x := u]))) where body[x := u] denotes the term obtained by substituting x with u in body in a capture avoiding way.

Mathematically: \[ \vdash (\lambda x:\tau. t)~ u = t[x := u] \]

substitution ((subst <subst> <proof>)): (subst sigma p) returns the same claues as p, but where free variables have been replaced according to the substitution. A substitution has the form (x1 e1 x2 e2 … xn en) where x_i are names (variables) and e_i are expressions. Bindings are parallel (substitution doesn't apply recursively).

For example, (subst (x a y (f x)) p), when p proves (cl (+ (p x)) (+ q y)), proves (cl (+ (p a)) (+ q (f x))). x is not substituted again in the image of y.

TODO: lra (in own file) TODO: datatypes (in own file)

TODO: instantiation

TODO: reasoning under quantifiers

Composite rule

The main structuring construct for proofs is steps. Its structure is (steps (<assumption>*) (<step>+)).

Each assumption is a pair (<name> <literal>). As a reminder, literals are of the shape (+ t) or (- t).

These assumptions can be used in the steps by using (ref <name>) (see below), which is a trivial proof of the unit clause (cl <literal>).
Each step is one of:
- Term definition ((deft <name> <term>)), which introduces an alias for a term. <name> must not be in the signature of the original problem.
  
  Logically speaking, after (deft c t), c and t are syntactically the same. c has no real existence, it is only a shortcut, so a proof of (cl (+ (= c t))) can be simply (refl t) (or (refl c)).
- Reasoning step ((stepc <name> <clause> <proof>)): (stepc name clause proof) (where name must be fresh: be defined nowhere else) introduces name as a shortcut for proof in the following steps.
  
  The result of proof must be exactly clause, otherwise the step fails. This means we can start using (ref name) in the following steps before we validate proof, since we know the result ahead. In a high-performance proof checker this is a good opportunity for parallelisation, where proof can be checked in parallel with other steps that make use of its result.
The result of (steps (A1 … Am) (S1 … Sn)), where the last step Sn has as conclusion a clause C with literals (cl l1 … ln), is the clause (cl l1 … ln ¬A1 … ¬Am).

In particular, if Sn proves the empty clause, then (steps …) proves that at least one assumption must be false. If both Sn's conclusion and the list of assumptions are empty, then the result is the empty clause.
The list of assumptions can be useful either for subproofs, or to produce a proof of unsatisfiability for the SMTLIB v2.6 statement (check-sat-assuming (<lit>+)).

Toplevel composite rule

The syntax allows the toplevel composite rule (i.e. the main proof, as opposed to a subproof) to be flattened into a list of S-expressions.

It's probably not very useful to have assumptions here¹.

For example, a proof could look like:

(quip 1)

(step1)
(step2)
(step3)
…

famous last words.

Box rules

Box is a term constructor that allows abstracting a clause into an opaque constant term. This way, a whole clause can become a literal in another clause without having to deal with true nested clauses; see the section on terms for more details.

First, a few remarks.

(box C) and (box D) are syntactically the same term if and only if (iff) C and D are the same clause, modulo renaming of free variables and reordering of literals.
(box C) is opaque and will not be traversed by congruence closure.

That said, we have a few rules to deal with boxes:

box-assume ((box-assume <clause>)): (box-assume c), where c is (cl l1 l2 … ln), proves the tautology (cl (- (box c)) l1 l2 … ln).

If one erases box (which is, semantically, transparent), this corresponds to the tautology $ \lnot (\forall \vec{x} C) \lor \forall \vec{x} C $ where $ \vec{x} $ is the set of free variables in $ C $.

The use-case for this rule is that we can assume C freely and use it in the rest of the proof, provided we keep an assumption (box C) around in a negative literal. Once we actually prove C we can discharge (box C).
box-proof ((box-proof <proof>)): given a proof p with conclusion C, this returns a proof whose conclusion is (cl (+ (box C))). Semantically it is merely the identity.

An interesting possibility offered by box is simplifications in a tactic framework. A simplification rule might take a goal clause A, and simplify it into a goal clause B. To justify this, the theorem prover might produce a proof (cl (- (box B)) l1 … ln) (assuming A is (cl l1 … ln)) which means $ B \Rightarrow A $. It might do that by starting with (box-assume B) and applying the rewrite steps backward to get back to the literals of A.

Once the goal B is proved, we obtain a proof of B which we can lift to a proof of (cl (+ (box B))) using box-lift; then we only have to do unit-resolution on (cl (+ (box B))) and (cl (- (box B)) l1 … ln) to obtain (cl l1 … ln), ie. the original goal A.

Another possibility is to box a full clause, and use it as an assumption in a sub-proof using steps. Then, box-assume can be used to make use of the assumption by resolving it with the (- (box C)) literal from box-assume.

Full example

We're going to explore a bigger example: the proof of unsatisfiability of the SMTLIB problem QF_UF/eq_diamond/eq_diamond2.smt2.

It starts normally:

(quip 1 (steps () ((deft $t1 (= x0 y0))
(deft $t2 (= y0 x1))

followed by a bundle of term definitions for better sharing (note that the _tseitin_xxx symbols are actually introduced by Sidekick during clausification; they're not just proof printing artefacts):

(deft $t2 (= y0 x1))
(deft _tseitin_and_0 (and $t1 $t2))
(deft $t3 (= x0 z0))
(deft $t4 (= x1 z0))
(deft _tseitin_and_1 (and $t3 $t4))
(deft _tseitin_or_2 (or _tseitin_and_0 _tseitin_and_1))
(deft $t5 (= x0 x1))
(deft $t6 (not $t5))
(deft _tseitin_and_3 (and _tseitin_or_2 $t6))

and then the actual proof:

(stepc c0 (cl (- $t1) (- $t2) (+ $t5))
  (ccl (cl (- $t1) (- $t2) (+ $t5))))
(stepc c1 (cl (- _tseitin_and_3) (- $t5))
  (nn (bool-c and-e _tseitin_and_3 $t6)))
(stepc c2 (cl (+ _tseitin_and_3)) (assert _tseitin_and_3))
(stepc c3 (cl (- $t5)) (hres (@ c1) ((r1 (@ c2)))))
(stepc c4 (cl (- _tseitin_and_0) (+ $t2))
  (nn (bool-c and-e _tseitin_and_0 $t2)))
(stepc c5
  (cl (- _tseitin_or_2) (+ _tseitin_and_1) (+ _tseitin_and_0))
  (nn (bool-c or-e _tseitin_or_2)))
(stepc c6 (cl (- $t3) (- $t4) (+ $t5))
  (ccl (cl (- $t3) (- $t4) (+ $t5))))
(stepc c7 (cl (- $t4) (- $t3)) (hres (@ c6) ((r1 (@ c3)))))
(stepc c8 (cl (- _tseitin_and_1) (+ $t3))
  (nn (bool-c and-e _tseitin_and_1 $t3)))
(stepc c9 (cl (- _tseitin_and_1) (+ $t4))
  (nn (bool-c and-e _tseitin_and_1 $t4)))
(stepc c10 (cl (- _tseitin_and_1))
  (hres (@ c7) ((r $t4 (@ c9)) (r $t3 (@ c8)))))
(stepc c11 (cl (- _tseitin_and_3) (+ _tseitin_or_2))
  (nn (bool-c and-e _tseitin_and_3 _tseitin_or_2)))
(stepc c12 (cl (+ _tseitin_or_2))
  (hres (@ c11) ((r1 (@ c2)))))
(stepc c13 (cl (+ _tseitin_and_0))
  (hres (@ c5) ((r1 (@ c12)) (r1 (@ c10)))))
(stepc c14 (cl (+ $t2)) (hres (@ c4) ((r1 (@ c13)))))
(stepc c15 (cl (- _tseitin_and_0) (+ $t1))
  (nn (bool-c and-e _tseitin_and_0 $t1)))
(stepc c16 (cl (+ $t1)) (hres (@ c15) ((r1 (@ c13)))))
(stepc c17 (cl)
  (hres (@ c0) ((r1 (@ c16)) (r1 (@ c14)) (r1 (@ c3))))))))

Note that the last step returns the empty clause, which means we did prove the problem to be unsatisfiable:

(stepc c17 (cl)
  (hres (@ c0) ((r1 (@ c16)) (r1 (@ c14)) (r1 (@ c3))))))))

Let's examine a few steps.

c0:
```
(stepc c0 (cl (- $t1) (- $t2) (+ $t8))
  (ccl (cl (- $t1) (- $t2) (+ $t5))))
```
which means that $ x0 = y0, y0 = x1 \vdash x0 = x1 $ is a tautology of the theory of equality. Indeed it is, it's the transitivity axiom.
c3:
```
(stepc c3 (cl (- $t5)) (hres (@ c1) ((r1 (@ c2)))))
```
Here, we have:
- c1 is (cl (- _tseitin_and_3) (- (= x0 x1))
- c2 is (cl (+ _tseitin_and_3))
- so, by resolution of c1 and c2 (note the use of "r1" since c2 is unit: we do not need to specify a pivot) we obtain (cl (- (= x0 x1))).
c4:
```
(stepc c4 (cl (- _tseitin_and_0) (+ $t2))
  (nn (bool-c and-e _tseitin_and_0 $t2)))
```
where _tseitin_and_0 is actually a name for (and $t1 $t2).

The bool-c rule is valid since (cl (+ $t2) (- (and $t1 $t2))) is one of the basic tautology on the and boolean connective. We use nn to make sure we eliminate weird literals like (- (not a)).
c10:
```
(stepc c10 (cl (- _tseitin_and_1))
  (hres (@ c7) ((r $t4 (@ c9)) (r $t3 (@ c8)))))
```
We have:
- c7 is (cl (- $t5) (- $t4))
- c9 is (cl (- _tseitin_and_1) (+ $t5))
- c8 is (cl (- _tseitin_and_1) (+ $t4))
And the hyper-resolution steps therefore go:

clause step

(cl (- $t5) (- $t4)) start with c7

(cl (- $t4) (- _tseitin_and_1)) resolve with c9 on $t5

(cl (- _tseitin_and_1)) resolve with c8 on $t4
c17:
```
(stepc c17 (cl)
  (hres (@ c0) ((r1 (@ c16)) (r1 (@ c14)) (r1 (@ c3))))))))
```
We prove the empty clause! Here we have:
- c0 is (cl (- $t1) (- $t2) (+ $t8))
- c16 is (cl (+ $t1))
- c14 is (cl (+ $t2))
- c3 is (cl (- $t8))
So by starting with c0 and performing unit resolution steps we get:

clause step

(cl (- $t1) (- $t2) (+ $t8)) start with c0

(cl (- $t2) (+ $t8)) unit-resolve with c16

(cl (+ $t8)) unit-resolve with c14

(cl ) unit-resolve with c3

clause	step
`(cl (- $t5) (- $t4))`	start with `c7`
`(cl (- $t4) (- _tseitin_and_1))`	resolve with `c9` on `$t5`
`(cl (- _tseitin_and_1))`	resolve with `c8` on `$t4`

clause	step
`(cl (- $t1) (- $t2) (+ $t8))`	start with `c0`
`(cl (- $t2) (+ $t8))`	unit-resolve with `c16`
`(cl (+ $t8))`	unit-resolve with `c14`
`(cl )`	unit-resolve with `c3`

More on boxing

If two clauses C and D are known to be equivalent, but are not syntactically the same, one can prove (cl (- (box C)) (+ (box D))) (and conversely) as follows.

Let's assume C is (cl (+ a) (+ b)), D is (cl (+ a2) (+ b2)), and we have some clauses PA and PB proving (= a a2) and (= b b2):

(steps
 ((hyp_bc (+ (box C))))
 (
  ; first use box-assume
  (stepc s1 (cl (- (box C)) (+ a) (+ b)) (box-assume C))

  ; discharge the box locally using the assumption
  (stepc s2 (cl (+ a) (+ b))
    (hres (init (ref s1)) (r1 (ref hyp_bc))))

  ; prove D
  (stepc s3 (cl (+ a2) (+ b2))
    (hres
      (init (res s2))
      (p1 (ref PA)) ; a=a2
      (p1 (ref PB)) ; b=b2
      ))

  ; box D
  (stepc s4 (cl (+ (box D)))
    (box-proof (ref s3)))))

Because we add the negation of the assumptions, the result of this sub-proof is thus (cl (- (box C)) (+ (box D))).

We used the assumption mechanism to get rid of (- (box C)) locally to avoid boxing it along with the other literals when we apply box-proof.

Tools

Here is a list of tools that can produce or consume Quip proofs.

Proof-producing provers and SMT solvers

sidekick (WIP)

TODO: zipperposition

Proof-checking tools

Quip (WIP) is a proof checker in OCaml.

Others

TODO: proof-to-dot and others

Quip: Quick Proofs