A PROOF SYSTEM FOR GRAPH (NON)-ISOMORPHISM VERIFICATION

. In order to apply canonical labelling of graphs and isomorphism checking in interactive theorem provers, these checking algorithms must either be mechanically veriﬁed or their results must be veriﬁable by independent checkers. We analyze a state-of-the-art algorithm for canonical labelling of graphs (described by McKay and Piperno) and formulate it in terms of a formal proof system. We provide an implementation that can export a proof that the obtained graph is the canonical form of a given graph. Such proofs are then veriﬁed by our independent checker and can be used to conﬁrm that two given graphs are not isomorphic.


Introduction
An isomorphism between two graphs is a bijection between their vertex sets that preserves adjacency.Testing whether two given graphs are isomorphic is a problem that has been studied extensively since the 1970s (so much so that it has even been called a "disease" among algorithm designers).Since then many efficient algorithms have been proposed and practically used (e.g., [BKL83,MoCS81,MP14,LAC11]).Most state-of-the-art algorithms and tools for isomorphism checking (e.g., nauty, bliss, saucy, and Traces) are based on canonical labellings [MP14].They assign a unique vertex labelling to each graph, so that two graphs are isomorphic if and only if their canonical labellings are the same.Such algorithms can also compute automorphism groups of graphs.Graph isomorphism checking and the computation of canonical labellings and automorphism groups are useful in many applications.For example, they are used in many algorithms that enumerate and count isomorph-free families of some combinatorial objects [McK98].Traditional applications of graph isomorphism testing are found in mathematical chemistry (e.g.identifying chemical compounds in chemical databases) and in electronic design automation (e.g., checking whether electronic circuits represented by a schematic and an integrated circuit layout are identical), and there are more recent applications in machine learning (e.g., [NAK16]).
In the past decades, we have witnessed the rise of automated and interactive theorem proving and its application in formalizing many areas of mathematics and computer science [Hal08, ABK + 18].Some of the most famous results in interactive theorem proving had significant combinatorial components.For example, the proof of the four-color theorem [Gon07] required the analysis of several hundred non-isomorphic graph configurations, in some cases involving more than 20 million different cases.The formal proof of Kepler's conjecture about the optimal packing of spheres in three-dimensional Euclidean space [HAB + 17] required the identification, enumeration and analysis of several thousand nonisomorphic graphs [Nip11], which was done using a custom formally verified algorithm.Initial steps to verify generic algorithms for the isomorph-free enumeration of combinatorial objects have been taken (e.g., the Faradžev-Read enumeration has been formally verified within Isabelle/HOL [Mar20]).Several other such algorithms (e.g., [McK98]) would benefit from trustworthy graph isomorphism checking, automorphism group computation, and canonical labelling.
Isomorphism checking can be used in theorem proving applications only if it is somehow verified.The simple case is when an isomorphism between two graphs is explicitly constructed, since it is then easy to independently verify that the graphs are isomorphic.However, if an algorithm asserts that the given graphs are not isomorphic, that assertion is not easily independently verifiable.One approach would be to fully formally verify the isomorphism checking algorithm and its implementation in an interactive theorem prover.This would be a very difficult and tedious task.The other approach, which we are investigating in this paper, is to extend the isomorphism checking algorithm so that it can generate certificates for non-isomorphism.These certificates would then be verified by an independent checker, which is much simpler than the isomorphism checking algorithm itself.A similar approach has already been successfully used in SAT and SMT solving [WHJ14,HKM16,Lam20].In particular, we will extend a canonical labelling algorithm to produce a certificate confirming that the canonical labelling has been correctly computed.
In the rest of the paper we will examine the scheme for canonical labelling and graph isomorphism checking proposed by McKay and Piperno [MP14] and formulate it in the form of a proof system (in the spirit of the proof systems for SAT and SMT [KG07,NOT06,MJ11]).We prove the soundness and completeness of the proof system and implement a proof checker for it.We develop our own implementation of McKay and Piperno's algorithm, which exports canonical labelling certificates (proofs in our proof-system).These certificates are then checked independently by our proof checker.We also report experimental results.Finally, we formalize central results (the correctness of McKay/Piperno's abstract scheme and the soundness of our proof system) in Isabelle/HOL 1 .

Background
Notation used in this text is mostly borrowed from McKay and Piperno [MP14].Let G = (V, E) be an undirected graph with the set of vertices V = {1, 2, . . ., n} and the set of edges E ⊆ V ×V .Let π be a coloring of the graph's vertices, i.e. a surjective function from V to {1, . . ., m}.The set π −1 (k) (1 ≤ k ≤ m) is a cell of a coloring π corresponding to the color k (i.e. the set of vertices from G colored with the color k).We will also call this cell the k-th cell of π, and the coloring π can be represented by the sequence π −1 (1), π −1 (2), . . ., π −1 (m) of its cells.The pair (G, π) is called a colored graph.
A coloring π is finer than π (denoted by π π) if π(u) < π(v) implies π (u) < π (v) for all u, v ∈ V (note that this implies that each cell of π is a subset of some cell of π).A coloring π is strictly finer than π (denoted by π ≺ π) if π π and π = π.A coloring π is discrete if all its cells are singleton.In that case, the coloring π is a permutation of V (i.e. the colors of π can be considered as unique labels of the graph's vertices).

McKay and Piperno's scheme
The graph isomorphism algorithm considered in this text is described by McKay and Piperno in [MP14].The central idea is to construct the canonical form of a graph (G, π) (denoted by C(G, π)) which is a colored graph obtained by relabelling the vertices of G (i.e.there is a permutation σ ∈ S n such that C(G, π) = (G, π) σ ), such that the canonical forms of two graphs are identical if and only if the graphs are isomorphic.In this way, the isomorphism of two given graphs can be checked by computing canonical forms of both graphs and then comparing them for equality.
McKay and Piperno describe an abstract framework for computing canonical forms.It is parameterised by several abstract functions that, when fixed, yield a concrete, deterministic algorithm for computing canonical forms.Any choice of these functions that satisfies certain properties (given as axioms) will result in a correct algorithm.
In the abstract framework, the canonical form of a graph is computed by constructing the search tree of a graph (G, π 0 ), where π 0 is the initial coloring of the graph G.This tree is denoted by T (G, π 0 ).Each node of this tree is associated with a unique sequence ν = [v 1 , v 2 , . . ., v k ] of distinct vertices of G.The root of the tree is assigned the empty sequence [ ].The length of a sequence ν is denoted by |ν|, and the prefix of a length i of a sequence ν is denoted by [ν] i .The subtree of T (G, π 0 ) rooted at a node ν is denoted by T (G, π 0 , ν).A permutation σ ∈ S n acts on search (sub)trees by acting on each of the sequences associated with its nodes.
Each node ν of the search tree is also assigned a coloring of G, denoted by R(G, π 0 , ν).The refinement function R is one of the parameters that must satisfy the following three properties for the algorithm to be correct: (R1) The coloring R(G, π 0 , ν) must be finer than π 0 .
(R2) It must individualize the vertices from ν, i.e. each vertex from ν must be in a singleton cell of R(G, π 0 , ν). (R3) The function R must be label invariant, that is, it must hold that R(G, π 0 , ν) σ = R(G σ , π σ 0 , ν σ ).If the coloring assigned to a node ν is discrete, then this node is a leaf of the search tree.Otherwise, we choose some non-singleton cell of the coloring R(G, π 0 , ν), which we call the target cell and denote by T (G, π 0 , ν).The function T is called target cell selector and is another parameter that must satisfy the following properties for the algorithm to be correct: (T1) For leaves of the search tree T returns an empty cell.(T2) For inner nodes of the search tree T returns some non-singleton cell of R(G, π 0 , ν). (T3) The function T must be label invariant, i.e. it must hold T (G, π 0 , ν Note that the coloring of each child of the node ν will individualize at least one more vertex, which guarantees that the tree T (G, π 0 ) will be finite.Note also that if the order of the child nodes is fixed (and this can be achieved by assuming that the cell elements w 1 to w m are given in ascending order), once the functions R and T are fixed, the tree T (G, π 0 ) is uniquely determined from the original colored graph (G, π 0 ).
Since the coloring π = R(G, π 0 , ν) is discrete if and only if the node ν is a leaf, and since discrete colorings can be viewed as permutations of the graph's vertices, a graph G ν = G π can be associated to any leaf ν.
Example 3.1.In the example in Figure 1, the target cell is always the first non-singleton cell of R(G, π 0 , ν) (in each internal search tree node there is only one non-singleton cell, therefore the target cell selector T is trivial).The coloring R(G, π 0 , [ ]) is equal to π 0 , and π = R(G, π 0 , [ν, w]) introduces new cells with respect to R(G, π 0 , ν) only by individualizing w.The resulting colorings are shown under each node in Figure 1 (as sequences of cells, that are sets of vertices).
Each node ν of T (G, π 0 ) is also assigned a node invariant, denoted by φ(G, π 0 , ν).Node invariants map tree nodes to elements of some totally ordered set (usually lexicographically ordered sequences of numbers).
The node invariant function φ is another parameter of the algorithm, and must satisfy the following conditions for the algorithm to be correct: for some label invariant function f that maps the tree nodes to elements of a totally ordered set.Then φ trivially satisfies the axioms φ1 and φ3, assuming the lexicographical order.To ensure that the axiom φ2 also holds, for each leaf node ν we must append g(G ν ) to this list (i.e. for leaves we define φ(G, π 0 ), where g is some injective function that encodes a graph by an element of some totally ordered set (e.g., it could be some representation of the graph's adjacency matrix).
Figure 1: A search tree that generates all permutations of the vertex set.In each search tree node, the colored graph (each graph node contains the number denoting its color), the sequence ν, and the coloring R(G, π 0 , ν) (given as a sequence of its cells) are shown.
To simplify the formal setup and avoid the need of appending the graph representation g(G ν ) to the list of values of the function f , we can slightly change the first two axioms for φ (the axiom φ3 remains unchanged): for each node ν and for each permutation σ.In this setup, we only require that φ behaves like a lexicographic order, and omit graph comparison present in the original axiom φ2.Now, we can define the node invariant function φ uniformly, for all nodes ν, as φ(G, π 0 and the new axioms trivially hold. Let φ max = max{φ(G, π 0 , ν) | ν is a leaf}.A maximal leaf is any leaf ν such that φ(G, π 0 , ν) = φ max .Notice that with our new axioms there can be multiple maximal leaves and, contrary to the original axioms, their corresponding graphs may be distinct.Let us assume that there is a total ordering defined on graphs (for instance, we can compare their adjacency matrices lexicographically).Let G max = max{G ν | ν is a maximal leaf} (that is, G max is a maximal graph assigned to some maximal leaf).Again, there may be more than one maximal leaves corresponding to G max .Among them, the one labeled with the lexicographically smallest sequence will be called the canonical leaf of the search tree T (G, π 0 ), and will be denoted by ν * .Let π * = R(G, π 0 , ν * ) be the coloring of the leaf ν * .The graph G π * = G max that corresponds to the canonical leaf ν * is the canonical form of 9:6 (G, π 0 ), that is, we define C(G, π 0 ) = (G π * , π π * 0 ).So, in order to find the canonical form of the graph, we must find the canonical leaf of the search tree.
Note that once the node invariant function φ, and the graph ordering are fixed, the canonical leaf ν * and the canonical form C(G, π 0 ) are uniquely defined based only on the initial colored graph (G, π 0 ).Example 3.2.Continuing the previous example, let f (G, π 0 , ν) be the maximal vertex degree of the cell R(G, π 0 , ν) −1 (1) and let φ(G, π 0 , ν) be defined as the sequence . Note that f is label invariant, implying that φ is a valid node invariant.The node invariants for the nodes of the search tree from Figure 1, the adjacency matrices of the graphs corresponding to the leaves of the tree, and the canonical leaf of the tree are shown in Figure 2.
Note that although the abstract scheme described in this section is slightly modified compared to the original description given in [MP14] (mainly concerning the axioms for the function φ and the definition of the canonical form), the main result stated in Theorem 3.3 still holds.A proof of the theorem in our context is provided in the appendix of this paper.We have also formalized the described abstract scheme and formally proved the Theorem 3.3 in Isabelle/HOL.3.1.Search tree pruning.Suppose that the search tree T (G, π 0 ) is constructed.In order to find its canonical leaf efficiently, the search tree is pruned by removing the subtrees for which it is somehow deduced that they cannot contain the canonical leaf.There are two main types of pruning mentioned in [MP14] that are of interest here.
The other type of pruning is based on automorphisms -if there are two nodes ν and ν such that |ν | = |ν | and ν < lex ν (where < lex denotes the lexicographic order relation), and there is an automorphism σ ∈ Aut(G, π 0 ) such that ν σ = ν , then the subtree T (G, π 0 , ν ) can be pruned (operation P C in [MP14]). 2  In practical implementations, the search tree is not explicitly constructed, and pruning is done as it is traversed, as soon as possible.When a subtree is marked for pruning, its traversal is completely avoided, which is the main source of efficiency gains.The algorithm keeps track of the current canonical leaf candidate, which is updated whenever a "better" leaf is discovered.
Example 3.4.The search tree in Figure 2 can be pruned with either of the two types of pruning operations.For example, nodes with the node invariant [2, 1] can be pruned, because a node with the node invariant [2, 2] is present in the tree and [2, 1] < [2, 2].Some nodes may also be pruned due to the automorphism (a c), as shown in Figure 3.We have formalized pruning operations in Isabelle/HOL and proved their correctness.

Proving canonical labelling and non-isomorphism
To verify that two graphs are indeed non-isomorphic, we can either unconditionally trust the algorithm that computed the canonical forms (which turned out to be different), or extend the algorithm to produce a certificate (a proof ) for each canonical form it computes, which can be independently verified by an external tool.A certificate is a sequence of easily verifiable steps that proves that the graph computed by the algorithm is equal to the canonical form C(G, π 0 ) as defined by the McKay and Piperno's scheme.The proof system presented in this paper assumes that the scheme is instantiated with a concrete refinement function, target cell selector, and node invariant function, denoted respectively by R, T , and 2 In [MP14], the operation PB is also considered, which removes a node whose invariant is different from the invariant of the node on the same level on the path to a predefined leaf ν0.This type of pruning is only used to discover the automorphism group of the graph, but not for the discovery of its canonical form, so it is not considered in this work.
φ.These functions are specified in the following text, and are similar to the functions used in real implementations.This approach assumes the correctness of the abstract McKay and Piperno's scheme.This meta-theory has been well-studied in the literature [MoCS81,MP14], and we have formalized it in Isabelle/HOL.It is also assumed that our concrete functions R, T and φ satisfy the corresponding axioms (and this is explicitly proved in the following text and in our Isabelle/HOL formalization).Therefore, the soundness of the approach is based on Theorem 3.3, instantiated for this particular choice of R, T and φ.
On the other hand, the correctness of the implementation of the algorithm that computes the canonical form of a given graph is neither assumed nor verified.Instead, for the given colored graph (G, π 0 ) it generates a colored graph (G , π ) and a proof P certifying that (G , π ) is the canonical form of the graph (G, π 0 ) (i.e.C(G, π 0 ) = (G , π )).The certificate P is verified by an independent proof-checker.Note that the implementation of the proofchecker is much simpler than the implementation of the algorithm for computing the canonical form, so it can be much more easily trusted or verified within a proof assistant.4.1.Refinement function R. In this section we will define a specific refinement function R and prove its correctness i.e., prove that it satisfies axioms R1-R3.
We say that the coloring π is equitable if for every two vertices v 1 and v 2 of the same color, and every color k of π, the numbers of vertices of color k adjacent to v 1 and to v 2 are equal.As in [MP14], we define the coloring R(G, π 0 , ν) as the coarsest equitable coloring finer than π 0 that individualizes the vertices from the sequence ν.Such a coloring is obtained by individualizing the vertices from ν, and then making the coloring equitable by an iterative procedure [MP14].In each iteration we choose a cell W from the current coloring π (called a splitting cell ), and then partition all the cells of π with respect to the number of adjacent vertices from W .This process is repeated until an equitable coloring is achieved.It is known [MP14] that such a coloring is unique up to the order of its cells.Thus, in order to fully determine the coloring R(G, π 0 , ν), we must fix this order in some way.The order of the cells depends on the order in which the splitting cells are chosen and the order in which the obtained partitions of a split cell are inserted into the resulting coloring.In our setting, we use the following strategy: • We always choose the splitting cell that is the first in the current coloring π (i.e. the cell corresponding to the smallest color of π) that has some effect (causes splitting of some cell).• The obtained fragments of a split cell X are ordered by the number of adjacent vertices from the current splitting cell W (ascending).After the fragments are ordered in this way, the first fragment with the maximum possible size is moved to the end of the sequence, preserving the order of other fragments. 3n efficient procedure that computes R(G, π 0 , ν) is given in Algorithm 2. This procedure first constructs the initial equitable coloring, and then individualizes vertices from ν one by one, making the coloring equitable after each step.Individualization of a vertex v is done by replacing the cell W containing v in the current coloring π with two cells {v} and W \ {v} in that order.In order to obtain an equitable coloring finer than the current coloring π, the procedure make equitable(G, π, α) is invoked (Algorithm 1), where α is the set of splitting cells to use (a subset of the set of cells of π).In case of initial equitable coloring (when ν is the empty sequence), α contains all the cells of π 0 .Otherwise, α contains only the cell {v}, where v is the vertex that is individualized last.The procedure make equitable(G, π, α) implements the iterative algorithm given in [MP14], respecting the order of cells that is described in the previous paragraph.
Algorithm 1 make equitable(G, π, α) Require: (G, π) is a colored graph Require: α is a subset of cells from π Ensure: π is the coarsest equitable coloring of G finer than π obtained from π by partitioning its cells with respect to the cells from α begin π = π while π is not discrete and α is not empty do {let W be the first cell of π that belongs to α} remove W from α for each non-singleton cell X of π do partition X into X 1 , X 2 , . . .X k , according to the number of adjacent vertices from W { assume that the sequence X 1 , X 2 , . . ., X k is sorted by the number of adjacent vertices from W (ascending)} { let j be the smallest index such that . .X j−1 , X j+1 , . . ., X k , X j {in that order (in situ)} for each X i in {X 1 , . . ., X j−1 , X j+1 , . . ., X k } do add X i to α if X ∈ α then replace X by X j in α return π end The next two lemmas state some properties of the procedure make equitable(G, π, α), needed for proving the correctness of the refinement function itself.
Using the previous two lemmas, we can prove the following lemma which states that the refinement function R satisfies the properties required by the McKay and Piperno's scheme.
The cells that are used for splitting are determined by the set α, so the algorithm does not need to perform any significant additional work to find the next splitting cell.The next lemma shows that the cell W chosen as a splitting cell in an arbitrary iteration of the while loop is indeed the first cell of π for which the partitioning of π with respect to W would possibly have some effect on π .

9:10
Algorithm 2 R(G, π 0 , ν) Let W be the cell of π that is chosen as a splitting cell in some arbitrary iteration of the while loop of the procedure make equitable(G, π, α).Let W be any cell of π that precedes W in π (that is, it corresponds to a smaller color).Then W does not cause any splitting, i.e., for any two vertices of a same color in π the numbers of their adjacent vertices from W are equal.
A naive implementation of the make equitable procedure does not use the set α and tries to find the splitting cell by probing cells.It iterates through cells of π (in order) trying to use them for splitting either until splitting on the current cell has some effect (some cell is split), or until all cells are processed with no effect, proving that the current coloring is equitable.We use that simpler procedure within our proof system and proof checker (since it is simpler than the efficient one) and we have formally proved its correctness in Isabelle/HOL.4.2.Target cell selector T .We assume that the target cell T (G, π 0 , ν) that corresponds to the node ν of the search tree is the first non-singleton cell of the coloring R(G, π 0 , ν) if such a cell exists, or the empty set, otherwise (this only happens in leaves).It trivially satisfies the axioms T1 and T2, while the following lemma states that the target cell chosen this way is label invariant, and therefore satisfies the axiom T3.
Lemma 4.5.For each σ ∈ S n it holds T (G σ , π σ 0 , ν σ ) = T (G, π 0 , ν) σ .We have formalized T and its properties in Isabelle/HOL.4.3.Node invariant φ.Node invariants φ are defined using a suitable function on colored graphs (denoted by f hash (G, π)) which is label invariant, i.e., f hash (G σ , π σ ) = f hash (G, π) for each permutation σ ∈ S n .Our approach to construct such function is based on quotient graphs [MP14].The quotient graph that corresponds to a colored graph (G, π) is the graph whose vertices are the cells of π labeled by the cell number and size, and the edge that connects any two cells W 1 and W 2 is labeled by the number of edges between the vertices of W 1 and W 2 in G.It is easy to prove that quotient graphs are label invariant, so any hash function that depends only on the quotient graph will also be label invariant.In our setting, the node invariant φ(G, π 0 , ν) that corresponds to the node ν The function f hash first constructs the quotient graph, and then applies some fixed hash function to it.The node invariants φ are ordered lexicographically.
The following lemma proves that the axioms φ1-φ3 are satisfied for such defined node invariant function.
Lemma 4.6.The following properties of the function φ hold: for each node ν and for each permutation σ ∈ S n We have formalized the notion of quotient graphs, defined the function φ and formally proved its properties in Isabelle/HOL.4.4.The proof system.A proof is a sequence of rule applications, each deriving one or more facts from the premises, which are the facts already derived by the preceding rules in the sequence.The proof can be built during the search tree traversal by exporting rule applications determined based on the operations performed during the algorithm execution.However, it is possible to optimize this approach and emit proofs after the search tree has been completely processed -such proofs can be much shorter, avoiding derivation of many facts that are not relevant for the final result.This will be discussed in more details in Section 5.2.
The derived facts describe search tree nodes and their properties.We consider the following types of facts (and their associated semantics): π 0 , ν) that corresponds to the node ν is equal to the coloring obtained by refining the coloring π, i.e., by invoking make equitable(G, π, α), where α is the list of all cells of π • T (G, π 0 , ν) = W : the target cell T (G, π 0 , ν) that corresponds to the node ν is W , which is a non-empty set of vertices from G • Ω orbits(G, π 0 , ν): the set of graph vertices Ω is a subset of some orbit with respect to Aut(G, π), where π = R(G, π 0 , ν). • φ(G, π 0 , ν ) = φ(G, π 0 , ν ): the node invariants that correspond to the nodes ν and ν (where The goal of the proving process is to derive a fact of the form C(G, π 0 ) = (G , π ) (i.e. the final rule in the proof sequence should derive such a fact).
While some rules may be applied unconditionally, whenever facts in their premises are already derived, there are also rules that require some additional conditions to be fulfilled in order to be applied.These additional conditions should be easily checkable during the proof verification.The following operations are used for formulating and checking those conditions (colorings are represented by lists of their cells): i.e. by replacing the cell W of π containing v by two cells {v} and W \ {v} in that order (in situ) • split(G, π, i) -the coloring obtained by partitioning the cells of π with respect to the number of adjacent vertices from the i-th cell of π.Each cell X of π is replaced by the cells X 1 , . . ., X k obtained by its partitioning, ordered in the same way as in Algorithm 1 • split(G, π, i) ≺ π -the coloring obtained by partitioning the cells of π with respect to the number of adjacent vertices from the i-th cell of π is strictly finer than π (some cells are split) • split(G, π, i) = π -the coloring obtained by partitioning the cells of π with respect to the number of adjacent vertices from the i-th cell of π is equal to π (no cells are split) • f hash (G, π) -the hash value that corresponds to the colored graph (G, π) (i.e., to its quotient graph) • v π -action of the permutation π on the graph vertex v • G π -action of the permutation π on the graph G • π π 0 -action of the permutation π on the coloring π 0 • G π 1 > G π 2 -true if and only if the graph G permuted by the permutation π 1 is greater than the graph G permuted by the permutation π 2 in the lexicographic order of their adjacency matrices.• ν < lex ν -true if and only if the list of numbers ν is lexicographically smaller than the list of numbers ν • σ ∈ Aut(G, π 0 ) -true if and only if σ is an automorphism of of the colored graph (G, π 0 ) Note that most of these operations are also used in the canonical form algorithm implementation, so their implementation might be shared between that implementation and the implementation of the proof checker.However, if the checker is not mechanically verified, it is better to use less efficient, but more straightforward implementation of those operations (since they must be trusted or verified only by a manual code inspection).They are less efficient, but the number of their calls is usually much smaller in the checker than in the canonical form finding algorithm itself.
The rules of our proof system will be described in the following sections.When the rules are printed, premises will be displayed above a horizontal line, derived facts below a line, and additional conditions will be printed below (within where and provided clauses).4.4.1.Refinement rules.The rules given in Figure 4 are used to verify the correctness of the colorings constructed by the refinement function R. 9:13 The rule ColoringAxiom formally derives that the empty sequence is a node of the search tree (it is its root), and that the equitable coloring assigned to this node is obtained by applying the function make equitable (Algorithm 1) to the initial coloring π 0 .This rule is exported once, at the very beginning of the proof construction.
The rule Individualize verifies colorings obtained by individualizing vertices (an instance of this rule can be exported after each individualization in Algorithm 2).
The rule SplitColoring verifies colorings obtained by partitioning with respect to a selected cell (an instance of this rule can be exported in each iteration of the while loop in Algorithm 1).Note that this rule has an additional condition that ensures that the correct splitting cell is used.This is ensured by explicitly checking that the splitting on any previous cell of π has no effect on π (this is in agreement with Lemma 4.4).
The rule Equitable formally derives that an equitable coloring is achieved.This is ensured by checking explicitly that splitting on any cell of π has no effect.This rule can be exported at the end of Algorithm 1.
ColoringAxiom : SplitColoring(ν, π) : Note that SplitColoring rule applications explicitly encode intermediate colorings obtained at each iteration of the while loop in the make equitable procedure.The proof format could be changed to make the proof objects much smaller by omitting SplitColoring rule applications.In that case, the checker would have to run the full make equitable procedure (or at least its naive implementation) when checking the modified Equitable rule (shown in Figure 5).

9:14
Equitable(ν, π, π ) : where α contains all cells of π  6.This is ensured by explicitly checking that all previous cells are singleton, while the selected target cell has more than one element.An instance of this rule can be exported by the algorithm whenever the target cell selector is applied.Note that this rule also derives the facts which state that the sequences of the form [ν, v], where v belongs to the target cell corresponding to the node ν, are also nodes of the search tree.Such facts are important premises for the application of several other rules.7 are used for verifying that some nodes at the same level of the search tree have equal node invariants (recall that these are sequences of hash values).Those facts are important for verifying the lexicographic order over the node invariants, and finding the maximal leaf which yields the canonical form.The rule InvariantAxiom derives facts using the reflexivity of the equality.It can be exported once for each node of the tree.
The rule InvariantsEqualSym derives facts using the symmetry of the equality.It can be exported for any two nodes at the same level of the tree for which the fact of invariant equality has already been derived.
The rule InvariantsEqual verifies that the node invariants (which are sequences of hash values) of two nodes at the same level are equal -their parents should have equal node invariants and the hash values assigned to the graph G colored by corresponding colorings of the two nodes should be equal (this is explicitly checked in the checker by computing f hash values).This rule may be exported whenever two nodes at the same level in the tree have equal node invariants.9:15 The rule OrbitsAxiom states that any singleton set of vertices is a subset of some of the orbits with respect to Aut(G, π) (where π = R(G, π 0 , ν)).It may be exported once for each graph vertex and for each tree node.
The rule MergeOrbits is used for merging the orbits Ω 1 and Ω 2 whenever two vertices w 1 ∈ Ω 1 and w 2 ∈ Ω 2 correspond to each other with respect to some automorphism σ ∈ Aut(G, π).It can be exported whenever two orbits are merged at some tree node, after a new automorphism σ had been discovered.Note that the pruning within the proof (i.e.application of the pruning rules) has quite different purpose, compared to the pruning during the search.Namely, the sole purpose of pruning during the search (as described in [MP14]) is to make the tree traversal more efficient, by avoiding the traversal of unpromising branches.In that sense, there is no use of pruning a subtree that has been already traversed, or a leaf that has been already visited.
In other words, we can think of pruning during the search as a synonym for not traversing.
On the other hand, pruning within the proof (which will be referred to as formal pruning) has the purpose of proving that some node is not an ancestor of the canonical leaf.This must be done for all such nodes (whether they are traversed during the search or not), in order to be able to prove that the remaining nodes are the ancestors of the canonical leaf, i.e. that they belong to the path from the root of the tree to the canonical leaf.This enables deriving the canonical form, which is done by the rules described in the next section.As a consequence, applications of the pruning rules within the proof may or may not correspond to effective pruning operations during the search.Some formal pruning derivations will be performed retroactively, i.e. after the corresponding subtree has been already traversed.Also, it will be necessary to prune (non-canonical) leaves when they are visited.
The rule PruneInvariant verifies the invariant based pruning (operation P A in [MP14]).It justifies pruning of a node ν (and its subtree) when another node ν with the greater value of the node invariant is found on the same level of the search tree.Note that the pruned node can also be a leaf.Since node invariants are compared lexicographically it suffices to show that parent nodes of ν and ν have equal node invariants, and that ν has a smaller hash value than ν (this is explicitly checked in the checker by computing and comparing hash values f hash ).This rule may be exported when such a pruning operation is done during the search.However, this rule may also perform the retroactive pruning, in case when the node ν being pruned belongs to the path from the root of the tree to the current canonical node candidate (which has been already traversed), and the node ν is a node which is visited later, but it has a greater node invariant (that is the moment when the algorithm performing the search updates the current canonical node candidate).
The rule PruneLeaf does not correspond to any type of pruning described in [MP14] as such (since no leaves are pruned during the search).It justifies pruning of a leaf, based on some special cases not covered by PruneInvariant rule.
• The first case is when a pruned leaf ν has equal node invariant as some non-leaf node ν such that |ν | = |ν |.In that case, it is obvious that all the leaves belonging to the subtree T (G, π 0 , ν ) have greater node invariants than the leaf ν , since node invariants are compared lexicographically.• The second case is when the node ν is also a leaf with an equal node invariant as the pruned leaf ν , but its corresponding graph is greater than the graph that corresponds to ν (which is explicitly checked by the checker).
In both cases, we can prune the leaf ν since it is not the canonical leaf (recall that the canonical leaf has the maximal value of the node invariant and has the maximal graph among such leaves).
The rules PruneAutomorphism and PruneOrbits verify the automorphism based pruning (operation P C in [MP14]).In PruneAutomorphism rule, the checker needs to verify that the given permutation σ is an automorphism of (G, π 0 ), that it maps the list ν to ν , and that ν is lexicographically smaller than ν .The connection of the rule PruneOrbits to the automorphisms is more subtle.Namely, the fact Ω orbits(G, π 0 , ν) implies that there is 9:17 Therefore, this kind of pruning is also an instance of P C operation described in [MP14].
Finally, the rule PruneParent derives the fact that a node cannot be an ancestor of the canonical leaf because none of its children is an ancestor of the canonical leaf.This rule performs retroactive pruning and plays an essential role in pruning already traversed branches which are shown not the contain the canonical leaf.9:18 4.4.6.Rules for discovering the canonical leaf.The rules given in Figure 10 are used to formalize the traversal of the remaining path of the search tree in order to reach the canonical leaf, after all the pruning is done.The rule PathAxiom states that the root node of the tree belongs to the path leading to the canonical leaf.It is exported once, when all the pruning is finished.The rule ExtendPath is then exported for each node on the path leading to the canonical leaf, and it states that if the node ν is on that path, and all its children except [ν, w] are pruned, then [ν, w] is also on that path.Finally, the rule CanonicalLeaf is exported at the very end of the proof construction, and it states that any leaf that belongs to the path that leads to the canonical leaf must be the canonical leaf itself (and, therefore, it corresponds to the canonical graph).We have formalized all the facts and their semantics, and all listed rules in Isabelle/HOL.4.5.The correctness of the proof system.In this section, we prove the correctness of the presented proof system.More precisely, we want to prove the soundness (i.e. that we can only derive correct canonical forms), and the completeness (i.e. that we can derive the canonical form of any colored graph).Detailed proofs are given in the Appendix, and the soundness proofs are also formalized in Isabelle/HOL.Note that the verified proofs of canonical forms only confirm that the derived canonical forms are correct, without any conclusions about the isomorphism of the given graphs.As already noted, we still rely on the correctness of McKay and Piperno's abstract algorithm itself, that is, that the canonical forms returned by the algorithm are equal if and only if the two graphs are isomorphic (we have also formalized this result in Isabelle/HOL).4.5.1.Soundness.For soundness, we need to prove that if our proof derives the fact C(G, π 0 ) = (G , π ), then (G , π ) is indeed the canonical form of (G, π 0 ) that should be returned by the instance of McKay's algorithm described in the previous sections.We prove the soundness of the proof system by proving the soundness of all its rules.We say that a fact is valid for the graph (G, π 0 ) if its associated semantics (as defined in Section 4.4) holds in (G, π 0 ).The soundness of the rules means that they derive valid facts from valid facts, which is the subject of the following lemma.
Lemma 4.7.For each of the rules in our proof system, if all its premises are valid facts for the graph (G, π 0 ), and all additional conditions required by the rule are fulfilled, then the facts that are derived by the rule are also valid facts for the graph (G, π 0 ).
Soundness of the rules implies the validity of all derived facts, which is stated by the following lemma.
Lemma 4.8.For any proof that corresponds to the graph (G, π 0 ), all the facts that it derives are valid facts for (G, π 0 ).
The immediate consequence of the previous lemma is the following theorem.
Theorem 4.9.Let (G, π 0 ) be a colored graph.Assume a proof that corresponds to (G, π 0 ) such that it derives the fact C(G, π 0 ) = (G , π ).Then (G , π ) is the canonical form of the graph (G, π 0 ).4.5.2.Completeness.The completeness of the proof system means that for any colored graph there is a proof that derives its canonical form.First, we prove the following lemmas which claim the completeness of the proof system with respect to particular types of facts.
Lemma 4.10.Let ν be a node of the search tree T (G, π 0 ) and let π be the coloring assigned to ν.Then the facts ν ∈ T (G, π 0 ) and R(G, π 0 , ν) = π can be derived in our proof system.Furthermore, if π is not discrete, and W is the first non-singleton cell of π, then the fact T (G, π 0 , ν) = W can also be derived.
Lemma 4.14.If a node ν is an ancestor of the canonical leaf of the tree T (G, π 0 ), then there is a proof that derives the fact on path(G, π 0 , ν).
Lemma 4.15.If π * is the coloring that corresponds to the canonical leaf ν * of the tree T (G, π 0 ), then there is a proof that derives the fact C(G, π 0 ) = (G π * , π π * 0 ).Together, these lemmas imply the main completeness result given by the following theorem.
Theorem 4.16.Let (G, π 0 ) be a colored graph, and let (G , π ) be its canonical form.Then there is a proof that corresponds to (G, π 0 ) deriving the fact C(G, π 0 ) = (G , π ).9:20 5. Implementation and evaluation 5.1.Proof format.The proof is generated by writing the applied rules into a file.Each rule is encoded as a sequence of numbers that contains the parameters of the rule sufficient to reconstruct the rule within the proof-checker.The exact sequences for each of the rules are given in Table 1, using the notation consistent with the one used in the rules' definitions.Note that each sequence starts with a rule code, that is, a number that uniquely determines the type of the encoded rule.Subsequent numbers in the sequence encode the parameters specific for the rule.Vertex sequences are encoded such that the length of the sequence precedes the members of the sequence.Vertices are encoded in a zero-based fashion, i.e. the vertex i is encoded with the number i − 1. Sets of vertices (orbits and cells) are encoded in the same way as vertex sequences, but is additionally required that the vertices are listed in the increasing order.Colorings and permutations are encoded as sequences of values assigned to vertices 1, 2, . . ., n in that order (the colors are, like vertices, encoded in a zero-based fashion).
The whole proof is, therefore, represented by a sequence of numbers, where the first number is the number of vertices of the graph, followed by the sequences of numbers encoding the applied rules, in the order of their application.When such a sequence is written to the output file, each number is considered as a 32-bit unsigned integer, and encoded as a six-byte UTF-8 sequence 4 for compactness.Thus, the proof format is not human-readable.

Vol. 19:1
A PROOF SYSTEM FOR GRAPH (NON)-ISOMORPHISM VERIFICATION 9:21 5.2.Implementation details.For the purpose of evaluation, we have implemented a prototypical proof checker in the C++ programming language.The implementation contains about 1600 lines of code (excluding the code for printing debug messages), but most of the code is quite simple and can be easily verified and trusted.The main challenge was to provide an efficient implementation of the derived facts database, since the checker must know which facts are already derived in order to check whether a rule application is justified.The facts are, just like rules, encoded and stored in the memory as sequences of numbers.The checker offers support for two different implementations of the facts database.The first uses C++ standard unordered set container to store the sequences encoding the facts.This implementation is easier to trust, but is less memory efficient.Another implementation is based on radix tries.This implementation uses significantly less memory, since the sequences that encode facts tend to have common prefixes.On the other hand, since it is an in-house solution, it may be considered harder to trust, especially if we take into account that it is the most complex part of the checker.However, our intensive code inspecting and testing have not shown any bugs so far.The implementation of radix tries contains about 150 additional lines of code.
We have also implemented morphi -a prototypical C++ implementation of the canonical form search algorithm, based on the McKay and Piperno's scheme, instantiated with the concrete functions R, T and φ, as described in this paper.It is extended with the ability to produce proofs in the previously described format.The algorithm supports two strategies of proof generation.
In the first strategy, the proof is generated during the search, with rules being exported whenever the respective steps of the algorithm are executed.The proof generated in this way essentially represents a formalized step-by-step reproduction of the execution of the search algorithm.
In the second strategy, called the post-search strategy, the rules are exported after the search algorithm has finished and produced a canonical leaf and a set of generators of the automorphism group.The proof is then generated by initiating the search tree traversal once more, this time being able to utilize the pruning operations more extensively since the knowledge of the canonical leaf and discovered automorphisms is available from the start.
The proof checker implementation is much simpler than the implementation of the canonical form search algorithm.Namely, the canonical search implementation uses many highly optimized data-structures and algorithms, while almost all proof checker data structures and algorithms are quite straightforward (usually brute-force).For example, our canonical search implementation includes an efficient data structure for representing colorings, a union-find data-structure for representing orbits, an efficient data-structure for representing sets of permutations (and finding permutations that stabilize a given vertex set).It adapts to the graph size by using different integer types (8-bit, 16-bit or 32-bit).It uses a specialized memory allocation system (a stack based memory-pool).The refinement algorithm is specialized in cases when cells have 1 or 2 elements.The invariant is calculated incrementally (based on its value in the previous node).All those (and many more) implementation techniques are absent in the proof checker.If added, proof checking would become much more complex and harder to implement and verify.Since our experiments (see Subsection 5.3) show that even with the simplest implementation the proof checker is not much slower than the canonical form search, we opted to keep the proof checker implementation as simple as possible (and much simpler than the original algorithm).9:22 5.3.Experimental evaluation.The graph instances used for evaluation of the approach are taken from the benchmark library provided by McKay and Piperno5 , given in DIMACS format.We included all the instances from the library in our evaluation, except those instances whose initial coloring is not trivial (this is because our implementation currently does not support colored graphs as inputs).In total, our benchmark set consists of 1284 instances.The experiments were run on a computer with four AMD Opteron 6168 1.6GHz 12-core processors (that is, 48 cores in total), and with 94GB of RAM.
The evaluation is done using morphi.The first goal of the evaluation was to estimate how our implementation compares to the state-of-the-art isomorphism checking implementations.For this reason, we compared it to the state-of-the-art solver nauty (also based on the McKay's general scheme for canonical labelling) on the same set of instances.Both solvers were executed on all 1284 instances, with 60 seconds time limit per instance.For the sake of fair comparison, the proof generating capabilities were disabled in our solver during this stage of the evaluation.Table 2: Results of evaluation of nauty and morphi on the entire benchmark set.The total number of instances in the set was 1284.Times are given in seconds.Time limit per instance was 60s.When the average times were calculated, 120s was assumed for unsolved instances.
The results are given in Table 2.Note that when the average solving times were calculated, twice the time limit was used for unsolved instances (that is, 120 seconds).This is the PAR-2 score, that is often used in SAT competitions.The results show that nauty is significantly faster on average, and it managed to solve 461 more instances in the given time limit.However, on solved instances, the average solving times are much closer to each other, which suggests that our solver is comparable to nauty on a large portion of the benchmark set (at least on those 709 instances that our solver managed to solve in the given time limit).A more detailed, per-instance based comparison is given in Figure 11.The figure shows that, although most of the instances are solved faster by nauty, on a significant portion of them our solver morphi is still comparable (less than an order of magnitude slower), and there are several instances on which morphi performed better than nauty.Overall, we may say that morphi is comparable to the state-of-the-art solver nauty on average.This is very important, since it confirms that our prototypical implementation did not diverge too much from the modern efficient implementations of the McKay's algorithm, making the approach presented in this paper relevant.
In the rest of this section, we provide the evaluation of the proof generation and certification, using our solver morphi (with the proving capabilities enabled), and our prototypical checker, also described in the previous section6 .We consider two different versions of morphi, implementing the two different proof generating strategies, explained in the previous section: The instances used in this stage of evaluation are exactly those instances that our solver morphi without proof generating capabilities managed to solve in the time limit of 60 seconds (i.e. the instances selected in the previous stage of evaluation, 709 instances in total).Both versions of the solver morphi were run on all these 709 instances, this time without a time limit, with the proof generation enabled.In both cases, the generated proofs were verified using our prototypical checker.
The main result of the evaluation is that the proof verification was successful for all tested instances, i.e. for all generated proofs our checker confirmed their correctness.
For each instance we also measure several parameters that are important for estimating the impact of the proof generation and checking on the overall performance, as well as for comparing the two variants of the proof generating strategy.The solve time is the time spent in computing the canonical form.The prove time is the time spent in the proof generation.Notice that in case of morphi-d, we can only measure the sum of the solve time and the prove time, since the two phases are intermixed.The check time is the time spent in verifying the proof.We also measure the proof size.
Some interesting relations between the measured parameters on particular instances are depicted by scatter plots given in Figure 12, and corresponding minimal, maximal and average ratios are given in Table 3.We discuss these relations in more details in the following paragraphs.
Solve time vs. prove time.Top-left plot in Figure 12 shows the relation between the solve time and the prove time for morphi-p.The solve time was about 89% of the prove time on average, which means that the proof generation tends to take slightly more time than the search.In other words, the invocation of the algorithm with proof generation enabled on average takes over twice as much time as the canonical form search alone.This was somewhat expected, since the proof is generated by performing the search once more, using the information gathered during the first tree traversal.From the plots it is clear that the proof verification tends to take significantly more time than the search and the proof generation together, and this is especially noticeable on "harder" instances.The average (solve+prove)/check ratio was 0.558 for morphi-p, with minimal ratio on all instances being 0.003.The phenomenon is even more prominent for morphi-d (0.219 average, 0.001 minimal).Such behavior can be explained by the fact that the checker is implemented with the simplicity of the code in mind (in order to make it more reliable), 9:25 and not its efficiency.Most of the conditions in the rule applications are checked by brute force.On the other hand, the search algorithm's implementation is heavily optimized for best performance.
Comparing the proof generating strategies.Bottom-left, bottom-middle and bottomright plots in Figure 12 compare the morphi-p and morphi-d with respect to the solve+prove times, check times and proof sizes, respectively.The solve+prove times were similar on average (morphi-d being slightly faster, with the average solve+prove time equal to 17.7s, while for morphi-p it was 18.9s).Most of the points on the bottom-left plot are very close to the diagonal, almost evenly distributed on both sides of it.The average solve+prove time ratio was very close to 1 (1.07).This suggests that, when the time consumed by the algorithm is concerned, it does not matter which proof generation strategy is employed.On the other hand, the size of a morphi-p proof was about 67% of the size of the corresponding morphi-d proof on average, and a morphi-p proof was never greater than the corresponding morphi-d proof.Finally, the check time for proofs generated by morphi-p was about 55.5% of the check time for proofs generated by morphi-d on average.This means that the proof checking will be more efficient if we employ the post-search proof generation strategy.One of the main reasons for that is significantly smaller average proof size, but this might not be the only reason.Namely, the check time does not depend only on the proof size, but also on the structure of the proof, since some rules are harder to check than others.The most expensive check is required by Equitable, InvariantsEqual, PruneInvariant and PruneLeaf rules, since it includes time consuming operations such as verifying the equitability of the coloring, calculating the hash function for colored graphs, or comparing the graphs.These operations tend to be more expensive as we move towards the leaves of the search tree, i.e. as the colorings become closer to discrete.In post-search proof generating strategy, the prunings tend to happen higher in the search tree during the second tree traversal (when the proof is actually generated), lowering the proportion of these expensive rules in the proof.

False proofs.
It is important to stress that the checker does not only accept correct proofs, but that it indeed rejects false proofs as well.Some issues with the solver were found and resolved during development as a consequence of false proofs that had been rejected by the checker.For example, there were issues with the invariant calculation and the coloring refinement that led to incorrect canonical forms and proofs being produced for some instances.There was also an issue that caused emission of false facts into the proof in some occasions, specifically during the traversal of a pruned subtree in search for automorphisms.As a part of further testing, several artificial bugs were introduced into the solver which were successfully caught by the checker.7

Related work
Certifying algorithms in general.There are many applications in the industry or science where a user may benefit of having an algorithm which produces certificates that can confirm the correctness of its outputs.McConnell et al. [MMNS11] go one step further, suggesting that every algorithm should be certifying, i.e. be able to produce certificates.They establish a general theoretical framework for studying and developing certifying algorithms, and provide a number of examples of practical algorithms that could be extended to produce certificates.The authors stress two important properties of a certifying algorithm that must be fulfilled: simplicity, which means that it is easy to prove that the (correct) certificates really certify the correctness of the obtained outputs, and checkability, which means that it is easy to check the correctness of a certificate.In both cases, the property of being easy is not strictly defined.
In case of simplicity, it is expected that the fact that a certificate really proves the correctness of the corresponding output should be obvious and easily understandable.This may be considered as a potential drawback of our approach, since the soundness of our proof system is not obvious and requires a non-trivial proof.However, we did our best to provide an as precise as possible soundness proof in the appendix of this paper, with hope that it is convincing enough to the reader (we also prove the completeness, i.e. the existence of a certificate for each input, for which McConnell et al. [MMNS11] do not insist to be simple to prove).We have also formalized the soundness proof within Isabelle/HOL theorem prover, and we believe that the existence of a formal soundness proof is a good substitution for the simplicity property stressed by the authors.
When checkability is concerned, the authors suggest that an algorithm for certificate checking should either have a simple logical structure, such that its correctness can be easily established, or it should be formally verified [MMNS11].We believe that our checker indeed has a simple logical structure, since its most complicated parts such as calculating split(G, π, i) function are implemented using a brute force, without any sofisticated data structures or programming techniques.Moreover, we have already argued (see Subsection 5.2) that its implementation is much simpler than the implementation of the canonical labelling algorithm itself, and can be more easily verified.In fact, we have already provided and formally verified an abstract proof checker specification in Isabelle/HOL.The remaining step is to refine this abstract specification into an efficient executable code, which we plan to do in the future.
McConnell et al. [MMNS11] also suggest that the complexity of a certificate checking algorithm should preferably be linear with respect to the size of its input, and its input is composed of an input/output pair of the certifying algorithm, and of the corresponding certificate.In our case, each rule from the certificate is read and checked exactly once, but checking of some rules may require traversing the graph's adjacency matrix, so the complexity is not exactly linear.However, since the size of the proof is usually much greater than the size of the graph, the complexity may be considered to be almost linear.
Certifying algorithms in SAT solving.Certificate checking has been successfully used, for example, in certifying unsatisfiability results given by SAT or SMT solvers [WHJ14, HKM16, Lam20], and this enabled application of external SAT and SMT solvers from within interactive theorem provers [Web06,Böh12,BBP11].
The proof system used in the case of SAT solvers is very simple and is based on the resolution.While early approaches considered simple resolution proofs [ZM03] which were very easy to check, but much harder to produce, modern approaches are mostly based on so-called reversed unit-propagation (RUP) proofs [Gel08], which are easy to generate, but harder to check.While this fact violates the checkability property to some extent, it is widely adopted by the SAT solving community, and proof formats such as DRAT [WHJ14] became an industry standard in the field.Our proof system has a similar trait, since we 9:27 retained some complex computations within the checker, in order to make our proofs smaller and easier to generate.As with RUP proofs, we consider this as a good tradeoff.
Certifying pseudo-boolean reasoning.The SAT problem is naturally generalized to the pseudo-boolean (PB) satisfiability problem [RM21], where it is also useful to have certifying capabilities.Recently, a simple but expressive proof format for proving the PB unsatisfiability was proposed by Elffers et al. [EGM + 20], based on cutting planes [CCT87].That proof format was then employed for expressing unsatisfiability proofs for different combinatorial search and optimization problems, such as constraint satisfaction problems involving the alldifferent constraint [EGM + 20], the subgraph isomorphism problem [GMN21], and the maximum clique and the maximum common subgraph problem [GMM + 20].The main idea is to express some combinatorial problem P as a pseudo-boolean problem P B(P ), and then to express the reasonings of a dedicated solver for P in terms of PB constraints implied by the constraints of P B(P ) (such PB constraints are exported by the dedicated solver during the solving process).The obtained PB proof is then checked by an independent checker which relies on a small number of simple rules (such as addition of two PB constraints or multiplying/dividing a PB constraint with a positive constant), as well as on generalized RUP derivations [EGM + 20].The authors advocate the generality of their approach, enabling it to express the reasoning employed in solving many different combinatorial problems, using a simple and general language which does not know anything about the concepts and specific semantics of considered problems.
In contrast, our proof system is tied to a particular problem -the problem of constructing canonical graph labellings, and a particular class of algorithms -those that could be instantiated from the McKay/Piperno scheme [MP14].8Citing McKay, the canonical forms returned by the algorithm considered in this paper are "designed to be efficiently computed, rather than easily described" [McK98].Since the canonical labelling specification is given in terms of the algorithm, it seems that certification must also be tied to it.This is the main reason that led to our decision to develop an algorithm-specific proof format which is powerful enough to capture all the reasonings that appear in the context of this particular class of labelling algorithms.
The question remains whether the canonical labelling problem, like some other graph problems previously mentioned, may be encoded in terms of PB constraints in a way simple enough to be trusted (as noted by Gocht et al. [GMM + 20], an encoding should be simplesince it is not formally verified, any errors in the encoding may lead to "proving the wrong thing").Moreover, the encoding should permit expressing the operations made by the canonical labelling algorithm in terms of PB constraints implied by the encoding.Up to the authors' knowledge, it is not obvious whether such encoding exists.The main issue that we see in such an approach is that the canonical form of the graph, as defined by McKay and Piperno [MP14], is not simple to describe in advance using PB constraints, since it is defined implicitly by the canonical labelling algorithm itself.For instance, one should capture the notion of "coarsest equitable coloring finer than a given coloring π", which is especially hard if we know that such a coloring is not unique, so we must fix one particular ordering of the cells (and that ordering must be also captured using the PB language).The similar case is with describing hash functions used in calculating node invariants.

Conclusions and further work
We have formulated a proof-system for certifying that a graph is a canonical form of another graph, which is a key step in verifying graph isomorphism.The proof system is based on the state-of-the-art scheme of McKay and Piperno and we proved its soundness and completeness.Our algorithm implementation produces proofs that can be checked by an independent proof checker.The implementation of the proof checker is much simpler than the implementation of the core algorithm, so it is easier to trust.
One of the main problems with our approach is that our proof format is very closely related to the canonical labelling algorithm itself.Therefore, it is very difficult to use our proof format and proof checker for different approaches to isomorphism checking.On the other hand, the McKay/Piperno scheme is broad enough to cover several state-of-the-art algorithms, and we believe that our proofs and checkers can be easily adapted to certify all these algorithms.Our prototype implementation morphi is slower compared to stateof-the-art implementations such as nauty, but still comparable on many instances.We plan to make further optimizations to morphi.We are also considering adapting nauty's source-code to emit proofs in our format, which is not trivial, as nauty has been developed for more than 30 years and is a very complex and highly optimized software system.
The main direction of our further work is to go one step further and fully integrate isomorphism checking into an interactive theorem prover.We have already taken important steps in this direction: we have formally defined the McKay and Piperno's abstract scheme in Isabelle/HOL and formally proved its correctness (Theorem 3.3), we have shown that our concrete functions R, T , and φ satisfy the McKay/Piperno axioms, and we have formalized the rules of our proof system and formally proved its soundness.We have also provided and formally verified an abstract proof checker specification in Isabelle/HOL.In our further work it remains to refine our abstract proof checker specification into efficient executable code, which we plan to do using Imperative Refinement Framework [Lam16].
Another direction of our further work is to examine the possibilities of reducing the complexity of our proof system.Currently, our system operates with 9 different types of facts and has 18 different rules.Compared to some other well-known proof systems, such as those used in SAT or PB solving, which rely on few very simple rules, our system may be much harder to understand and trust, and proof checking becomes a non-trivial task.We tried to fill this gap by providing a formal proof of the soundness of our system in Isabelle/HOL, together with a (verified) abstract specification of our proof checker.However, it would be beneficial to develop a simpler proof system or to find a way to reduce our system to some existing proof system which is well understood and trusted (such as those mentioned above).We are currently not aware if this is possible in case of graph canonical labelling algorithms based on McKay and Piperno's abstract scheme, so this remains a great challenge that may be interesting to tackle in further work.9:31 node invariants.Therefore, the sets of maximal leaves of the two trees correspond to each other, with respect to σ.Furthermore, if R(G, π 0 , ν) = π, for some leaf ν ∈ T (G, π 0 ), then R(G σ , π σ 0 , ν σ ) = R(G, π 0 , ν) σ = π σ = σ −1 π, and we have (G σ ) π σ = (G σ ) σ −1 π = G σσ −1 π = G π , so the graphs that correspond to the leafs ν ∈ T (G, π 0 ) and ν σ ∈ T (G σ , π σ 0 ) are identical.Therefore, the maximal graphs that correspond to the maximal leaves in both trees are also identical, so C(G, π 0 ) = C(G σ , π σ 0 ) = C(G , π 0 ) (although the canonical leaves of the trees T (G, π 0 ) and T (G σ , π σ 0 ) do not have to correspond to each other with respect to σ).Conversely, assume that C(G, π 0 ) = C(G , π 0 ).Since C(G, π 0 ) is a graph assigned to some leaf of T (G, π 0 ), then there exists a permutation σ ∈ S n such that C(G, π 0 ) = (G σ , π σ 0 ).Similarly, there exists a permutation σ ∈ S n such that , so (G, π 0 ) and (G , π 0 ) are isomorphic.
Proof.Let π and π σ be the current colorings in invocations make equitable(G, π, α) and make equitable(G σ , π σ , α σ ) respectively.Similarly, let α and α σ be the current lists of splitting cells in the two invocations.We prove by induction that, for each k, after k iterations of the while loop in both invocations it holds that π σ = π σ and α σ = α σ .The statement holds initially, for k = 0, since π = π, π σ = π σ and α σ = α σ .Assume that statement holds for some k.First notice that π is a discrete coloring after k iterations if and only if π σ is discrete, since π σ = π σ , by induction hypothesis.Similarly, α is empty if and only if α σ is empty, since α σ = α σ .Therefore, either both invocations enter the next iteration of the while loop, or both finish the execution of the algorithm after k-th iteration, returning π and π σ , respectively.In the first case, if W is the first cell of π that belongs to α, then W σ is the first cell of π σ belonging to α σ , since π σ = π σ and α σ = α σ by induction hypothesis, and the relabelling does not change the order of the cells.Moreover, if a cell X ∈ π is partitioned into X 1 , X 2 , . . ., X k , then the corresponding cell X σ ∈ π σ is partitioned into X σ 1 , X σ 2 , . . ., X σ k , with |X σ i | = |X i |, since the partitioning (and the order of partitions) depends only on the number of adjacent vertices from W , which is not affected by relabelling.This implies that the statement also holds after k + 1 iterations of the while loop.This proves the lemma, since both invocations finish after equal numbers of iterations.
Proof.The lemma follows from the fact that the only type of transformation that is applied to the given coloring π by the procedure make equitable is splitting of its cells, which always produces a finer coloring.
Lemma 4.4.Let W be the cell of π that is chosen as a splitting cell in some arbitrary iteration of the while loop of the procedure make equitable(G, π, α).Let W be any cell of π that precedes W in π (that is, it corresponds to a smaller color).Then W does not cause any splitting, i.e., for any two vertices of a same color in π the numbers of their adjacent vertices from W are equal.
Proof.To simplify terminology, we call the cells W satisfying the condition stated in the last sentence of the lemma neutral.More precisely, we say that a set X of vertices from a graph G is neutral with respect to a coloring π, if for each cell W of π and for each two vertices v 1 and v 2 from W the numbers of vertices from X adjacent to v 1 and v 2 are equal.In that terminology, the lemma claims that any cell W of π preceding W is neutral with respect to π .Note that if some set X is neutral with respect to a coloring π, then it is also neutral with respect to any coloring π finer than π.
Furthermore, we say that a cell X of π is conditionally neutral with respect to π if for each coloring π finer than or equal to π if all the cells that precede X in π are neutral with respect to π, then X is also neutral with respect to π.Note that if a cell X is neutral with respect to π, it is also conditionally neutral with respect to π.
We first prove the following two propositions.
Proposition A.1.Let X be a cell of a coloring π which is conditionally neutral with respect to π, and let π be any coloring finer than π (i.e.obtained by partitioning some of its cells).Let X 1 , . . ., X k (in that order) be partitions of π obtained by partitioning the cell X of π.Then X k is conditionally neutral with respect to π.
To prove this proposition, first notice that π π.Let π be any coloring finer than π such that all the cells of π that precede X k are neutral with respect to π.We must prove that X k is also neutral with respect to π.Let Y be any cell of π that precedes X in π, and let Y 1 , . . ., Y m be the cells of π obtained by partitioning the cell Y .Since Y 1 , . . ., Y m precede X k in π, these cells are neutral with respect to π by assumption.Then it is easy to argue that Y is also neutral with respect to π -the number of adjacent vertices from Y to each vertex v is obtained by summing the numbers of adjacent vertices from Y 1 , . . ., Y m , so for any two same-colored vertices (with respect to π) such sums must be equal.Now, since π π π, and all the cells that precede X in π are neutral with respect to π, then X must be also neutral with respect to π (because it is conditionally neutral with respect to π by assumption).Furthermore, since the cells X 1 , . . ., X k−1 are also neutral with respect to π by assumption (because they precede X k in π), then the cell X k is also neutral with respect to π (since X is neutral with respect to π, and the number of adjacent vertices from 9:33 X k to any vertex v can be obtained by subtracting the sum of the numbers of its adjacent vertices from X 1 , . . ., X k−1 from the number of its adjacent vertices from X. Again, for the same-colored vertices with respect to π, such numbers must be equal).Now it follows that X k is conditionally neutral with respect to π, as stated by the proposition.
Proposition A.2. Let X 1 , . . ., X k be the first k cells of a coloring π.If X 1 , . . ., X k are conditionally neutral with respect to π, then they are also neutral with respect to π.This proposition may be proven by induction on the cell index.For the cell X 1 the statement trivially holds, since there are no cells that precede it in π, so it must be neutral with respect to any coloring finer than or equal to π, including π itself.Assume the statement holds for X 1 , . . ., X i−1 , and let us prove that it holds for X i (i ≤ k).Since X i is conditionally neutral with respect to π it must be neutral with respect to any coloring finer than or equal to π such that all the cells X 1 , . . ., X i−1 are also neutral with respect to it.By induction hypothesis, X 1 , . . ., X i−1 are neutral with respect to π, and since π π, the cell X i must also be neutral with respect to π.
Proposition A.3.In the procedure make equitable(G, π, α), at the beginning of each iteration of the while loop, any cell X of π that is not present in α is conditionally neutral with respect to π .This claim can be proven by induction on the number of iterations.In the base case, before the first iteration of the loop (when π = π), we have two cases to consider.The first case is when the procedure is called initially, with α containing all the cells of π.In that case, the statement trivially holds, since there are no cells of π that are not in α.The second case is when the procedure is invoked with only one singleton cell {v} in α, where v is the last individualized vertex.Let π be the coloring obtained from π by partitioning the cell V containing v into {v} and V \ {v} (in that order).Since the coloring π was already equitable before individualization of v, all the cells of π (including V ) are neutral with respect to π.Therefore, all the cells of π distinct from V (which are also the cells of π and are not in α) are neutral with respect to π, and therefore, conditionally neutral with respect to π.On the other hand, since V is neutral with respect to π, it is also conditionally neutral with respect to π.By Proposition A.1, the cell V \ {v} is conditionally neutral with respect to π.Thus, all cells of π not in α are conditionally neutral with respect to π.
In order to prove the induction step, let us assume that the statement holds before k-th iteration.Let W ∈ α be the next splitting cell chosen by the algorithm, and let π be the coloring obtained from π by partitioning its cells with respect to W .At the end of k-th iteration, the following cells of π will not be in α: • since W is removed from α at the beginning of k-th iteration, if W is partitioned to W 1 , . . ., W k (in that order), then W k will not be in α.Since π is obtained from π by partitioning its cells with respect to W , then W is neutral with respect to π, and also with respect to any coloring π finer than π.Therefore, if W 1 , . . ., W k−1 are neutral with respect to π, the cell W k will be also neutral with respect to π.By definition, it follows that W k is conditionally neutral with respect to π. • for any cell X of π that was not in α at the beginning of k-th iteration, if X is partitioned to X 1 , . . ., X k (in that order), X k will not be in α.By induction hypothesis, X is conditionally neutral with respect to π , so by Proposition A.1, X k is conditionally neutral with respect to π.
For the rule PruneOrbits, since w 1 and w 2 belong to the same orbit Ω with respect to Aut(G, π) (where π = R(G, π 0 , ν)), there exists σ ∈ Aut(G, π) such that w 2 = w σ 1 .Since π σ = π, and π ≺ π 0 , it also holds π σ 0 = π 0 , so σ ∈ Aut(G, π 0 ).Moreover, for any vertex ]) correspond to each other with respect to σ. Similarly as for the rule PruneAutomorphism, we can now prove that the canonical leaf cannot belong to For the rule PruneParent, we may notice that if all the children of a node ν are not ancestors of the canonical leaf, then ν is also not an ancestor of the canonical leaf.
For the rule PathAxiom, the derived fact on path(G, π 0 , [ ]) is obviously valid, since the canonical leaf always exists, so the root node must be its ancestor.
For the rule ExtendPath, from its first premise it holds that ν is an ancestor of the canonical leaf.By the second premise of the rule, the children of the node ν are the nodes [ν, w ], where w ∈ W , and the rule assumes that w is one of them.According to the rest of the premises, all of the children of ν, except the node [ν, w], are not ancestors of the canonical leaf.This means that the node [ν, w] must be an ancestor of the canonical leaf.
Finally, for the rule CanonicalLeaf, by its first premise it holds that the node ν is an ancestor of the canonical leaf.Since the rule assumes that the coloring π (assigned to the node ν by second premise) is discrete, the node ν is a leaf, and therefore it is the canonical leaf, so (G π , π π 0 ) is the canonical form of the graph (G, π 0 ).
Lemma 4.8.For any proof that corresponds to the graph (G, π 0 ), all the facts that it derives are valid facts for (G, π 0 ).
Proof.The lemma can be proven by the total induction on the proof length.That is, we can assume that the statement holds for each fact that can be derived by a proof of a length smaller than k, and then prove that the statement also holds for each fact that is derived by a proof of the length exactly k.If a fact F is derived by a proof of the length exactly k, we can assume that F is obtained by the last rule application in that proof (otherwise there would exist a shorter proof of the same fact).A rule can be applied only if all its premises are already derived (that is, by proofs shorter than k, which means that they are valid, by the induction hypothesis), and all the additional conditions required by the rule are fulfilled.According to Lemma 4.7, the derived fact F is then also valid.By proving the induction step, we have proven the lemma.
Proof.Let define the height h(ν) of a node ν ∈ T (G, π 0 ) as a maximal distance to some of its descendant leaves.We prove the lemma by induction on the node height.If h(ν) = 0, then ν is a leaf, and the statement follows from Lemma 4.12.Let us assume that the statement holds for all nodes whose height is smaller than some l > 0, and let ν be an arbitrary node such that h(ν) = l.Assume that W is a target cell assigned to the node ν.Then, by Lemma 4.10, there is a proof of the fact T (G, π 0 , ν) = W .For each w ∈ W , the height of the node [ν, w] is smaller than l, and the subtree T (G, π 0 , [ν, w]) also does not contain the canonical leaf, so by induction hypothesis, there is a proof that derives either the fact pruned(G, π 0 , [ν, w]), or a fact pruned(G, π 0 , [ν] i ) for some i ≤ |ν|.If the second case is true for at least one w ∈ W , then the statement also holds for the node ν.Otherwise, for each w ∈ W there is a proof of the fact pruned(G, π 0 , [ν, w]), so we can apply the rule PruneParent to derive the fact pruned(G, π 0 , ν).By proving the induction step, we proved the lemma.
Lemma 4.14.If a node ν is an ancestor of the canonical leaf of the tree T (G, π 0 ), then there is a proof that derives the fact on path(G, π 0 , ν).
Proof.The lemma will be proved by induction on k = |ν|.For k = 0, the node ν is the root node [ ].We can apply the rule PathAxiom to derive the fact on path(G, π 0 , [ ]).Assume that the statement holds for some k, and let ν = [ν , v] be an ancestor of the canonical node such that |ν | = k.Since ν is also an ancestor of the canonical leaf, by induction hypothesis there is a proof that derives the fact on path(G, π 0 , ν ).Let W be the target cell assigned to the node ν .By Lemma 4.10, there is a proof that derives the fact T (G, π 0 , ν ) = W . Notice that v ∈ W and for each w ∈ W \ {v} the node [ν , w] is not an ancestor of the canonical leaf, that is, the subtree T (G, π 0 , [ν , w]) does not contain the canonical leaf.By Lemma 4.13, either there is a proof that derives the fact pruned(G, π 0 , [ν , w]), or a proof deriving the fact pruned(G, π 0 , [ν ] i ) for some i ≤ k.But in the second case the pruned node [ν ] i would be an ancestor of the canonical leaf, which is not possible, by Lemma 4.8.Therefore, there exists a proof that derives the fact pruned(G, π 0 , [ν , w]) for each w ∈ W \ {v}.Thus, we can apply the rule ExtendPath to derive the fact on path(G, π 0 , [ν , v]).We proved the induction step, so the lemma holds.

Figure 2 :
Figure 2: Node invariants and the canonical leaf for the search tree from Figure 1.For allnodes the coloring R(G, π 0 , ν), and the invariant φ(G, π 0 , ν) are printed.Colored graphs that correspond to leaves are drawn.

Figure 3 :
Figure 3: The search tree from Figure 2 pruned using the automorphism (a c)

Figure 6 :
Figure 6: Rule for target cell

Figure 10 :
Figure 10: Canonical leaf rules Solver # solved Avg.time Avg.time on solved Figure 11: A comparison of behaviour of nauty and morphi on particular instances.Times are given in seconds

Figure 12 :
Figure 12: Behavior of morphi and the checker on particular instances.Times are given in seconds, and proof sizes are given in kilobytes
This work is licensed under the Creative Commons Attribution License.To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/or send a letter to Creative Commons, 171 Second St, Suite 300, San Francisco, CA 94105, USA, or Eisenacher Strasse 2, 10777 Berlin, Germany