Weight Annotation in Information Extraction

The framework of document spanners abstracts the task of information extraction from text as a function that maps every document (a string) into a relation over the document's spans (intervals identified by their start and end indices). For instance, the regular spanners are the closure under the Relational Algebra (RA) of the regular expressions with capture variables, and the expressive power of the regular spanners is precisely captured by the class of VSet-automata -- a restricted class of transducers that mark the endpoints of selected spans. In this work, we embark on the investigation of document spanners that can annotate extractions with auxiliary information such as confidence, support, and confidentiality measures. To this end, we adopt the abstraction of provenance semirings by Green et al., where tuples of a relation are annotated with the elements of a commutative semiring, and where the annotation propagates through the positive RA operators via the semiring operators. Hence, the proposed spanner extension, referred to as an annotator, maps every string into an annotated relation over the spans. As a specific instantiation, we explore weighted VSet-automata that, similarly to weighted automata and transducers, attach semiring elements to transitions. We investigate key aspects of expressiveness, such as the closure under the positive RA, and key aspects of computational complexity, such as the enumeration of annotated answers and their ranked enumeration in the case of ordered semirings. For a number of these problems, fundamental properties of the underlying semiring, such as positivity, are crucial for establishing tractability.

Abstract.The framework of document spanners abstracts the task of information extraction from text as a function that maps every document (a string) into a relation over the document's spans (intervals identified by their start and end indices).For instance, the regular spanners are the closure under the Relational Algebra (RA) of the regular expressions with capture variables, and the expressive power of the regular spanners is precisely captured by the class of VSet-automata -a restricted class of transducers that mark the endpoints of selected spans.
In this work, we embark on the investigation of document spanners that can annotate extractions with auxiliary information such as confidence, support, and confidentiality measures.To this end, we adopt the abstraction of provenance semirings by Green et al., where tuples of a relation are annotated with the elements of a commutative semiring, and where the annotation propagates through the positive RA operators via the semiring operators.Hence, the proposed spanner extension, referred to as an annotator, maps every string into an annotated relation over the spans.As a specific instantiation, we explore weighted VSet-automata that, similarly to weighted automata and transducers, attach semiring elements to transitions.We investigate key aspects of expressiveness, such as the closure under the positive RA, and key aspects of computational complexity, such as the enumeration of annotated answers and their ranked enumeration in the case of ordered semirings.For a number of these problems, fundamental properties of the underlying semiring, such as positivity, are crucial for establishing tractability.

Introduction
A plethora of paradigms have been developed over the past decades towards the challenge of extracting structured information from text -a task generally referred to as Information Extraction (IE).Common textual sources include natural language from a variety of sources such as scientific publications, customer input and social media, as well as machine-generated activity logs.Instantiations of IE are central components in text analytics and include tasks such as segmentation, named-entity recognition, relation extraction, and coreference resolution [Sar08].Rules and rule systems have consistently been key components in such paradigms, yet their roles have varied and evolved over time.Systems such as Xlog [SDNR07] and SystemT [CKL + 10] use IE rules for materializing relations inside relational query languages.Machine-learning classifiers and probabilistic graphical models (e.g., Conditional Random Fields) use rules for feature generation [LBC04,SM12].Rules serve as weak constraints (later translated into probabilistic graphical models) in Markov Logic Networks [PD07] (abbrev.MLNs) and in the DeepDive system [SWW + 15].Rules are also used for generating noisy training data ("labeling functions") in the Snorkel system [RBE + 17].
The framework of document spanners (spanners for short) provides a theoretical basis for investigating the principles of relational rule systems for IE [FKRV15].Specifically, a spanner extracts from a document a relation over text intervals, called spans, using either atomic extractors or a relational query on top of the atomic extractors.More formally, by a document we refer to a string d over a finite alphabet, a span of d represents a substring of d by its start and end positions, and a spanner is a function that maps every document d into a relation over the spans of d.The most studied spanner language is that of the regular spanners: atomic extraction is via regex formulas, which are regular expressions with capture variables, and relational manipulation is via the relational algebra: projection, natural join, union, and difference.Equivalently, the regular spanners are the ones expressible as variable-set automata (VSet-automata for short), which are nondeterministic finite-state automata that can open and close variables (playing the role of the attributes of the extracted relation).Interestingly, there has been an independent recent effort to express artificial neural networks for natural language processing by means of finite-state automata [WGY18, MY18, MSV + 19].
To date, the research on spanners has focused on their expressive power [FKRV15, PtCFK19, Fre19, SS21], the computational complexity of their main algorithmic problems [ACJR19, ABMN19, FKP18, FRU + 18], incompleteness [MRV18,PFKK19], and other system aspects such as cleaning [FKRV16] and distributed query planning [DKM + 19].That research has exclusively adopted a Boolean approach: a tuple is either extracted or not.Nevertheless, when applied to noisy or fuzzy domains such as natural language, modern approaches in artificial intelligence adopt a quantitative approach where each extracted tuple is associated with a level of confidence that the tuple coincides with the intent.When used within an end-to-end IE system, such confidence can be used as a principled way of tuning the balance between precision and recall.For instance, in probabilistic IE models (e.g., Conditional Random Fields), each extraction has an associated probability.In systems of weak constraints (e.g., MLN), every rule has a numerical weight, and the confidence in an extraction is an aggregation of the weights of the invoked rules that lead to the extraction.IE via artificial neural networks typically involves thresholding over a produced score or confidence value [CN16, PPQ + 17].Numerical scores in the extraction process As a specific instantiation of K-annotators, we study the class of K-weighted VSetautomata.Such automata generalize VSet-automata in the same manner as weighted automata and weighted transducers (cf., e.g., the Handbook of Weighted Automata [DKV09]): transitions are weighted by semiring elements, the cost of a run is the product of the weights along the run, and the weight (annotation) of a tuple is the sum of costs of all the runs that produce the tuple.(Again, there has been recent research that studies the connection between models of artificial neural networks in natural language processing and weighted automata [STS18].) Our investigation answers several fundamental questions about the class of K-weighted VSet-automata: (1) Is this class closed under the positive relational algebra (according to the semantics of provenance semirings [GKT07])?(2) What is the complexity of computing the annotation of a tuple?(3) Can we enumerate the annotated tuples as efficiently as we can do so for ordinary VSet-automata (i.e., regular document spanners)?(4) In the case of ordered semirings, what is the complexity of enumerating the answers in ranked order by decreasing weight?
Our answers are mostly positive and show that K-weighted VSet-automata possess appropriate expressivity and tractability properties.As for the last question, we show that ranked enumeration is intractable and inapproximable for some of the aforementioned semirings (e.g., the probability and counting semirings), but tractable for positively ordered and bipotent semirings, such as the Viterbi semiring.

Preliminaries
Weight annotators read documents and produce annotated relations [GKT07], which are relations in which each tuple is annotated with an element from a commutative semiring.In this section, we revisit the basic definitions and properties of annotated relations.
2.1.Algebraic Foundations.We begin by giving some required background on algebraic structures like monoids and semirings [Gol99].
A commutative monoid (M, * , id) is an algebraic structure consisting of a set M, a binary operation * and an element id ∈ M, such that: (3) id is an identity, i.e., id * a = a for all a ∈ M. We say that a monoid (M, * , id) is bipotent, if a * b ∈ {a, b}, for every a, b ∈ M.
A commutative semiring (K, ⊕, ⊗, 0, 1) is an algebraic structure consisting of a set K, containing two elements: the zero element 0 and the one element 1.Furthermore, it is equipped with two binary operations, namely addition ⊕ and multiplication ⊗ such that: (1) (K, ⊕, 0) and (K, ⊗, 1) are a commutative monoids, (2) multiplication distributes over addition, that is, (a ⊕ b) ⊗ c = (a ⊗ c) ⊕ (b ⊗ c) for all a, b, c ∈ K, and (3) 0 is absorbing for ⊗, that is, 0 ⊗ a = 0 for all a ∈ K. Furthermore, a semiring is positive if, for all a, b ∈ K, the following conditions hold: We call a semiring bipotent, if its additative monoid is bipotent.An element a ∈ K is a zero divisor if a = 0 and there is an element b ∈ K with b = 0 and a ⊗ b = 0. Furthermore, an element a ∈ K has an additive inverse, if there is an element b ∈ K such that a ⊕ b = 0.In the following, we will also identify a semiring by its domain K if the rest is clear from the context.When we do this for numeric semirings such as Q and N, we always assume the usual addition and multiplication.
Given a semiring (K, ⊕, ⊗, 0, 1) and a set K ⊆ K with 0, 1 ∈ K such that K is closed under addition and multiplication (that is, for all a, b ∈ K it holds that a ⊕ b ∈ K and a ⊗ b ∈ K ) then (K , ⊕, ⊗, 0, 1) is a subsemiring of K.
Example 2.1.The following are examples of commutative semirings.It is easy to verify that all but the numeric semirings and the Lukasiewcz semiring are positive.
Complexity-wise, we use the RAM model with uniform cost measure and logarithmic word size [AH74] for our complexity results.That is, we assume that addition and multiplication of numbers, represented by a logarithmic number of bits, take constant time.Furthermore, we assume that semiring elements are encoded in binary.That is, the encoding of a semiring K, is a function enc : K → {0, 1} * , which assigns a binary encoding to every semiring element.Furthermore, we denote the length 3 of the encoding of an element a ∈ K by a .We discuss semiring encodings into more detail in Section 5.
2.2.Annotated Relations.We assume infinite and disjoint sets D and Vars, containing data values (or simply values) and variables, respectively.Let V ⊆ Vars be a finite set of variables.A V -tuple is a function t : V → D that assigns values to variables in V .The arity of t is the cardinality |V | of V .For a subset X ⊆ Vars, we denote the restriction of t to the variables in X by t X.We denote the set of all the V -tuples by V -Tup.We sometimes leave V implicit when the precise set is not important.For the rest of this article, we assume that (K, ⊕, ⊗, 0, 1) is a commutative semiring When D is clear from the context or irrelevant, we also use K-relations to refer to (K, D)-relations.
Example 2.2.The bottom left table in Figure 1 shows an example (K, D)-relation, where K is the Viterbi semiring.The variables are x pers and x loc , so the V -tuples are described in the first two columns.The third column contains the element in K associated to each tuple.
Green et al. [GKT07] defined a set of operators on (K, D)-relations that naturally correspond to relational algebra operators and map K-relations to K-relations.They define the algebraic operators4 union, projection, natural join, and selection for all finite sets V 1 , V 2 ⊆ Vars and for all K-relations R 1 over V 1 and R 2 over V 2 , as follows.
• Natural Join: The natural join where t 1 and t 2 are the restrictions t V 1 and t V 2 , respectively.
• Selection: Proposition 2.3 [GKT07].The above operators preserve the finiteness of the supports and therefore they map K-relations into K-relations.
Hence, we obtain an algebra on K-relations.

K-Annotators
We start by setting the basic terminology.We fix a finite set Σ, which is disjoint from Vars, that we call the alphabet.A document is a finite string over the alphabet Σ, that is a finite sequence d = σ 1  A span identifies a substring of a document d by specifying its bounding indices, that is, a span of d is an expression of the form [i, j where 1 ≤ i ≤ j ≤ n + 1.By d [i,j we denote the substring σ i • • • σ j−1 .If i = j, it holds that d [i,j is the empty string, which we denote by ε.We denote by Spans(d) the set of all possible spans of a document d and by Spans the set of all possible spans of all possible documents.Since we will be working with relations over spans, we assume that D is such that Spans ⊆ D. A (K, d)-relation over V ⊆ Vars is defined analogously to a (K, D)-relation over V but only uses V -tuples with values from Spans(d).
Definition 3.1.A K-annotator (or annotator for short), is a function S that is associated with a finite set V ⊆ Vars of variables and maps documents d into (K, d)-relations over V .We denote V by Vars(S).We sometimes also refer to an annotator as an annotator over K when we want to emphasize the semiring.
Example 3.2.We provide an example document d in Figure 1 (top).The table at the bottom right depicts a possible (K, d)-relation obtained by a spanner that extracts (person, hometown) pairs from d.Notice that for each span [i, j occurring in this table, the string d [i,j can be found in the table to the left.
In this naïve example, which is just to illustrate the definitions, we used the Viterbi semiring and annotated each tuple with (0.9) k , where k is the number of words between the spans associated to x pers and x loc .The annotations can therefore be interpreted as confidence scores.
We now lift the relational algebra operators on K-relations to the level of K-annotators.For all documents d and for all annotators S 1 and S 2 associated with V 1 and V 2 , respectively, we define the following: • Natural Join: The natural join S := S 1 S 2 is defined by S(d) := S 1 (d) S 2 (d).
• String selection: Let R be a k-ary string relation. 6The string-selection operator σ R is parameterized by k variables x 1 , . . ., x k in V 1 and can be written as σ R x 1 ,...,x k .Then the 5 Here, ∪ stands for the union of two K-relations as was defined previously.The same is valid also for the other operators. 6Recall that a (k-ary) string relation is a subset of Docs k .
annotator S := σ R x 1 ,...,x k S 1 is defined as S(d) := σ P (S 1 (d)) where P is a selection predicate with P(t) = 1 if (d t(x 1 ) , . . ., d t(x k ) ) ∈ R; and P(t) = 0 otherwise.Due to Proposition 2.3, it follows that the above operators form an algebra on K-annotators.

Weighted Variable-Set Automata
In this section, we define the concept of a weighted VSet-automaton as a formalism to represent K-annotators.This formalism is the natural generalization of VSet-automata [FKRV15] and weighted automata [DKV09].
The weight of a run is obtained by ⊗-multiplying the weights of its constituent transitions.Formally, the weight w ρ of ρ is an element in K given by the expression We call ρ nonzero if w ρ = 0.A run is called valid if, for every variable v ∈ V the following hold: there is exactly one index i for which o i = v and exactly one index j > i for which o j = v.
For a nonzero and valid run ρ, we define t ρ as the V -tuple that maps each variable v ∈ V to the span [i j , i j where o i j = v and o i j = v.We denote the set of all valid and nonzero runs of A on d by Runs(A, d).We say that a weighted VSet-automaton A is functional if all runs of A are all valid.Note that this definition is the same as the notion of functionality of VSet-automata [FKRV15].However, in contrast to VSet-automata, a K-annotator over an arbitrary commutative semiring K might output the empty relation, even though it has multiple runs.This can be the case, as the weight of a tuple t is the sum of the weights of all its runs ρ with t ρ = t and therefore, the weights can cancel each other out.This can not happen if K is positive (e.g., if K is the Boolean semiring).Notice that there may be infinitely many nonzero and valid runs of a weighted VSetautomaton on a given document, due to ε-cycles, which are states q 1 , . . ., q k such that (q i , ε, q i+1 ) is a transition, also referred to as an ε-transition, for every i ∈ {1, . . ., k − 1} and q 1 = q k .Similar to much of the standard literature on weighted automata (see, e.g., [ ÉK09]) we assume that weighted VSet-automata do not have ε-cycles, unless mentioned otherwise.The reason for this restriction is that automata with such cycles need K to be closed under infinite sums for their semantics to be well-defined. 8s such, if A does not have ε-cycles, then the result of applying A on a document d, Note that Runs(A, d) only contains runs ρ that are valid and nonzero.If t is a V -tuple with V = V then R(t) = 0, because we only consider valid runs.In addition, A K is well defined since every V -tuple in the support of The size of a weighted VSet-automaton A is defined by We say that a K-annotator S is regular if there exists a weighted VSet-automaton A such that S = A K .Note that this is an equality between functions.Furthermore, we say that two weighted VSet-automata A and A are equivalent if they define the same K-annotator, that is A K = A K , which is the case if A K (d) = A K (d) for every d ∈ Docs.Similar to our terminology on B-annotators, we use the term B-weighted VSet-automata to refer to the "classical" VSet-automata of Fagin et al. [FKRV15], which are indeed weighted VSet-automata over the Boolean semiring.
We say that a K-weighted VSet-automaton A is unambiguous if, for every document d and every tuple t ∈ A K (d), there exists exactly one valid run ρ of A on d, such that t = t ρ , and there is no valid run for tuples t / ∈ A K (d).Note that, over some semirings, the class of unambiguous weighted VSet-automata is strictly contained in the class of weighted VSetautomata, as shown in the following proposition.However, over the Boolean semiring, every B-weighted automaton can be determinized (c.f.Doleschal et al. [DKM + 19, Proposition 4.4]9 ).Therefore there is also an unambiguous B-weighted automaton A u which is equivalent to A, as every deterministic B-weighted automaton is also unambiguous.
Proof.Weighted automata can be seen as weighted VSet-automata over the empty set of variables.Thus, the statement follows directly from Kirsten [Kir08, Proposition 3.2] who q 0 start q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 q 9 q 10 Σ ; 1 ; 1 Σ ; 1 x pers ; 1 Pers; 1 x pers ; 1 ; 1 Σ ; 1 ; 0.9 Σ ; 1 x loc ; 1 Loc; 1 An example weighted VSet-automaton over the Viterbi semiring with initial state q 0 (with weight 1), two final states q 9 , q 10 (both with weight 1), and alphabet Σ = Σ \ { }.Pers and Loc are sub-automata matching person and location names respectively.All edges, including the edges of the sub-automata, have the weight 1 besides the transition from q 6 to q 5 with weight 0.9.
showed that there is a K-weighted automaton A such that there is no equivalent unambiguous K-weighted automaton A .10 Example 4.2. Figure 2 shows an example weighted VSet-automaton over the Viterbi semiring, which is intended to extract (person, hometown)-tuples from a document.Here, "Pers" and "Loc" should be interpreted as sub-automata that test whether a string could be a person name or a location.(Such automata can be compiled from publicly available regular expressions11 and from deterministic rules and dictionaries as illustrated in SystemT [CKL + 10].) The relation extracted by this automaton from the document in Figure 1 is exactly the annotated span relation of the same figure.The weight of a tuple t depends on the number of spaces occurring between the span captured by x pers and the span captured by x loc .More specifically the automaton assigns the weight (0.9) k to each tuple, where k is the number of words between the two variables.
As we see next, checking equivalence of weighted VSet-automata is undecidable in general.
Proposition 4.3.Given two weighted VSet-automata A 1 and A 2 over the tropical semiring, it is undecidable to test whether Proof.Follows directly from undecidability of the containment problem of weighted automata over the tropical semiring.(c.f.Krob [Kro94, Corollary 4.3]12 ) 4.1.Connection to Datalog over Annotated Relations.The semantics of A K (d) is similar in spirit to the semantics of Datalog over annotated relations, studied by Deutch et al. [DMRT14], when we view the runs of ρ ∈ Runs(A, d) as the derivations of t: we take the product of the items that participate in each derivation, and sum up these products over all derivations.In fact, there is a simple translation of a weighted vset-automaton into a Datalog program over annotated relations, similarly to the way Peterfreund et al. [PtCFK19] represent ordinary spanners via Datalog.Roughly speaking, in this translation the document d is conventionally represented via the relations over the positions such as O σ (i) and Successor(i, j) that store the positions with the symbol σ and successor relationship between positions, respectively (in addition to the relations First(i) and Last(i) that represent the first and last positions, respectively).Each tuple in these relations is annotated by 1.In addition, the transitions are represented by relations of the form T σ (q, q) annotated with the weight of the corresponding transition.(For simplicity, we ignore the initial and final weight functions.)The runs are derived by simple path rules such as Path(x, y, q, q ) ← T σ (q, q ), Successor(x, y), O σ (y) Path(x, y, q, q ) ← Path(x, z, q, q ), Path(x, z, q , q ) where Path(x, y, q, q ) states that it is possible to reach q from q when starting in x and ending with y.
It is easy to show that a translation such as the above preserves the provenance as defined by Deutch et al. [DMRT14] (i.e., the sum of products of tuples in the derivations).We further discuss this translation, as well the general relationship between our work and that of Deutch et al. [DMRT14], in Section 9.

Semiring Encodings
In order to state complexity results, we need to make some assumptions about the representation and computation of the semiring operations.That is, as mentioned in Section 2.1, we assume that semiring elements are encoded in binary, i.e., there is a function enc : K → {0, 1} * , which assigns a binary encoding to every semiring element.We write the length of the encoding of an element a ∈ K as a .
Throughout this article, we sometimes encode computations into matrix multiplications.To this end, we define a matrix multiplication system MMS K of dimension n ∈ N as a triple MMS K := (I, M, F ), where I, F ∈ K n are n-dimensional vectors over K and M ∈ K n×n is an n × n matrix.We define the size of a matrix multiplication system as its dimension plus the sum of the encoding lengths of all semiring elements in the system.That is, For an n × n matrix X ∈ K n×n (resp., a vector X ∈ K n ), we define max(X) to be the maximum of the dimension of X and the largest encoding length of a semiring element in X, that is, max(X) := max(n, max Furthermore, for a matrix multiplication system MMS K = (I, M, F ), we define max(MMS K ) = max(max(I), max(M ), max(F )) .
Let F T be the transpose of vector F .By I × M we denote the matrix multiplication of I and M .We define efficient semiring encodings as follows.
Definition 5.1.Let (K, ⊕, ⊗, 0, 1) be a semiring.The encoding of K is efficient if, for every matrix multiplication system MMS K and every natural number k, the encodings of the semiring elements Throughout this section, whenever we give complexity bounds, we assume that an efficient encoding of the semiring is used.As we show now, the standard encodings of most of the semirings in Example 2.1 are efficient.Proof.Let MMS K = (I, M, F ) be a matrix multiplication system of dimension n and k ∈ N be a natural number.Let w 1 , . . ., w k and w be as in Definition 5.1.
We observe that the computation of w requires a polynomial number of additions and multiplications.However, as the encoding of the semiring elements that are used for the computation might become large, this does not immediately imply that w can be computed in time polynomial in We therefore show, for every 1 ≤ i ≤ k, that the semiring elements that are required for the computation of w i have an encoding of size at most polynomial in Recall that max(M ) is the maximum of the dimension n of M and the largest representation size of any element in M .We begin by showing that max(I ×M ) ≤ max(I)+max(M )+n for all vectors I ∈ K n and matrices M ∈ K n×n .Let x be an element of I × M .Per definition of matrix multiplication, x is the sum of n elements x 1 , . . ., x n , each of which is the product of an element from I and an element from M .Thus and, therefore, x ≤ max(I) + max(M ) + n .We conclude that max(I × M ) ≤ max(I) + max(M ) + n for all vectors I ∈ K n and matrices M ∈ K n×n .
We now show by induction that, for all i ∈ N, it holds that for all vectors I, F ∈ K n and all matrices M ∈ K n×n .Since max(I)+max(M )+max(F )+n For the base case, we observe that w 0 = I × F is the sum of n elements, each of which has size at most max(I) + max(F ).As desired, we therefore have that max(I × F ) = w 0 ≤ max(I) + max(F ) + n .
For the inductive step, assume there is an i ∈ N such that for all vectors I, F ∈ K n and all matrices M ∈ K n×n .With I := I × M , we have that max(I ) = max(I × M ) ≤ max(I) + max(M ) + n and, therefore, This concludes the proof.
Note that all semirings over a finite domain have an efficient encoding, as each semiring element can be encoded with constant size and all operations can be carried out in constant time via a constant size lookup table.
We observe that, for many semirings, the standard encodings satisfy the conditions of Proposition 5.2.Examples are the numeric semiring (Z, +, •, 0, 1), the counting semiring, the Boolean semiring, the Viterbi semiring (over the rationals Q), the access control semiring, and the tropical semirings.However, for some semirings, standard encodings of the semiring elements do not satisfy the conditions of Proposition 5.2.For example, consider the numeric semiring (Q, +, •, 0, 1) and the encoding, where every semiring element a = n d is encoded by its numerator n ∈ Z and its denominator d ∈ N. The problem is that the sum of two rational numbers a b , c d is given by and therefore the size of the encoding of Even though this only increases the size of the representation by a small margin, we need some further investigation to conclude that this encoding is efficient.
Proof.Let (Q, +, •, 0, 1) be the numeric semiring.We assume that every semiring element x = a b is encoded by its numerator a ∈ Z and its denominator b ∈ N. Let all numerators and denominators be encoded in binary, where two's complement encoding is used for the numerators.We observe that Proposition 5.2 holds for both encodings.Furthermore, the encoding of the denominators is monotone, that is, for every x, y ∈ N it holds that x ≤ y if x ≤ y.
For a matrix multiplication system MMS K = (I, M, F ), let D be the set of all denominators of the rationals in I, F , and M .We will compute the least common multiple d lcm of all denominators in D and expand the representations of all numbers to the denominator d lcm .Observe that all denominators d ∈ D are natural numbers.Therefore, Furthermore, the computation of d lcm as well as the expansion can be done in polynomial time. 13We therefore assume w.l.o.g. that all rationals in I, F , and M have the denominator d lcm .
Let I Z , F Z ∈ Z n and M Z ∈ Z n×n be the vectors I, F and the matrix M where all numbers are replaced by their numerator.For all 1 ≤ i ≤ k, we define We recall that, due to Proposition 5.2, w Z,i can be computed in time polynomial in Per assumption that all rationals in I, F , and M have the denominator d lcm , we have that . Furthermore, the denominator can also be computed in time polynomial in y for the encoding of natural numbers.Thus, for all i ≤ k, the encodings of the w i can be computed in time polynomial in |MMS K |•k.Furthermore, w can be computed in time polynomial in |MMS K |•k by first expanding all w i to the denominator d k+2 lcm and summing up the expanded fractions.This concludes the proof.

Fundamental Properties
We now study fundamental properties of annotators.Specifically, we show that regular annotators are closed under union, projection, and join.Furthermore, annotators over a semiring K behave the same as document spanners with respect to string selection if K is positive or ⊕ is bipotent 14 and for every a, b ∈ K, a ⊗ b = 1 implies that a = b = 1.6.1.Epsilon Elimination.We begin the section by showing that every regular K-annotator can be transformed into an equivalent functional regular K-annotator without ε-transitions.Proposition 6.1.For every weighted VSet-automaton A there is an equivalent weighted VSet-automaton A that has no ε-transitions.This automaton A can be constructed from A in polynomial time.Furthermore, A is functional if and only if A is functional.
Proof.We use a result by Mohri [Moh09, Theorem 7.1] who showed that, given a weighted automaton, one can construct an equivalent weighted automaton without epsilon transitions.
More precisely, let A = (Σ, V, Q, I, F, δ) be a weighted VSet-automaton.Notice that A can also be seen as an ordinary weighted finite state automaton B = (Σ ∪ Γ V , Q, I, F, δ).
In this automaton, one can remove epsilon transitions by using Mohri's epsilon removal algorithm [Moh09, Theorem 7.1].The resulting ε-transition free automaton B = (Σ ∪ Γ V , Q , I , F , δ ) accepts the same strings as B. Therefore, interpreting B as an weighted VSet-automaton A = (Σ, V, Q , I , F , δ ) we have that A K = A K and A is functional if and only if A is functional. 13The least common multiple can be computed using the Eucledian algorithm and the expansion of Concerning complexity, Mohri shows that this algorithm runs in polynomial time, assuming that weighted-ε-closures can be computed in polynomial time.However, in our setting this is obvious as we do not allow ε-cycles.Therefore, the weight of an element of an ε-closure can be computed by at most n matrix multiplications, where n is the number of states 15 in A. Per assumption that K has an efficient encoding, these matrix multiplications can be computed in polynomial time.
6.2.Functionality.Non-functional VSet-automata are inconvenient to work with, since some of their nonzero runs are not valid and therefore do not contribute to the weight of a tuple.It is therefore desirable to be able to automatically convert weighted VSet-automata into functional weighted VSet-automata.Proposition 6.2.Let A be a weighted VSet-automaton.Then there is a functional weighted VSet-automaton A fun that is equivalent to A. If A has n states and uses k variables, then A fun can be constructed in time polynomial in n and exponential in k.
Proof.The proof follows the idea of a similar result by Freydenberger [Fre19, Proposition 3.9] for unweighted VSet-automata.Like Freydenberger, we associate each state in A fun with a function s : V → {w, o, c}, where s(x) represents the following: • w stands for "waiting," meaning x has not been read, • o stands for "open," meaning x has been read, but not x, • c stands for "closed," meaning x and x have been read.
Let S be the set of all such functions.Observe that |S| = 3 |V | .We now define A fun := (Σ, V, Q fun , I fun , F fun , δ fun ) as follows: Furthermore, for all (p, s) ∈ Q fun and x ∈ V we define Observe that there is a one to one correspondence between valid nonzero runs 15 As such, the construction also works in a slightly more general setting than ours, where the semiring is complete (closed under taking infinite sums, associativity, commutativity, and distributivity apply for countable sums) and weights of ε-closures can be computed in polynomial time.
ρ ∈ Runs(A, d) and valid nonzero runs ρ fun ∈ Runs(A fun , d) with w ρ = w ρ fun .Therefore, A K (d) = A fun K (d) must also hold.
The exponential blow-up in Proposition 6.2 cannot be avoided, since it already occurs for VSet-automata over the Boolean semiring. 16Functionality of VSet-automata can be checked efficiently, as we have the following result.Proposition 6.3.Given a K-weighted VSet-automaton A with m transitions and k variables, it can be decided whether A is functional in time O(km).Furthermore, A is functional if and only if it is functional when interpreted as B-weighted document spanner.
Proof.Per definition, a weighted VSet-automaton is functional if all runs are valid.Furthermore, a run ρ is valid if for every variable v ∈ V there is exactly one index i for which o i = v and exactly one index j > i for which o j = v.
Observe that this definition only depends on the labels of the run and not on the semiring of the automaton.Therefore, a K-weighted VSet-automaton A is functional if and only if A is functional when interpreted as an B-weighted VSet-automaton A B .More formally, let A B be the B-weighted VSet-automaton obtained by replacing nonzero weights with true, sum by ∨ and multiplication by ∧.The result now follows directly from Freydenberger [Fre19, Lemma 3.5], who showed that it can be verified in O(km) whether a VSet-automaton is functional.
6.3.Closure Under Join, Union, and Projection.We will obtain the following result.Theorem 6.4.Regular annotators are closed under finite union, projection, and finite natural join.Furthermore, if the annotators are given as functional weighted VSet-automata, the construction for a single union, projection, and join can be done in polynomial time.Furthermore, the constructions preserve functionality.
The theorem follows immediately from Lemmata 6.5, 6.6, and 6.9.Whereas the constructions for union and projection are fairly standard, the case of join needs some care in the case that the two automata A 1 and A 2 process variable operations in different orders. 17Lemma 6.5.Given two K-weighted VSet-automata A 1 and A 2 with V 1 = V 2 , one can construct a weighted VSet-automaton A in linear time, such that Proof.This lemma follows by the standard construction for the union of two weighted automata. Let the set of states, I, F : Q → K with I(q) = I i (q) and F (q) = F i (q).Furthermore, let δ(p, a, q) = δ i (p, a, q) if p, q ∈ Q i and δ(p, a, q) = 0 if p, q are not from the state set of the same automaton.We observe that this construction can be carried out in linear time.It remains to show the correctness of the construction.To this end, observe This concludes the proof that Lemma 6.6.Given a K-weighted VSet-automaton A and a subset X ⊆ V of the variables V of A, there exists a weighted VSet-automaton A with A K = π X A K .Furthermore, if A is functional, then A can be constructed in polynomial time. Proof.
is not yet functional, we can assume by Proposition 6.2 that it is, at exponential cost in the number of variables of A. Furthermore, assume that, for every nonzero transition, there is a run ρ which uses the transition.Due to A being functional, we will be able to construct A by replacing all transitions labeled with a variable operation o ∈ Γ V − with an ε-transition of the same weight.More formally, let A := (Σ, X, Q, I, F, δ ), such that • δ (p, o, q) = δ(p, o, q) for all p, q ∈ Q and o ∈ Σ ∪ {ε} ∪ Γ X , and • δ (p, ε, q) = δ(p, o, q) for all p, q ∈ Q and o ∈ Γ V − .We first argue why δ is well defined.Towards a contradiction, assume that δ is not well-defined.This can only happen if A has two transitions δ(p, o 1 , q) and δ(p, o 2 , q) with o 1 , o 2 ∈ Γ V − ∪ {ε} and o 1 = o 2 .Therefore, there are two runs ρ 1 , ρ 2 of A, which only differ on this transition, that is, ρ 1 uses δ(p, o 1 , q) and ρ 2 uses δ(p, o 2 , q) respectively.Since o 1 = o 2 and o 1 , o 2 ∈ Γ V − ∪ {ε}, either ρ 1 or ρ 2 are not valid, contradicting functionality of A.
It remains to show that A K = π X A K .To this end, let d ∈ Docs be an arbitrary document.Every run ρ of A selecting t on d corresponds to exactly one run ρ of A selecting t on d such that t = t X and w ρ = w ρ .Therefore, ρ∈Runs(A,d) and t=tρ Therefore, A K = π X A K .
We will now show that regular annotators are closed under join.Freydenberger et al. [FKP18, Lemma 3.10], showed that, given two functional B-weighted VSet-automata A 1 and A 2 , one can construct a functional VSet-automaton A with B in polynomial time.The construction is based on the classical product construction for the intersection of NFAs.However, A 1 and A 2 can process consecutive variable operations in different orders which must be considered during the construction.To deal with this issue, we adapt and combine multiple constructions from the literature.
To be precise, we adopt so called extended VSet-automata as defined by Amarilli et al. [ABMN19] by adding weights to the transitions. 18An extended K-weighted VSetautomaton on alphabet Σ and variable set V is an automaton A E = (Σ, V, Q, I, F, δ), where Q = Q v Q is a disjoint union of variable states Q v and letter states Q .Furthermore, I : Q → K is the initial weight function, such that I(q) = 0, for every q ∈ Q .Analogously, F : Q → K is a final weight function, such that F (q) = 0, for every q ∈ Q v .Finally, we define the (partial) transition function δ : Q × (Σ ∪ 2 Γ V ) × Q → K, such that transitions labeled by σ ∈ Σ originate in letter states and terminate in variable states and T ⊆ Γ V transitions are between variable states and letter states.More formally, for every σ ∈ Σ, it holds that if δ(p, σ, q) = 0 then p ∈ Q and q ∈ Q v .Furthermore, for every T ⊆ Γ V , if δ(p, T, q) = 0 then p ∈ Q v and q ∈ Q .
The weight w ρ of a run ρ on an extended weighted VSet-automaton, A E K , functionality, and unambiguity are defined analogously to the weighted VSet-automata.Proposition 6.7.For every functional weighted VSet-automaton A, there exists an equivalent functional extended weighted VSet-automaton A E and vice versa.Given an automaton in one model, one can construct an automaton in the other model in polynomial time.Furthermore, the conversion preserves unambiguity.
Due to Proposition 6.3 a weighted VSet-automaton is functional if and only if the automaton A interpreted as B-weighted VSet-automaton is functional.For functional VSet-automata it is well known 19 that there is a function s : Q × V → {w, o, c}, where • s(q, v) = w stands for "waiting," meaning that no run ρ of A such that v is read before reaching state q.• s(q, v) = o stands for "open," meaning that all runs ρ of A read v but not v before reaching state q.

WEIGHT ANNOTATION IN INFORMATION EXTRACTION 21:19
• s(q, v) = c stands for "closed," meaning that all runs ρ of A read v and v before reaching state q.
Based on s, we define the function S : Q × Q → Γ V , such that S(q, q ) = T , if on every run ρ of A which visits q after q, exactly the variable operations T must be read between q and q .More formally, x ∈ S(p, q) if and only if s(p, x) = w and s(q, x) = w and x ∈ S(p, q) if and only if s(p, x) = c and s(q, x) = c.We assume, w.l.o.g., that the states of A are {1, . . ., n} for some n ∈ N.For every state i ∈ Q, we define the vector V i , where Furthermore, we define the n × n matrix M p,q where We construct the weighted extended functional VSet-automaton be two disjoint copies of the states of A. Furthermore, let We observe that per assumption that K has an efficient encoding, it follows that A E can be constructed in polynomial time.It remains to show that A K = A E K .To this end, we define a function, which maps valid runs of A to runs of A E .More formally, let Let q 1 v ∈ Q v (resp., q n+1 ∈ Q ) be the variable state (resp., letter state) corresponding to q 0 (resp., q m ).Furthermore, for 1 ≤ k ≤ n, let q k , q k+1 v be the states corresponding to the states visited by ρ while reading the symbol d k .That is, for (q j , k) d k → (q j+1 , k + 1) in ρ, q k−1 corresponds to q j and q k v to q j+1 .We define f (ρ) as the run ρ E ∈ Runs(A E , d) such that For every valid run ρ E ∈ Runs(A E , d), it holds that w ρ E = ρ∈Runs(A,d) with f (ρ)=ρ E w ρ .Therefore, it follows that It remains to show that A E is unambiguous if A is unambiguous.To this end, assume that A E is not unambiguous.Thus, there must be two runs ρ 1 E = ρ 2 E on A E , encoding the same tuple.By construction of A E , there must be two runs ρ 1 = ρ 2 of A which encode the same tuple, however this contradicts the unambiguity of A. Therefore A E must be unambiguous.
For the other direction, one can construct a weighted VSet-automaton A with εtransitions, 20 by replacing every edge δ(p, T, q) = w by a sequence of transitions δ(p, v 1 , q 1 ) = w, δ(q 1 , v 2 , q 2 ) = 1, . . ., δ(q n−1 , v n , q) = 1, where T = {v 1 , . . ., v n } and q 1 , . . ., q n−1 are new states.We observe that only the first transition has weight w, whereas all other transitions have weight 1.This construction also runs in polynomial time and it is straightforward to verify that A K = A E K and that A is unambiguous if A E is unambiguous.
Proposition 6.8.Let A 1 , A 2 be two functional extended K-weighted VSet-automata.One can construct a functional extended K-weighted VSet-automaton A in polynomial time, such that To this end, let I(q 1 , q 2 ) = I 1 (q 1 ) ⊗ I 2 (q 2 ) and F (q 1 , q 2 ) = F 1 (q 1 ) ⊗ F 2 (q 2 ).Furthermore, let We observe that A can be constructed in polynomial time.We have to show that Let d ∈ Docs be a document and t be a tuple.Every run ρ ∈ Runs(A, d) with t ρ = t originates from of a set of runs ρ 1 ∈ Runs(A 1 , d) selecting π V 1 t and a set of runs ρ 2 ∈ Runs(A 2 , d) selecting π V 2 t.Due to distributivity of ⊗ over ⊕, it holds that w ρ = w ρ 1 ⊗ w ρ 2 .Furthermore, every run in A corresponds to exactly one run in A 1 and one run in A 2 .It follows directly that and that the construction preserves unambiguity.
We now show that regular annotators are closed under join.Lemma 6.9.Given two K-weighted VSet-automata A 1 and A 2 , one can construct a weighted functional VSet-automaton A with A K = A 1 K A 2 K .Furthermore, A can be constructed in polynomial time if A 1 and A 2 are functional and A is unambiguous if A 1 and A 2 are unambiguous.
Proof.If A 1 and A 2 are not yet functional, we can assume that they are at an exponential cost in their number of variables (cf.Proposition 6.2).By Proposition 6.7, one can construct functional extended weighted VSet-automata . Furthermore, due to Proposition 6.8, one can construct a functional extended weighted VSet-automaton . Thus, again applying Proposition 6.7, one can construct a functional weighted VSet-automaton Note that all constructions are in polynomial time if A 1 and A 2 are functional and preserve unambiguity.Thus, concluding the proof with A := A E .
The previous lemma also has applications to unambiguous functional VSet-automata over the Boolean semiring.Corollary 6.10.Given two unambiguous functional VSet-automata A 1 , A 2 over the Boolean semiring, one can construct an unambiguous functional VSet-automaton A with B in polynomial time.

Closure Under String Selection
where each L i is a regular language over Σ [Sak09].Let REG K be the set of regular K-annotators.We say that a k-ary string relation 21 R is selectable by regular K-annotators if the class of K-annotators is closed under the string selection σ R .More formally: {σ R x 1 ,...,x k (S) | S ∈ REG K and x i ∈ Vars(S) for all 1 ≤ i ≤ k} ⊆ REG K , that is, the class of K-annotators is closed under selection using R.If K = B, we say that R is selectable by document spanners.Fagin et al. [FKRV15] proved that a string relation is recognizable if and only if it is selectable by document spanners.Here, we generalize this result in the context of weights and annotation.Indeed, it turns out that the equivalence is maintained for all positive semirings.Theorem 6.11.Let (K, ⊕, ⊗, 0, 1) be a positive semiring and R be a string relation.The following are equivalent: (1) R is recognizable.
(2) R is selectable by document spanners.
As we will see, the proof of Lemma 6.18 is heavily based on the closure properties from Theorem 6.4 and holds beyond positive semirings.For the proof of Lemma 6.19, we use semiring morphisms to turn K-weighted VSet-automata into B-weighted VSet-automata and need positivity of the semiring.We need some preliminary results in order to give the proofs of the lemma.Definition 6.12.Let R be a k-ary string relation.A K-weighted VSet-automaton A K R with variables {x 1 , . . ., x k } selects R over K if for every document d ∈ Docs and every tuple t it holds that Lemma 6.13.Let R be a k-ary string relation.Then R is selectable by K-annotators if and only if there is a VSet-automaton A K R that selects R over K.
Proof.Assume that R is selectable by K-annotators.Let A be the K-weighted VSetautomaton that assigns weight 1 to all possible tuples for all documents.As R is selectable by K-annotators, σ R x 1 ,...,x k ( A K ) must be a regular K-annotator.Thus, the K-weighted VSet-automaton , which proves that R is selectable by K-annotators, as K-annotators are closed under join (c.f.Theorem 6.4).
We will now define means of transferring the structure of weighted automata between different semirings, that is, we define B-projections and K-extensions of weighted VSetautomata.Definition 6.14.Let A be a weighted VSet-automaton over K.A B-weighted VSetautomaton A B is a B-projection of A if, for every document d ∈ Docs, it holds that Definition 6.15.Let A be a B-weighted VSet-automaton.Then a K-weighted VSetautomaton A K is called a K-extension of A if, for every document d ∈ Docs and every tuple t, the following are equivalent: Furthermore, A K has exactly one run for every tuple in A K K (d).
We now show that a B-projections of a K-weighted VSet-automaton A exists if K is positive.Furthermore, a K-extensions of a B-weighted VSet-automaton always exists.To this end, let (K, ⊕, ⊗, 0, 1) and (K , ⊕ , ⊗ , 0 , 1 ) be semirings.For a function f : K → K and a weighted VSet-automaton A := (Σ, V, Q, I, F, δ) over K, we define the weighted VSet-automaton Lemma 6.16.Let K be a positive semiring.Then there exists a B-projection A B of A for every K-weighted VSet-automaton A.
Proof.Let f : K → B be the function Eilenberg [Eil74, Chapter VI.2] showed that, due to K being positive 22 , the function f is a semiring morphism, that is, Observe that these properties ensure that, for every document d ∈ Docs and every tuple t ∈ A K , it holds that Therefore, A f is a B-projection of A.
Proof.Let A := (V, Q, I, F, δ) be a B-weighted VSet-automaton.Doleschal et al. [DKM + 19, Proposition 4.4] showed that, for every VSet-automaton A, there is an equivalent deterministic VSet-automaton A det .Note that a deterministic VSet-automaton has exactly one run for every tuple t ∈ A det B (d).Therefore, w.l.o.g., we can assume that A has this property.Let g : B → K be the function 23 Observe that A g also has exactly one run for every tuple t ∈ A det K (d).It remains to show that A g is indeed a K-extension of A. To this end, let d ∈ Docs be a document.We now show the equivalence between (1) and (2).
(1) implies (2): Let t ∈ A B (d). Per assumption, A has exactly one run ρ on d for t.Let ρ g := g(ρ) be the run, resulting from ρ by replacing all weights w with g(w).Observe that ρ g must be a run of A g on d accepting t.Per construction, all transitions of A g have weight 0 or 1.Thus, (2) must hold.
(2) implies (1): Let t ∈ A g K (d) and A K (d)(t) = 1.Thus, there is a run ρ g of A g on d accepting t.Therefore, there must also be a run ρ of A on d, accepting t, concluding the proof.
We are now ready to prove the two main results of this section.Lemma 6.18.Let R be a string relation, which is selectable by document spanners.Then R is also selectable by K-Annotators.
Proof.Let A be a K-weighted VSet-automaton and R be a relation that is selectable by regular B-annotators.We have to show that every string selection σ R x 1 ,...,x k A K is definable by a K-weighted VSet-automaton.By assumption R is selectable by regular B-annotators.Let A B R be the VSet-automaton that selects R over B, which exists by Lemma 6.13.Let A K R be a K-extension of A B R VSet-automaton, which exists by Lemma 6.17.Thus, A K R selects R over K and therefore, by Lemma 6.13, R is selectable by K-Annotators.Lemma 6.19.Let (K, ⊕, ⊗, 0, 1) be a positive semiring and R be a string relation, which is selectable by K-Annotators.Then R is also selectable by document spanners. 22Eilenberg [Eil74, Chapter VI.2] actually showed that f is a semiring morphism if and only if K is positive.
23 Notice that g is not necessarily a semiring morphism.Depending on K, it may be the case that 1 ⊕ 1 = 0, contradicting the properties of semiring morphisms.Take K = Z/2Z, for instance.
Proof.Let R be a string relation selectable by K-annotators and let A be a B-weighted VSet-automaton.We have to show that R is also selectable over B, i.e., there is a Bweighted VSet-automaton Let A K be a K-extension of A, which exists by Lemma 6.17.Per assumption R is selectable over K, therefore, due to Lemma 6.13, there exists a K-weighted VSet-automaton Per definition of string selection, it follows that (d t(x 1 ) , . . ., d t(x k ) ) ∈ R and t ∈ A K K .By Lemma 6.17, it follows that (d t(x 1 ) , . . ., d t(x k ) ) ∈ R and t ∈ A B , and therefore t ∈ σ R x 1 ,...,x k A B .Observe that all implications in the previous argument where actually equivalences.Therefore, the inclusion σ R A B ⊆ A B R B also holds.
Observe that for the proof of Theorem 6.11 we only required positivity of the semiring for the implication from (3) to (2).This raises the question whether the equivalence can be generalized even further.We show next that this is indeed the case, such as for the Lukasiewicz semiring, which is not positive.6.4.1.Beyond Positive Semirings.We provide some insights about the cases where K is not positive.Recall that, by Lemma 6.18, every string relation R, which is selectable by document spanners is also selectable by K-Annotators.Therefore, the question is: for which semirings K does selectability by K-annotators imply selectability by ordinary document spanners?It turns out that this is indeed possible for some non-positive semirings, such as the Lukasiewicz semiring L.
Let (K , ⊕ , ⊗ , 0 , 1 ) be a subsemiring of a semiring K. 24 This semiring is minimal if there is no subsemiring of (K, ⊕, ⊗, 0, 1) with fewer elements.Recall that a semiring K is bipotent, if a ⊕ b ∈ {a, b}, for every a, b ∈ K.We begin with some intermediate results.
It remains to show that K min is isomorphic to B. To this end, let f : K min → B be the bijection It is straightforward to verify that f is indeed a semiring isomorphism.
It follows directly that K min is a positive semiring.
Proof.Let A, d, and t be as stated.Per definition, the weight assigned to t by A is Therefore, in order to compute the weight A K (d)(t), we need consider the weights of all runs ρ for which t = t ρ .Furthermore, multiple runs can select the same tuple t but assign variables in a different order. 25 Let A E be the functional extended weighted VSet-automaton corresponding to A, as constructed by Proposition 6.7.It follows that Let ρ be a run of A E on d and t, and let d t be the document obtained from ρ by concatenating the labels of the transitions of ρ. 26 Observe that d t encodes d and t and is uniquely defined by d and t.
It follows that, if A E is interpreted as an weighted automaton, the weight A E K (d)(t) is exactly the weight which is assigned by A E to the input word d t .
We note that computing the weight assigned by a weighted automaton on an input word w strongly depends on the cost model and the used semiring.We therefore give an explicit proof that the weight can be computed in polynomial time if the semiring has an efficient encoding.
We define an functional extended weighted VSet-automaton A t , such that A t K (d)(t) = 1 and A t K (d)(t ) = 0 for all t = t.Such an automaton A t can be defined using a chain of |d t | + 1 states, which checks that the input document is d and which has exactly one nonzero run ρ, with w ρ = 1 and t ρ = t.
By Proposition 6.8, there is a weighted VSet-automaton A such that It follows directly from the definition of A that A K (d )(t ) = 0 if d = d or t = t and A K (d)(t) = A K (d)(t), otherwise.Furthermore, all runs ρ ∈ Runs(A , d) have length |d t | + 1.Therefore, the weight A K (d)(t) can be obtained by taking the sum of the weights of all runs of length |d t | + 1 of A .If we assume, w.l.o.g., that the states of A are {1, . . ., n} for some n ∈ N, then, due to distributivity of ⊕ over ⊗, this sum can be computed as where • v I is the vector (I(1), . . ., I(n)), • M δ is the n × n matrix with M δ (i, j) = a∈Σ∪Γ V δ(i, a, j), and Therefore, by the assumption that K has an efficient encoding (Definition 5.1), the weight can be computed in polynomial time.7.2.Best Weight Evaluation.In many semirings, the domain is naturally ordered by some relation.For instance, the domain of the probability semiring is Q + , which is ordered 25 This may happen when variable operations occur consecutively, that is, without reading an alphabet symbol in between. 26We note that we assume in the following that every set X ⊆ ΓV is represented by an unique label that is not in Σ.That is, we can assume w.l.o.g. that no set of the form X ⊆ ΓV is an element from Σ and just use subsets of ΓV as labels.
by the ≤-relation.This motivates evaluation problems, where one is interested in some kind of optimization of the weight.We start by giving the definition of an ordered semiring.27Definition 7.2 (similar to Droste and Kuich [DK09]).A commutative monoid ⊕, 0) is ordered if it is equipped with a linear order preserved by the ⊕ operation.An ordered monoid is positively ordered if 0 a for all a ∈ K.A semiring (K, ⊕, ⊗, 0, 1) is (positively) ordered if the additive monoid is (positively) ordered and multiplication with elements 0 a preserves the order.
We consider the following two problems.

Threshold
Given: Regular annotator A over an ordered semiring, document d ∈ Docs, and a weight w ∈ K. Question: Is there a tuple t with w A K (d)(t)?

MaxTuple
Given: Regular annotator A over an ordered semiring and a document d ∈ Docs.Task: Compute a tuple with maximal weight, if it exists.
Notice that, if MaxTuple is efficiently solvable, then so is Threshold.We therefore prove upper bounds for MaxTuple and lower bounds for Threshold.The Threshold problem is sometimes also called the emptiness problem in the weighted automata literature.It turns out that both problems are tractable for positively ordered semirings that are bipotent (that is, for every a, b ∈ K it holds that a ⊕ b ∈ {a, b}).
We first make an observation about positively ordered, bipotent semirings.Notice that Observation 7.3 shows that, for positively ordered, bipotent semirings, the ⊕-operator is the same as the max operator, where the maximum is taken over the linear order .
Theorem 7.5.Let (K, ⊕, ⊗, 0, 1) be a positively ordered semiring, where ⊕ = max is the maximum operator over the linear order .Furthermore, let A be a functional K-weighted VSet-automaton, and let d Docs be a document.Then MaxTuple for A and d can be solved in polynomial time.
Proof.By Proposition 6.7, we can assume, w.l.o.g., that A is given as an extended functional K-weighted VSet-automaton.Furthermore, as K is bipotent, it must hold that a ⊕ b ∈ {a, b} for every a, b ∈ K. Therefore, the weight of a tuple t ∈ A K (d) is always equal to the weight of one of the runs ρ with t = t ρ .
Let ρ ∈ Runs(A, d) be the run of A on d with maximal weight.Due to ⊕ = max, it must hold that w tρ = w ρ and w t w tρ , as otherwise, ρ would not be the run of A on d with maximal weight.Therefore, in order to find the tuple with maximal weight, we need to find the run of A on d with maximal weight.
To this end, we define a directed acyclic graph (DAG) which is obtained by taking a "product" between A and d.Finding the run with the maximal weight then boils down to finding the path with maximal weight in this DAG.
Assume that A = (V, Q, I, F, δ).Recall that 2 Γ V denotes the power set of Γ V .We define a weighted, edge-labeled DAG G = (N, E, w), where each edge e is in N × ({ε} (2 Γ V × {1, . . ., |d| + 1})) × N and w assigns a weight w(e) ∈ K to every edge e.We note that an edge (p, (T, i), q) ∈ E will encodes that a transition, labeled T , is reached after reading d [1,i .
Furthermore, for T ⊆ Γ V and a ∈ Σ, we define the weight w(e) for all e ∈ E as follows: w((s, ε, (q, 1))) := I(q) w(((q, |d| + 1), ε, t)) := F (q) w(((p, i), (T, i), (q, i))) := δ(p, T, q) w(((p, i), ε, (q, i + 1))) := δ(p, a, q) .Recall that, in extended weighted VSet-automata, the set of states Q is a disjoint union of letter-and variable states, such that all transitions labeled by σ ∈ Σ originate in letter states and all transitions labeled by T ⊆ Γ V originate in variable states.Therefore, G must be acyclic, as all edges are either from a node in layer i to a node in layer i + 1 or from a variable state to a letter state within the same layer.Furthermore, there is a path from s to t in G with weight w if and only if there is a tuple t ∈ A K (d) with the same weight.
As shown in Mohri [Moh09], a path of maximal weight can be computed in polynomial time.We also give procedure, Procedure BestWeightEvaluation, for the sake of succinctness. 28 The correctness follows directly from K being positively ordered, thus order being preserved by addition and multiplication with an element ∈ K.

output Null
If the semiring is not bipotent, however, the Threshold and MaxTuple problems quickly become intractable.
Theorem 7.6.Let (K, ⊕, ⊗, 0, 1) be a semiring such that m i=1 1 is strictly monotonously increasing for increasing values of m.Furthermore, let A be a functional K-weighted VSetautomaton, let d ∈ Docs be a document, and k ∈ K be a weight threshold.Then Threshold for such inputs is NP-complete.
Proof.It is obvious that Threshold is in NP, as one can guess a tuple t and test in PTIME whether w A K (d)(t) using Theorem 7.1.For the NP-hardness, we will reduce from the MAX-3SAT problem.Given a 3CNF formula and a natural number k, the decision version of MAX-3SAT asks whether there is a valuation satisfying at least k clauses.
We can assume, w.l.o.g., that no clause has two literals corresponding to the same variable.Observe that, for each clause C i , there are 2 3 = 8 assignments of the variables corresponding to the literals of C i of which exactly 7 satisfy the clause C i .Formally, let f C i be the function that maps a variable assignment τ to a number between 1 and 8, depending on the assignments of the literals of the clause C i .We assume, w.l.o.g., that f C i (τ ) = 8 if and only if C i is not satisfied by τ .
We define a functional weighted automaton automaton A ψ over the unary alphabet Σ = {σ} such that A ψ K (σ n )(t) = m i=1 1 if and only if the assignment corresponding to t satisfies exactly m clauses in ψ and A ψ K (d)(t) = 0 if d = σ n or t does not encode a variable assignment.To this end, each variable x i of ψ is associated with a corresponding capture variable x i of A ψ .With each assignment τ we associate a tuple t τ , such that The automaton A ψ := (Σ, V, Q, I, F, δ) consists of m disjoint branches, where each branch corresponds to a clause of ψ; we call these clause branches.Each clause branch is divided into 7 sub-branches, such that a path in the sub-branch j corresponds to a variable assignment τ if f C i (τ ) = j.Thus, each clause branch has exactly one run ρ with weight 1 for each tuple t τ associated to a satisfying assignment τ of C i .More formally, the set of states states for every of the 7 sub-branches of each clause branch.Intuitively, A ψ has a gadget, consisting of 5 states, for each variable and each of the 7 satisfying assignments of each clause.Figure 4 depicts the three types of gadgets we use here.Note that the weights of the drawn edges are all 1.We use the left gadget if x does not occur in the relevant clause and the middle (resp., right) gadget if the literal ¬x (resp., x) occurs.Furthermore, within the same sub-branch of A ψ , the last state of each gadget is the same state as the start state of the next variable, i.e., q a,5 i,j = q a,1 i,j+1 for all 1 ≤ i ≤ k, 1 ≤ j < n, 1 ≤ a ≤ 7. We illustrate the crucial part of the construction on an example.Let ψ = (x 1 ∨ ¬x 2 ∨ x 4 )∧(x 2 ∨x 3 ∨x 4 ).The corresponding weighted VSet-automaton A ψ therefore has 14 = 2×7 disjoint branches.Figure 3 depicts the sub-branch for clause C 1 that corresponds to all assignments with x 1 = x 2 = 1 and x 4 = 0.
Formally, the initial weight function is I(q a,b i,j ) = 1 if j = 1 = b and I(q a,b i,j ) = 0 otherwise.The final weight function F (q a,b i,j ) = 1 if j = n and b = 5 and F (q a,b i,j ) = 0, otherwise.The transition function δ is defined as follows: δ(q a,b i,j , o, q a,b i,j ) = We show that there is a tuple t ∈ A ψ K (σ n ) with weight w t = k i=1 1 if and only if the corresponding assignment τ satisfies exactly k clauses of ψ.To this end, let τ be an assignment of the variables x 1 , . . ., x n .Thus, there is a run ρ ∈ Runs(A ψ , σ n ) with weight w ρ = 1 starting in q a,0 i,0 , such that a = f C i (τ ) if and only if τ satisfies clause C i .Due to k i=1 1 being strictly monotonously increasing it follows that k i=1 1 w tτ if and only if the corresponding assignment to τ satisfies at least k clauses.Let w = k i=1 1.It follows directly that there is an assignment τ of ψ satisfying k clauses if and only if there is a tuple t with w A ψ K (σ n )(t).
We note that Theorem 7.5 and Theorem 7.6 give us tight bounds for all semirings we defined in Example 2.1.
Since MAX-3SAT is hard to approximate, we can turn Theorem 7.6 into an even stronger inapproximability result for semirings where approximation makes sense.To this end, we focus on semirings that contain (N, +, •, 0, 1) as a subsemiring in the following result.Note that this already implies that m i=1 1 is strictly monotonously increasing for increasing values of m.
Theorem 7.7.Let K be a semiring that contains (N, +, •, 0, 1) as a subsemiring and let A be a weighted VSet-automaton over K. Unless PTIME = NP, there is no algorithm that approximates the tuple with the best weight within a sub-exponential factor in polynomial time.
Proof.Given a Boolean formula ψ in 3CNF, MAX-3SAT asks for the maximal number of clauses satisfied by a variable valuation.Given a 3CNF formula, a polynomial time approximation algorithm with approximation factor x returns a variable assignment satisfying at least opt x clauses, where opt is the maximal number of clauses which are satisfiable by a single variable assignment.Håstad [Hås01] showed that, for every ε > 0, it is NP-hard to approximate MAX-3SAT with an approximation factor x < 8/7 − ε, even if the input is restricted to satisfiable 3CNF instances.In other words, unless PTIME = NP, for every polynomial time approximation algorithm, there is a satisfiable 3CNF instance such that, for this instance, the algorithm does not return an approximation with approximation factor x < 8/7 − ε.We can leverage this, using the reduction from Theorem 7.6, to show that there is no polynomial time algorithm that approximates the tuple with the best weight with a sub-exponential approximation factor.
Let ψ be a satisfiable 3CNF formula with m clauses and let A ψ be the weighted VSetautomaton and d ∈ Docs be as constructed from ψ in the proof of Theorem 7.6.Let c be the size of A ψ , which is linear in n.As shown in Theorem 7.6, there is a tuple t in A ψ with weight j if and only if the variable assignment corresponding to t satisfies exactly j clauses.For a k ∈ N let A k ψ be the weighted VSet-automaton, constructed by concatenating k fresh copies of A ψ , each of which operates on a set of n fresh variables, by inserting ε-edges with weight 1 from q i to q i+1 where q i is a final state of the i-th copy and q i+1 an initial state of the i + 1-th copy.Observe that A k ψ has size c • k, has nk variables, and each tuple t ∈ A k ψ K (d k ) encodes k, possibly different, variable assignments for ψ.For the sake of contradiction, assume there is a polynomial time algorithm approximating the tuple with the best weight within a sub-exponential factor f (x) ∈ o(b x ) for every b > 1 (i.e., for every b > 1 and every C > 0 there is a x 0 > 0, such that f (x) < C • b x , for every x > x 0 ).That is, given a spanner A of size c and a document d of size |d| ≤ c, the approximation algorithm returns a tuple t with w t ≥ opt f (c) , where opt is the maximal weight assigned to a tuple t over d by A.
Due to Håstad [Hås01, Theorem 6.5], there is a satisfiable 3CNF formula ψ, such that this procedure can at best lead to an (8/7 − ε) approximation of the maximal number of satisfiable clauses.Therefore, it follows that The tuple t encodes k variable assignments and the weight of the tuple is the product of the weights of the variable assignments.Let τ be one of the variable assignments, encoded by t, which satisfy the most clauses. 29Thus, combining both inequalities, it must hold that opt f (c•k) ≤ w t ≤ opt (8/7−ε) k .Thus, (8/7 − ε) k ≤ f (c • k).Recall that the function f is sub-exponential, that is f (x) ∈ o(b x ) for every b > 1.Therefore, (8/7 − ε) k ∈ o(b (c•k) ), however, if 1 < b < 8/7 − ε, this does not hold for arbitrarily large k, leading to the desired contradiction.

Enumeration Problems
In this section we consider computing the output of annotators from the perspective of enumeration problems, where we try to enumerate all tuples with nonzero weight, possibly from large to small.Such problems are highly relevant for (variants of) VSet-automata, as witnessed by the recent literature that explicitly focuses on such automata [ABMN19, FRU + 18] or on alternative formalisms [BGJR21,DK21].
An enumeration problem P is a (partial) function that maps each input i to a finite or countably infinite set of outputs for i, denoted by P (i).Terminologically, we say that, given i, the task is to enumerate P (i).
An enumeration algorithm for P is an algorithm that, given input i, writes a sequence of answers to the output such that every answer in P (i) is written precisely once.If A is an enumeration algorithm for an enumeration problem P , we say that A runs in preprocessing (1) * is associative, i.e., (a * b) * c = a * (b * c) for all a, b, c ∈ M, (2) * is commutative, i.e. a * b = b * a for all a, b ∈ M, and
Proposition 5.2.Let (K, ⊕, ⊗, 0, 1) be a semiring.Then the encoding of K is efficient if, for all semiring elements a, b ∈ K, the encodings of a ⊕ b and a ⊗ b can be computed in time polynomial in a + b and a ⊕ b ≤ max( a , b ) + 1, and a ⊗ b ≤ a + b .
where inequality (1) follows from d lcm ≤ d∈D d and the monotonicity of the encoding of the denominators, inequality (2) follows from x • y ≤ x + y , and inequality (3) follows from the definition of |MMS K |.Therefore, x = a b by multiplying the numerator a by b d lcm . 14Recall, ⊕ is bipotent, if a ⊕ b ∈ {a, b}, for every a, b ∈ K.
, s) := I(p) where s(x) = w for all x ∈ V 0 otherwise F fun (p, s) := F (p) where s(x) = c for all x ∈ V 0 otherwise o is defined by s x o (x) := o, and s x o (y) := s(y) for all x = y, and s x c is defined by s x c (x) := c and s x c (y) := s(y) for all x = y.Functionality follows analogously to Freydenberger [Fre19, Proposition 3.9].It remains to show equivalence, i.e., that for every document d ∈ Docs it holds that A K 2 ,d) and t=tρ

Observation 7. 3 .
Let (K, ⊕, ⊗, 0, 1) be a positively ordered, bipotent semiring.Then, for every a, b ∈ K with a b it holds that a ⊕ b = b.Proof.As K is positively ordered it holds that 0 a.Thus, 0 ⊕ b a ⊕ b , and therefore b a ⊕ b.Due to K being bipotent, a ⊕ b ∈ {a, b}.Assume that a ⊕ b = a.Thus, b a and it follows with a b and antisymmetry of that a = b.Therefore, a ⊕ b = b as claimed.
w t ≤ opt (8/7 − ε) k .Let t ∈ A k ψ K (d k )be such an approximation and τ 1 , . . ., τ k be the corresponding variable assignments of ψ.Recall that|A k ψ | = c • k and |d k | = n • k ≤ c • k.Per assumption, there is an approximation algorithm, returning a tuple t withw t ≥ opt f (c • k).
• • • σ n where σ i ∈ Σ for each i = 1, . . ., n.By Docs we denote the set of all documents.A (k-ary) string relation is a subset of Docs k for some k ∈ N. 21:7 that every nonzero run ρ of A can only consist of states q ∈ Q 1 or q ∈ Q 2 .Let d ∈ Docs be an arbitrary document.The set Runs(A, d) of all valid and nonzero runs of A on d is the union of two sets P 1 (A, d), P 2 (A, d), where a run ρ is in P i (A, d) if it consists of states in Q i .Furthermore, it holds that ρ ∈ P i (A, d) if and only if ρ ∈ Runs(A i , d) and therefore,