Semantic Annotation Mini-Language

Authors: Bruce Miller, Deyan Ginev (cribbing ideas from Neil Soiffer, Sam Dooley, David Carlisle)

Abstract

MathML 3 is a W3C recommendation for representing mathematical expressions. It defines two forms: Presentation MathML describes how the math looks and is the most commonly used; Content MathML aims to completely describe the math’s meaning, and is consequently difficult to produce. The purpose of this note is to develop a means of adding at least minimal semantic information to Presentation MathML to support accessibility.

Motivation

Mathematical notations are ambiguous. Some notations are used for different purposes and perhaps should be spoken differently. In contrast, different communities occasionally express the same mathematical concept using different notations; perhaps they should be spoken the same. Consider the variety of purposes for which a superscript is employed. Most commonly, one writes $x^2$ or $x^n$ for powers; abstractly, power is applied to 2 arguments. But superscripts are also used for indexing, such as with tensors or matrices $\Phi^i$. Abstractly, a different operation is applied the same 2 arguments; one needs to know the subject area or topic to know which operator is intended.

It gets worse: sometimes the superscript itself indicates the intended operator: $z^*$, $A^T$, $A^\dagger$, may signify conjugate, transpose and adjoint operations applied to the base of the subscript. Of course, these operations could also be written in functional notation, $\mathrm{conj}(z)$, $\mathrm{trans}(A)$, $\mathrm{adj}(A)$, so we ought not conflate the meaning “transpose of A” with the specific superscript notation that was used.

But sometimes a cigar is just a cigar. While in some contexts, $x’$ might represent the derivative of $x$, in other contexts the $x’$ as a whole simply represents “a different x”, or perhaps “the frobulator”.

Another notational pattern is seen in the somewhat messy markup commonly used for a binomial coefficient $\binom{n}{m}$, having the arguments $n$ and $m$ are buried in the markup, stacked with either mfrac or mtable, and finally wrapped in parentheses. The fact that very similar markup is also used for Eulerian numbers, Legendre and Jacobi symbols, and conversely, that other notations, such as $C^n_m$, are used for binomial coefficients, suggest decoupling the notation and meaning of the expression.

The common theme is the disambiguation of seemingly identical notations as well as variety of patterns that must be recognized by a accessibility agent which may want to use some combination of the mathematical intent of an expression as well as the actual notation employed in order to generate effective speech. Here we focus on a minimal annotation of Presentation MathML that could resolve the inherent ambiguities and facilitate more useful accessibility. The focus is not necessarily to achieve the precision necessary for computation.

Proposal

We propose an attribute, tentatively called “intent”, which describes the semantic intent of a Presentation MathML fragment (the node on which the attribute is given) using a simple prefix notation. This mechanism provides for recursively describing the operator and arguments, as well as ascribing a fixed intent to entire subtrees. The mini-language is specified as follows:

intent   ::= literal | selector | intent '(' intent [ ',' intent ]* ')'
literal  ::= [letters|digits|_|-]+
selector ::= argref | argpath
argref   ::= '$' NCName
argpath  ::= '$' [digit]+ [ '/' [digit]+ ]*

where the grammar components have the following significance:

the third form of intent represents some operation applied to arguments.
literal: represents a mathematical concept, being a literal keyword (to be looked up in a to-be-specified dictionary), or a phrase to be spoken as-is (possibly translated, as needed) if not present in the dictionary. In the latter case, hyphens and underscores are treated as spaces. The set of literals is intentionally open-ended.
selector: selects another node whose intent is used in-place-of or as part-of the composed intent.
argref: refers to the intent of another node, being the child of the current node with attribute arg having the given NCName (or possibly any node in the document with the NCName as id).
argpath: refers to the intent of a descendant of the current node; for example, a single component such as $2 refers to the 2nd child; multiple components refer to grandchildren or deeper, $1/1 would refer to the first grandchild.

The key component here is the literal; they are intended to correspond to some mathematical concept or operator, or some application specific quantity or operation; that is it represents some “meaning”. Ultimately, the goal is to translate this virtual content tree into text or speech (in different languages), braille or some other form. The preferred translation may also depend on context and user profile. On the one hand, the most natural translation of a given expression can depend on the operator. It is thus desirable to have a set of common known meaning with translation rules. Thus power(x,2) might be rendered as “the 2nd power of x” or “x squared”, depending on the user and agent.

On the other hand, mathematics encompasses an endless set of concepts, arguing that the set of meanings must be open-ended. We propose that there should be a small set of recommended meaning keywords whose translation can be specialized, while allowing any value for the meaning (an implementation is free to recognize more keywords). In the case where a meaning is not in a dictionary, support for multiple languages may be weak and the application of such a literal would be generic as “the [meaning] of [arg1], [arg2] and [argn] …”. While this result may be less than optimal, it is still functional.

Some Examples

Same notation, different meanings

With this model, the msup examples: $x^n$ as a power; $A^T$ as a transpose; $f’$ as a derivative; and $x’$ as an embellished symbol would be distinguished as follows:

<msup intent="power($base,$exp)">
  <mi arg="base">x</mi>
  <mi arg="exp">n</mi>
</msup>

<msup intent="$op($a)">
  <mi arg="a">A</mi>
  <mi arg="op" intent="transpose">T</mi>
</msup>

(easily adapted to conjugate and adjoint)

<msup intent="derivative($a)">
  <mi arg="a">f</mi>
  <mi>'</mi>
</msup>

<msup intent="x-prime">
  <mi>x</mi>
  <mo>'</mo>
</msup>

Selectors

Use of selectors allow different ways to express the same result, such as:

<msup intent="transpose(A)">
  <mi>A</mi>
  <mi>T</mi>
</msup>

<msup intent="transpose($a)">
  <mi arg="a">A</mi>
  <mi>T</mi>
</msup>

<msup intent="$mop($marg)">
  <mi arg="marg">A</mi>
  <mi arg="mop" intent="transpose">T</mi>
</msup>

<msup intent="transpose($1)">
  <mi>A</mi>
  <mi>T</mi>
</msup>

At one extreme, one could simply encode the entire intent on the math element. However, liberal use of selectors presumably would give the user agent more leeway for highlighting, navigation, etc.

Different notations, same meaning

A binomial would be marked up as:

<mrow intent="binomial($n,$m)">
  <mo>(</mo>
  <mfrac thickness="0pt">
    <mi arg="n">n</mi>
    <mi arg="m">m</mi>
  </mfrac>
  <mo>)</mo>
</mrow>

However, other notations for binomial are in use, such as:

<msubsup intent="$op($n,$m)">
  <mi arg="op" intent="binomial">C</mi>
  <mi arg="n">n</mi>
  <mi arg="m">m</mi>
</msubsup>

Or as some would prefer

<msubsup intent="$op($n,$m)">
  <mi arg="op" intent="binomial">C</mi>
  <mi arg="m">m</mi>
  <mi arg="n">n</mi>
</msubsup>

Each of the above binomials have the same semantic content, and (presumably) would generate the same translation.

mrow structure

A row of infix with multiple operators may seem to be a special case, depending on how mrow is used, but basically it forces the annotator to specify the exact nesting and precedence of operators:

<mrow intent="$p($a,$b,$m($c),$d)">
  <mi arg="a">a</mi>
  <mo arg="p" intent="plus">+</mo>
  <mi arg="b">b</mi>
  <mo arg="m" intent="minus">-</mo>
  <mi arg="c">c</mi>
  <mo>+</mo>
  <mi arg="d">d</mi>
</mrow>

Such expressions can be annotated whether the markup is rich or poor in mrows; one only has to annotate at the appropriate level in the tree.

A variety of infix operators ($\times$, $\cdot$, etc) are handled by adding the appropriate meaning to each.

Prefix and Postfix operators, are simple, provided they are contained within an mrow:

<mrow intent="$op($a)">
  <mi arg="a">n</mi>
  <mo arg="op" intent="factorial">!</mo>
</mrow>

A little less simple, but still possible, if they are not

<mrow intent="$p($a,$f($b))">
  <mi arg="a">a</mi>
  <mo arg="p" intent="plus">+</mo>
  <mi arg="b">b</mi>
  <mo arg="f" intent="factorial">!</mo>
</mrow>

Defaulting

“Defaulting” is based on the idea that certain notations are almost always used for the same purpose in a given subject area, and that therefore an author ought not need to tediously annotate every single instance; the system should recognize the usage pattern an generate the intent annotation itself. This document does not directly address the subject of defaulting, although the intent model presented here may very well serve as the target language for a defaulting preprocessor (or processing phase).

However, the possibility of a defaulting process does have implications for how one should annotate. A node without an explicitly given intent would likely be a candidate for a default intent based on some rule set. In a field where a prime is commonly used to denote derivatives, the annotator must make sure to mark those primes that do not denote derivatives, such as the $x’$ example above.

Summary

We have proposed a simple intent attribute to annotate Presentation MathML with disambiguating information that can be used to support accessibility. The design is open-ended. Further topics that will need to be addressed are:

dictionary: being a catalog of common semantic concepts with special requirements for quality speech;
defaulting: a mechanism to automatically generate intent annotation based on subject area.

Appendix: Examples of notations

This section demonstrates how various awkward constructs might be annotated.

The literals used below are for illustrative purposes only, and are not the final proposal. Choosing such literals must be done in the context of some dictionary of standard keywords. Ultimately, we will need naming conventions that would yield sensible dictionary keywords while providing natural readings for concepts not in the dictionary. Indeed, these two concerns may be in conflict.

Derivatives are an interesting case; certainly common and important enough to be in the dictionary. But there are many subcases: the variable of differentiation is sometimes implied (eg. when primes are used); the degree should default to 1; the differentiation may be over multiple variables. The functional notation for intent proposed here suggests positional arguments to a derivative literal, so that for example derivative(f,,n) would stand for the $n$th derivative of $f$ with respect to an implied variable. Undoubtedly, some will dislike this approach.

Notation	Description	Code
infix	arithmetic $a+b-c+d$	`<mrow intent="$op1($arg1,$arg2,$op2($arg3),$arg4)"> <mi arg="arg1">a</mi> <mo arg="op1" intent="plus">+</mo> <mi arg="arg2">b</mi> <mo arg="op2" intent="minus">-</mo> <mi arg="arg3">c</mi> <mo>+</mo> <mi arg="arg4">d</mi> </mrow>`
	inner product $\mathbf{a}\cdot\mathbf{b}$	`<mrow intent="$op($arg1,$arg2)"> <mi arg="arg1" mathvariant="bold">a</mi> <mo arg="op" intent="inner-product>⋅</mo> <mi arg="arg2" mathvariant="bold">b</mi> </mrow>`
	Easily extended to other operators and meanings: cross-product, "by", etc.
prefix	negation $-a$	<mrow intent="$op($arg)"> <mo arg="op" intent="negative">-</mo> <mi arg="arg">a</mi>` </mrow>
	Laplacian $\nabla^2 f$	<mrow intent="$op($arg)"> <msup arg="op" intent="laplacian"> <mi>∇</mi> <mn>2</mn> </msup> <mi arg="arg">f</mi>` </mrow>
postfix	factorial $n!$	`<mrow intent="$op($arg)"> <mi arg="arg">a</mi> <mo arg="op" intent="factorial">!</mo> </mrow>`
sup	power $x^n$	`<msup intent="power($base,$exp)"> <mi arg="base">x</mi> <mi arg="exp">n</mi> </msup>`
	iterated function $f^n$ ($=f(f(...f))$)	`<msup intent="functional-power($op,$n)"> <mi arg="op">f</mi> <mi arg="n">n</mi> </msup>`
	inverse $\sin^{-1}$	`<msup intent="functional-power($op,$n)"> <mi arg="op">sin</mi> <mn arg="n">-1</mn> </msup>`
	$n$-th derivative $f^{(n)}$	`<msup intent="derivative($op,,$n)"> <mi arg="op">f</mi> <mrow> <mo>(</mo> <mi arg="n">n</mi> <mo>)</mo> </mrow> </msup>`
sub	indexing $a_i$	`<msup intent="component($array,$index)"> <mi arg="array">a</mi> <mi arg="index">i</mi> </msup>`
sup-operator	transpose $A^T$	`<msup intent="$op($x)"> <mi arg="x">A</mi> <mi arg="op" intent="transpose">T</mn> </msup>`
	Compare to $\mathrm{trans}(A)$	`<mrow intent="$op($x)"> <mi arg="op" intent="transpose">trans</mi> <!-- optionally ⁡ --> <mi arg="x">A</mn> </mrow>`
	Or the function $\mathrm{trans}$	`<mi intent="transpose">trans</mi>`
	adjoint $A^\dagger$	`<msup intent="$op($x)"> <mi arg="x">A</mi> <mi arg="op" intent="adjoint">&dagger;</mn> </msup>`
	$2$-nd derivative $f''$	`<msup intent="derivative($op,,$n)"> <mi arg="op">f</mi> <mo arg="n" intent="2">''</mo> </msup>`
Awkward nesting	$x'_i$	`<msubsup intent="derivative(component($array,$index))"> <mi arg="array">x</mi> <mi arg="index">i</mi> <mo>'</mo> </msubsup>`
	or maybe	`<msubsup intent="component(derivative($op),$index)"> <mi arg="op">x</mi> <mi arg="index">i</mi> <mo>'</mo> </msubsup>`
	$\overline{x}_i$ being midpoint of $x_i$	`<msub intent="$op(component($line,$index))"> <mover accent="true"> <mi arg="line">x</mi> <mo arg="op" intent="midpoint">¯</mo> </mover> <mi arg="index">i</mi> </msub>`
	Versus: $\overline{x}_i$ being ith element of $\overline{x}$	`<msub intent="component($arr,$index)"> <mover arg="arr" accent="true" intent="$op($line)> <mi arg="line">x</mi> <mo arg="op" intent="midpoint">¯</mo> </mover> <mi arg="index">i</mi> </msub>`
base-operator	binomial $C^n_m$	`<msubsup intent="$op($n,$m)"> <mi arg="op" intent="binomial">C</mi> <mi arg="m">m</mi> <mi arg="n">n</mi> </msubsup>`
fenced	absolute value $\|x\|$	`<mrow intent="absolute-value($x)"> <mo>\|</mo> <mi arg="x">x</mi> <mo>\|</mo> </msup>`
	norm $\|\mathbf{x}\|$	`<mrow intent="norm($x)"> <mo>\|</mo> <mi arg="x"> mathvariant="bold"x</mi> <mo>\|</mo> </msup>`
	determinant $\|\mathbf{X}\|$	`<mrow intent="determinant($x)"> <mo>\|</mo> <mi arg="x" mathvariant="bold">X</mi> <mo>\|</mo> </msup>`
	sequence $\lbrace a_n\rbrace$	`<mrow intent="sequence($arg)"> <mo>{</mo> <msub arg="arg"> <mi>x</mi> <mi>n</mi> </msub> <mo>}</mo> </msup>`
	open interval $(a,b)$	`<mrow intent="open-interval($a,$b)"> <mo>(</mo> <mi arg="a">a</mi> <mo>,</mo> <mi arg="b">b</mi> <mo>)</mo> </msup>`
	open interval $]a,b[$	`<mrow intent="open-interval($a,$b)"> <mo>]</mo> <mi arg="a">a</mi> <mo>,</mo> <mi arg="b">b</mi> <mo>[</mo> </msup>`
	closed, open-closed, etc. similarly
	inner product $\left<\mathbf{a},\mathbf{b}\right>$	`<mrow intent="inner-product($a,$b)"> <mo><</mo> <mi arg="a" mathvariant="bold">a</mi> <mo>,</mo> <mi arg="b" mathvariant="bold">b</mi> <mo>></mo> </msup>`
	Legendre symbol $(n\|p)$	`<mrow intent="Legendre-symbol($n,$p)"> <mo>(</mo> <mi arg="n">n</mi> <mo>\|</mo> <mi arg="p">p</mi> <mo>)</mo> </msup>`
	Jacobi symbol	similarly
	Clebsch-Gordan $(j_1 m_1 j_2 m_2 \| j_1 j_2 j_3 m_3)\|$	`<mrow intent="Clebsch-Gordan($a1,$a2,$a3,$a4,$b1,$b2,$b3,$b4)"> <mo>(</mo> <msub arg="a1"><mi>j</mi><mn>1</mn> <msub arg="a2"><mi>m</mi><mn>1</mn> <msub arg="a3"><mi>j</mi><mn>2</mn> <msub arg="a4"><mi>m</mi><mn>2</mn> <mo>\|</mo> <msub arg="b1"><mi>j</mi><mn>1</mn> <msub arg="b2"><mi>j</mi><mn>2</mn> <msub arg="b3"><mi>j</mi><mn>3</mn> <msub arg="b4"><mi>m</mi><mn>3</mn> <mo>)</mo> </msup>`
fenced-sub	Pochhammer $\left(a\right)_n$	`<msup intent="Pochhammer($a,$n)"> <mrow> <mo>(</mo> <mi arg="a">a</mi> <mo>)</mo> </mrow> <mi arg="n">n</mi> </msup>`
fenced-stacked	binomial $\binom{n}{m}$	`<mrow intent="binomial($n,$m)"> <mo>(</mo> <mfrac thickness="0pt"> <mi arg="n">n</mi> <mi arg="m">m</mi> </mfrac> <mo>)</mo> </mrow>`
	multinomial $\binom{n}{m_1,m_2,m_3}$	`<mrow intent="multinomial($n,$m1,$m2,$m3)"> <mo>(</mo> <mfrac thickness="0pt"> <mi arg="n">n</mi> <mrow> <msub arg="m1"><mi>m</mi><mn>1</mn></msup> <mo>,</mo> <msub arg="m2"><mi>m</mi><mn>2</mn></msup> <mo>,</mo> <msub arg="m3"><mi>m</mi><mn>3</mn></msup> </mrow> </mfrac> <mo>)</mo> </mrow>`
		??? puntuation separates the several arguments?
	Eulerian numbers $\left< n \atop k \right>$	`<mrow intent="Eulerian-numbers($n,$k)"> <mo><</mo> <mfrac thickness="0pt"> <mi arg="n">n</mi> <mi arg="k">k</mi> </mfrac> <mo>></mo> </mrow>`
fenced-table	3j symbol $\left(\begin{array}{ccc}j_1& j_2 &j_3 \\ m_1 &m_2 &m_3\end{array}\right)$	`<mrow intent="3j($j1,$j2,$j3,$m1,$m2,$m3)"> <mo>(</mo> <mtable> <mtr> <mtd arg="j1"><msub><mi>j</mi><mn>1</mn></mtd> <mtd arg="j2"><msub><mi>j</mi><mn>2</mn></mtd> <mtd arg="j3"><msub><mi>j</mi><mn>3</mn></mtd> </mtr> <mtr> <mtd arg="m1"><msub><mi>m</mi><mn>1</mn></mtd> <mtd arg="m2"><msub><mi>m</mi><mn>2</mn></mtd> <mtd arg="m3"><msub><mi>m</mi><mn>3</mn></mtd> </mtr> </mtable> <mo>)</mo> </msup>`
	6j, 9j, ...	similarly
functions	function $A(a,b;z\|q)$	`<mrow intent="$op($p1,$p2,$a1,$q)"> <mi arg="op">A</mi> <mo>(</mo> <mi arg="p1">a</mi> <mo>,</mo> <mi arg="p2">b</mi> <mo>;</mo> <mi arg="a1">z</mi> <mo>\|</mo> <mi arg="q">q</mi> <mo>(</mo> </mrow>`
	Bessel $J_\nu(z)$	`<mrow intent="$op($nu,$z)"> <msub> <mi arg="op" intent="BesselJ">J</mi> <mi arg="nu">ν</mi> </msub> <mo>(</mo> <mi arg="z">z</mi> <mo>(</mo> </mrow>`
	curried Bessel $J_\nu(z)$	`<mrow intent="$op($nu)($z)"> <msub> <mi arg="op" intent="BesselJ">J</mi> <mi arg="nu" >ν</mi> </msub> <mo>(</mo> <mi arg="z">z</mi> <mo>(</mo> </mrow>`
derivatives	$\frac{d^2f}{dx^2}$	`<mfrac intent="derivative($func,$var,$deg)"> <mrow> <msup> <mo>d</mo> <mn>2</mn> </msup> <mi arg="func">f</mix> </mrow> <msup> <mrow> <mo>d</mo> <mi arg="var">x</mix> </mrow> <mn arg="deg">2</mn> </msup> </mfrac>`
integrals	$\int\frac{dr}{r}$	`<mrow intent="$op(divide(1,$r),$bvar)"> <mo arg="op" intent="integral">∫</mo> <mfrac> <mrow> <mi>d</mi> <mi arg="bvar">r</mi> </mrow> <mi arg="r">r</mi> </mfrac> </mrow>`
	One might be tempted put intent="divide(1,$r)" on the mfrac, but this blocks access to $bvar
continued fractions	$a_0+\displaystyle\frac{1}{a_1+\displaystyle\frac{1}{a_2+\cdots}}$	`<mrow intent="infinite-continued-fraction($a0,$b1,$a1,$b2,$a2)"> <msub arg="a0"><mi>a</mi><mn>0</mn></msub> <mo>+</mo> <mstyle display="true"> <mfrac> <mn arg="b1">1</mn> <mrow> <msub arg="a1"><mi>a</mi><mn>1</mn></msub> <mo>+</mo> <mstyle display="true"> <mfrac> <mn arg="b2">1</mn> <mrow> <msub arg="a2"><mi>a</mi><mn>2</mn></msub> <mo>+</mo> <mo>⋯</mo> </mrow> </mfrac> </mstyle> </mrow> </mfrac> </mstyle> </mrow>`