Authors: Bruce Miller, Deyan Ginev (cribbing ideas from Neil Soiffer, Sam Dooley, David Carlisle)
MathML 3 is a W3C recommendation for representing mathematical expressions. It defines two forms: Presentation MathML describes how the math looks and is the most commonly used; Content MathML aims to completely describe the math’s meaning, and is consequently difficult to produce. The purpose of this note is to develop a means of adding at least minimal semantic information to Presentation MathML to support accessibility.
Mathematical notations are ambiguous. Some notations are used for different purposes and perhaps should be spoken differently. In contrast, different communities occasionally express the same mathematical concept using different notations; perhaps they should be spoken the same. Consider the variety of purposes for which a superscript is employed. Most commonly, one writes $x^2$ or $x^n$ for powers; abstractly, power is applied to 2 arguments. But superscripts are also used for indexing, such as with tensors or matrices $\Phi^i$. Abstractly, a different operation is applied the same 2 arguments; one needs to know the subject area or topic to know which operator is intended.
It gets worse: sometimes the superscript itself indicates the intended operator: $z^*$, $A^T$, $A^\dagger$, may signify conjugate, transpose and adjoint operations applied to the base of the subscript. Of course, these operations could also be written in functional notation, $\mathrm{conj}(z)$, $\mathrm{trans}(A)$, $\mathrm{adj}(A)$, so we ought not conflate the meaning “transpose of A” with the specific superscript notation that was used.
But sometimes a cigar is just a cigar. While in some contexts, $x’$ might represent the derivative of $x$, in other contexts the $x’$ as a whole simply represents “a different x”, or perhaps “the frobulator”.
Another notational pattern is seen in the somewhat messy markup
commonly used for a binomial coefficient $\binom{n}{m}$,
having the arguments $n$ and $m$ are buried in the markup, stacked with either mfrac
or mtable
,
and finally wrapped in parentheses.
The fact that very similar markup is also used for Eulerian numbers, Legendre and Jacobi symbols,
and conversely, that other notations, such as $C^n_m$, are used for binomial coefficients,
suggest decoupling the notation and meaning of the expression.
The common theme is the disambiguation of seemingly identical notations as well as variety of patterns that must be recognized by a accessibility agent which may want to use some combination of the mathematical intent of an expression as well as the actual notation employed in order to generate effective speech. Here we focus on a minimal annotation of Presentation MathML that could resolve the inherent ambiguities and facilitate more useful accessibility. The focus is not necessarily to achieve the precision necessary for computation.
We propose an attribute, tentatively called “intent”, which describes the semantic intent of a Presentation MathML fragment (the node on which the attribute is given) using a simple prefix notation. This mechanism provides for recursively describing the operator and arguments, as well as ascribing a fixed intent to entire subtrees. The mini-language is specified as follows:
intent ::= literal | selector | intent '(' intent [ ',' intent ]* ')'
literal ::= [letters|digits|_|-]+
selector ::= argref | argpath
argref ::= '$' NCName
argpath ::= '$' [digit]+ [ '/' [digit]+ ]*
where the grammar components have the following significance:
arg
having the given NCName
(or possibly any node in the document with the NCName
as id
).$2
refers to the 2nd child;
multiple components refer to grandchildren or deeper,
$1/1
would refer to the first grandchild.The key component here is the literal;
they are intended to correspond to some mathematical concept
or operator, or some application specific quantity or operation;
that is it represents some “meaning”.
Ultimately, the goal is to translate this virtual content tree into
text or speech (in different languages), braille or some other form.
The preferred translation may also depend on context and user profile.
On the one hand, the most natural translation of a given expression
can depend on the operator. It is thus desirable to have a set of
common known meaning with translation rules.
Thus power(x,2)
might be rendered as “the 2nd power of x” or “x squared”,
depending on the user and agent.
On the other hand, mathematics encompasses an endless set of concepts, arguing that the set of meanings must be open-ended. We propose that there should be a small set of recommended meaning keywords whose translation can be specialized, while allowing any value for the meaning (an implementation is free to recognize more keywords). In the case where a meaning is not in a dictionary, support for multiple languages may be weak and the application of such a literal would be generic as “the [meaning] of [arg1], [arg2] and [argn] …”. While this result may be less than optimal, it is still functional.
With this model, the msup
examples:
$x^n$ as a power;
$A^T$ as a transpose;
$f’$ as a derivative;
and $x’$ as an embellished symbol
would be distinguished as follows:
<msup intent="power($base,$exp)">
<mi arg="base">x</mi>
<mi arg="exp">n</mi>
</msup>
<msup intent="$op($a)">
<mi arg="a">A</mi>
<mi arg="op" intent="transpose">T</mi>
</msup>
(easily adapted to conjugate and adjoint)
<msup intent="derivative($a)">
<mi arg="a">f</mi>
<mi>'</mi>
</msup>
<msup intent="x-prime">
<mi>x</mi>
<mo>'</mo>
</msup>
Use of selectors allow different ways to express the same result, such as:
<msup intent="transpose(A)">
<mi>A</mi>
<mi>T</mi>
</msup>
<msup intent="transpose($a)">
<mi arg="a">A</mi>
<mi>T</mi>
</msup>
<msup intent="$mop($marg)">
<mi arg="marg">A</mi>
<mi arg="mop" intent="transpose">T</mi>
</msup>
<msup intent="transpose($1)">
<mi>A</mi>
<mi>T</mi>
</msup>
At one extreme, one could simply encode the entire intent on the math
element.
However, liberal use of selectors presumably would give the user agent
more leeway for highlighting, navigation, etc.
A binomial would be marked up as:
<mrow intent="binomial($n,$m)">
<mo>(</mo>
<mfrac thickness="0pt">
<mi arg="n">n</mi>
<mi arg="m">m</mi>
</mfrac>
<mo>)</mo>
</mrow>
However, other notations for binomial are in use, such as:
<msubsup intent="$op($n,$m)">
<mi arg="op" intent="binomial">C</mi>
<mi arg="n">n</mi>
<mi arg="m">m</mi>
</msubsup>
Or as some would prefer
<msubsup intent="$op($n,$m)">
<mi arg="op" intent="binomial">C</mi>
<mi arg="m">m</mi>
<mi arg="n">n</mi>
</msubsup>
Each of the above binomials have the same semantic content, and (presumably) would generate the same translation.
A row of infix with multiple operators may seem to be a special case,
depending on how mrow
is used, but basically it forces the annotator
to specify the exact nesting and precedence of operators:
<mrow intent="$p($a,$b,$m($c),$d)">
<mi arg="a">a</mi>
<mo arg="p" intent="plus">+</mo>
<mi arg="b">b</mi>
<mo arg="m" intent="minus">-</mo>
<mi arg="c">c</mi>
<mo>+</mo>
<mi arg="d">d</mi>
</mrow>
Such expressions can be annotated whether the markup
is rich or poor in mrow
s; one only has to annotate at the appropriate
level in the tree.
A variety of infix operators ($\times$, $\cdot$, etc) are handled by adding the appropriate meaning to each.
Prefix and Postfix operators, are simple, provided they are contained
within an mrow
:
<mrow intent="$op($a)">
<mi arg="a">n</mi>
<mo arg="op" intent="factorial">!</mo>
</mrow>
A little less simple, but still possible, if they are not
<mrow intent="$p($a,$f($b))">
<mi arg="a">a</mi>
<mo arg="p" intent="plus">+</mo>
<mi arg="b">b</mi>
<mo arg="f" intent="factorial">!</mo>
</mrow>
“Defaulting” is based on the idea that certain notations are almost always used for the same purpose in a given subject area, and that therefore an author ought not need to tediously annotate every single instance; the system should recognize the usage pattern an generate the intent annotation itself. This document does not directly address the subject of defaulting, although the intent model presented here may very well serve as the target language for a defaulting preprocessor (or processing phase).
However, the possibility of a defaulting process does have implications for how one should annotate. A node without an explicitly given intent would likely be a candidate for a default intent based on some rule set. In a field where a prime is commonly used to denote derivatives, the annotator must make sure to mark those primes that do not denote derivatives, such as the $x’$ example above.
We have proposed a simple intent attribute to annotate Presentation MathML with disambiguating information that can be used to support accessibility. The design is open-ended. Further topics that will need to be addressed are:
This section demonstrates how various awkward constructs might be annotated.
The literals used below are for illustrative purposes only, and are not the final proposal. Choosing such literals must be done in the context of some dictionary of standard keywords. Ultimately, we will need naming conventions that would yield sensible dictionary keywords while providing natural readings for concepts not in the dictionary. Indeed, these two concerns may be in conflict.
Derivatives are an interesting case;
certainly common and important enough to be in the dictionary.
But there are many subcases:
the variable of differentiation is sometimes implied (eg. when primes are used);
the degree should default to 1;
the differentiation may be over multiple variables.
The functional notation for intent proposed here suggests positional arguments to a
derivative
literal, so that for example derivative(f,,n)
would stand for the $n$th derivative of $f$ with respect to an implied variable.
Undoubtedly, some will dislike this approach.
Notation | Description | Code |
---|---|---|
infix | arithmetic $a+b-c+d$ |
|
inner product $\mathbf{a}\cdot\mathbf{b}$ |
| |
Easily extended to other operators and meanings: cross-product, "by", etc. | ||
prefix | negation $-a$ |
|
Laplacian $\nabla^2 f$ |
| |
postfix | factorial $n!$ |
|
sup | power $x^n$ |
|
iterated function $f^n$ ($=f(f(...f))$) |
| |
inverse $\sin^{-1}$ |
| |
$n$-th derivative $f^{(n)}$ |
| |
sub | indexing $a_i$ |
|
sup-operator | transpose $A^T$ |
|
Compare to $\mathrm{trans}(A)$ |
|
|
Or the function $\mathrm{trans}$ |
|
|
adjoint $A^\dagger$ |
| |
$2$-nd derivative $f''$ |
| |
Awkward nesting | $x'_i$ |
|
or maybe |
| |
$\overline{x}_i$ being midpoint of $x_i$ |
|
|
Versus: $\overline{x}_i$ being ith element of $\overline{x}$ |
|
|
base-operator | binomial $C^n_m$ |
|
fenced | absolute value $|x|$ |
|
norm $|\mathbf{x}|$ |
| |
determinant $|\mathbf{X}|$ |
| |
sequence $\lbrace a_n\rbrace$ |
| |
open interval $(a,b)$ |
| |
open interval $]a,b[$ |
| |
closed, open-closed, etc. similarly | ||
inner product $\left<\mathbf{a},\mathbf{b}\right>$ |
| |
Legendre symbol $(n|p)$ |
| |
Jacobi symbol | similarly | |
Clebsch-Gordan $(j_1 m_1 j_2 m_2 | j_1 j_2 j_3 m_3)|$ |
| |
fenced-sub | Pochhammer $\left(a\right)_n$ |
|
fenced-stacked | binomial $\binom{n}{m}$ |
|
multinomial $\binom{n}{m_1,m_2,m_3}$ |
| |
??? puntuation separates the several arguments? | ||
Eulerian numbers $\left< n \atop k \right>$ |
| |
fenced-table | 3j symbol $\left(\begin{array}{ccc}j_1& j_2 &j_3 \\ m_1 &m_2 &m_3\end{array}\right)$ |
|
6j, 9j, ... | similarly | |
functions | function $A(a,b;z|q)$ |
|
Bessel $J_\nu(z)$ |
| |
curried Bessel $J_\nu(z)$ |
| |
derivatives | $\frac{d^2f}{dx^2}$ |
|
integrals | $\int\frac{dr}{r}$ |
|
One might be tempted put intent="divide(1,$r)" on the mfrac, but this blocks access to $bvar | ||
continued fractions | $a_0+\displaystyle\frac{1}{a_1+\displaystyle\frac{1}{a_2+\cdots}}$ |
|