Author: Neil Soiffer

Motivation

These ideas were motivated by a comment by Deyan Ginev at the end of the June 18 MathML meeting. He had some reservations about Bruce Miller’s proposal; maybe he had the following in mind or maybe something different. His concern was that notation didn’t capture the “tail” of what people do. This proposal is more flexible and I believe simpler than Bruce’s proposal in that there is no need to categorize abstractions. It does crib many ideas from Bruce’s thinking.

This document is about an approach to adding semantic markup to Presentation MathML. It is not meant to motivate or justify why semantic markup of Presentation MathML is desirable. However, here is a reminder of the two communities this affects:

Authors – most MathML is generated by tools. Three subgroups might be concerned with semantics:
- WYSIWYG math editors – these tools contain some templates that might be considered semantic (and hence generate some semantic markup), but for the most part, most WYSIWYG editors focus on how things look, not what they mean. Potentially in the future, they might allow authors to add semantic meaning.
- TeX and other translators – TeX authors often make use of macros such as “\binom” to simplify markup. These macros sometimes carry semantic meaning and that meaning can become part of the translation to MathML. PreTeXt is an example of a macro package specifically designed to encourage semantic authoring in TeX.
- Human remediation – accessibility is important to various content producers such as textbook publishers. If a tool generates MathML that reads incorrectly because the reading software doesn’t understand the meaning, then the MathML might need to be marked up by hand to add semantics. Hence, any solution needs to be simple enough for hand authoring.
Consumers – MathML in used in Web pages and is also understood by most mathematical software. When read by assistive technology, the closer the words used by the assistive technology to those used by humans (in particular, teachers), the easier it is to understand. In order to do a good job of reading, the assistive technology needs to know the meaning. For example, $|x|$ could be read as “absolute value of x” or as “determinant of x”; reading it incorrectly can be very confusing to the listener. For mathematical software, it too must make a choice on how to interpret the expression if it is to compute or plot it.

The Basics of Functional Semantic Markup

I’ll start with two simple examples using transpose and binomial coefficient examples:

$A^T$:

<msup intent="transpose(@matrix)">
  <mi arg="matrix">A</mi>
  <mi>T</mi>
</msup>

$\binom{n}{m}$

<mrow intent="binomial(@n, @m)">
  <mo>(</mo>
  <mfrac thickness="0pt">
    <mi arg="n">n</mi>
    <mi arg="m">m</mi>    
  </mfrac>
  <mo>)</mo>
</mrow>

The idea is that a notation attribute names a function and its arguments. @xxx is used to mean “find the argument with arg attribute value xxx and replace @xxx with it.

Many notations such as the transpose notation are simple, so this proposal has an alternative method of markup that avoids some work: numbered arguments.

$A^T$:

<msup intent="transpose(@0)">
  <mi>A</mi>
  <mi>T</mi>
</msup>

Here, the number ‘0’ refers to the first (0-based) child of element of the notation attribute. Finding the ith child can be extended to arbitrary descendants by simply repeating it. For example:

$\binom{n}{m}$

<mrow intent="binomial(@1@0, @1@1)">
  <mo>(</mo>
  <mfrac thickness="0pt">
    <mi>n</mi>
    <mi>m</mi>    
  </mfrac>
  <mo>)</mo>
</mrow>

Pros and Cons

Although the arg attribute has similar functionality to id, it has a big advantage over using id in that it does not need to be globally unique. This makes it easier to generate and easier to reuse in a web page versus id.

Note: it is possible that we might be able to reuse an existing HTML attribute instead of creating a new one (arg). On the call, David Carlisle suggested that maybe we could use name. In this note, I’ll use arg.

Numbered attribute values avoid the need for adding an arg attribute and are therefore less work to manually author. Their disadvantage is that they are fragile – if a polyfill such as one for mfenced for MathML Core changes the document, the location of the children might change. A good polyfill hopefully at least preserves the non-mfenced attributes of the mfenced element by transferring them to the corresponding mrow, so the named approach would still work. Two other potential polyfills are a canonicalization polyfill that “fixes” the mrow structure and a line breaking polyfill creates new (indented) lines. Both would like alter the number of children in an mrow.

David Carlisle pointed out that numbered attributes are even more fragile than mentioned above because some normalization may add or remove an mrow. E.g, the contents of msqrt may be a single mrow with children or may omit that mrow. If the mrow has only one child, MathML defines it to be equivalent to just having that child. David also suggests using 1-based counting if we do use numbered attributes because that is what xpath and CSS selectors use.

Conversely, named arguments require more manual markup. Supporting both allows authors to choose what works best for their use case.

Some Details

This idea is not fully fleshed out, but some things can be clarified:

The notation attribute can take any value, but it is likely only some values will be listed as “known”. Typically the value will be either a string representing a constant (e.g., “EulerNumber”) or a string representing a function with arguments as in the examples above.
The arguments to a function have one of the forms:
@<digits>+ or @<letter><alphaChars>+ – if digits, then then it refers to the ith child (0-based) of the element with the notation attr. If it starts with a letter, then it refers to the value of an arg attribute. If there is more than one @s present, they refer to the child of the match of the previous @.
@@<letter><alphaChars>+ – nary match. All children are searched instead of stopping at the first child.

It is probably possible to extend the nary notation to work with a number also, but I’m less sure of that. E.g, maybe intent=set(@1,@@2) could mean match the second child, then continue matching all siblings that are offset by two from that. Maybe a slightly different “@>2” would make more sense. Potentially multiple nary picks could be given and the pattern repeated until the children of the element are exhausted. I don’t have a use case for that though. It’s probably best to keep things simple. We should not duplicate xpath or CSS selectors – “that way madness lies”. Named arguments work nary operators so there is no pressing need to complicate the simple indexing of children with numbers.

David Carlisle has a proposal that involves gathering up all the non-mo elements and making them arguments to the operator. That suggests that the @@ notation be modified to do the same thing by having no arguments. That would allow the following short version of dot product ($\mathbf{a}\cdot\mathbf{b}$) to work without having to use of the potentially fragile numbering notation or tagging the arguments:

<mrow intent="inner-product(@@)">
  <mi mathvariant="bold">a</mi>
  <mo>&#x22C5;</mo>
  <mi mathvariant="bold">b</mi>
</mrow>

In the case of a named argument, the children would be searched for using a depth first search. The search stops when:

The element has an arg attribute and the value matches, the element is searched for a secondary @ arg; if there are no more @ args, the element is returned. If no match, the search continues on the next sibling of the current element.
A notation attribute is found. The search continues on the next sibling of the current element. Note that the arg attribute has already been checked for a match.

In step one, if an nary parameter is being matched, the search would continue on the next sibling instead of stopping.

Step two is to prevent searching children of another notation and finding a match there. See the Nested Notation section below for an example.

A few things to note:

although depth first search might needlessly search deep down the tree, non-matching nodes are very likely to be leaf elements like fences or operators, so very little time will be wasted.
potentially all the arguments could be searched for at once and a dictionary is returned of the matches.

Examples of notations

Here are Bruce’s example with this new proposal’s markup. Some use numbers and others with names. The choice was arbitrary. Neither Bruce nor I (copied from Bruce’s names) endorse the specific meaning/notation names used; they are merely meant to be suggestive of the meaning.

intent="unary-minus(//*[@arg="operand"])" Jacobi symbol

Description	Code	XPath version
nary (discussed later) $a+b-c+d$	`<mrow intent="@@all"> <mi arg="all">a</mi> <mo arg="all">+</mo> <mi arg="all">b</mi> <mo arg="all">-</mo> <mi arg="all">c</mi> <mo arg="all">+</mo> <mi arg="all">d</mi> </mrow>`
dot product $\mathbf{a}\cdot\mathbf{b}$	`<mrow intent="inner-product(@0, @2)"> <mi mathvariant="bold">a</mi> <mo>⋅</mo> <mi mathvariant="bold">b</mi> </mrow>`	intent="inner-product(//1., //3)"
negation $-a$	`<mrow intent="unary-minus(@operand)"> <mo>-</mo> <mi arg="operand">a</mi> </mrow>`
Laplacian $\nabla^2 f$	<mrow intent="compose(@laplacian, @function)"> <msup arg="laplacian" intent="laplacian(@1)"> <mi>∇</mi> <mn>2</mn> </msup> <mi arg="function">f</mi>` </mrow>
factorial $n!$	`<mrow intent="factorial(@0)"> <mi>a</mi> <mo>!</mo> </mrow>`
power $x^n$	`<msup intent="power(@base,@exp)"> <mi arg="base">x</mi> <mi arg="exp">n</mi> </msup>`
repeated application $f^n$ ($=f(f(...f))$)	`<msup intent="applicative-power(@0,@1)"> <mi>f</mi> <mi>n</mi> </msup>`
inverse $\sin^{-1}$	`<msup intent="applicative-power(@0,@1)"> <mi>sin</mi> <mn>-1</mn> </msup>`
$n$-th derivative $f^{(n)}$	`<msup intent="nth-derivative-implicit-variable(@function, @n)"> <mi arg="function">f</mi> <mrow> <mo>(</mo> <mi arg="n">n</mi> <mo>)</mo> </mrow> </msup>`
indexing $a_i$	`<msup intent="index(@0,@1)"> <mi>a</mi> <mi>i</mi> </msup>`
transpose $A^T$	`<msup intent="transpose(@0)"> <mi>A</mi> <mi>T</mn> </msup>`
adjoint $A^\dagger$	`<msup intent="adjoint(@0)"> <mi>A</mi> <mi>&dagger;</mn> </msup>`
$2$-nd derivative $f''$	`<msup intent="2nd-derivative-implicit-variable(@0)"> <mi>f</mi> <mo>''</mo> </msup>`

binomial $C^n_m$	`<msubsup intent="binomial(@1, @2)"> <mi>C</mi> <mi>m</mi> <mi>n</mi> </msubsup>`
absolute value $\|x\|$	`<mrow intent="absolute-value(@1)"> <mo>\|</mo> <mi>x</mi> <mo>\|</mo> </mrow>`
norm $\|\mathbf{x}\|$	`<mrow intent="norm(@1)"> <mo>\|</mo> <mi> mathvariant="bold"x</mi> <mo>\|</mo> </mrow>`
determinant $\|\mathbf{X}\|$	`<mrow intent="determinant(@matrix)"> <mo>\|</mo> <mi mathvariant="bold" arg="matrix">X</mi> <mo>\|</mo> </mrow>`
sequence $\lbrace a_n\rbrace$	`<mrow intent="sequence(@base, @index)"> <mo>{</mo> <msub> <mi arg="base">x</mi> <mi arg="index">n</mi> </msub> <mo>}</mo> </mrow>`
open interval $(a,b)$	`<mrow intent="open-interval(@start, @end)"> <mo>(</mo> <mi arg="start">a</mi> <mo>,</mo> <mi arg="end">b</mi> <mo>)</mo> </mrow>`
open interval $]a,b[$	`<mrow intent="open-interval(@start, @end)"> <mo>]</mo> <mi arg="start">a</mi> <mo>,</mo> <mi arg="end">b</mi> <mo>[</mo> </mrow>`
	closed, open-closed, etc. similarly
inner product $\left<\mathbf{a},\mathbf{b}\right>$	`<mrow intent="inner-product(@arg1, @arg2)"> <mo><</mo> <mi mathvariant="bold" arg="arg1">a</mi> <mo>,</mo> <mi mathvariant="bold" arg="arg2">b</mi> <mo>></mo> </mrow>`
Legendre symbol $(n\|p)$	`<mrow intent="Legendre-symbol(@arg1, arg2)"> <mo>(</mo> <mi arg="arg1">n</mi> <mo>\|</mo> <mi arg="arg2">p</mi> <mo>)</mo> </mrow>`
similarly
Clebsch-Gordan $(j_1 m_1 j_2 m_2 \| j_1 j_2 j_3 m_3)\|$	`<mrow intent="Clebsch-Gordan([@1,@2,@3,@4], [@6,@7,@8,@9])"> <mo>(</mo> <msub><mi>j</mi><mn>1</mn> <msub><mi>m</mi><mn>1</mn> <msub><mi>j</mi><mn>2</mn> <msub><mi>m</mi><mn>2</mn> <mo>\|</mo> <msub><mi>j</mi><mn>1</mn> <msub><mi>j</mi><mn>2</mn> <msub><mi>j</mi><mn>3</mn> <msub><mi>m</mi><mn>3</mn> <mo>)</mo> </mrow>`
Pochhammer $\left(a\right)_n$	`<msup intent="Pochhammer(@x, @n)"> <mrow> <mo>(</mo> <mi arg="x">a</mi> <mo>)</mo> </mrow> <mi arg="n">n</mi> </msup>`
binomial $\binom{n}{m}$	`<mrow intent="binomial(@n, @m)"> <mo>(</mo> <mfrac thickness="0pt"> <mi arg="n">n</mi> <mi arg="m">m</mi> </mfrac> <mo>)</mo> </mrow>`
multinomial $\binom{n}{m_1,m_2,m_3}$	`<mrow intent="multinomial(@n, [@k1, @k2, @k3])"> <mo>(</mo> <mfrac thickness="0pt"> <mi arg="n">n</mi> <mrow> <msub arg="k1"><mi>m</mi><mn>1</mn></msup> <mo>,</mo> <msub arg="k2"><mi>m</mi><mn>2</mn></msup> <mo>,</mo> <msub arg="k3"><mi>m</mi><mn>3</mn></msup> </mrow> </mfrac> <mo>)</mo> </mrow>`
Eulerian numbers $\left< n \atop k \right>$	`<mrow intent="Eulerian-numbers(@n, @k)"> <mo><</mo> <mfrac thickness="0pt"> <mi arg="n">n</mi> <mi arg="k">k</mi> </mfrac> <mo>></mo> </mrow>`
3j symbol $\left(\begin{array}{ccc}j_1& j_2 &j_3 \\ m_1 &m_2 &m_3\end{array}\right)$	`<mrow intent="3j([@j1,@j2,@j3], [@m1,@m2,@m3])"> <mo>(</mo> <mtable> <mtr> <mtd><msub arg="j1"><mi>j</mi><mn>1</mn></mtd> <mtd><msub arg="j2"><mi>j</mi><mn>2</mn></mtd> <mtd><msub arg="j3"><mi>j</mi><mn>3</mn></mtd> </mtr> <mtr> <mtd><msub arg="k1"><mi>m</mi><mn>1</mn></mtd> <mtd><msub arg="k2"><mi>m</mi><mn>2</mn></mtd> <mtd><msub arg="k3"><mi>m</mi><mn>3</mn></mtd> </mtr> </mtable> <mo>)</mo> </mrow>`
6j, 9j, ...	similarly

The Good, the bad, and the lost in the fog

Some further remarks…

Two masters

I think we have all been focused on getting semantics out and figured conversion to Content MathML, Speech, Braille, and anything else would just follow. However, here are two cases where speech doesn’t necessarily flow from a function-based version of semantics:

Transpose can be written as $A^T$ or ${}^T! A$ or as $T(A)$. All would have the value transpose(A) in the above scheme. But it is likely we would want to speak the first as “A transpose”, the second as “transpose A” and the third as “the transpose of A”.
Infix notation seems simple: grab the operands and name the function the name of the operands as in plus(a,b,c). However, “a-b+c” is problematic because there are two operators: + and -. Computation systems typically solve this by using a unary minus as in plus(a, times(-1, b), c). The exact same representation would be used for “1+-2+3”. Speech needs to distinguish these two forms. Also, without “good” mrow structure, operators tend to be mixed (e.g., $2x+1$ all in one mrow). This isn’t a problem for speech or braille, but it is one for conversion to Content MathML and computation systems.

Having targets with different needs is a problem for Bruce’s proposal and this proposal. Potentially the speech problem is solved using the “hack” in the $a+b+c+d$ example above where both the operands and operators are returned. It isn’t good for conversion to Content MathML though.

Nested notations

All the examples above were “simple” examples in that notation only occurred once. Arguably, the examples with subscripted variables such as Clebsch-Gordan should have tagged the msub, but I just followed Bruce’s example.

Here’s an example of nesting $\binom{n^2}{m}$ where both notations use the same argument names:

<mrow intent="binomial(@arg1, @arg1)">
  <mo>(</mo>
  <mfrac thickness="0pt">
    <msup intent="power(@arg1,@arg2)" arg='arg1'>
      <mi arg="arg1">n</mi>
      <mn arg="arg2">2</mn>
    </msup>
    <mi arg="arg2">m</mi>    
  </mfrac>
  <mo>)</mo>
</mrow>

Because search for arguments to “binomial” stops when notation is found on the msup, the search for its “arg2” will not find the “2” and will instead properly find the “m”.

infix, prefix, postfix

At least for Content MathML conversion, “good” mrow structure is needed for both Bruce’s proposal and this proposal. For speech, this proposal can get by with flattened mrows.

The details for nary matches need to be worked out so that one can grab the operands in something like $a \times b \times c \times d$. There is some hand waving in the section that introduces the nary notation, but that part of the section is not thought through.

Other cases Bruce lists:

powers and defaults
sub-, super-, over-, under-scripts
fenced
fenced-stacked
Function calls
Derivatives and Integrals
Continued Fractions

These don’t cause problems in this system. In particular:

any notation for function call is easily supported
finding the $dx$ in integrals, etc., is not a problem
continued functions just work with intent="ContinuedFraction([@a0, @a1, @a2, @a3]) for a fraction like \[ a_0+\cfrac{1}{a_1+\cfrac{1}{a_2+\cfrac{1}{a_3+\cdots}}} \]

The Elephant in the Room Everyone Knows Wants To Be Fed

As with mathrole and meaning, this proposal will only be useful if we end up standardizing “some” names. This was definitely a problem for Content MathML in the past. Hopefully with the passage of time and also the (maybe) reduction in complexity of this proposal, we can create a larger and more useful list more quickly. We should be able to easily create a list equivalent to pragmatic Content MathML easily.

Hiding behind the naming problem is the problem of deciding defaults. We can go small and have only very simple defaults. E.g., for msup:

$\mathrm{trigFunc} ^ {-1}$ ⟶ intent="inverse-function(@0)"
$\mathrm{trigFunc} ^ {\mathrm{exp}}(\mathrm{arg})$ ⟶ intent=power( @trigFunc(@arg), @exp )
everything else ⟶ intent=power(@0, @1)

or we can go for a more complete set that includes $\log^2(x)$, $ℝ^2$, various calculus notations ($f’$, $d^2/dx^2$, …), $A^T$, etc. Or maybe some of these should only be defaults for a given subject area (yet another naming elephant in the room).

Bottom line: there are a lot of elephants to feed once we get past figuring out how to mark up semantics.

Summary

I believe this proposal is an improvement over using mathrole because it bundles the meaning with its arguments without the addition of tables to figure them out. I also feel it is an improvement over trying to extract patterns of usage and name them as that requires developing (and remembering) two open-ended sets of names and introduces an indirection that doesn’t add any power.

Functional Patterns for Semantic Annotation