Author: Neil Soiffer
These ideas were motivated by a comment by Deyan Ginev at the end of the June 18 MathML meeting. He had some reservations about Bruce Miller’s proposal; maybe he had the following in mind or maybe something different. His concern was that notation
didn’t capture the “tail” of what people do. This proposal is more flexible and I believe simpler than Bruce’s proposal in that there is no need to categorize abstractions. It does crib many ideas from Bruce’s thinking.
This document is about an approach to adding semantic markup to Presentation MathML. It is not meant to motivate or justify why semantic markup of Presentation MathML is desirable. However, here is a reminder of the two communities this affects:
I’ll start with two simple examples using transpose and binomial coefficient examples:
<msup intent="transpose(@matrix)">
<mi arg="matrix">A</mi>
<mi>T</mi>
</msup>
<mrow intent="binomial(@n, @m)">
<mo>(</mo>
<mfrac thickness="0pt">
<mi arg="n">n</mi>
<mi arg="m">m</mi>
</mfrac>
<mo>)</mo>
</mrow>
The idea is that a notation
attribute names a function and its arguments. @xxx
is used to mean “find the argument with arg
attribute value xxx and replace @xxx
with it.
Many notations such as the transpose notation are simple, so this proposal has an alternative method of markup that avoids some work: numbered arguments.
<msup intent="transpose(@0)">
<mi>A</mi>
<mi>T</mi>
</msup>
Here, the number ‘0’ refers to the first (0-based) child of element of the notation attribute. Finding the ith child can be extended to arbitrary descendants by simply repeating it. For example:
<mrow intent="binomial(@1@0, @1@1)">
<mo>(</mo>
<mfrac thickness="0pt">
<mi>n</mi>
<mi>m</mi>
</mfrac>
<mo>)</mo>
</mrow>
Although the arg
attribute has similar functionality to id
, it has a big advantage over using id
in that it does not need to be globally unique. This makes it easier to generate and easier to reuse in a web page versus id
.
Note: it is possible that we might be able to reuse an existing HTML attribute instead of creating a new one (arg
). On the call, David Carlisle suggested that maybe we could use name
. In this note, I’ll use arg
.
Numbered attribute values avoid the need for adding an arg
attribute and are therefore less work to manually author. Their disadvantage is that they are fragile – if a polyfill such as one for mfenced
for MathML Core changes the document, the location of the children might change. A good polyfill hopefully at least preserves the non-mfenced
attributes of the mfenced
element by transferring them to the corresponding mrow
, so the named approach would still work. Two other potential polyfills are a canonicalization polyfill that “fixes” the mrow
structure and a line breaking polyfill creates new (indented) lines. Both would like alter the number of children in an mrow.
David Carlisle pointed out that numbered attributes are even more fragile than mentioned above because some normalization may add or remove an mrow
. E.g, the contents of msqrt
may be a single mrow
with children or may omit that mrow
. If the mrow
has only one child, MathML defines it to be equivalent to just having that child.
David also suggests using 1-based counting if we do use numbered attributes because that is what xpath and CSS selectors use.
Conversely, named arguments require more manual markup. Supporting both allows authors to choose what works best for their use case.
This idea is not fully fleshed out, but some things can be clarified:
The notation attribute can take any value, but it is likely only some values will be listed as “known”. Typically the value will be either a string representing a constant (e.g., “EulerNumber”) or a string representing a function with arguments as in the examples above.
@<digits>+
or @<letter><alphaChars>+
– if digits, then then it refers to the ith child (0-based) of the element with the notation attr. If it starts with a letter, then it refers to the value of an arg
attribute. If there is more than one @
s present, they refer to the child of the match of the previous @
.@@<letter><alphaChars>+
– nary match. All children are searched instead of stopping at the first child.It is probably possible to extend the nary notation to work with a number also, but I’m less sure of that. E.g, maybe intent=set(@1,@@2)
could mean match the second child, then continue matching all siblings that are offset by two from that. Maybe a slightly different “@>2” would make more sense. Potentially multiple nary picks could be given and the pattern repeated until the children of the element are exhausted. I don’t have a use case for that though. It’s probably best to keep things simple. We should not duplicate xpath or CSS selectors – “that way madness lies”. Named arguments work nary operators so there is no pressing need to complicate the simple indexing of children with numbers.
David Carlisle has a proposal that involves gathering up all the non-mo
elements and making them arguments to the operator. That suggests that the @@
notation be modified to do the same thing by having no arguments. That would allow the following short version of dot product ($\mathbf{a}\cdot\mathbf{b}$) to work without having to use of the potentially fragile numbering notation or tagging the arguments:
<mrow intent="inner-product(@@)">
<mi mathvariant="bold">a</mi>
<mo>⋅</mo>
<mi mathvariant="bold">b</mi>
</mrow>
In the case of a named argument, the children would be searched for using a depth first search. The search stops when:
arg
attribute and the value matches, the element is searched for a secondary @
arg; if there are no more @
args, the element is returned. If no match, the search continues on the next sibling of the current element.notation
attribute is found. The search continues on the next sibling of the current element. Note that the arg
attribute has already been checked for a match.In step one, if an nary parameter is being matched, the search would continue on the next sibling instead of stopping.
Step two is to prevent searching children of another notation and finding a match there. See the Nested Notation section below for an example.
A few things to note:
Here are Bruce’s example with this new proposal’s markup. Some use numbers and others with names. The choice was arbitrary. Neither Bruce nor I (copied from Bruce’s names) endorse the specific meaning/notation names used; they are merely meant to be suggestive of the meaning.
Description | Code | XPath version |
---|---|---|
nary (discussed later) $a+b-c+d$ |
|
|
dot product $\mathbf{a}\cdot\mathbf{b}$ |
|
intent="inner-product(//*1., //*3)" |
negation $-a$ |
|
intent="unary-minus(//*[@arg="operand"])"
|
Laplacian $\nabla^2 f$ |
|
|
factorial $n!$ |
|
|
power $x^n$ |
|
|
repeated application $f^n$ ($=f(f(...f))$) |
|
|
inverse $\sin^{-1}$ |
|
|
$n$-th derivative $f^{(n)}$ |
|
|
indexing $a_i$ |
|
|
transpose $A^T$ |
|
|
adjoint $A^\dagger$ |
|
|
$2$-nd derivative $f''$ |
|
|
binomial $C^n_m$ |
| |
absolute value $|x|$ |
| |
norm $|\mathbf{x}|$ |
| |
determinant $|\mathbf{X}|$ |
| |
sequence $\lbrace a_n\rbrace$ |
| |
open interval $(a,b)$ |
| |
open interval $]a,b[$ |
| |
closed, open-closed, etc. similarly | ||
inner product $\left<\mathbf{a},\mathbf{b}\right>$ |
| |
Legendre symbol $(n|p)$ |
| |
similarly | ||
Clebsch-Gordan $(j_1 m_1 j_2 m_2 | j_1 j_2 j_3 m_3)|$ |
| |
Pochhammer $\left(a\right)_n$ |
| |
binomial $\binom{n}{m}$ |
| |
multinomial $\binom{n}{m_1,m_2,m_3}$ |
| |
Eulerian numbers $\left< n \atop k \right>$ |
| |
3j symbol $\left(\begin{array}{ccc}j_1& j_2 &j_3 \\ m_1 &m_2 &m_3\end{array}\right)$ |
| |
6j, 9j, ... | similarly |
Some further remarks…
I think we have all been focused on getting semantics out and figured conversion to Content MathML, Speech, Braille, and anything else would just follow. However, here are two cases where speech doesn’t necessarily flow from a function-based version of semantics:
transpose(A)
in the above scheme. But it is likely we would want to speak the first as “A transpose”, the second as “transpose A” and the third as “the transpose of A”.plus(a,b,c)
. However, “a-b+c” is problematic because there are two operators: +
and -
. Computation systems typically solve this by using a unary minus as in plus(a, times(-1, b), c)
. The exact same representation would be used for “1+-2+3”. Speech needs to distinguish these two forms. Also, without “good” mrow
structure, operators tend to be mixed (e.g., $2x+1$ all in one mrow
). This isn’t a problem for speech or braille, but it is one for conversion to Content MathML and computation systems.Having targets with different needs is a problem for Bruce’s proposal and this proposal. Potentially the speech problem is solved using the “hack” in the $a+b+c+d$ example above where both the operands and operators are returned. It isn’t good for conversion to Content MathML though.
All the examples above were “simple” examples in that notation
only occurred once. Arguably, the examples with subscripted variables such as Clebsch-Gordan should have tagged the msub
, but I just followed Bruce’s example.
Here’s an example of nesting $\binom{n^2}{m}$ where both notations use the same argument names:
<mrow intent="binomial(@arg1, @arg1)">
<mo>(</mo>
<mfrac thickness="0pt">
<msup intent="power(@arg1,@arg2)" arg='arg1'>
<mi arg="arg1">n</mi>
<mn arg="arg2">2</mn>
</msup>
<mi arg="arg2">m</mi>
</mfrac>
<mo>)</mo>
</mrow>
Because search for arguments to “binomial” stops when notation
is found on the msup
, the search for its “arg2” will not find the “2” and will instead properly find the “m”.
At least for Content MathML conversion, “good” mrow
structure is needed for both Bruce’s proposal and this proposal. For speech, this proposal can get by with flattened mrow
s.
The details for nary matches need to be worked out so that one can grab the operands in something like $a \times b \times c \times d$. There is some hand waving in the section that introduces the nary notation, but that part of the section is not thought through.
These don’t cause problems in this system. In particular:
intent="ContinuedFraction([@a0, @a1, @a2, @a3])
for a fraction like
\[
a_0+\cfrac{1}{a_1+\cfrac{1}{a_2+\cfrac{1}{a_3+\cdots}}}
\]As with mathrole
and meaning
, this proposal will only be useful if we end up standardizing “some” names. This was definitely a problem for Content MathML in the past. Hopefully with the passage of time and also the (maybe) reduction in complexity of this proposal, we can create a larger and more useful list more quickly. We should be able to easily create a list equivalent to pragmatic Content MathML easily.
Hiding behind the naming problem is the problem of deciding defaults. We can go small and have only very simple defaults. E.g., for msup
:
intent="inverse-function(@0)"
intent=power( @trigFunc(@arg), @exp )
intent=power(@0, @1)
or we can go for a more complete set that includes $\log^2(x)$, $ℝ^2$, various calculus notations ($f’$, $d^2/dx^2$, …), $A^T$, etc. Or maybe some of these should only be defaults for a given subject area (yet another naming elephant in the room).
Bottom line: there are a lot of elephants to feed once we get past figuring out how to mark up semantics.
I believe this proposal is an improvement over using mathrole
because it bundles the meaning with its arguments without the addition of tables to figure them out. I also feel it is an improvement over trying to extract patterns of usage and name them as that requires developing (and remembering) two open-ended sets of names and introduces an indirection that doesn’t add any power.