The code consists of two classes in order to parse the MathML string. The first class, MathToString prepares the string. The second class, ParseML, does the parsing. Similarly, to get MathML code from text one class prepares and the other does the conversion.
Introduction
There seem to be few resources converting MathML code into plain text. A good reason is there is no consensus in how to format some math expressions. But for many expressions involving (+,-,/,*,^,=) operators, here is one possible converter.
The Classes Engaged
- Class
MathToString
prepares the string
with the MathML
code so that class ParseML
can perform the detailed parsing.
Preparation
First, the spaces are replaced by empty strings and some special characters are replaced too. Also, some tags not involved in the math expression, like style
tags, are removed. Then the code goes over from the most insider <mfrac>...</mfrac>, <msup>...</msup>, <mrow>...</mrow>, <msqrt>...</msqrt>
tags to the most outer, being parsed and replaced, enclosing them in between special characters so that later they can be recovered by ParseML
class.
Using the Code
To convert, just call the shared method MathToString.convertToString()
:
Dim converted as String = MathMLToString.convertToString(MathMLcodeToConvert)
To convert text to MathML, call convertStringToMathML()
.
Basic Principles
The parsing method is a recursive-descent parsing: Parsing Expressions by Recursive Descent.
Evaluation method E
calls T
for any addition or subtraction, but T
calls first F
for any multiplication or subtraction, and F
calls first P
for any power possible power operation. P
calls first v
to get next token. If there is a "(
" token, v
calls recursively to T
.
E --> T {( "+" | "-" ) T}
T --> F {( "*" | "/" ) F}
F --> P ["^" F]
P --> v | "(" E ")" | "-" T
History
- 12th May, 2022: Initial version
Version 1.0.3.0
Now, demo zip file contains a setup file.
JavaScript, .NET, and Core applications have been re-coded, because MathML is an application of XML and to reflect tags can be 'recursively' nested, I mean, an element can appear inside itself or any level below its children elements. For example:
<mfrac>
<mrow>
<mi>-1</mi>
</mrow>
<mn>
<mn>3</mn>
</mn>
</mfrac>
Version 2.0.0.0 (2024-03-25)
The code has been rewritten, improved and expanded. For example, it now supports including equations in LaTex language such as the following:
\begin{array}{*{20}c} {x = \frac{{ - b \pm \sqrt {b^2 - 4ac} }}{{2a}}} &
{{\text{when}}} & {ax^2 + bx + c = 0} \\ \end{array}
To this end, by putting the equation in LaTex in the MathML textbox and clicking on 'Convert to Plain Text' the Latex code is replaced by its translation to MathML and converted to plain text.
Version 2.0.5.0 (2024-04-17)
Some more samples have been added.
Arranges have been made so that square and curly brackets are reflected in the conversion.
ConvertStringToMathML.EvaluateFromInnerToOuter()
is a new method and, as its name indicates, it examines from most inner paired parentheses and/or brackets to most outer.
Version 2.0.5.9 (2024-05-01)
More samples have been added.
Several bugs have been patched and some improvements have been implemented. For example, now tag <mlabeledtr>
is taken into account.
Version 2.0.6.0 (2024-05-17)
Some more samples have been added. Pressing function keys F1, F2 and F12, respectively, goes to a sample number; advances to next sample; and sweeps samples from current sample (escape key 'Esc' cancels the sweep).
There are new features. Preserving parentheses using ⸨ ⸩
was enabled, preventing the application from deleting them. For example, when converting the text "(3)" to MathML
the parentheses are discarded. Instead, the text "⸨3⸩"
converted to MathML
will preserve them.
Now, with ↓
(ascii 25) character, after and before a parenthesis, a MathML
attribute stretchy='false'
is achieved. So, text ∫⸨5*x+2*sin⸨↓x↓⸩⸩dx
when translated into MathML will extend outer parentheses and not the inner ones around x
.
Also, it's up to the user choose between MathJax's
2.7.2 and 3 version. Version 2 is older but accounts line breaks, while version 3 does not.
Version 2.0.13.0 (2024-07-11)
- Included some more examples.
- Several fixes and improvements have been taken.
Version 2.0.16.0 (2024-08-01)
Some fixes, more examples, and now distinguishes between <msubsup>
and <munderover>
tags (for ex., example 73)
Version 2.0.17.0 (2024-08-12)
- Several fixes.
- Javascript code update. (Current version is 2.0.17.2. There is a metatag in the HTML
indicating the version.)
Version 2.0.20.0 (2024-08-26)
- More fixes.
- Attached a Notes.txt file.
- Updated all three zip files.