Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

MathML to/from Plain Text Converter

4.98/5 (34 votes)
26 Aug 2024MIT3 min read 91.9K   3.1K  
Converts MathML coded string to/from plain text string
The code consists of two classes in order to parse the MathML string. The first class, MathToString prepares the string. The second class, ParseML, does the parsing. Similarly, to get MathML code from text one class prepares and the other does the conversion.

Image 1

Introduction

There seem to be few resources converting MathML code into plain text. A good reason is there is no consensus in how to format some math expressions. But for many expressions involving (+,-,/,*,^,=) operators, here is one possible converter.

The Classes Engaged

  • Class MathToString prepares the string with the MathML code so that class ParseML can perform the detailed parsing.

Preparation

First, the spaces are replaced by empty strings and some special characters are replaced too. Also, some tags not involved in the math expression, like style tags, are removed. Then the code goes over from the most insider <mfrac>...</mfrac>, <msup>...</msup>, <mrow>...</mrow>, <msqrt>...</msqrt> tags to the most outer, being parsed and replaced, enclosing them in between special characters so that later they can be recovered by ParseML class.

Using the Code

To convert, just call the shared method MathToString.convertToString():

VB.NET
Dim converted as String = MathMLToString.convertToString(MathMLcodeToConvert)

To convert text to MathML, call convertStringToMathML().

Basic Principles

The parsing method is a recursive-descent parsing: Parsing Expressions by Recursive Descent.

Evaluation method E calls T for any addition or subtraction, but T calls first F for any multiplication or subtraction, and F calls first P for any power possible power operation. P calls first v to get next token. If there is a "(" token, v calls recursively to T.

E --> T {( "+" | "-" ) T}
T --> F {( "*" | "/" ) F}
F --> P ["^" F]
P --> v | "(" E ")" | "-" T

History

  • 12th May, 2022: Initial version

Version 1.0.3.0

Now, demo zip file contains a setup file.

JavaScript, .NET, and Core applications have been re-coded, because MathML is an application of XML and to reflect tags can be 'recursively' nested, I mean, an element can appear inside itself or any level below its children elements. For example:

XML
<mfrac>
   <mrow>
     <mi>-1</mi>
   </mrow>
   <mn>
     <mn>3</mn>
   </mn>
</mfrac>

 

Version 2.0.0.0 (2024-03-25)

The code has been rewritten, improved and expanded. For example, it now supports including equations in LaTex language such as the following:

latex
\begin{array}{*{20}c} {x = \frac{{ - b \pm \sqrt {b^2 - 4ac} }}{{2a}}} &
{{\text{when}}} & {ax^2 + bx + c = 0} \\ \end{array}

To this end, by putting the equation in LaTex in the MathML textbox and clicking on 'Convert to Plain Text' the Latex code is replaced by its translation to MathML and converted to plain text.

 

Version 2.0.5.0 (2024-04-17)

Some more samples have been added.

Arranges have been made so that square and curly brackets are reflected in the conversion.

ConvertStringToMathML.EvaluateFromInnerToOuter() is a new method and, as its name indicates, it examines from most inner paired parentheses and/or brackets to most outer.

 

Version 2.0.5.9 (2024-05-01)

More samples have been added.

Several bugs have been patched and some improvements have been implemented. For example, now tag <mlabeledtr> is taken into account.

Version 2.0.6.0 (2024-05-17)

Some more samples have been added. Pressing function keys F1, F2 and F12, respectively, goes to a sample number; advances to next sample; and sweeps samples from current sample (escape key 'Esc' cancels the sweep).

There are new features. Preserving parentheses using ⸨ ⸩ was enabled, preventing the application from deleting them. For example, when converting the text "(3)" to MathML the parentheses are discarded. Instead, the text "⸨3⸩" converted to MathML will preserve them.

Now, with (ascii 25) character, after and before a parenthesis, a MathML attribute stretchy='false' is achieved. So, text ∫⸨5*x+2*sin⸨↓x↓⸩⸩dx when translated into MathML will extend outer parentheses and not the inner ones around x.

Also, it's up to the user choose between MathJax's 2.7.2 and 3 version. Version 2 is older but accounts line breaks, while version 3 does not.

Version 2.0.13.0 (2024-07-11)

- Included some more examples.
- Several fixes and improvements have been taken.

Version 2.0.16.0 (2024-08-01)

Some fixes, more examples, and now distinguishes between <msubsup> and <munderover> tags (for ex., example 73)

Version 2.0.17.0 (2024-08-12)

- Several fixes.
- Javascript code update. (Current version is 2.0.17.2. There is a metatag in the HTML indicating the version.)

Version 2.0.20.0 (2024-08-26)

- More fixes.
- Attached a Notes.txt file.
- Updated all three zip files.

License

This article, along with any associated source code and files, is licensed under The MIT License