Introduction
I'm travelling through Africa using my home-build expedition truck. I wanted to publish a diary about that. For that I use the Mambo Content
Management System (CMS). The trouble with Mambo, and for example WordPress, is that to publish articles, one has to use the rather clumsy publishing interface
that those software projects offer. Especially when images must be included in the text, it starts to be very labor intensive.
There are other drawbacks, but this is not about complaining over other projects, let's focus on a solution.
What I wanted is using MS-Word documents as the base for articles that are published using the Mambo CMS.
This has the following advantages:
- Use of the MS-Word spell checker and thesaurus.
- Images can be modified in MS-Word: cropped, scaled, rotated, brightness and contrast control, and much more.
- Documents can be used for multiple purposes, for example, they can also be printed.
- Easy backup of the source of the web-site.
Also, because most of the time I don't have internet access, I have to work offline. Therefore I have an exact copy of the
website running locally. Articles that are meant to be published are first loaded on the local system and then have to be
exported, transported to the hosted site, and then imported. It would be possible to create an import file directly without using
a local installment, but I preferred to have a local preview. So what is the working schedule:
- Create an MS-Word document with article text, including images.
- Save the MS-Word document as a "Filtered HTML" file.
- This software (Text2Mambo) parses that HTML and injects the text into the local Mambo system, and images are copied
into the correct localhost location and the image paths are changed accordingly.
- The localhost text is reviewed
- A selected part of the Mambo system (the part that we just added or edited) is exported to an XML file.
- The XML file is FTP'ed to the remote hosted site including the images that belong to that part of the text.
- The XML file is imported at the remote site.
All of these tasks except the first two and the last are done by Text2Mambo. The first you must do yourself;
for the second task, I created an MS-Word macro that is included in the downloads, and for the last task, I made a PHP import script, which is also provided as a download.
But to be able to inject text into a Mambo system, there are some requirements on the text.
Mambo requirements
Articles for Mambo have to have the following minimum characteristics:
- A title
- An author (this must be a registered user on the Mambo system)
- A section
- A category
- A created date
There are additional characteristics:
- Modified date, initially the same as the created date; after that it is set to the current date/time when the text is processed by Text2Mambo. The modified ID is not set in any way.
- Title alias, Text2Mambo makes this the same as the title.
- Expire date. Text2Mambo doesn't implement this, articles never expire. But can easily be programmed into Text2Mambo.
- There is a toggle that makes an article a front page article or not.
- Version number is set to 1 and never changes. To use this, Text2Mambo must be changed, together with updating the Modified date.
- Ordering. Is determined using the publishing date.
Text2Mambo (the project name) has two systems to assign characteristics to articles:
- The default options (author, section, and category) that are selected in the
MamboCtrl
interface. - The possibility to supply the info in an article title (titles are indicated by the
h2
style). An article title can look like this:
Driving through Africa[a:John Janssen][c:Side stories][d:June 12, 2011][fp:yes]
This means the following:
- Article title: "Driving through Africa".
- Author: John Janssen.
- Category: Side stories (the corresponding section is automatically found).
- Publishing date: June 12, 2011.
- The article is put on the front page.
A full description of the possibilities with article headers is described in the accompanying manual.
Database
Mambo runs using MySQL as a database. Text2Mambo injects text into a localhost database. Access to the hosted MySQL database
from a remote station is mostly blocked due to security concerns. To compile the code, we must install the MySQL .NET connector, which is not included in the source download.
Look at the MySQL download site for the correct download.
For Text2Mambo to work, we got to know a name/password combination for the MySQL instance on which Mambo runs.
The main class for connecting to the Mambo database is static public class MamboConnection
, it is a static class with all the methods static because I only needed one
connection, so I don't have to maintain an instance of the class. MamboConnection
uses the same method of flexible table names as the Mambo source, each table name starts
with "#__", which at runtime is replaced with a prefix, ProgramSettings.MamboDatabasePrefix
. This setting can only be modified in the code.
Using this flexible prefix makes it possible, from a Mambo perspective, that multiple Mambo instances run on the same database but
using separate tables. To let Text2Mambo control multiple Mambo instances, we must make the ProgramSettings.MamboDatabasePrefix
setting user changeable.
User interface
The user interface is a tabbed control collection. Each tab page is created at runtime and filled at runtime with a single User Control.
These User Controls all derive from class MyUserControl
, which is an abstract class with two virtual methods Closing()
and
Loading()
. The former is called for all MyUserControl
derived UserControls in the tab pages when Text2Mambo
terminates, thus making it possible that those controls save their data.
The sequence of the tab pages is the sequence in which I developed them, not very logical, but it works.
- The first tab page is the database login control; once these credentials are entered and correct, then Text2mambo starts with showing the second page.
- The second page is the most important, it controls the import of the HTML document. There is a dropdown recent document list, a property page with loads
of settings, more settings, and the magical "Process File" button, just press it to see what happens.
- The "Advanced Setup" deals with "Document tags", there is a chapter about this somewhere in the text.
- The "Text Export" controls deal with creating the .XML export file and filling the "Sync Online" controls file list with the appropriate files.
- The "Other tools" is not very useful right now.
- The "Sync Online" is basically an FTP upload, with the difference that it only uploads files that the "Export" controls have set.
- "The "FTP Login" control is used to enter the FTP login credentials and test them. The "Syn Online" control uses those credentials to login.
- The "Codeproject text" is this text. It is a
WebBrowser
control that reads the text to display from an embedded resource, which is this text.
Security
Actually: none. The system is intended to be used by the owner of the web-site. The only thing required is that you know a name and password of the MySQL account on
which the Mambo system runs. Both the localhost and remote MySQL credentials must be known.
One warning: Text2mambo stores the MySQL and FTP username/password unencrypted in the Registry under HKCU\Software\TravelisFun\Text2Mambo.
Coding practices
At the start of this project, I had little C# knowledge, so this project became a kind of training exercise. I used as many C# features as I could find
that would be remotely useful to the project. This means that the project looks a bit messy and overdone.
The MS-Word export macro
I use MS-Word to create the text, but reading the .DOC or .DOCX files directly is difficult. However, MS-Word has the option of exporting a document as HTML,
that is plain ASCII so it is easy to parse. However, to export a file as HTML in MS-Word involves a lot of mouse clicking, so a macro to do the job has been made (see source later).
An additional advantage of using the "Filtered HTML" output is that images are stored separately in the size and format as they are visible in the document.
The images are either .JPG or .PNG; the moment the editor does something else to the image other than crop, resize, or some color changing, then MS-Word creates
a .PNG (which is much bigger than .JPG). The images are stored in a sub-folder of the folder where the HTML export document is stored.
With an English version of MS-Word, this folder is called "<documentname>_files", so if the document is called "Juni2008", then the folder with the images
is called "Juni2008_files". With other language versions, this image folder is called differently; for instance, the Dutch MS-Word version calls this folder
"<documentname>_bestanden". When Text2Mambo tries to find images, it looks for a sub-directory called "<documentname>_*"; if it finds multiple
ones, then a selection box is shown. Here is the macro for saving an MS-Word dcument as "Filtered HTML":
Sub SaveAsHtml()
If Application.Documents.Count < 1 Then
MsgBox "No documents are open"
Return
End If
OrgFileName = ActiveDocument.Name
OrgFolder = ActiveDocument.Path
NewPath = OrgFolder & Application.PathSeparator & "HTML"
Set fs = CreateObject("Scripting.FileSystemObject")
If Not fs.FolderExists(NewPath) Then
MkDir NewPath
End If
NewPath = NewPath & Application.PathSeparator
NewFilename = Left(OrgFileName, InStr(1, OrgFileName, ".") - 1)
NewFilename = NewFilename & ".html"
ActiveDocument.Save
ChangeFileOpenDirectory NewPath
Application.DisplayAlerts = wdAlertsNone
ActiveDocument.WebOptions.AllowPNG = False
ActiveDocument.WebOptions.TargetBrowser = msoTargetBrowserIE6
ActiveDocument.SaveAs FileName:=NewFilename, _
FileFormat:=wdFormatFilteredHTML, _
LockComments:=False, _
Password:="", _
AddToRecentFiles:=False, _
WritePassword:="", _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCELetter:=False
ActiveWindow.View.Type = wdWebView
ActiveDocument.Close
Application.DisplayAlerts = wdAlertsAll
ChangeFileOpenDirectory OrgFolder
RecentFiles(1).Open
End Sub
This macro creates, if not already present, an HTML folder as a sub-directory of the folder where the current document resides, and places the HTML export document in that folder.
Note that the HTML export file does not show up in the "Recent Document" list and the original document is opened again after exporting.
Because Text2Mambo has a command line interface, it is possible to call Text2Mambo from within the macro and have the exported text injected into your Mambo CMS straight away.
Something like this:
RetVal = Shell("D:\Dev\Text2Mambo\bin\Release\Text2Mambo.exe " & NewFilename, vbNormalNoFocus)
In my MS-Word, this macro is assigned to the Alt-F2 keyboard combination.
Document tags
I needed some extra functionality/formatting possibilities in the text, so I added "document tags". This is a system where pieces of text or images can
be given a certain style and that style is recognized by the Text2Mambo parser. I needed these four:
- Object: the possibility to insert pieces of HTML source into the text. For instance, to insert YouTube videos.
- Map: the possibility to insert GPS coordinates and link those to a mapping site, link maps.google.com.
- Link: the possibility to easily hyperlink certain text to external sources, like Wikipedia.org.
- Gallery: the possibility to combine various images into a slideshow.
For example: I'm using the word "nuclear" in my text and I want this text to be a hyperlink to Wikipedia. To do this, I have to set up a document tag style
in Text2Mambo of type link, I choose the style name to be AMWikiEN. In MS-Word, I have to create a style with name AMWikiEN that is based on style Normal
and that is of type "Linked". Especially this last setting is important; if the style in MS-Word is "Paragraph", then we can't select just one word in a paragraph to have a different style.
Now the word "Nuclear" can be selected and given the AMWikiEN style. When the text is parsed by Text2Mambo, it sees the AMWikiEN style and transforms
that word into a hyperlink that has been setup in Text2Mambo.
What do the other document tags do:
- Object: Marks a text as HTML source code. We can even inject Java or PHP code into the text using this tag.
- Map: This tag transforms a GPS coordinate in the form "N47 45.558 E54 23.457" into coordinates that maps.google can understand,
e.g.: "47.7593,54.3910", and places those coordinates in a hyperlink.
- Gallery: Any image that is tagged with the Gallery style in an article is collected and placed into a slide show at the bottom of the article.
To do this, some Java code is inserted into the text. The Java code looks like this:
<script type="text/javascript">
var image1=new Image() ;
image1.src="images/April2011_files/image001.jpg";
var image2=new Image();
image2.src="images/April2011_files/image002.jpg";
var image3=new Image();
image3.src="images/April2011_files/image003.jpg";
</script>
<br clear="left">
<img align="left" vspace="5" hspace="5"
src="images/April2011_files/image001.jpg" name="slide"/>
<script type="text/javascript">
var step=1;
function slideit()
{
if (!document.images)
return ;
document.images.slide.src=eval("image"+step+".src");
if (step < 3)
step++;
else
step = 1;
setTimeout("slideit()",2500)
}
slideit()
</script>
The Link style is the most versatile, not only can you make hyperlinks out of selected text, you can also use it to mark text with a certain style (from the Mambo stylesheet).
For instance: you want words in the text be marked up with the ".componentheading" style. To do this, make a document tag of type Link in Text2Mambo with style
name "AMComponent" and with the process string set to: "<p class="componentheading">{1}</p>". Then in MS-Word, create the AMComponet style (based on Normal and as a Linked style).
Select the words that must be marked up with .componentheading and process the document.
Parsing the source HTML
If we look inside the HTML that MS-Word creates, then we see a lot of garbage. Text2Mambo only parses what is between the <body></body>
tags. There is
a cleanup of, for example, all the <span lang=>
tags that MS-Word throws in the text. The HTMLProcessor
class takes care of all the processing.
Import on remote system
I provided the download with the PHP script that performs that task.
What the software does is take the mamboexport.xml file that Text2Mambo has created with part of the Mambo context and import that into the remote database.
Because this import script does not create users, sections, or categories, we must make sure that the users, sections, and categories that are present in
the exported context are already created on the remote system. The import script matches users based on their mos_users.name
property, matches sections based on
mos_sections.name
, and categories on the mos_categories.name
property and not on the ID. Therefore the user, section, and category records
for each of those that are used in the export context are included in the export XML.
You have to look in the script to see how things work.
This import thing presumes it is installed in a sub-folder of the Mambo root. So if Mambo is installed in "/htdocs" then this import script must be in "/htdocs/import".
Calling this import script requires typing in the address, like: http://yoursitename/import/import.php.
The import.php script uses configuration.php from Mambo like this: include_once("../configuration.php");
.
Using the code
You need to have a Mambo system installed, period.
To run this software, you have to install the MySQL .NET connector and have .NET 3.5 installed.
The project is a MS-Visual Studio 2010 project. I'm sorry for the older MS-VS users, but I don't know of a project conversion back to older MS-VS systems.
There are a lot of parameters involved in the system, especially on the part of the relative paths. Be smart and
you will figure it out. If you run the software for the first time, a lot of default paths will be set, they probably don't match your system, so modify them.
Points of interest
If you browse through the code, you may notice that all kinds of coding styles and practices are used that are maybe overdone or otherwise not appropriate for the situation,
this is on purpose. As this project was/is a study into C#, I used as many controls and techniques as possible. So I used:
- A tools DLL
- User controls
- LINQ statements
- Collections, Dictionaries, and Lists
WebBrowser
with content read from embedded resource- Delegates
- Multi threading
- Property sheets
- Extension Methods
- Lambda Expressions
- .NET MySQL
- FTP functionality
- Go see and find more
History
Because I'm travelling in Africa, I can't keep updating the software or this text very often. But I'll try.
The software is maintained here: http://software.travelisfun.org/text2mambo.
If you want to read the stories made using this system (Dutch only, but Google can translate), see here: http://stories.travelisfun.org.
- Oct. 23, 2011: Updated source code and added manuals for download.