Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Understanding GRML

0.00/5 (No votes)
11 Oct 2004 1  
The development of a markup language.

Introduction

HTML is the primary markup language used on the web. After its first release, it lacked many of the features taken for granted today on the web. It took many years for HTML to become what it is. In fact, almost four years passed between the first attempts at a markup language and HTML 2.0. In the years since 1995, HTML has continued changing. This demonstrates the commitment necessary to develop a markup language.

Before the development of HTML began, it needed software to test its features. The software developed was the precursor to the first HTML web browser. There is no way to test a markup language without having the software first. This demonstrates the requirements for developing a markup language. The software drives its development.

The purpose of this article is to show how General Reuse Markup Language, or GRML, developed into its current format. Examples are given to show the differences between GRML v1.0 and GRML v2.0 formats. The attributes of each markup language are described along with how they are used on the web.

Background

This is one in a series of articles on GRML. Before continuing, read the article, Introducing GRML. It provides an overview of existing file formats and markup languages, and explains why GRML was created.

If you are not interested in markup languages, potential alternative approaches, or web browser technology, this is not the article for you. This discussion is not suitable for anyone who feels HTML is the only way to browse the web.

The Beginning

The process of creating GRML was indirect. It began with a desire to create a front-end to extract content from web pages. The idea was to submit a web page request and retrieve the content in a format usable by a variety of applications. HTML displays content in one way, so it is not used by a variety of applications. Since the target web pages used HTML, the retrieved content needed to be available using another format.

The only solution to extract content from an HTML web page is to use an adapter. Adapters read data in one format and write them in another. This was the perfect solution, except for one thing. HTML web pages are described differently for every web page requested. There is no way to extract author information, or article text, or product descriptions without creating an adapter for each web page. There had to be a better way.

Building a web front-end

While trying to find a practical way to extract content from web pages, a front-end was being developed to display the content. A single adapter was developed to format HTML from a single web page into an informal format used by the front-end. This informal format was the initial step toward creating a markup language.

From June, 2002 until August, 2002, the front-end used a website adapter to convert HTML web pages to text, for display. There was no format, other than reading single lines of text from the adapter. As development continued, more adapters were added, until 6 were available. Web page requests sent from the front-end had to use one of these 6 adapters. There was no feature for users to enter their own web page request.

The first attempt

While building the web front-end, a form was needed for sending requests using input controls. This required a formal approach for handling requests to and responses from a web page. Using arbitrary lines of text was inefficient. This was the beginning of Personal Markup Language.

The new markup language had form support and provided a structure for formatting web page content. However, the form was limited. At first, the front-end created a form from the first web page request. There was no way to display another form. To allow the markup language to create a form for each web page request, the front-end was updated.

Upgrading the format

With form support, the front-end now sent web page requests from input controls and created a form from web page responses, when necessary. The only feature missing was a way to organize web page content into groups, and display each group of content separately in the front-end. This required a new markup language. It was the beginning of the Simple Markup Language (SML).

When the front-end displays content from a web page, it is called a dimension. Splitting content into different groups creates a dimension for each group. The front-end needed to display different dimensions of content for forecasting, logistics, and data analysis. Once this support was added, the front-end displayed multidimensional views.

As the markup language was being developed, there was one constant. The front-end did not allow the user to directly enter a web page request. A user had to choose from the 6 web page requests used by the front-end. Or, submit a request using a form. Once this limitation was removed, the markup language had to be completely redesigned. This new markup language was the first version of GRML.

GRML version 1.0

Completed January 2003, GRML supported form input controls, columns, and results. There was multidimensional support and it used the concept of "web applications". Each represented an activity that a user performs on the web. The first GRML web browser had "web applications" for using a search engine, getting news headlines, viewing auction listings, and doing a job search.

"Web applications" were a holdover from the days of the web front-end, when submitting a web page request or opening a file were not supported. While the web browser allowed web page requests, they had to be to one of the "web applications" if the web page was to be displayed.

The reason for "web applications" is to use content from HTML web pages in GRML web browsers. Since HTML web pages are abundant and GRML is new, it is advantageous to have the ability to adapt the HTML to GRML. The "web applications" do this.

An example of "web applications" in GRML.

<GRML> 
<a class=navi_13 name=AUCT type=title>Auctions</> 
<a class=navi_13 name=JOBS type=title>Job Search</> 
<a class=navi_13 name=SRCH type=title>Search Engine</> 

<a class=navi_13 name=AUCT type=location>127.0.0.1/auc.asp?search2=</> 
<a class=navi_13 name=JOBS type=location>127.0.0.1/jobs.asp?search2=</> 
<a class=navi_13 name=SRCH type=location>127.0.0.1/parse.asp?search2=</> 

<a class=hist_13 type=item>127.0.0.1/startup.asp</> 
<a class=hist_13 type=item>127.0.0.1/over.asp</> 
</GRML> 

GRML was designed to be used by many different browsers. It was not possible to test this capability since only one GRML web browser existed. As other browsers were created and the markup language developed, GRML moved to version 1.1 in the first four months of 2003.

The next major upgrade to GRML occurred when resolving the problem of "web applications."

GRML version 1.2

One limitation of the "web application" approach was the need for a separate web page for each "web application" and each website used. Since there are millions of websites, it was impractical to create a "web application" for each one. Another problem was keeping the "web application" updated if the website changed. If having to support millions of websites was difficult, trying to keep them updated was practically impossible. GRML needed to be changed.

During March, 2004, everything related to "web applications" was removed from GRML. This allowed the markup language to focus on form input controls, columns, and results. With the "web applications" removed, it was now possible to read any web page using more generic and consistent web adapters.

An example of GRML version 1.2 follows:

<a class=edit_13 name=url1 type=title>Enter URL:</> 
<a class=edit_13 name=url1 type=location>http://127.0.0.1/links.asp</> 

<a class=column_13 type=item>Title</> 
<a class=column_13 type=item>Result</> 

<a control=result_13 type=item>RIAA, MPAA Ask High Court To Review</> 
<a control=result_13 type=item>It's official: Hollywood studios and record 
  companies on Friday asked the United States Supreme Court to overturn a  
  controversial series of recent court decisions that have kept file-swapping  
  software legal." (Previous /. coverage here.)</> 
<a control=result_13 type=link>http://slashdot.org/article.pl?sid=04/10/11/1846208</>

GRML version 1.2 was the last of the 1.x releases of GRML. During the next six months of use, it set the stage for another change in the syntax of the markup language.

GRML version 2.0

The initial versions of GRML worked well on the web and the local file system. It allowed the development of many different web browsers that use its form and column/results approach. Other than removing "web applications", the syntax for GRML did not change much from the 1.0 to 1.2 versions. Issues of speed, control, and reliability were not considered. However, this changed with GRML version 2.0.

This version of GRML was designed to create small file sizes, handle file and web page content using fewer browser resources, and allow more options for arranging file and web page content. The old syntax was completely abandoned in favor of smaller tags and more specific tag keywords. The sample GRML from version 1.2 looks as follows in 2.0:

<edit url1> 
<location>Enter URL: 
<title>http://127.0.0.1/links.asp 
</edit> 

<column> 
<Title> 
<Description> 
<Link> 
</column> 

<result> 
<Title>RIAA, MPAA Ask High Court To Review 
<Descriptiong>The Hobo writes "It's official: Hollywood studios and record 
   companies on Friday asked the United States Supreme Court to overturn a 
   controversial series of recent court decisions that have kept 
   file-swapping software legal." (Previous /. coverage here.) 
<link>http://slashdot.org/article.pl?sid=04/10/11/1846208 
</result>

Using the GRML 2.0 syntax, tags drop to a third of their size from version 1.2. In addition, there are no problems with cutting content off when a web page has a very long line. In version 1.2, the content would not be read because the end tag must be on the same line as the start tag. With very long content, the end tag would appear on the following line with the remaining content. This problem is solved in version 2.0 because it uses no end tag.

It is possible to organize columns and results using version 2.0 that is not possible with version 1.2. Using the column tag in the above GRML, the column display order is set by listing the top column as first and the bottom column as last. It does not matter how the results are ordered. If there are five column items, and the third should be displayed first, then place it at the top of the column tag items.

When a result item has a column specified in its tag, it is only displayed if that column appears between the <column> ... </column> tags. If it is necessary to only display one column of results, then specify only that column. Or, specify any number of columns and only display those from the results. This was not possible with previous versions of GRML.

Conclusion

GRML has moved through many versions since its first release, January 2003. It has moved from a "web application" markup language to a web page markup language. With version 2.0, it has the smallest, fastest, and most flexible syntax of any version released.

With its support for form input controls, columns, and results, GRML is able to support many web browsers and provide a view of its content that is appropriate regardless of the browser used.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here