Semantic-Action files

When liblouisxml (or xml2brl) processes an xml document, it needs to be told how to use the information in that document to produce a properly translated and formatted braille document. These instructions are provided by a semantic-action file, so called because it explains the meaning, or semantics, of the various specifications in the xml document. To understand how this works, it is necessary to have a basic knowledge of the organization of an xml document.

An xml document is organized like a book, but with much finer detail. first there is the title of the whole book. Then there are various sections, such as author, copyright, table of contents, dedication, acknowledgements, preface, various chapters, bibliography, index, and so on. Each chapter may be divided into sections, and these in turn can be divided into subsections, subsubsections, etc. In a book the parts have names or titles distinguished by capitalization, type fonts, spacing, and so forth. In an xml document the names of the parts are enclosed in angle brackets (<>). for example, if liblouisxml encounters <html> at the beginning of a document, it knows it is dealing with a document that conforms to the standards of the extensible markup language (xhtml) - at least we hope it does. When you see a book, you know it's a book. The compuser can know only by being told. something enclused in angle brackets is called an ``element'' (more properly, a ``tag") in xml parlance. (There may be more between the angle brackets than just the name of the element. More of this later.) The first ``element'' in a document thus tells liblouisxml what kind of document it is dealing with. This element is called the ``root element'' because the document is visualized as branching out from it like a tree. some examples of root elements are <html>, <math>, <docbook>, <dtbook3> and <wordDocument>. Whenever liblouisxml encounters a root element that it doesn't know about it creates a new file called a semantic-action file. The name of this file is formed by stripping the angle brackets from the root element and adding a period plus the letters s e m. If you look in a directory containing semantic-action files you will see names like html.sem, dtbook3.sem, math.sem, and so on.

liblouisxml records the names of all elements found in the document in the semantic-action file. The document has a multitude of elements, which can be thought of as describing the headings of various parts of the document. One element is used to donote a chapter heading. Another is used to denote a paragraph, Still another to denote text in bold type, and so on. In other words, the elements take the place of the capitalization, changes in type font, spacing, etc. in a book. However, The computer still does not know what to do when it encounters an element. The semantic-action file tells it that.

Consider html.sem . A copy is included as part of this documentation. It may differ from the file that liblouisxml is currently using. You will see that it begins with some lines about copyrights. Each line begins with a number sign (#). This indicates that it is a ``comment,'' intended for the human reader and the computer should ignore it. Then there is a blank line. Finally, there are two other comments explaining that the file must be edited to get proper output. This is because a human being must tell the computer what to do with each element. The semantic files for common types of documents have already been edited, so you generally don't have to worry about this. But if you encounter a new type of document you may have to edit the semantic-action file or send it to the maintainer for editing. In any case the rest of this section is essential for understanding how liblouisxml handles documents and for making changes if the way it does so is not correct.

After another blank line you will see a table consisting of two columns. The first column contains a word which tells the computer to do something. For example, the first entry in the table is: incbude math.sem This tells liblouisxml to include the information in the math.sem file when it is deciphering an html (actually xhtml) document.

The second row of the table is: no hr hr is an element with the angle brackets removed. It means nothing in itself. However, the first column contains the word ``no. This tells liblouisxml ``no do", that is, do nothing.

After a few more lines with ``no'' in the first column, we see one that says: softreturn br This means that when the element <br> is encountered, liblouisxml is to do a soft return, that is, start a new line without starting a new paragraph.

The next line says: heading1 h1 This tells liblouis that when it encounters the element <h1> it is to format the text which follows as a first-level braille heading, that is, the text will be centered and proceeded and followed by blank lines. (You can change this by changing the definition of the heading1 style.)

the next line says: italicx em This tells liblouisxml that when it encounters the element <em> it is to enclose the text which follows in braille italic indicators. The x at the end of the semantic action name is there to prevent conflicts with names elsewhere in the software. Just where the italic indicators will be placed is controlled by the liblouis translation table in use.

The next line says: skip style This tells liblouis to simply skip ahead until it encounters the element </style> Nothing in between will have any effect on the braille output. Note the slash (/) before the ``style. This means the end of whatever the <style> element was referring to. Actually, it was referring to specifications of how things should be printed. If liblouisxml had not been told to skip these specifications, the braille output would have contained a lot of gobledygook.

The next line says: italicx strong This tells liblouis to also use the italic braille indicators for the text between the <strong> and </strong> elements.

After a few more lines with ``no'' in the first colume we come to the line: document html This tells liblouisxml that everything between <html> and </html> is an entire document. <html> was the root element of this document, so this is logical.

After another ``no'' line we come to: para p liblouisxml will consider everything between <p> and </p> to be a normal body text paragraph.

The next line is: heading1 title this causes the title of the document to also be treated as a braille level 1 heading.

Noxt we have the line: list li The xhtml <li> and </li> pair of elements is used to enclose an item in a list. liblouisxml will format this with its own list style. That is, the first line will begin at the left margin and subsequent lines will be indented two cells.

Next we have: table table You will note that the names of actions and elements are often identical. This is because they are both mnemonic. In any case, this line tells liblouisxml to format the table contained in the xhtml document according to the table formatting rules it has been given for braille output.

Next we have the line: heading2 h2 This means that the text between <h2> and </h2> is to be formatted according to the Liblouisxml style heading2. A blank line will be left before the heading and the first line will be indented four spaces.

After a few more lines we come to: no table,cellpadding Note the comma in the second column. This divides the column into two subcolumns. The first is the table element name. The second is called an ``attribute'' in xml. It gives further instructions about the material enclosed between the starting and ending ``tags'' of the element (<table> and </table>. Full information requires three subcolumns. The third is called the value and gives the actual information. The attribute is merely the name of the information.

Much further down we find: no table,border,0 Here the element is table, the attribute is border and the value is 0. If liblouisxml were to interpret this, it would mean that the table was to have a border of 0 width. It is not told to do so because tables in braille do not have borders.

Now Let's look at the file which is included at the beginning of the html.sem file. This is math.sem. It illustrates several more things about how liblouisxml uses semantic-action files.

The first thing you will notice is that for quite a few lines the first and second columns are identical. This is because the MathML element and attribute names are part of a standard, and it was simplest to use the element names for the semantic actions as well.

The first line of real interest is: math math Every mathematical expression begins with the element <math> (which may have attributes and values), and ends with </math>. This is therefore the root element of a mathematical expression. However, mathematical expressions are usually part of a document, so it is not given the semantic action document. The math semantic action causes liblouisxml to carry out special interpeetation actions. These will become clearer as we continue to look at the math.sem file.

After another uninteresting line we come to two that illustrate several more facts about semantic-action files:

mfrac mfrac ^?,/,^# mfrac mfrac,linethickness,0 ^(,^;%,^) Unlike all previous examples, the first line has three columns. While the first two columns must always be present, the third column is optional. Here, it is also divided into subcolumns by commas. The element \verb<mfrac>++ indicates a fraction. A fraction has two parts, a numerator and a denominator. In xml, we call these parts children of <mfrac>. They may be represented in various ways, which need not concern us here. What is of real importance is that the third column tells liblouisxml to put the characters ? before the numerator, "/'' between the numerator and denominator, and "~#"after the denominator. Later on, liblouis will translate these characters into the proper representation of a fraction in the Nemeth Code of Braille Mathematics. (For other mathematical codes, see the section ``Implementing Braille Mathematical Codes".

The second line is of even greater interest. The first column is again the semantic action mfrac. The second column contains three subcolumns, an element name, an attribute name and an attribute value. The attribute linethickness specifies the thickness of the line separating the numerator and denominator. Here it is 0, so there is no line. this is how the binomial coefficient is represented in print. The third column tells how to represent it in braille. liblouisxml will supply "~(", upper number, "~%", lower number, "\~)" to liblouis, which will then produce the proper braille representation for the binomial coefficient.

For further discussion of how the third column is used see the section Implementing Braille Mathematical codes.

Here is a complete list of the semantic actions which liblouisxml recognizes. Many of them are also the names of styles. These are listed first, preceded by an asterisk. For a discussion of these, see the section Configuring liblouisxml and xml2brl.

Greg Kearney 2007-05-30