Just a Theory

Trans rights are human rights

Posts about Markdown

Plain Text Figures

A couple weeks ago, I implemented JSON Feed for Just a Theory (subscribe here). A nice feature of the format is that support for plain text content in addition to the expected HTML content. It reminds me of the Daring Fireball plain text feature: just append .text to any post to see its Markdown representation, like this. I’m a sucker for plain text, so followed suit. Now you can read the wedding anniversary post in plain text simply by appending .text to the URL (or via the JSON Feed).

Markdowners will notice something off about the formatting: the embedded image looks nothing like Markdown. Here it is:


{{% figure
  src     = "dance.jpg"
  title   = "dance.jpg"
  alt     = "First Dance"
  caption = "First dance, 28 May 1995."
%}}

This format defines an HTML figure in the Hugo figure shortcode format. It’s serviceable for writing posts, but not beautiful. In Markdown, it would look like this:

![First Dance](dance.jpg "First Dance")

Which, sadly, doesn’t allow for a caption. Worse, it’s not great to read: it’s too similar to the text link format, and doesn’t look much like an image, let alone a figure. Even Markdown creator John Gruber doesn’t seem to use the syntax much, preferring the HTML <img> element, as in this example. But that’s not super legible, either; it hardly differs from the shortcode format. I’d prefer a nicer syntax for embedded images and figures, but alas, Markdown hasn’t one.

Fortunately, the .text output needn’t be valid Markdown. It’s a plain text output intended for reading, not for parsing into HTML. This frees me to make figures and images appear however I like.

Framed

Still, I appreciate the philosophy behind Markdown, which is best summarized by this bit from the docs:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

So how do you make an embedded image look like an image without any obvious tags? How about we frame it?

        {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
        {                                                          }
        {                      [First Dance]                       }
        {  https://justatheory.com/2018/05/twenty-three/dance.jpg  }
        {                                                          }
        {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
        {  First dance, 28 May 1995.                               }
        {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}

Think of the braces and tildes like a gilded frame. In the top section, we have the bracketed alt text like a descriptive card, followed by the image URL. Below the image area, separated by another line of tildes, we have the caption. If you squint, it looks like an image in a frame, right? If you want to include a link, just add it below the image URL. Here’s an example adapted from this old post:

  {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
  {                                                                         }
  {                       [*Vogue* on the new iPad]                         }
  {   https://justatheory.com/2012/03/conde-nast-ipad/vogue-ipad-retina.jpg }
  {      (https://www.flickr.com/photos/theory/7007813933/sizes/l/)         }
  {                                                                         }
  {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
  {  Image content from *Vogue* on the new iPad. Not shown: the second that }
  {  that it's blurry while the image engine finishes loading and           }
  {  displaying the image.                                                  }
  {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}

The link appears in parentheses (just like in the text link format). The format also preserves the alt text and caption Markdown formatting. Want to include multiple images in a figure? Just add them, as long as the caption, if there is one, appears in the last “box” in the “frame”:

  {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
  {                                                                            }
  {                   [*The New Yorker* on the 1st gen iPad]                   }
  {      http://localhost:1313/2012/03/conde-nast-ipad/new-yorker-ipad-1.jpg   }
  {         (https://www.flickr.com/photos/theory/6861697774/sizes/o/)         }
  {                                                                            }
  {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
  {                                                                            }
  {         [*The New Yorker* on the 3rd gen iPad with retina display]         }
  {   http://localhost:1313/2012/03/conde-nast-ipad/new-yorker-ipad-retina.jpg }
  {         (https://www.flickr.com/photos/theory/7007813821/sizes/o/)         }
  {                                                                            }
  {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
  {  Text content from *The New Yorker* on the first generation iPad (top)     }
  {  and the third generation iPad with retina display (bottom). Looks great   }
  {  because it's text.                                                        }
  {~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}

You can tell I like to center the images, though not the caption. Maybe you don’t need a caption or much else. It could be quite minimal: just an image and alt text:

        {                      [First Dance]                       }
        {  https://justatheory.com/2018/05/twenty-three/dance.jpg  }

Here I’ve eschewed the blank lines and tildes; the dont’ feel necessary without the caption.

This format could be parsed reasonably well, but that’s not really the goal. The point is legible figures that stand out from the text. I think this design does the trick, but let’s take it a step further. Because everything is framed in braces, we might decide to put whatever we want in there. Like, I dunno, replace the alt text with an ASCII art1 version of the image generated by an conversion interface? Here’s my wedding photo again:

{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
{                                                                                    }
{  NNNmmdsyyyyyyysssssdhyssooo+++++++++++++ooymNNNNyo+/::::-----------------------:  }
{  NNmNmdssyyyyyssssssdhyssooo++++///++++++ooymNmmmyo+/:::--------...-------------:  }
{  mmddddsyyyyyyssssssdhyssoo++++/////++++osydmmmmNyo+/::--------.....-------------  }
{  Nmddmmsyyyyyyssssssdhysooo+++///////++osso+oymNNyo+/::--------.......-----------  }
{  mmmmmmyyyyyyysssssshhysooo+++//////+ys+//:///sdmho+/::-------..........---------  }
{  mmmmmmyyyyyyssssosshhysooo+++////+ydmy/:/:/+ossydo+/::------...........---------  }
{  mmmmmmyyyyyysssoosshhysosshdy+/+odmNNmdyddmmNNmdmms+::------............--------  }
{  mmmmmmyyyyyyssooosshdhyso:/ymhhdmNNNNNmyhNNNNNNNNNmmo:------.............-------  }
{  mdddmmhyyyysssooossdmdmho.-hmmNNNNNNNmdyhmNNNNNNNNNmh+/+/--..............-------  }
{  mmmmddhyyyysssoooymmNmNmo--yNNmNmmmmmhhyhdhydNNNNmmmdysshy:..............-------  }
{  mmmddmhyyyssssoosdNNNNNmssydmNddhssossyyhs::+ssyhmmh+///ohh-..............------  }
{  Nmmmmmdyyyssssoohhdmddhs:-:hdhyhdso+++///--::/:::+o/://oosy:-.............------  }
{  NNNmmmdyyyssssosdhhyyh+//oohdmmmh///+/::::::---:++/://+hddmdho:...........------  }
{  NNNmmmdyyyssssosmmdmdy+.-/mmdmho+//////::::::/sddddhs/.:sdmmmmy-..........------  }
{  NNmmmmdhyyssssooydmmd+/+sydNmmh+/+yddyo/://oydmmmmmmdy:..:ymds:............-----  }
{  mmmmmmdhyysssoooo+oo+ohdmmmmmddhhsyddddysyddddmmmdddho-..`./h/.............-----  }
{  mmNNNNmhyysssoooo:-/.-ymmmmmmmmmmmNmdddmmmdmdddhhhhs-```````/h-............-----  }
{  NNNNNNmhyysssooymddmddmmmdddddmdmmNmyddmddddhhhhhy:`   ` ```.oo............-----  }
{  NNNNNNmhyyssssymmmNNmmmddhhddddmddddhddmdhddhhhhs-     ` ```.:y............-----  }
{  NNmdmNmhyysssydNmmNmmmddddddddddddhdddhddhhhhhho.      ``````.y............-----  }
{  mmmddmmdhyssssdmmhdmmmddddddhhdhhhhhddhhhdhhho-`       `` ```.s...........------  }
{  ddddhdddhyysssymNmmmmmddddddddhhdhhhhdddddy+.``        `` ```.s............-----  }
{  NNNmmddhyyssssosdddmmmmdddddddhhdddmmmmd+..```        `` ````-s............-----  }
{  NNNmmhysssssssooooymmmmdddmmmdhhdmdmdddd/``.`        ``  ```.-o...........------  }
{  mNNmddhhysssssssssydmmmmdmmmmmddmmddddhdd+``        `` `````.-/...........------  }
{  NNNmmddhhhhyyyyyyyyhmmmmmmmmmmddddddddhhy.``    ` ``` ``````./-...........------  }
{  NNNmmmysssssssssssssdmmmmmmmddddddddddh+.`     ` ```   `````.+...........-------  }
{  mmmNmmhyyyyyhhhhhhhhdmmmmmmddddddddddy:`````` `````   ``````-:...........-------  }
{  mmmmmmmmmmmmmmmmmmddhdmmmddddddddddy/.`````` ````       ```./...........-------:  }
{  mmmmmmmmmmmmmmmmmmhyssdmmmmdddddho:.``````````-```  ``````./:...........-------:  }
{  mmmmmNmmmmmmmmmddddhysydmmddho:-...`````````:oh/```` ````.-:............-------:  }
{  mmmmNNmmmmmmmmmmmmmddyoydo:.``.`````````.:+ydddh-```````-/--............-------:  }
{  NNNNNNNmmmmmmmmmmmmdyyyoo.````...`````:ohdmdddddh+oosyyhdmo--..........--------:  }
{  NNNNNNNNmmmmmmmNmmmmhys+//-```...`.-+yddddmmmddddmmmmmmmmmm+--.......----------:  }
{  NNNNNNNNNmmmmNNNNNmmsyysohdyosyhyyhddddddddmmmmmdmmmmmmmmmmh--.......---------::  }
{  NNNNNNNNNNNNNNNNNNNNyyhhdmmdddmmdddddmddddmmmmmmmmmmmmmmmmmd-----------------:::  }
{  NNNNNNNNNNNNNNNNNNNNmmmmmmmmdddddddmmmmmmmmmmmmmmmmmmmmmmmmy-----------------:::  }
{  NNNNNNNNNNNNNNNNNNNNmmmmmmmmddddddmmmmmmmmmmmmmmmmmmmmmmmmm+-----------------:::  }
{  NNNNNNNNNNNNNNNNmNNNmmmmmmmmmdmdmmmmmmmmmmmmmmmmmmmmmmmmmmh:----------------::::  }
{  NNNNNNNNNNNNNNNNNNNmmmmmmdmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmo----------------:::::  }
{  NNNNNNNNNNNNNNNNNNNmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm/---------------::::::  }
{  NNNNNNNNNNNNNNNNNNNNNNmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmNNmy::::::::--:-:::::::://  }
{  NNNNNNNNNNNNNNNNNNNNNNmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmNmmo///:::::::::::::::////  }
{  NNNNNNNNNNNNNNNNNNNNNNmdddmmmmmmmmmmmmmmmmmmmmmmmmmmmNmmy///////////////////////  }
{  NNNNNNNNNNNNNNNNNNNNNNmdddddddddddddmmmmmmmmmmmmmmmmmNmm/::::::::::::::::::::::/  }
{  NNNNNNNNNNNNNNNNNNNNNNmdddddddddhddddddmmmmmmmmmmmmmmmmy::::::::::--:::::::::///  }
{  NNNNNNNNNNNNNNNNNNNNNNmmdddddddddddddddmmmmmmmmmmmNNNmNyo++++/////////////++++oo  }
{  NNNNNNNNNNNNNNNNNNNNNNmmdddddddddddddddmmmmmmmmmmNNNmmNhyyyyyyssssssssssssssssss  }
{  NNNNNNNNNNNNNNNNNNNNNmmmdddddddddddddddmmmmmmmmmNmNNmmmysssssooooo+++++/////////  }
{  NNNNNNNNNNNNNNNNNNNNNmmmmddddddddddddddmmmmmmmNNNNNmmmd/::::::::::::::://///////  }
{  NNNNNNNNNNNNNNNNNNNNNmmmmdmddddddddddddmmmmmmNNNmNmmmmd::::::::::::::://////////  }
{  NNNNNNNNNNNNNNNNNNNNNmmmmdmmmdddddddddmmmmmmNmmmNmmmmmh::::::::://///////++os+++  }
{  NNNNNNNNNNNNNNNNNNNNNmmmmdmmmmddddddddmmNmNNmmmNmNNmmNh/::::///////////+oo+++++o  }
{  NNNNNNNNNNNNNNNNNNNNmmmmmmmddddddddddmmNmmmmmmmNNNNmmNy////////+oossyyhhhdddmmmN  }
{                                                                                    }
{               https://justatheory.com/2018/05/twenty-three/dance.jpg               }
{                                                                                    }
{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
{  First dance, 28 May 1995.                                                         }
{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}

Silly? Maybe. But I’m having fun with it. I expect to wrangle Hugo into emitting something like this soon.

Update 2023-08-05: Changed the location of plain text files for each post from $slug/copy.text to $slug.text, and updated the links in this post to reflect that change. In other words, the plain text file for this post is now plain-text-figures.text.


  1. Surely someone has come up with a way to improve on ASCII art by using box elements or something? ↩︎

Encoding is a Headache

I have to spend way too much of my programming time worrying about character encodings. Take my latest module, Text::Markup for example. The purpose of the module is very simple: give in the name of a file, and it will figure out the markup it uses (HTML, Markdown, Textile, whatever) and return a string containing the HTML generated from the file. Simple, right?

But, hang on. Should the HTML it returns be decoded to Perl’s internal form? I’m thinking not, because the HTML itself might declare the encoding, either in a XML declaration or via something like

<meta http-equiv="Content-type" content="text/html;charset=Big5" />

And as you can see, it’s not UTF-8. So decoded it would be lying. So it should be encoded, right? Parsers like XML::LibXML::Parser are smart enough to see such declarations and decode as appropriate.

But wait a minute! Some markup languages, like Markdown, don’t have XML declarations or headers. They’re HTML fragments. So there’s no wait to tell the encoding of the resulting HTML unless it’s decoded. So maybe it should be decoded. Or perhaps it should be decoded, and then given an XML declaration that declares the encoding as UTF-8 and encoded it as UTF-8 before returning it.

But, hold the phone! When reading in a markup file, should it be decoded before it’s passed to the parser? Does Text::Markdown know or care about encodings? And if it should be decoded, what encoding should one assume the source file uses? Unless it uses a BOM, how do you know what its encoding is?

Text::Markup is a dead simple idea, but virtually all of my time is going into thinking about this stuff. It drives me nuts. When will the world cease to be this way?

Oh, and you have answers to any of these questions, please do feel free to leave a comment. I hate having to spend so much time on this, but I’d much rather do so and get things right (or close to right) than wrong.

Looking for the comments? Try the old layout.

RFC: A Simple Markdown Table Format

I’ve been thinking about markdown tables a bit lately. I’ve had in mind to follow up on my definition list proposal with a second proposal for the creation and editing of simple tables in Markdown. For better or for worse, an aside on the markdown-discuss mail list led to a longish thread about a syntax for continuing lines in tables (not to mention a long aside on the use of monospaced fonts, but I digress), wherein I realized, after an open-minded post from MultiMarkdown’s Fletcher Penney, that I needed to set to working up this request for comments sooner rather than later.

Requirements

All of which is to say that this blog entry is a request for comments on a proposed syntax for simple tables in Markdown. The requirements for such a feature, to my thinking, are:

  • Simple tables only
  • Formatting should be implicit
  • Support for a simple caption
  • Support for representing column headers
  • Support for left, right, and center alignment
  • Support for multicolumn cells
  • Support for empty cells
  • Support for multiline (wrapping) cells
  • Support for multiple table bodies
  • Support for inline Markdown (spans, lists, etc.)
  • Support for all features (if not syntax) of MultiMarkdown tables.

By “simple tables” in that first bullet, I mean that they should look good in 78 character-wide monospaced plain text. Anything more complicated should just be done in XHTML. My goal is to be able to handle the vast majority of simple cases, not to handle every kind of table. That’s not to say that one won’t be able to use the syntax to create more complicated tables, just that it might not be appropriate to do so, and many more advanced features of tables will just have to be done in XHTML.

And by “implicit formatting” in the second bullet, I mean that the syntax should use the bare minimum number of punctuation characters to provide hints about formatting. Another way to think about it is that formatting hints should be completely invisible to a casual reader of the Markdown text.

Most of the rest of the requirements I borrowed from MultiMarkdown, with the last bullet thrown in just to cover anything I might have missed. The MultiMarkdown syntax appears to be a superset of the PHP Markdown Extra syntax, so that’s covered, too.

Prior Art: Databases

When I think about the display of tables in plain text, the first piece of prior art I think of is the output from command-line database clients. Database developers have been thinking about tables since, well, the beginning, so it makes sense to see what they’re doing. So I wrote a bit of SQL and ran it in three databases. The SQL builds a table with an integer, a short name, a textual description, and a decimal number. Here’s the code:

CREATE TEMPORARY TABLE widgets (
    id          integer,
    name        text,
    description text,
    price       numeric(6,2)
);

INSERT INTO widgets VALUES( 1, 'gizmo', 'Takes care of the doohickies', 1.99);
INSERT INTO widgets VALUES( 2, 'doodad', 'Collects *gizmos*', 23.8);
INSERT INTO widgets VALUES( 10, 'dojigger', 'Handles:
*   gizmos
*   doodads
*   thingamobobs', 102.98);
INSERT INTO widgets VALUES(1024, 'thingamabob', 'Self-explanatory, no?', 0.99);

SELECT * FROM widgets;

My goal here was to see how the database client would format a variety of data formats, as well as a textual column (“description”) with newlines in it (and a Markdown list, no less!), as the newlines will force the output to appear on multiple lines for a single row. This is one of the features that is missing from the existing Markdown implementations, which all require that the text all be on a single line.

The first database client in which I ran this code was psql 8.3, the interactive terminal for PostgreSQL 8.3. Its output looks like this:

  id  |    name     |         description          | price  
------+-------------+------------------------------+--------
    1 | gizmo       | Takes care of the doohickies |   1.99
    2 | doodad      | Collects *gizmos*            |  23.80
   10 | dojigger    | Handles:                     | 102.98
                    : * gizmos                       
                    : * doodads                      
                    : * thingamobobs                 
 1024 | thingamabob | Self-explanatory, no?        |   0.99

As you can see, PostgreSQL properly right-aligned the integer and numeric columns. It also has a very nice syntax for demonstrating continuing lines for a given column: the colon. The colon is really nice here because it looks kind of like a broken pipe character, which is an excellent mnemonic for a string of text that breaks over multiple lines. Really, this is just a very nice output format overall.

The next database client I tried was mysql 5.0, the command-line client for MySQL 5.0. Its output looks like this:

+------+-------------+--------------------------------------------+--------+
| id   | name        | description                                | price  |
+------+-------------+--------------------------------------------+--------+
|    1 | gizmo       | Takes care of the doohickies               |   1.99 | 
|    2 | doodad      | Collects *gizmos*                          |  23.80 | 
|   10 | dojigger    | Handles:
* gizmos
* doodads
* thingamobobs | 102.98 | 
| 1024 | thingamabob | Self-explanatory, no?                      |   0.99 | 
+------+-------------+--------------------------------------------+--------+

Once again we have very good alignment of the numeric data types. Furthermore, MySQL uses exactly the same syntax as PostgreSQL to represent the separation between column headers and column rows, although the PostgreSQL version is a bit more minimalist. The MySQL version just hast a little more stuff in it

Where the MySQL version fails, however, is in the representation of the continuing lines for the “dojigger” row. First of all, it set the width of the “description” column to the longest value in that column, but since that longest value includes newlines, it actually ends up being much too long—much longer than PostgreSQL’s representation of the same column. And second, as a symptom of that problem, nothing special is done with the wrapped lines. The newlines are simply output like any other character, with no attempt to line up the column. This has the side effect of orphaning the price for the “dojigger” after the last line of the continuing description. So its alignment is shot, too.

To be fair, PostgreSQL’s display featured almost exactly the same handling of continuing columns prior to version 8.2. But I do think that their solution featuring the colons is a good one.

The last database client I tried was SQLite 3.6. This client is the most different of all. I set .header ON and .mode column and got this output:

id          name        description                   price     
----------  ----------  ----------------------------  ----------
1           gizmo       Takes care of the doohickies  1.99      
2           doodad      Collects *gizmos*             23.8      
10          dojigger    Handles:
* gizmos
* doodads
  102.98    
1024        thingamabo  Self-explanatory, no?         0.99      

I don’t think this is at all useful for Markdown.

Prior Art: MultiMarkdown

Getting back to Markdown now, here is the MultiMarkdown syntax, borrowed from the documentation:

|             |          Grouping           ||
First Header  | Second Header | Third Header |
 ------------ | :-----------: | -----------: |
Content       |          *Long Cell*        ||
Content       |   **Cell**    |         Cell |

New section   |     More      |         Data |
And more      |            And more          |
[Prototype table]

There are a few interesting features to this syntax, including support for multiple lines of headers, multicolumn cells alignment queues, and captions. I like nearly everything about this syntax, except for two things:

  1. There is no support for multiline cell values.
  2. The explicit alignment queues are, to my eye, distracting.

The first issue can be solved rather nicely with PostgreSQL’s use of the colon to indicate continued lines. I think it could even optionally use colons to highlight all rows in the output, not just the continuing one, as suggested by Benoit Perdu on the markdown-discuss list:

  id  |    name     |         description          | price  
------+-------------+------------------------------+--------
    1 | gizmo       | Takes care of the doohickies |   1.99
    2 | doodad      | Collects *gizmos*            |  23.80
   10 | dojigger    | Handles:                     | 102.98
      :             : * gizmos                     : 
      :             : * doodads                    : 
      :             : * thingamobobs               : 
 1024 | thingamabob | Self-explanatory, no?        |   0.99

I think I prefer the colon only in front of the continuing cell, but see no reason why both couldn’t be supported.

The second issue is a bit more subtle. My problem with the alignment hints, embodied by the colons in the header line, is that to the reader of the plain-text Markdown they fill no obvious purpose, but are provided purely for the convenience of the parser. In my opinion, if there is some part of the Markdown syntax that provides no obvious meaning to the user, it should be omitted. I take this point of view not only for my own selfish purposes, which are, of course, many and rampant, but from John Gruber’s original design goal for Markdown, which was:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

To me, those colons are formatting instructions. So, how else could we support alignment of cells but with formatting instructions? Why, by formatting the cells themselves, of course. Take a look again at the PostgreSQL and MySQL outputs. both simply align values in their cells. There is absolutely no reason why a decent parser couldn’t do the same on a cell-by-cell basis if the table Markdown follows these simple rules:

  • For a left-aligned cell, the content should have no more than one space between the pipe character that precedes it, or the beginning of the line.
  • For a right-aligned cell, the content should have no more than one space between itself and the pipe character that succeeds it, or the end of the line.
  • For a centered cell, the content should have at least two characters between itself and both its left and right borders.
  • If a cell has one space before and one space after its content, it is assumed to be left-aligned unless the cell that precedes it or, in the case of the first cell, the cell that succeeds it, is right-aligned.

What this means, in effect, is that you can create tables wherein you line things up for proper display with a proportional font and, in general, the Markdown parser will know what you mean. A quick example, borrowing from the PostgreSQL output:

  id  |    name     |         description          |  price  
------+-------------+------------------------------+--------
    1 | gizmo       | Takes care of the doohickies |   1.99 
    2 | doodad      | Collects *gizmos*            |  23.80 
   10 | dojigger    | Handles stuff                | 102.98 
 1024 | thingamabob | Self-explanatory, no?        |   0.99 

The outcome for this example is that:

  • The table headers are all center-aligned, because they all have 2 or more spaces on each side of their values
  • The contents of the “id” column are all right-aligned. This includes 1024, which ambiguously has only one space on each side of it, so it makes the determination based on the preceding line.
  • The contents of the “name” column are all left-aligned. This includes “thingamabob”, which ambiguously has only one space on each side of it, so it makes the determination based on the preceding line.
  • The contents of the “description” column are also all left-aligned. This includes first row, which ambiguously has only one space on each side of it, so it makes the determination based on the succeeding line.
  • And finally, the contents of the “price” column are all right-aligned. This includes 102.98, which ambiguously has only one space on each side of it, so it makes the determination based on the preceding line.

And that’s it. The alignments are perfectly clear to the parser and highly legible to the reader. No further markup is required.

Proposed Syntax

So, with this review, I’d like to propose the following syntax. It is inspired largely by a combination of PostgreSQL and MySQL’s output, as well as by MultiMarkdown’s syntax.

  • A table row is identifiable by the use of one or more pipe (|) characters in a line of text, aside from those found in a literal span (backticks).
  • Table headers are identified as a table row with the immediately-following line containing only -, |, +, :or spaces. (This is the same as the MultiMarkdown syntax, but with the addition fo the plus sign.)
  • Columns are separated by |, except on the header underline, where they may optionally be separated by +, and on continuing lines (see next point).
  • Lines that continue content from one or more cells from a previous line must use : to separate cells with continued content. The content of such cells must line up with the cell width on the first line, determined by the number of spaces (tabs won’t work). They may optionally demarcate all cells on continued lines, or just the cells that contain continued content.
  • Alignment of cell content is to be determined on a cell-by-cell basis, with reference to the same cell on the preceding or succeeding line as necessary to resolve ambiguities.
  • To indicate that a cell should span multiple columns, there should be additional pipes (|) at the end of the cell, as in MultiMarkdown. If the cell in question is at the end of the row, then of course that means that pipes are not optional at the end of that row.
  • You can use normal Markdown markup within the table cells, including multiline formats such as lists, as long as they are properly indented and denoted by colons on succeeding lines.
  • Captions are optional, but if present must be at the beginning of the line immediately preceding or following the table, start with [ and end with ], as in MultiMarkdown. If you have a caption before and after the table, only the first match will be used.
  • If you have a caption, you can also have a label, allowing you to create anchors pointing to the table, as in MultiMarkdown. If there is no label, then the caption acts as the label.
  • Cells may not be empty, except as represented by the appropriate number of space characters to match the width of the cell in all rows.
  • As in MultiMarkdown. You can create multiple <tbody> tags within a table by having a single empty line between rows of the table.

Sound like a lot? Well, if you’re acquainted with MultiMarkdown’s syntax, it’s essentially the same, but with these few changes:

  • Implicit cell alignment
  • Cell content continuation
  • Stricter use of space, for proper alignment in plain text (which all of the MultiMarkdown examples I’ve seen tend to do anyway)
  • Allow + to separate columns in the header-demarking lines
  • A table does not have to start right at the beginning of a line

I think that, for purposes of backwards compatibility, we could still allow the use of : in the header lines to indicate alignment, thus also providing a method to override implicit alignment in those rare cases where you really need to do so. I think that the only other change I would make is to eliminate the requirement that the first row be made the table header row if now header line is present. But that’s a gimme, really.

Taking the original MultiMarkdown example and rework it with these changes yields:

|               |            Grouping            ||
+---------------+---------------------------------+
| First Header  |  Second Header  |  Third Header |
+---------------+-----------------+---------------+
| Content       |           *Long Cell*          ||
: continued     :                                ::
: content       :                                ::
| Content       |    **Cell**     |          Cell |
: continued     :                 :               :
: content       :                 :               :

| New section   |      More       |          Data |
| And more      |             And more           ||
 [Prototype table]

Comments?

I think I’ve gone on long enough here, especially since it ultimately comes down to some refinements to the MultiMarkdown syntax. Ultimately, what I’m trying to do here is to push MultiMarkdown to be just a bit more Markdownish (by which I mean that it’s more natural to read as plain text), as well as to add a little more support for some advanced features. The fact that I’ll be able to cut-and-paste the output from my favorite database utilities is a handy bonus.

As it happens, John Gruber today posted a comment to the markdown-discuss mail list in which he says (not for the first time, I expect):

A hypothetical official table syntax for Markdown will almost certainly look very much, if not exactly, like Michel’s table syntax in PHP Markdown Extra.

I hope that he finds this post in that vein, as my goals here were to embrace the PHP Markdown Extra and MultiMarkdown formats, make a few tweaks, and see what people think, with an eye toward contributing toward a (currently hypothetical) official table syntax.

So what do you think? Please leave a comment, or comment on the markdown-discuss list, where I’ll post a synopsis of my proposal and a link to this entry. What have I missed? What mistakes have I made? What do you like? What do you hate? Please do let me know.

Thanks!

Looking for the comments? Try the old layout.

A Modest Proposal for Markdown Definition Lists

I realize that greater minds than mind have likely given a lot of thought to how best to implement a “natural” syntax for definition lists in Markdown. The best I’ve seen, however, is that implemented by PHP Markdown Extra, which is also supported by MultiMarkdown. Given the prevalence of these two libraries, I’m assuming that the syntax become the de-facto standard for definition lists in Markdown. But, to my mind at least, it leaves something to be desired. Here’s an extended example, taken from the PHP Markdown Extra documentation, including multiple definitions, multiple paragraphs, and nested formatting (lists, code block):

Term 1

  : This is a definition with two paragraphs. Lorem ipsum 
    dolor sit amet, consectetuer adipiscing elit. Aliquam 
    hendrerit mi posuere lectus.

    Vestibulum enim wisi, viverra nec, fringilla in, laoreet
    vitae, risus.

  : Second definition for term 1, also wrapped in a paragraph
    because of the blank line preceding it.

Term 2

  : This definition has a code block, a blockquote and a list.

        code block.

    > block quote
    > on two lines.

    1.  first list item
    2.  second list item

This format has a lot going for it, in that it covers most of the requirements for definition lists. In particular, it allows a term to have multiple definitions, or for multiple terms to share a definition, and for a definition to have multiple paragraphs, nested lists, code blocks, and other formatting. There’s only one problem with it, as far as I’m concerned: I would never write a definition list like this in an email.

I started thinking about alternates, first by thinking about how I would write a definition list in plain text. It would likely be something like this:

Term 1
&#xz002d;&#xz002d;&#xz002d;&#xz002d;&#xz002d;&#xz002d;
  This is a definition with two paragraphs. Lorem ipsum 
  dolor sit amet, consectetuer adipiscing elit. Aliquam 
  hendrerit mi posuere lectus.

  Vestibulum enim wisi, viverra nec, fringilla in, laoreet
  vitae, risus.

  Second definition for term 1, also wrapped in a paragraph
  because of the blank line preceding it.

Term 2
&#xz002d;&#xz002d;&#xz002d;&#xz002d;&#xz002d;&#xz002d;
  This definition has a code block, a blockquote and a list.

      code block.

  > block quote
  > on two lines.

  1.  first list item
  2.  second list item

This is much more like what I’ve actually written in the past. From the point of view of Markdown, however, there are precedents that make it problematic, namely:

  1. The underline for the terms is already used for secondary headers
  2. Lists with multiple paragraphs need to be indented four spaces or one tab (never mind that this can cause conflicts with code blocks following lists)
  3. There is no way to tell whether the paragraphs for a given term constitute a single definition with multiple paragraphs, multiple definitions, or some combination.

This last item never would have occurred to me, since I have never used more than one definition per term, but have often used multiple paragraphs in a single definition. However, when I think about the literal use of definitions–you know, to define a term, I think about a dictionary, which of course will offer many definitions for a single term. So clearly, there needs to be a way to distinguish definitions from paragraphs.

So I started thinking about it some more, trying to figure out why I don’t like the PHP Markdown Extra syntax, since it solves this problem by using “:” to identify a term. But then it hit me: Definitions are just a list, and the “:” is the bullet that identifies a list item. PHP Markdown Extra actually reinforces this interpretation, since it in all ways makes definitions conform with basic list syntax. This is a good symmetry and easy to remember.

So why do I hate the syntax? Once I realized this bit about the list, I immediately knew what I hated about it: “:” is a shitty bullet. As a native speaker and writer of American English, I no doubt bring my cultural biases to the table, but I would never use a colon at the beginning of something, only at the end. It just doesn’t belong there, hanging out in space. It’s too subtle, conveys no meaning that I can see, and thus have no obvious mnemonics to make it memorable.

So I started hunting around my keyboard for an alternate, and stumbled almost at once on the tilde, “~”. Consider this example, which in all ways is just like the PHP Markdown Extra syntax, except that the colon is replaced with a tilde:

Term 1:

  ~ This is a definition with two paragraphs. Lorem ipsum 
    dolor sit amet, consectetuer adipiscing elit. Aliquam 
    hendrerit mi posuere lectus.

    Vestibulum enim wisi, viverra nec, fringilla in, laoreet
    vitae, risus.

  ~ Second definition for term 1, also wrapped in a paragraph
    because of the blank line preceding it.

Term 2:

  ~ This definition has a code block, a blockquote and a list.

        code block.

    > block quote
    > on two lines.

    1.  first list item
    2.  second list item

Well, okay I did add the trailing colon to the terms, but that’s just more natural to me, and could be optional. But otherwise, it’s the tilde that’s different. This to me is much more natural. I’d be perfectly willing to write a definition list in email messages this way (and I think I will from now on). It works well with shorter definition lists, too, of course:

Apple:
  ~ Pomaceous fruit of plants of the genus Malus in the family Rosaceae.
  ~ An american computer company.

Orange:
  ~ The fruit of an evergreen tree of the genus Citrus.

See how nice that is? So, you might ask, why the tilde rather than the colon? As I said before, the colon doesn’t look right out there in front, it’s not a “natural” way to write definitions because it’s not a natural choice for a bullet. The tilde, however, is perfectly comfortable hanging out at the beginning of a line as a bullet, resembling as it does the dash, already used for unordered lists in Markdown. Furthermore, it’s already used in dictionaries. According to Wikipedia, “The swung dash is often used in dictionaries to represent a word that was mentioned before and is understood, to save space.” Not an exact parallel, but at least the tilde’s cousin the swung dash has to do with definitions! Not only that, but in mathematics, according to Wikipedia again, the tilde “is often used to denote an equivalence relation between two objects.” That works: a definition is a series of words that define a term; that is, they are a kind of equivalent!

So I would like to modestly propose to the Markdown community that the PHP Markdown Extra definition list syntax be adopted as a standard with this one change. What do you think?

Looking for the comments? Try the old layout.

Now with Markdown!

Lately I’ve been fiddling a bit with Markdown, John Gruber’s minimalist plain text markup syntax. I’ve become more and more attracted to Markdown after I’ve had to spend some time using Trac and, to a lesser degree, Twiki and MediaWiki. The plain-text markup syntax in these projects is…how shall I put this?…gawdawful. Why do I hate these wiki syntaxes? Becaus they’re unnatural. Maybe it’s just because I’m most familiar with it, but Trac’s syntax is just completely random and inconsistent. Trying to get anything other than simple paragraphs formatted just right is just a giant pain in the ass. Just try have multiple paragraphs in a hierarchical bulleted list and you’ll see what I mean. If I wanted to worry about space this much I’d hack Python! I mean, seriously, there’s a reason I write my blog entries in pure HTML. It’s not so user-friendly, but at least I know exactly how something will be formatted when I’m done.

But Markdown is different. It’s syntax is almost exactly like what I’ve been using in lain-text email messages since the mid-1990s. It’s humane in a way that Textile only approaches in its inline markup (as long as you don’t use attributes, of course). There are a few oddities, such as the definition list syntax used by PHP Markdown Extra and MultiMarkdown is a bit unnatural. But overall, it’s quite close to what I type anyway. I’ve been writing the pgTAP documentation in Markdown, using Discount to generate the HTML you see on the Web site (plus my own custom hack to create the table of contents), and it’s just a thrill that it’s so easy to maintain: I can easily read and edit the README file like any other text file, and then generate the HTML for the Web site with a simple make target. It has been such a great experience that I’m tempted to stop writing documentation in POD!

So in my next app, I’ll likely be making use of MultiMarkdown for the end-user management of content. It has nearly everything I want, formatting-wise, and I can likely get used to the few cases where its syntax seems a bit weird to me. Plus, I can then use the generated HTML to output PDFs and other formats from the same document. I expect it to be a dream to work with. (Oh, and thanks to Aristotle Pagaltzis for patiently putting up with my questions about markdown in private email messages; they’ll help keep me from saying anything too embarrassing on the Markdown mail list!)

In the meantime, I’ve modified the comment system on this blog to support Markdown. You can still use HTML in comments, same as always, as Markdown passes HTML through unmolested. But few of you ever did that, and I was always adding HTML tags to the comments. Now maybe I won’t have to: Markdown is so easy and natural to use, that the vast majority of commenters will just leave paragraphs and they’ll look beautiful.

At any rate, you now have one less reason not to leave a comment!

Looking for the comments? Try the old layout.