Paragraph formatting and text indentation
=========================================

General formatting: after expanding Texinfo commands, of course, it just
does simple text filling to column 72 (by default), breaking at
whitespace.

The biggest trick that comes to mind is that it's necessary
to insert a second space when a sentence ends at the end of an input
line.  That is, given this input:

  Blah some sentence foo.
  Another sentence bar.

The output would be:

  Blah some sentence foo.  Another sentence bar.

with two spaces, not just one.  (Ordinarily a newline would just become
a single space when filled.)  That is, unless @frenchspacing is in
effect.  See around line 2745 in makeinfo.c and the end_of_sentence_p
function.

However, we don't try to generalize this.  If the input is:

  Sentence foo. Sentence bar.

We don't recognize that period as the end of a sentence and insert the
extra space.  It might actually be desirable to do that in theory, but
in practice it's never been implement.  The theory being that the author
should have used two spaces in the Texinfo input file in the first
place.  (Whereas the author is not in control of how spaces are used in
the first case above.)

In general, what happens with whitespace is another case where the
precise rules of what happens have never been written down.  So if we
see chances to improve things, or have slight variations, that's ok.


Info output
===========

Well, let's see.  Looking at the info.info and texinfo Info files, the
general idea is as follows.  Maybe we can turn this into a real
definition eventually, but hopefully this will be enough to get started.
I trust my metasyntax will be obvious.

<info_file> ::=
<preamble>
<node>* | <indirect-table>
<tag table>

# The preamble is text at the beginning of each output file (if split).
# It is ignored by Info readers.
<preamble> ::=
  <identification line>   # "This is <filename>, produced by ..."
  <copying text>          # expansion of the @copying text
  <dir entries>           # from @dircategory and @direntry

# The indirect table is used for the main file in the case of split
# output.  It specifies the starting byte position of each output file.
# The positions are consecutive.  We'll have to figure out the exact way
# they are computed (eg, when the preamble bytes are counted and when
# they're not), I don't have all that in my head.
# 
<indirect-table> ::=
"^_
Indirect:"
( <filename>: <bytepos> )*

# Regular nodes.  The next and prev pointers are omitted at the
# last/first node.
<node> ::=
^_
File: <filename>,  Node: <nodeid>,  Next: <nodeid>,  Prev: <nodeid>,  Up: <nodeid>

<then comes arbitrary text until the next ^_>


# The tag table specifies the starting byte pos of each node and anchor
# in the file.  It appears last in the (main) output file.
# 
# A split output file is the same as the main output file except it
# omits the tag table.  In particular, split files do include the
# preamble text.
# 
<tag table> ::=
"^_
Tag Table:"
"(Indirect)"              # this literal text appears only with split output
( "Node" | "Ref" )":" <nodeid>"^?"<bytepos>
"^_
End Tag Table"

# E.g., Node: Top^?1647  says that the node named "Top" starts at byte 1647
# while Ref: Overview-Footnote-1^?30045 says that the anchor named
#   "Overview-Footnote-1" starts at byte 30045

# The ^_ and ^? characters that I wrote out above are really one-byte
# control characters, but you probably guessed that :).
