diff --git a/chunky-base-b.md b/chunky-base-b.md
index 10b4c5d..af673d2 100644
--- a/chunky-base-b.md
+++ b/chunky-base-b.md
@@ -1,105 +1,133 @@
---
layout: page
title: Chunky Base b Encodings
author: T. J. M. Makarios
katex: True
---
Introduction
===
There's often a reason to encode an arbitrary sequence of bytes as
a sequence of symbols from an alphabet smaller than 256 characters.
ASCII-based alphabets are popular,
but the precise choice of alphabet often differs, depending on
the constraints of each particular situation.
Having chosen an alphabet with $$b$$ characters,
it's tempting to say "now we just encode it in base $$b$$".
But a number of implementation details still remain,
and if they aren't precisely specified,
different implementations might not be interoperable.
For example:
* Are the input and encoded data to be
treated as big-endian or little-endian numbers?
* Is the input data to be
treated as a single large integer, as in variants of [base58], or
broken into chunks as in [ascii85]?
* How are leading or trailing `0x00` bytes to be encoded?
This document provides a way to concisely specify
a relatively efficient scheme for
encoding arbitrary sequences of bytes
into sequences of symbols from
any alphabet of between two and 256 characters (inclusive).
It's compatible with the unpadded
[base16, base32, and base64 encodings][rfc4648],
and, to a certain extent, with Ascii85
(though the `y` and `z` shorthands aren't supported).
Finally, a base 26 alphabetic encoding is suggested,
which can assist with the efficient encoding of data
in valid URIs stored in [Aztec codes][aztec],
which allow for compact encoding of
long single-case alphabetic sequences.
Design choices
===
For compatibility with the most popular existing encodings,
the encoding scheme defined in this document uses
the big-endian convention,
treating earlier bytes or symbols as more significant digits.
Each byte of input is treated in the usual way as
an unsigned integer (between 0 and 255);
a string of bytes $$A_0, A_1, \ldots, A_{k-1}$$ is
treated as a string of bits $$a_0, a_1, \ldots, a_{8k-1}$$,
where the bits of $$A_\iota$$ are
$$a_{8\iota}, a_{8\iota+1}, \ldots, a_{8\iota+7}$$,
arranged in order from the most significant bit --- $$a_{8\iota}$$ ---
to the least significant --- $$a_{8\iota+7}$$.
Unless the chosen base is a power of two,
encodings that treat the input
(say, $$a_0, a_1, \ldots, a_{8k-1}$$)
as a single large integer ---
$$\sum_{0 \leq i < 8k}{2^{8k-i-1} a_i}$$ ---
can't determine the first output symbol until
the entire input has been read (or at least its length is known),
and the time-complexity of computing the output is
quadratic in the length of the input.
Therefore, such encodings are undesirable in cases where
large streams of data are to be encoded.
An alternative is possible in which the input is treated as a fraction
--- $$\sum_{0 \leq i < 8k}{2^{-(i+1)} a_i}$$;
in fact, the base16, base32, and base64 encodings can be
viewed as this type of encoding system.
This type of encoding allows the output stream to
begin before the entire input has been read, but again,
if the chosen base is not a power of two, then
the average time required to compute the next output symbol grows
as the encoding algorithm processes more and more input data.
Therefore, in order to be suitable for as many purposes as possible,
this document adopts the practice of
dividing the input into chunks of specific numbers of bits.
Each full chunk is encoded into a fixed number of output symbols,
and any final partial chunk is encoded into a number of symbols that
allows the decoder to reconstruct the exact length of the original input;
in this way, this scheme ensures that there's
no loss of information regarding leading or trailing `0x00` bytes.
+
+Parameters
+===
+
Given integers $$n$$ and $$b$$ satisfying
$$n \geq 1$$ and $$2 \leq b \leq 256$$, and
distinct symbols $$c_0, c_1, \ldots, c_{b-1}$$,
this document defines
the $$n$$-bit chunky base $$b$$ encoding using the alphabet
$$c_0, c_1, \ldots, c_{b-1}$$.
+Each full chunk of $$n$$ bits of the input is encoded in a full chunk
+of $$m$$ symbols, where $$m = \left\lceil \frac{n}{\lg{b}} \right\rceil$$, $$\lceil\cdot\rceil$$
+represents the ceiling function, and $$\lg$$ is the base-two
+logarithm.
+
+Typically, a user of this specification will first choose an alphabet
+that satisfies their constraints;
+the size of this alphabet determines the parameter $$b$$.
+
+The parameter $$n$$ can be any positive integer, but for a given value
+of $$b$$, some values of $$n$$ will be more suitable than others.
+Certainly, $$n$$ should be at least $$\lg{b}$$, otherwise part of the
+chosen alphabet will remain entirely unused.
+
+The encoding will be most space-efficient when $$\frac{n}{\lg{b}}$$ is
+not much less than an integer.
+When $$b$$ isn't a power of 2, the continued fraction expansion of
+$$\lg{b}$$ can be useful in identifying an appropriate value of $$n$$.
+Every second convergent will be a rational number that slightly
+overestimates $$\lg{b}$$, with each successive convergent being a
+closer approximation;
+the numerators of these rational approximations (when written in their
+simplest forms) are good candidates for $$n$$.
+
[ascii85]: https://en.wikipedia.org/wiki/Ascii85
[aztec]: https://en.wikipedia.org/wiki/Aztec_Code
[base58]: https://docs.rs/bs58/0.4.0/src/bs58/encode.rs.html#334-370
[rfc4648]: https://tools.ietf.org/html/rfc4648