When working on a large project, take the time to document the internals.

So I’m working on a very large project which I intend to discuss elsewhere at a later date. But it involves a lot of code–and is quite easily the largest thing I’ve ever considered taking on to date.

And I find I go for a few weeks making serious headway–then find myself spending less and less time coding and more and more time surfing the web. I think it’s because I’ve hit a roadblock; the project has gotten so large I can no longer see the forest for the trees. (And hell, at the level some of this code exists, I can’t see the damned trees for the individual leaves.)

I’ve discovered a secret, however.

Document.

When I start forgetting what’s there, it’s good to stop at a good stopping point, and take a survey of what’s there. Nothing fancy; just a simple iteration of what’s where at a directory level helps worlds to get your bearings.

Because otherwise, it’s easy to get lost when you’re looking at several months worth of work, with over a hundred source files and tens of thousands of lines of code, wondering “what’s next.”

And in the documentation you even find things that perhaps you forgot to deal with. Corner cases you thought “I’ll deal with you tomorrow”–only for tomorrow to never come.


So today’s a documentation day. It’s healthy because it reminds me of what I’ve done. It also reminds me of what I still need to do. And it reminds me of things I could do in the future.

Bad User Interfaces

Hawaii missile alert: How one employee ‘pushed the wrong button’ and caused a wave of panic

Around 8:05 a.m., the Hawaii emergency employee initiated the internal test, according to a timeline released by the state. From a drop-down menu on a computer program, he saw two options: “Test missile alert” and “Missile alert.” He was supposed to choose the former; as much of the world now knows, he chose the latter, an initiation of a real-life missile alert.

Well, there’s your problem right there.

Can we require more user interface designers to read Designing with the Mind in Mind before building interfaces like this one?

You can learn a lot about something by trying to implement the specification.

One thing I never really understood about CSS and HTML was how display layout actually works. All of the tutorials and documents I’ve encountered felt to me like the end of a long game of telephone: how things work was garbled, and replaced with a list of half-baked recipes that sort of worked some of the time.

But after spending the last few weeks in a deep dive in the CSS specifications I’ve figured out a few things which I wanted to share here.


At the top of the visual formatting model is a trio of CSS properties which control how a piece of HTML is presented to the user–including if the contents are in a block or is wrapped like text, if it floats to the left or to the right, or if it floats around on the screen, are the following properties:

  • Display which indicates how a piece of HTML is displayed.
  • Position which indicates how a box is positioned within its parent box or within the page.
  • Float which indicates if a cell should float to the left or the right of its parent box.

These three properties are probably the most important properties in determining how a piece of HTML is displayed on the screen. In fact, an entire chapter of the CSS2 spec (on which CSS3 is based) is devoted to these three properties and how they interact.

In fact, they’re so important that there is an algorithm for resolving conflicts between these three properties.

From these three properties we determine what other HTML properties are valid and how they work.

Now on top of this model we also have the box dimensions. Everything within HTML ultimately renders to a box: an inline box (which may represent a run of text that can be word wrapped) or into a block–a rectangular region like an image or a table.

All boxes including inline text is subject to the box dimension properties. This has the interesting side effect that you could (for example) extend a box around a bunch of words and have those boxes subject to word wrapping.

When considering display properties, there are only a handful which are significant. (The rest revolve around table formatting.)

  • Block causes a chunk of HTML to be presented as a box. A box can contain inline flowed text (such as a paragraph), or it can contain a bunch of other boxes, which are generally stacked top to bottom.
  • Inline which represents inline objects positioned horizontally on the screen, such as inline text. If you want to show a small image inline with text, you’d set ‘display:inline’ on the image tag.
  • None causes an object not to appear on the screen.
  • inline-block allows you to format a block of content and present that content inline when formatting text. For example, you can use an inline block to display a fraction.

The float property is relatively self-evident. But the position property less so.

As far as I can figure, the position properties are:

  • static uses normal formatting; the box is laid out according to the default rules.
  • fixed positions the box on the screen in absolute coordinates. (A fixed box can be fixed to the top of the screen, for example.)
  • absolute positions the box at a specified location relative to the parent box.
  • relative marks an object as being repositionable after it is placed according to normal layout rules.
  • The “absolute”, “relative” and “fixed” rules have a second property in that they define the container block for children with position: absolute. Absolute positioned elements and relative positioned elements are positioned relative to their location inside the position flow; the only difference is that absolute positioned elements are not first laid out using normal formatting. (Source)

    Putting all this together, this allows you to do things like float text relative to other text in a paragraph, such as:

    This is a test.

    It’s interesting that these three properties are the root properties in the layout algorithm, yet they seem to be treated as if they were just incidental (and apparently not well understood) properties in everything I’ve read.

CSS3 is a hot mess.

Stupid me, I thought this would be relatively easy for someone of my skill set.

So I want to build an application which can display ePub files. Step 1, of course, is to look at the ePub specification, which is simple enough: an ePub file is basically a zip file with a particular format.

Building a zip file reader is relatively straightforward, and parsing through the ePub file specification really doesn’t take a lot. As verbose as the ePub file specification is, really you only need to do a handful of things to get at the contents of the file:

1. First, you build a piece of code which allows you to randomly access the contents of the zip file. If you are doing this in Java that’s pretty easy; in Objective C that requires a little more finesse. But at the end of the day you build a zip file scanner which scans for the table of contents, load that into a structure, and use the table of contents to find the data in the file containing the compressed file structure.

There are plenty of sources and examples of reading the zip file structure; Wikipedia gives a good overview, the file format is documented fairly well, and there are examples on how to parse and decompress the file.

2. You need to load the META-INF/container.xml file from the zip file, and parse the contents.

The contents are well-defined, and generally contain a single top level .opf file reference which you can then use to parse the contents of the ePub book. The file generally looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
    <rootfiles>
        <rootfile full-path="OPS/epb.opf" media-type="application/oebps-package+xml"/>
    </rootfiles>
</container>

So just find the rootfile object in the rootfiles object, grab the full-path attribute, and away you go. (Now ePub books can have multiple oebps package files, and there is a method for parsing through each of them and offering the “version” of the book to display to the user. This may be used, for example, with comic books where you may want a text version with images, and a SGML version with illustrator images.) But parsing and displaying that is beyond the scope of this post.

3. You then parse the .opf file, which is a <package> file that contains metadata, a manifest, a spine (that is, a list of the chapters in the book), and potentially a table of contents of some form.

For ePub 3 (which is what I’m interested in parsing), you can grab the author and title of the book from the metadata. (Generally the name of the book is under metadata/dc:title, and the author under metadata/dc:creator, though there are exceptions; [see the specs](http://www.idpf.org/epub/31/spec/epub-packages.html#sec-metadata-elem) on how to handle multiple authors or subtitles.)

For ePub 3, the table of contents can be found by scanning the manifest for an <item> which has the property “nav” set. So, for our example file with a manifest:

    <manifest>
        <item href="toc.xhtml" id="toc" media-type="application/xhtml+xml" properties="nav"/>
        <item href="cover.xhtml" id="cover" media-type="application/xhtml+xml"/>
        <item href="chapter-1.xhtml" id="chapter-1" media-type="application/xhtml+xml"/>
        <item href="images/IMG_0087.jpg" id="dataItem1" media-type="image/jpeg"/>
        <item href="css/book.css" id="stylesheet" media-type="text/css"/>
        <item href="epb.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>
    </manifest>

The table of contents is the file toc.xhtml, which, while it is in XHTML format, must honor the format specified in the ePub specification. This means the xhtml table of contents can be parsed easily into a table of contents data structure and presented as, for example, a drop-down menu, rather than being forced to display the table of contents as an HTML page.

The spine XML contains the actual contents of the ePub book, and presents a list of item references referring to files in the manifest for display. A spine looks like:

    <spine toc="ncx">
        <itemref idref="cover" linear="no"/>
        <itemref idref="chapter-1" linear="yes"/>
    </spine>

Each <itemref> item contains an idref attribute which refers to an id of an item in the manifest. The spine is in the presentation order of the book. And notice that each manifest item contains an href; that is a relative reference within the zip file for the contents of each chapter. The manifest also contains references for all the images and other contents, again, relative to the location of the original .opf file.

There are other elements of the ePub file, including an .ncx file and other non-standard parts; they exist either for ePub 2 compatibility or to provide extra metadata for certain readers, such as iBooks specific information.


At the bottom of the stack, an ePub file is essentially an HTML web site in compressed forms, where individual pages (generally, chapters) should be presented in a certain order. You would not be wrong, by the way, to unzip all of this into a directory, find the table of contents and open that in a browser. If you want to get fancy, you can also add a “previous chapter”/”next chapter” button somewhere in your UI to progress through the web pages in the spine, or you can synthesize a single page that the user can use for navigation.

However, suppose you’re me and you don’t want to display your book as a series of very long web pages the user scrolls through, but as a series of pages–like the Kindle app or the iBooks app.

Well, that’s where my adventure down the rabbit hole comes into play.

As it turns out, on iOS the UIWebView has the ability to display web sites in “page” mode. But this doesn’t quite do the right thing. You get formatting errors all over the place, and fundamentally you want to have greater control over the layout of individual pages than UIWebView (which is deprecated in favor of a new web browser technology which does not do page-level layout), you need to lay the pages out yourself.

And ePub says that its pages are XHTML, using the HTML 5 specification (meaning you can parse the XHTML pages using an XML parser), and formatted with CSS 3.

And thus, my adventure into the land of understanding CSS 3 begins.


If you’re like me, the first thing you do when you scan the CSS 3 specs is you look for a document which describes the syntax of the file, and you implement it.

Which is what I did: I found the CSS 3 syntax specification, implemented the tokenizer or lexical parser (which I did by hand and was no more complex than most tokenizers), and started in on the syntax parser.

It started out fine. Section 4 of the Syntax document turned out to be incredibly easy to implement: section 4.3 pretty much holds your hand and tells you in prose exactly what to implement in code. (I’m sure there are some optimizations that can be applied here, but if you’re like me, you want to see something work before you go back and optimize.)

Then I got to Section 5, and did the same thing.

And that’s where the ground started to give way into quicksand.

First, notice the multiple entry points. Okay, this makes sense since CSS can be in its own separate file, embedded in a <style> tag, and snippets of CSS can be in the style attribute of the HTML file.

Sure, fine. I can deal with that.

But then you start building the parser and you notice things like this:

Note: Despite the name, this actually parses a mixed list of declarations and at-rules, as CSS 2.1 does for @page. Unexpected at-rules (which could be all of them, in a given context) are invalid and should be ignored by the consumer.

An “at-rule,” by the way, is sort of like a preprocessor macro in C: it can be just about anything, though what it is legally allowed to be at any point in the parse tree is context-dependent.

And then you see things like this: Consume a component value:

  • Consume the next input token.
  • If the current input token is a <{-token>, <[-token>, or <(-token>, consume a simple block and return it.
  • Otherwise, if the current input token is a , consume a function and return it.
  • Otherwise, return the current input token.

Hang on a minute, Sparky; did I just see what I thought I saw?

We’re returning tokens?


At this point you realize the syntax parser algorithm does not actually fully specify the CSS language. It basically finds components of the CSS file, but then at later points you need separate parsers to parse the language in order to find out what’s going on. That is, at the end of this parsing session you’ll see things like “qualified rules” which contain a list of tokens; you later have to parse those tokens to figure out you have a list of declarations–and so forth.

You are not, in other words, left with an abstract syntax tree. You’re left with a partially parsed file which still needs further work.

And how do you deal with a component being either a “function”, a “block” or a “token?” (Personally I reached back into the token object and added synthesized token values for “block” and “function”; that way I could use a recursive parser for further parsing tasks.)

So why the hell don’t you get a full abstract syntax tree?

Well…


First, let me note that the CSS 3 specification does not exist.

By that I mean the CSS 3 specification is not a single specification like CSS 1 or 2. There is no single document stamped “CSS 3” which is a complete description of the CSS 3 specification.

Instead, CSS 3 is essentially a series of “deltas”; smaller documents which describe changes to the CSS 2 specification which make the specification “CSS 3” compliant. The standards committe calls this a “modular approach”:

However for CSS beyond Level 2, the CSS Working Group chose to adopt a modular approach, where each module defines a part of CSS, rather than to define a single monolithic specification. This breaks the specification into more manageable chunks and allows more immediate, incremental improvement to CSS.

In practice, what this means is that to understand CSS 3, you must first understand CSS 2, then update any of the “out of date” chapters in CSS 2 with their relevant CSS 3 documents.

Think that’s bullshit? Well, here’s the official definition.


In practice, that means the syntax parser specification cannot fully parse the CSS 3 language, since future modules may supersede the current specification.

It also means in practice there is no definitive list of CSS 3 attributes anywhere in the document. Now you are not required to implement all attributes; with the introduction of media types in CSS 2 you may want to only implement a subset of the attributes relevant for your media type. And some media types introduce attributes that are only relevant for that media; for example, paged media (that is, web sites rendered to printers or to eBooks with flippy pages) introduce formatting attributes only relevant to print media.

This, by the way, also means to fully parse CSS 3 you must understand which attributes you want to implement (and a full list of CSS 2 attributes which form the foundation of CSS 3 are listed here), then you must build specialized parsers to parse the tokens in the relevant declarations.

And it means you must expand supported shorthand properties; properties which are shorthand for multiple other properties, some of which can have some mind-bending syntax of their own.

And you must expand shorthand properties, because in the following:

border: black;
border-top: red;

This should expand to:

border-top: red;
border-right: black;
border-bottom: black;
border-left: black;
border-image: none;

Meaning a shorthand property is exactly the same as if you wrote out the full properties–and later in the CSS you can override the individual properties without resetting all of the other components of the shorthand property.


So which properties are shorthand properties and what do they expand to? Well, that requires wading through the specification. And ignoring properties you don’t understand–which can be added later to the CSS specification, as it’s modular.

Once you realize the CSS specification is a hot mess, you find yourself writing a hell of a lot of code to handle all the various properties.

But wait, it gets worse! Not all properties promulgate. And that’s a good thing from a user’s perspective; just because you indicate a <div> tag should sport a nice sexy light-gray border doesn’t mean you want all the contained content to also display borders. On the other hand, if you specify the font of a <div> tag, you probably want the same font to be used inside every element of that tag unless specified otherwise.

But from an implementation standpoint that requires a table somewhere of all your properties and which ones promulgate and which ones don’t.

Resolving which attributes are associated with which elements is also well defined. But there are no good strategies given for how to do this quickly; a naive approach would be pretty brute force. (I’ve elected to build a cache of all selectors who have a final selector value that matches a particular cache key of the tag name (i.e., “<p>”), the id attribute and the class attribute. The theory is that the majority of HTML elements will share similar attributes, so the cache should be hit repeatedly. And if a tag hasn’t been seen yet, I construct a subset of the selectors which match the key. My hope is that for most HTML we’ll have zero lookups beyond the basic selection process.)

It just gives me a headache.

And I haven’t even gotten into page layout yet.

*sigh*

NSURLProtocol

So here’s something fun you can do with the whole family. (Okay, I have a weird family. YMMV.)

So I want to allow my UIWebView to handle URLs of the form “myscheme://myhost/mypath”, and intercept these requests to (for example) pull the data for each of these requests from a zip archive.

At first I thought “hey, just use the UIWebViewDelegate”, but that turns out not to work very well.

NSURLProtocol to the rescue!

There are plenty of tutorials out there, but they all seem to cover the idea of using NSURLProtocol as a sort of caching system. There are so many other things you can do with it, though!

Like, in my case, create a brand new way to pull data that does not rely on a network connection.


NSURLProtocol is an abstract class which allows you to insert a custom mechanism for loading URLs. What you do, you see, is build a new NSURLProtocol that handles some new protocol (like, oh, say, handling request to myscheme), and insert it into the networking stack so when, anywhere in your app, you see a request for “myscheme://blahblahblah”, it is handled by your custom code.

So here’s how you use the class.

Step 1

Create a new class which inherits from NSURLProtocol. (We’ll call this “MySchemeProtocol.”)

Define the glass method canInitWithRequest: which indicates that your class needs to be used to process your custom request. (Note: your protocols will be examined first when looking for a protocol to handle a request, so in theory you could intercept file:/// and http:// requests. Probably best not to do this.)

+ (BOOL)canInitWithRequest:(NSURLRequest *)request
{
    if (![request.URL.scheme isEqualToString:@"myscheme"]) return NO;
    return YES;
}

Note that if this returns true, a new instance of an NSURLProtocol class will be created for each request. You can thus use the NSURLProtocol class to track any local state associated with the specific request.

Step 2

The documentation also says you must implement canInitWithTask: and canonicalRequestForRequest:. You can read the documentation to understand what these methods do, but in my case (and I suspect, in yours), you really don’t need to do much. The former can examine the request behind the task, the latter can just pass the URL back.

Honestly I don’t know the consequences of defining canInitWithTask: the way I did, so beware.

+ (BOOL)canInitWithTask:(NSURLSessionTask *)task
{
    return [self canInitWithRequest:task.originalRequest];
}

+ (NSURLRequest *)canonicalRequestForRequest:(NSURLRequest *)request
{
    return request;
}

Step 3

Once you’ve added the boilerplate above, you must implement startLoading and stopLoading.

Now in my case, I assume that the contents from each request are loaded immediately after startLoading, though you can also kick off a thread to obtain the data. (If you do kick off a thread or a task or some other asynchronous mechanism for obtaining data, you must halt that process when stopLoading is called.)

So in my case, with data loaded immediately on startLoading, our required stopLoading method is easy:

- (void)stopLoading
{
    // Does nothing, since I satisfy the request synchronously in startLoading.
    // Otherwise, stop the background task or thread here.
}

Step 4

You start loading (or in my case, synchronously load) when startLoading is called.

Now here’s the thing I had to discover on my own. If you are loading data yourself (rather than just fiddling with the caching of data), you wind up having to interact with the NSURLProtocolClient object that is stored in your self.client field. This is the client that made the request, and it is where the data needs to go once you obtain it.

While loading data, if the cached data passed to your protocol is valid (for some definition of “valid” you get to define yourself), you can call the URLProtocol:cachedResponseIsValid: method, and return.

- (void)startLoading
{
    // In my case I assume the cached response is always valid. So if
    // we have a cached response, simply pass it up. You can add logic,
    // such as "if this is more than 5 minutes old, it's not valid."

    if (self.cachedResponse) {
        [self.client URLProtocol:self cachedResponseIsValid:self.cachedResponse];
    } else {

Now if we have to load our data, we then respond to the NSURLProtocolClient, first by calling URLProtocol:didReceiveResponse:cacheStoragePolicy: to indicate we are receiving something. Then we call URLProtocol:didLoadData: one or more times with the data we receive. (If this is being loaded asynchronously, we can call didLoadData multiple times as we receive our data.) And finally, once all the data is loaded, we call URLProtocolDidFinishLoading.

So in our case, as we’re loading all our data from an archive (which I don’t describe here how it works, other than it’s synchronous) is:

        // Load data from our internal file and pass the results
        NSData *data = [[MyArchive shared] loadDataFromURL:self.request.URL];
        NSString *mimeType = [[MyArchive shared] mimeTypeForURL:self.request.URL];

        NSURLResponse *response = [[NSURLResponse alloc] initWithURL:self.request.URL
                                                            MIMEType:mimeType
                                               expectedContentLength:data.length
                                                    textEncodingName:"utf-8"];
        [self.client URLProtocol:self 
              didReceiveResponse:response 
              cacheStoragePolicy:NSURLCacheStorageAllowed];

        [self.client URLProtocol:self didLoadData:data];

        [self.client URLProtocolDidFinishLoading:self];
    }
}

Step 5

Now that you’re done, you simply need to register this when your application starts up:

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
    // Other stuff goes here...

    [NSURLProtocol registerClass:CSBookURLProtocol.class];

    return YES;
}

And now you can invoke your protocol when loading a specialized URL, such as:

NSString *myURL = @"myscheme://myhost/mypath";
NSURL *url = [NSURL URLWithString:myURL];
NSURLRequest *req = [NSURLRequest requestWithURL:url];
[self.myWebView loadRequest:req];

Things to remember: files in test bundles.

For a program I’m working on, I want to store a file in the test bundle of an app for unit testing, then load that file to test a parser.

Specifically I want to create a class of XCTestCase, and create a file which is included in the unit test object. I then want to be able to load a URL from the file using NSBundle’s URLForResource:withExtension: method, so that I can test an internal parser using the test file, but without the test file being built as a part of the final application.

To do this, you need to load the URL from the test class’s bundle. So in the end, I wound up doing the following:

(1) Including the file as a target of the unit tests app.

(2) Loading the file with:

- (void)testZip
{
    NSBundle *b = [NSBundle bundleForClass:self.class];
    NSURL *url = [b URLForResource:@"MyFile" withExtension:@"ext"];
    XCTAssert(url != nil);
    ... Do my other tests with the file here...
}