CSS3 is a hot mess.

Stupid me, I thought this would be relatively easy for someone of my skill set.

So I want to build an application which can display ePub files. Step 1, of course, is to look at the ePub specification, which is simple enough: an ePub file is basically a zip file with a particular format.

Building a zip file reader is relatively straightforward, and parsing through the ePub file specification really doesn’t take a lot. As verbose as the ePub file specification is, really you only need to do a handful of things to get at the contents of the file:

1. First, you build a piece of code which allows you to randomly access the contents of the zip file. If you are doing this in Java that’s pretty easy; in Objective C that requires a little more finesse. But at the end of the day you build a zip file scanner which scans for the table of contents, load that into a structure, and use the table of contents to find the data in the file containing the compressed file structure.

There are plenty of sources and examples of reading the zip file structure; Wikipedia gives a good overview, the file format is documented fairly well, and there are examples on how to parse and decompress the file.

2. You need to load the META-INF/container.xml file from the zip file, and parse the contents.

The contents are well-defined, and generally contain a single top level .opf file reference which you can then use to parse the contents of the ePub book. The file generally looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
    <rootfiles>
        <rootfile full-path="OPS/epb.opf" media-type="application/oebps-package+xml"/>
    </rootfiles>
</container>

So just find the rootfile object in the rootfiles object, grab the full-path attribute, and away you go. (Now ePub books can have multiple oebps package files, and there is a method for parsing through each of them and offering the “version” of the book to display to the user. This may be used, for example, with comic books where you may want a text version with images, and a SGML version with illustrator images.) But parsing and displaying that is beyond the scope of this post.

3. You then parse the .opf file, which is a <package> file that contains metadata, a manifest, a spine (that is, a list of the chapters in the book), and potentially a table of contents of some form.

For ePub 3 (which is what I’m interested in parsing), you can grab the author and title of the book from the metadata. (Generally the name of the book is under metadata/dc:title, and the author under metadata/dc:creator, though there are exceptions; [see the specs](http://www.idpf.org/epub/31/spec/epub-packages.html#sec-metadata-elem) on how to handle multiple authors or subtitles.)

For ePub 3, the table of contents can be found by scanning the manifest for an <item> which has the property “nav” set. So, for our example file with a manifest:

    <manifest>
        <item href="toc.xhtml" id="toc" media-type="application/xhtml+xml" properties="nav"/>
        <item href="cover.xhtml" id="cover" media-type="application/xhtml+xml"/>
        <item href="chapter-1.xhtml" id="chapter-1" media-type="application/xhtml+xml"/>
        <item href="images/IMG_0087.jpg" id="dataItem1" media-type="image/jpeg"/>
        <item href="css/book.css" id="stylesheet" media-type="text/css"/>
        <item href="epb.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>
    </manifest>

The table of contents is the file toc.xhtml, which, while it is in XHTML format, must honor the format specified in the ePub specification. This means the xhtml table of contents can be parsed easily into a table of contents data structure and presented as, for example, a drop-down menu, rather than being forced to display the table of contents as an HTML page.

The spine XML contains the actual contents of the ePub book, and presents a list of item references referring to files in the manifest for display. A spine looks like:

    <spine toc="ncx">
        <itemref idref="cover" linear="no"/>
        <itemref idref="chapter-1" linear="yes"/>
    </spine>

Each <itemref> item contains an idref attribute which refers to an id of an item in the manifest. The spine is in the presentation order of the book. And notice that each manifest item contains an href; that is a relative reference within the zip file for the contents of each chapter. The manifest also contains references for all the images and other contents, again, relative to the location of the original .opf file.

There are other elements of the ePub file, including an .ncx file and other non-standard parts; they exist either for ePub 2 compatibility or to provide extra metadata for certain readers, such as iBooks specific information.


At the bottom of the stack, an ePub file is essentially an HTML web site in compressed forms, where individual pages (generally, chapters) should be presented in a certain order. You would not be wrong, by the way, to unzip all of this into a directory, find the table of contents and open that in a browser. If you want to get fancy, you can also add a “previous chapter”/”next chapter” button somewhere in your UI to progress through the web pages in the spine, or you can synthesize a single page that the user can use for navigation.

However, suppose you’re me and you don’t want to display your book as a series of very long web pages the user scrolls through, but as a series of pages–like the Kindle app or the iBooks app.

Well, that’s where my adventure down the rabbit hole comes into play.

As it turns out, on iOS the UIWebView has the ability to display web sites in “page” mode. But this doesn’t quite do the right thing. You get formatting errors all over the place, and fundamentally you want to have greater control over the layout of individual pages than UIWebView (which is deprecated in favor of a new web browser technology which does not do page-level layout), you need to lay the pages out yourself.

And ePub says that its pages are XHTML, using the HTML 5 specification (meaning you can parse the XHTML pages using an XML parser), and formatted with CSS 3.

And thus, my adventure into the land of understanding CSS 3 begins.


If you’re like me, the first thing you do when you scan the CSS 3 specs is you look for a document which describes the syntax of the file, and you implement it.

Which is what I did: I found the CSS 3 syntax specification, implemented the tokenizer or lexical parser (which I did by hand and was no more complex than most tokenizers), and started in on the syntax parser.

It started out fine. Section 4 of the Syntax document turned out to be incredibly easy to implement: section 4.3 pretty much holds your hand and tells you in prose exactly what to implement in code. (I’m sure there are some optimizations that can be applied here, but if you’re like me, you want to see something work before you go back and optimize.)

Then I got to Section 5, and did the same thing.

And that’s where the ground started to give way into quicksand.

First, notice the multiple entry points. Okay, this makes sense since CSS can be in its own separate file, embedded in a <style> tag, and snippets of CSS can be in the style attribute of the HTML file.

Sure, fine. I can deal with that.

But then you start building the parser and you notice things like this:

Note: Despite the name, this actually parses a mixed list of declarations and at-rules, as CSS 2.1 does for @page. Unexpected at-rules (which could be all of them, in a given context) are invalid and should be ignored by the consumer.

An “at-rule,” by the way, is sort of like a preprocessor macro in C: it can be just about anything, though what it is legally allowed to be at any point in the parse tree is context-dependent.

And then you see things like this: Consume a component value:

  • Consume the next input token.
  • If the current input token is a <{-token>, <[-token>, or <(-token>, consume a simple block and return it.
  • Otherwise, if the current input token is a , consume a function and return it.
  • Otherwise, return the current input token.

Hang on a minute, Sparky; did I just see what I thought I saw?

We’re returning tokens?


At this point you realize the syntax parser algorithm does not actually fully specify the CSS language. It basically finds components of the CSS file, but then at later points you need separate parsers to parse the language in order to find out what’s going on. That is, at the end of this parsing session you’ll see things like “qualified rules” which contain a list of tokens; you later have to parse those tokens to figure out you have a list of declarations–and so forth.

You are not, in other words, left with an abstract syntax tree. You’re left with a partially parsed file which still needs further work.

And how do you deal with a component being either a “function”, a “block” or a “token?” (Personally I reached back into the token object and added synthesized token values for “block” and “function”; that way I could use a recursive parser for further parsing tasks.)

So why the hell don’t you get a full abstract syntax tree?

Well…


First, let me note that the CSS 3 specification does not exist.

By that I mean the CSS 3 specification is not a single specification like CSS 1 or 2. There is no single document stamped “CSS 3” which is a complete description of the CSS 3 specification.

Instead, CSS 3 is essentially a series of “deltas”; smaller documents which describe changes to the CSS 2 specification which make the specification “CSS 3” compliant. The standards committe calls this a “modular approach”:

However for CSS beyond Level 2, the CSS Working Group chose to adopt a modular approach, where each module defines a part of CSS, rather than to define a single monolithic specification. This breaks the specification into more manageable chunks and allows more immediate, incremental improvement to CSS.

In practice, what this means is that to understand CSS 3, you must first understand CSS 2, then update any of the “out of date” chapters in CSS 2 with their relevant CSS 3 documents.

Think that’s bullshit? Well, here’s the official definition.


In practice, that means the syntax parser specification cannot fully parse the CSS 3 language, since future modules may supersede the current specification.

It also means in practice there is no definitive list of CSS 3 attributes anywhere in the document. Now you are not required to implement all attributes; with the introduction of media types in CSS 2 you may want to only implement a subset of the attributes relevant for your media type. And some media types introduce attributes that are only relevant for that media; for example, paged media (that is, web sites rendered to printers or to eBooks with flippy pages) introduce formatting attributes only relevant to print media.

This, by the way, also means to fully parse CSS 3 you must understand which attributes you want to implement (and a full list of CSS 2 attributes which form the foundation of CSS 3 are listed here), then you must build specialized parsers to parse the tokens in the relevant declarations.

And it means you must expand supported shorthand properties; properties which are shorthand for multiple other properties, some of which can have some mind-bending syntax of their own.

And you must expand shorthand properties, because in the following:

border: black;
border-top: red;

This should expand to:

border-top: red;
border-right: black;
border-bottom: black;
border-left: black;
border-image: none;

Meaning a shorthand property is exactly the same as if you wrote out the full properties–and later in the CSS you can override the individual properties without resetting all of the other components of the shorthand property.


So which properties are shorthand properties and what do they expand to? Well, that requires wading through the specification. And ignoring properties you don’t understand–which can be added later to the CSS specification, as it’s modular.

Once you realize the CSS specification is a hot mess, you find yourself writing a hell of a lot of code to handle all the various properties.

But wait, it gets worse! Not all properties promulgate. And that’s a good thing from a user’s perspective; just because you indicate a <div> tag should sport a nice sexy light-gray border doesn’t mean you want all the contained content to also display borders. On the other hand, if you specify the font of a <div> tag, you probably want the same font to be used inside every element of that tag unless specified otherwise.

But from an implementation standpoint that requires a table somewhere of all your properties and which ones promulgate and which ones don’t.

Resolving which attributes are associated with which elements is also well defined. But there are no good strategies given for how to do this quickly; a naive approach would be pretty brute force. (I’ve elected to build a cache of all selectors who have a final selector value that matches a particular cache key of the tag name (i.e., “<p>”), the id attribute and the class attribute. The theory is that the majority of HTML elements will share similar attributes, so the cache should be hit repeatedly. And if a tag hasn’t been seen yet, I construct a subset of the selectors which match the key. My hope is that for most HTML we’ll have zero lookups beyond the basic selection process.)

It just gives me a headache.

And I haven’t even gotten into page layout yet.

*sigh*

NSURLProtocol

So here’s something fun you can do with the whole family. (Okay, I have a weird family. YMMV.)

So I want to allow my UIWebView to handle URLs of the form “myscheme://myhost/mypath”, and intercept these requests to (for example) pull the data for each of these requests from a zip archive.

At first I thought “hey, just use the UIWebViewDelegate”, but that turns out not to work very well.

NSURLProtocol to the rescue!

There are plenty of tutorials out there, but they all seem to cover the idea of using NSURLProtocol as a sort of caching system. There are so many other things you can do with it, though!

Like, in my case, create a brand new way to pull data that does not rely on a network connection.


NSURLProtocol is an abstract class which allows you to insert a custom mechanism for loading URLs. What you do, you see, is build a new NSURLProtocol that handles some new protocol (like, oh, say, handling request to myscheme), and insert it into the networking stack so when, anywhere in your app, you see a request for “myscheme://blahblahblah”, it is handled by your custom code.

So here’s how you use the class.

Step 1

Create a new class which inherits from NSURLProtocol. (We’ll call this “MySchemeProtocol.”)

Define the glass method canInitWithRequest: which indicates that your class needs to be used to process your custom request. (Note: your protocols will be examined first when looking for a protocol to handle a request, so in theory you could intercept file:/// and http:// requests. Probably best not to do this.)

+ (BOOL)canInitWithRequest:(NSURLRequest *)request
{
    if (![request.URL.scheme isEqualToString:@"myscheme"]) return NO;
    return YES;
}

Note that if this returns true, a new instance of an NSURLProtocol class will be created for each request. You can thus use the NSURLProtocol class to track any local state associated with the specific request.

Step 2

The documentation also says you must implement canInitWithTask: and canonicalRequestForRequest:. You can read the documentation to understand what these methods do, but in my case (and I suspect, in yours), you really don’t need to do much. The former can examine the request behind the task, the latter can just pass the URL back.

Honestly I don’t know the consequences of defining canInitWithTask: the way I did, so beware.

+ (BOOL)canInitWithTask:(NSURLSessionTask *)task
{
    return [self canInitWithRequest:task.originalRequest];
}

+ (NSURLRequest *)canonicalRequestForRequest:(NSURLRequest *)request
{
    return request;
}

Step 3

Once you’ve added the boilerplate above, you must implement startLoading and stopLoading.

Now in my case, I assume that the contents from each request are loaded immediately after startLoading, though you can also kick off a thread to obtain the data. (If you do kick off a thread or a task or some other asynchronous mechanism for obtaining data, you must halt that process when stopLoading is called.)

So in my case, with data loaded immediately on startLoading, our required stopLoading method is easy:

- (void)stopLoading
{
    // Does nothing, since I satisfy the request synchronously in startLoading.
    // Otherwise, stop the background task or thread here.
}

Step 4

You start loading (or in my case, synchronously load) when startLoading is called.

Now here’s the thing I had to discover on my own. If you are loading data yourself (rather than just fiddling with the caching of data), you wind up having to interact with the NSURLProtocolClient object that is stored in your self.client field. This is the client that made the request, and it is where the data needs to go once you obtain it.

While loading data, if the cached data passed to your protocol is valid (for some definition of “valid” you get to define yourself), you can call the URLProtocol:cachedResponseIsValid: method, and return.

- (void)startLoading
{
    // In my case I assume the cached response is always valid. So if
    // we have a cached response, simply pass it up. You can add logic,
    // such as "if this is more than 5 minutes old, it's not valid."

    if (self.cachedResponse) {
        [self.client URLProtocol:self cachedResponseIsValid:self.cachedResponse];
    } else {

Now if we have to load our data, we then respond to the NSURLProtocolClient, first by calling URLProtocol:didReceiveResponse:cacheStoragePolicy: to indicate we are receiving something. Then we call URLProtocol:didLoadData: one or more times with the data we receive. (If this is being loaded asynchronously, we can call didLoadData multiple times as we receive our data.) And finally, once all the data is loaded, we call URLProtocolDidFinishLoading.

So in our case, as we’re loading all our data from an archive (which I don’t describe here how it works, other than it’s synchronous) is:

        // Load data from our internal file and pass the results
        NSData *data = [[MyArchive shared] loadDataFromURL:self.request.URL];
        NSString *mimeType = [[MyArchive shared] mimeTypeForURL:self.request.URL];

        NSURLResponse *response = [[NSURLResponse alloc] initWithURL:self.request.URL
                                                            MIMEType:mimeType
                                               expectedContentLength:data.length
                                                    textEncodingName:"utf-8"];
        [self.client URLProtocol:self 
              didReceiveResponse:response 
              cacheStoragePolicy:NSURLCacheStorageAllowed];

        [self.client URLProtocol:self didLoadData:data];

        [self.client URLProtocolDidFinishLoading:self];
    }
}

Step 5

Now that you’re done, you simply need to register this when your application starts up:

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
    // Other stuff goes here...

    [NSURLProtocol registerClass:CSBookURLProtocol.class];

    return YES;
}

And now you can invoke your protocol when loading a specialized URL, such as:

NSString *myURL = @"myscheme://myhost/mypath";
NSURL *url = [NSURL URLWithString:myURL];
NSURLRequest *req = [NSURLRequest requestWithURL:url];
[self.myWebView loadRequest:req];

Abstracting my papers in one place.

Over the years I’ve described several algorithms and written several “papers” which describe those techniques, which I’ve abstracted on a new page on my blog. This includes a detailed description of the LR(1) algorithm used by the OCYacc tool I recently open-sourced, as well as papers on computer graphics and computational geometry. Hopefully they will come of use to someone out there.

Designing a User Interface? First, create a visual language.

Reading through the comments in this Slashdot article and I’ve noticed a few things.

Apple Is Really Bad At Design

Everyone seems to agree where Apple went off the rails was with iOS 7.

But why–no-one seems to be able to agree.

In the original article linked to by Slashdot, the article’s author seems to hinge his argument on the fact that the new operating system looks ugly. Apple Is Really Bad At Design

In 2013 I wrote about the confusing and visually abrasive turn Apple had made with the introduction of iOS 7, the operating system refresh that would set the stage for almost all of Apple’s recent design. The product, the first piece of software overseen by Jony Ive, was confusing, amateur, and relatively unfinished upon launch. While Ive had utterly revamped what the company had been doing thematically with its user interface — eschewing the original iPhone’s tactility of skeuomorphic, real-world textures for a more purely “digital” approach — he also ignored more grounded concepts about user experience, systematic cohesion, and most surprisingly, look and feel.

Now it almost sounds like the original author was about to stumble on the truth.

But then, he fails:

“It’s not just that the icons on the homescreen feel and look like the work of a lesser designer. They also vary across the system. For instance, the camera icon is a different shape in other sections of the OS, like the camera app or the lockscreen,” I wrote at the time. “Shouldn’t there be some consistency?” While this may seem like obsessive nit-picking, these are the kinds of details that Apple in its previous incarnation would never have gotten wrong.

And, at the bottom of the stack, the essay seems inspired by the iPhone X, which is described repeatedly in the article as “ungainly and unnatural”, “bad design”, “a visually disgusting element.” And by extension the entire Apple environment is described as “fucking crazy.”

The comments in the Slashdot article also riff on the visually consistent or visually pleasing aspects of design:

The result is what you see now in Apple products – a muddled mess of different ideas that just don’t fit together right, and very little actual customer value.


And you know what? We see this focus on the beautiful in many other applications. You can see it in how we describe UX jobs and who gets hired for UI design: UI, UX: Who Does What? A Designer’s Guide To The Tech Industry.

UX designers are primarily concerned with how the product feels. A given design problem has no single right answer. UX designers explore many different approaches to solving a specific user problem. The broad responsibility of a UX designer is to ensure that the product logically flows from one step to the next. One way that a UX designer might do this is by conducting in-person user tests to observe one’s behavior. By identifying verbal and non-verbal stumbling blocks, they refine and iterate to create the “best” user experience. An example project is creating a delightful onboarding flow for a new user.

It’s worth reading the entire thing to understand the state of the industry, or the fact that sometimes

The boundary between UI and UX designers is fairly blurred and it is not uncommon for companies to opt to combine these roles.


You know what is missing here?

From the movie Objectified, a movie anyone who is a designer or interested in design must watch:

I think there are really three phases of modern design. One of those phases, or approaches, if you like, is looking at the design in a formal relationship, the formal logic of the object–the act of form-giving: form begets form.

The second way to look at it is in terms of the symbolism and content of what you’re dealing with: the little rituals that make up making coffee or using a fork and knife, or the cultural symbolisms of a particular object. Those come back to a habit and gives form, helps give guidance to the designer about how that form should be or how it should look.

The third phase, really, is looking at Design in a contextual sense, in a much bigger picture scenario. It’s looking at the technological context for that object. It’s looking at the human and object relationship.

The first phase you you might have something fairly new, like Karim Rashid’s Kone vacuum, which is for Dirt Devil. The company sells this basically, “so beautiful, you can put it on display.” In other words you can leave it on your counter, it doesn’t look like a piece of crap.

Conversely you can look at Dyson and his vacuum cleaners. He approaches the design of his vacuum in a very functionlist manner. But if you look at the form of it it’s really expressing the symbolism of function. The color introduced into it is–he’s not a frivolous person–so it’s really there to articulate the various components of the vacuum.

Or you can look at, in a more recent manifestation in a kind of contextual approach, would be something like the Roomba. There, the relationship to the vacuum is very different. First of all, there is no more human interaction relationship. The relationship is to the room it’s cleaning.

I think it’s even more interesting the company has kits that are available in the marketplace called “Create”. It’s essentially the Roomba vacuum cleaner kit that’s made for hacking. You put a really wacky–I mean, you can create things like “Bionic Hamster” which is attaching the kind of play wheel or dome a hamster uses as a driving device for the Roomba. So it is the ultimate revenge on the vacuum cleaner.

How I think about it myself is that design is the search for form. What form should this object take.

– Andrew Blauvelt, Design Curator, Walker Art Center (around the 20 minute mark)


First, if you are really interested in design, do yourself the favor of watching Objectified.

But the most important part about the section above–to me, the most valuable two and a half minutes of that movie (outside of the opening sequence, of course)–is the notion that design is not just making things “look pretty.”

When you really look at it, the Dyson vacuum is an ugly looking contraption of a machine.

What is important is considering design as a formal process. And much of this “form-giving” really involves the visual language used to express the design of an object, since it is that visual language which helps us understand the object, our relationship to the object, and how we interact with that object.

Look again at the Dyson vacuum. The use of color is deliberate, Specifically, in the photo linked, the yellow parts are all components which either articulate, or which are components that can be moved, taken apart or put back together. The brush, for example, is the yellow thing at the bottom–and can be disassembled easily and reassembled for cleaning. Even the bucket which stores dirt can be disassembled by separating the yellow and gray components, as can the pathway which brings air in from the hose. (If you look carefully you see a small yellow thing just above the wheel; this is a lever which is used to disassemble the components which bring air into the storage bucket, and periodically need cleaning.)

This use of yellow expresses not just the form or function of the device–but clearly articulates how to use the vacuum: how to remove the cord, how to turn it on, how to take it apart and put it together for maintenance.

Yellow, in this example, is part of the visual language, and the consistent use of yellow is used to express function–to guide the user, to make using the vacuum easy.


Perhaps you’ve seen the poster. Perhaps not. But it says:

A user interface is like a joke.

If you have to explain it, it’s not that good.

The Dyson Vacuum, through the consistent use of the color yellow, and through the careful considered shapes of the yellow components, does not need to explain itself. It’s clear how to take apart the air pathway. It’s clear how to remove and clean the basket. It’s even clear how to disassemble down to the replaceable filter. Just consider the yellow parts, and twist, press or pull as the shape suggests.

This lesson reveals a larger one: if you want your user to understand how to use your interface, design and use a consistent visual language.

This includes the consistent use of shape and of color, so that the same shape or visual design performs the same way consistently across your entire application. Further, the same shape or visual design does not have multiple behaviors, and behaviors are not hidden behind different shapes or colors.

The best user interfaces are like the Dyson Vacuum. Good design does not have to be beautiful, but it should be suggestive of functionality. Good design should not hide functionality, though it does not need to be overly obtrusive.

And this is where things start falling apart on later versions of iOS. 3D Touch, for example, allows you to ‘hard press’ on an icon in the desktop of a later iOS device (like the iOS 7 phone) and have a pop-up preview or menu appear.

But how do you know to ‘hard press’ that icon or image?

How many of you who own iPhone 7’s even know you can do this?

You see the same thing with Android with the “long press”, which was used in prior user interface guidelines to show a pop-up menu to reveal further functions. (The new Material design tries to get away from this gesture–but how many people know the way they delete e-mails on earlier versions of Android was to long-tap the e-mail?)

But hell, even Apple knows these “hidden” gestures create a problem:

Adopt Peek and Pop consistently. If you support Peek and Pop in some places but not others, people won’t know where they can use the feature and may think there’s a problem with your app or their device.

And yet Apple never answers that most fundamental question: how does the user even know they can hard-press something in the first place? How do they know they can swipe left? How do they know they can swipe down? How do they know what they can do?

Now on iOS 6, at least some of these questions were answered: buttons had rings around them or were icons arrayed in a row.

But in the tension between visual clarity and being visually clean and visually beautiful, beauty won over understanding. We’ve moved away from the Dyson Vacuum–from the “functionalist” approach to deconstructing design and creating a constant visual language (where checkboxes, radio buttons, drop down lists and the like are easily understood because we’ve consistently used similar art to express the same functionality), towards a flat design which dispenses of any hints of functionality.

Today’s interfaces are as beautiful as users are befuddled.


But it gets worse. In dispensing the logical deconstruction of interface and the creation of consistent user interface “languages” (such as consistently using the same gesture to mean the same thing regardless of the context in which we operate), modern design is less about creating design manuals which express the visual language being used and how to consistently apply that language to solve design problems–and has become more about creating beautiful interfaces, but with absolutely no respect for usability. Usability has become, in today’s world, an afterthought, the equivalent of cargo cult thinking where we echo form without understanding what gives form.

Just look at the first of the core values listed on Apple’s human interface guidelines page, which focuses on Aesthetic Integrity, Consistency and Feedback–but without ever describing what these mean other than from an aesthetic perspective.

It’s no wonder why people in the original Slashdot article claim Apple can no longer design stuff.

Because to us “design” is no longer about the formal relationship of the design, or about the symbolism and content–but about if it looks pretty.

And thus, that notch on top of the iPhone X is considered “bad design” because it “looks ugly” without ever considering if the design (and the design language) serves a purpose or provides functional usability.

Because we don’t have the tools to describe the formal relationship between a human and a computer, all we’re left with is the artistic qualities of that object. So we look at iOS 7 and don’t understand why it fails. We don’t see that in moving towards the “print page” look and feel of user design, iOS 7 has eschewed the very visual hints we need to know if a red line of text is a button, or just a highlighted passage.

We’re left not knowing if the yellow knob on the vacuum cleaner is a lever, or an immovable bit of plastic added for style that we will break if we attempt to twist it.

We don’t understand why iOS 7 (and later versions of iOS) fail, even though the rather ugly Dyson Vacuum succeeds.

And sadly we are left with the ultimate message of the Slashdot article: that iOS 7 (released in 2013) fails because the iPhone X (which has yet to ship) has a notch at the top.


Weirdly, in the process, our design becomes more artistically radical and yet more conservative, more entrenched. Design patterns we barely understand (such as the inverted “L” of web design, or the bottom tab buttons of a mobile device, or using grids to lay web pages out) are reused without understanding what motivates those choices, because they seem familiar to us. We no longer have the tools to create truly revolutionary designs because we no longer know what makes them work.

Worse, in our conservatism mobile devices design becomes web page design but on a smaller scale. Web page design becomes mobile device design but on a larger scale. Desktop applications become mobile web page design but with menu bars and multiple windows.

All of it has become ritual without understanding why.

We’re dancing naked around a fire hoping the Gods will deliver rain to our crops.


You want to create good design?

Then first, you must create a consistent visual language. You must not start with a blank page and start drawing stuff on the page until it looks pleasing. Instead, you must start with a “design manual” for your application.

And you must answer some questions–because in today’s age we no longer have design guidance like we used to.

Questions like “how should I decompose my application?” “How should I consistently present printed information, photos, lists.” “How should I separate sections of information.”

And even more basically, “what does a button look like?” “How does that button behave?” “How can I tell the user that my icon is tappable” (and that could be simply a question of consistent placement), or “how do I tell the user that this button will reveal multiple options?”

Even deeper than this, we must answer fundamental questions such as “what are the nouns–the objects–in my application?” “What are the verbs?” (That is, the actions which operate on those nouns.) “What are the adjectives?” (The modifiers which modify nouns, like color to suggest a value is negative or bad or out of bounds.)

In many ways, because your users have a formal relationship with your application, and because your goal is to clearly communicate how to use what is probably a much more complex and feature-rich application than “pick up vacuum, press button, suck up dirt”, you probably want to eschew the beautiful in favor of the formal. (But even the Kone suggests how it is to be used: the seam bisecting the cone shows the point where the base separates from the enclosed vacuum, the flat top reveals the on/off button.)

And if that means you have little gray rectangles and small dots in places which make your application look more cluttered than you’d like–consider if your user is then able to use your application.

Beyond this, while you should respect the media and the established conventions where your design appears, so long as you are consistent in your designs, you can explore new ideas and new gestures. And you can consider new problems few designers are considering–such as the problem that on larger mobile devices, users can no longer hold the device in one hand and use it with his thumb if your controls are placed at the top of the screen where they are no longer in reach.

Because that’s the bottom line, isn’t it? Not if your application is pretty, but if your users can use it–and continue to use it, and continue to help you generate revenue.

OhMyGoodness, getting rid of affordances makes it harder on users? Who knew?

As seen on /.: It’s official: Users navigate flat UI designs 22 per cent slower

Have you ever bonked your head on a glass door because you had no clue how to open the door–because the architects decided to make the design “clean” by getting rid of anything that ruined the clean lines of the door?

Yeah, that’s our modern UI environment.

Door

I promise you this is a picture of a door. Do you see how to open it?


I mean, look at the examples provided here.

First, let’s dispense with the stupid items listed as “features of flat design”. They list, amongst the supposed advantages of a flat design “functionality”, “close attention to details” and “clear and strict visual hierarchy”–because before the invention of flat design, none of us wanted to deliver functionality, and most of us slopped our user designs together in the same way we slop pigs. (*eye roll*)

And let’s look at the supposed “advantages”: “simplicity of shapes and elements”, “minimalism” and “avoiding textures, gradients and complex forms.”

Which suggests to me the problem with the photo of my door above is that it contains a complex shape and unclear hierarchy which distracts from the functionality of the door.

Here. Let me fix that.

Door2

I know the difference is subtle, but to the purist, makes the door much better looking. No more distractions from the pure essence of a door, one that has a single unitary shape, a minimalist door free of visual distractions.

Right up until you face-plant yourself because you can’t open the god-damned thing.

I mean, look at the animated example they give:

1 itTTnLfQEtyWHBZMwO93aQ

Setting aside the cute (and distracting animation of the weather icon to the side), how does the user know that by tapping and dragging he expands and shrinks a region? How does he know that it doesn’t scroll the page instead? Or that tapping (instead of swiping) would expand or shrink an area? Or that tapping instead pulls up an hourly prediction?

How does he know that swiping left and right gives the previous and next day’s weather prediction?

And notice the design is not entirely free from complex shapes. The two icons in the upper right? Is that two separate buttons, or a single toggle (as the shading suggests)?

Or notice the location in the Ukraine. Is the location tappable? Can we pick a new location?

The key here is that the user does not have a fucking clue. And let’s be honest: there is no delight in a “discovery” which seems more designed to make the user feel like a stupid idiot.

I’m not going to even address the complex and superfluous animations which, while cute, and may even be demanded in some markets, exist only to say how great the application is, but provide absolutely no aid to user comprehension.


Look, I’m not asking for buttons and checkboxes and the like.

It’s not like you have to beat your users over the head; you can have clean lines and still use affordances which subtly guide the user on how to use your application. Just create a consistent visual language so that, for example, all shapes with a small dot in the corner can be resized by dragging.

But I am suggesting if the user needs to spend time figuring out how to open the door, they’re less likely to go through the door.

And you lose users. And revenue.

Some thoughts on designing a computer language.

Designing a computer language is an interesting exercise.

Remember first, the target of a computer language is a microprocessor or a microcontroller. And microprocessors or microcontrollers are stupid: they only understand integers, memory addresses (which are just like integers; think of memory as organized as an array of bytes), and if you’re lucky, floating point numbers. (And even there, they’re handled like integers but with a decimal point position. Stored, of course, as an integer.)

Because of that, most modern computer languages rely on a run-time library. Even C, which is as close to writing binary code for microprocessors as most of us will ever get, relies on a run-time library to handle certain abstract constructs the microprocessor can’t. For example, a ‘long’ integer in C is generally assumed to be at least 32-bits wide–but if you’re on a processor that only understands 16-bit integers, any 32-bit operation on a long integer must be handled with a subroutine call into a run-time library. And heck, some microcontrollers don’t even know how to multiply numbers, which means a * b has to translate internally into __multiply(a,b).

For most general-purpose programming languages (like C, C#, C++, Java, Objective-C, Swift, ADA and the like), the question becomes “procedural programming” or “object-oriented programming.” That is, which paradigm will you support: procedures (like C)? or objects? (like Java)

Further, how will you handle strings? How will you handle text like “Hello world?” Remember: your microprocessor only handles integers–not strings. And under the hood, every string is basically just an array of integers: “Hello world?” is stored in ASCII as the array of numbers [ 72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 63 ], either marked somewhere with a length, or terminated with an end of string marker, 0.

In C, a string is simply an array of bytes. The same in C++, though C++ provides the std::string class which helps manage the array. In Java, all strings are translated internally into an array of bytes which is then immediately wrapped into a java.lang.String object. (It’s why in Java you can write:

"Hello world?".length()

since the construct “Hello world?” is turned into an object.) Objective-C turns the string declaration @”Hello world?” into an NSString, and Swift into a String type, which is related to NSString.

Declarations also become interesting. In C, C++ and Objective-C, you have headers which forces your language to provide a mechanism for representing external linkage. Those three languages also provide a mechanism for representing abstract types, meaning for every variable declaration:

int *a;

which represents the variable a that points to an integer, you must be able to write:

int *

which represents the abstraction of a variable which points to an integer.

And for every function:

int foo(int a, int *b, char c[5]) {...}

You need:

extern int foo(int, int, char[5]);

But Java does not provide headers, so it has less need for header declarations–but then adds the need to mark methods as “public”, “protected” or “private” so we know the scope of methods and variables which can be hidden in C by simply omitting the declaration from the header.

This means Java’s type declaration system can be far simpler than C’s.

And while we’re at it, what types are you going to allow? Most languages have integer types, floating point types, structure or object types (which basically represent a record containing multiple different internal values), array types, and pointer or reference types. But even here there are differences:

C allows the use of unsigned values, like unsigned integers. Java, however, does not–but really, the only effective difference in performing math operations between signed and unsigned integers are right-shift operations and compare operations. And Java works around the former with the unsigned right shift (‘>>>’) operator.

C also represents arrays as simply a chunk of memory; C is very low level this way. But Java represents arrays as a distinct fundamental type, alongside basic types (like integers or floating point values) and objects.

And pointers or references can be explicit or implicit: C++ makes this explicit by requiring you to indicate in a function if an object or structure is passed by value (that is, the entire object is copied onto the stack), or by reference (that is, a pointer is passed on the stack). This makes a difference because updating an object passed by value has no effect on the caller. But when passed by reference, changes to the object can affect the caller’s copy–since there really is only one copy in memory.

Java, on the other hand, passes objects and arrays by reference, always.

This passing by reference makes the ‘const’ keyword (or its equivalent) very important: it can forbid the function being called from modifying the object passed to it by reference.

On the flip side, Java does not have the concept of a ‘pointer’.

And let’s consider for(...) loops. The C language introduces the three-part for construct:

for (initializer, comparator, incrementer) statement

which translates into:

        initializer
loop:   if (!comparator) goto exit;
        statement
        incrementer
        goto loop;
exit:

But Java and Objective C also introduce different for loop constructs, such as Java’s

for (type variable: container) statement

which iterates the variable across the contents of the container. Internally it is implemented by using the Java’s Iterator interface, and translates the for loop above as:

        Iterator<type> iterator = container.iterator;
loop:   if (!iterator.hasNext()) goto exit;
        type variable = iterator.next();
        statement
        goto loop;
exit:

Of course this assumes container implements the Iterable interface. (Pro-tip: If you want to create a custom class which can be used as the container in a for loop, implement the Iterable interface.)

While we’re at it, if your language is object oriented, do you allow multiple inheritance, like C++ where an object can be the child of two or more parent objects? Or do you implement an “interface” or “protocol” (which specifies methods that are required to be implemented but provides no code), and have single inheritance, where objects can have no more than one parent object but can have one or more interfaces, such as in Java or Objective C?

Do you make exceptions a first-class citizen of your language, such as in Java or C++? Or is it a library, such as C’s setjmp/longjmp calls? Or is it even available? Early versions of Pascal did not provide for exception handling: instead, you must either explicitly handle problems yourself, or you must check to make sure that things don’t go haywire: that you don’t divide by zero, for example.

And we haven’t even dived into more advanced features. We’ve just stuck with the stuff that most general purpose languages implement. Ada has built-in support for parallel processing by making threads and synchronization part of the language. (Languages like C or Swift require a library–generally based on POSIX Threads–for parallel processing, though the availability of multi-threaded programming in those languages are optional.)

Other languages have built-in handling of mathematical vectors and matrices, or of string comparison and regular expressions. Some languages (like Java or LISP) provide support for lambda functions. And other languages combine domain-specific features with general purpose computing–such as PHP, which allow general-purpose programs to be written, but is designed for web pages.

Pushing farther afield, we have languages such as Prolog, a declarative language which defines the formal logic rules of a program without declaring the control flow to execute the rules.

(Prolog defines the relationships between a collection of rules, and performs a search through the rules in response to a query. Such a language is useful if we wish to, for example, provide a list of conditions that may be symptoms of a disease; a Prolog query would then list the symptoms, and after execution provide a list of diseases which correspond to those symptoms.)

But let’s ignore stuff like this for now, since my interest here is either procedural or object-oriented programming. (One could consider object-oriented programming as basically procedural programming performed on objects.)


The design of a programming language is quite interesting.

And how you answer questions like this (and other questions that may come up) really determine the simplicity of learning verses the expressive power of the language. Sadly, expressive power can become confusing and harm learning: just look at the initial promise of Swift as an easy and painless language to learn. A promise that has since been retracted, since Swift is neither a stable language (Swift 1 does not look like Swift 4), nor simple. Things like the type safety constructs ? (optional) or ! (forced) are hard to understand, since they rely on the concept of “pointers” and the safety (or lack thereof) of dealing with null pointers (that is, pointers to memory address 0, which typically means “not initialized” or “undefined”).

Or just look at how confusing the C type system can become to a beginner. I mean, it’s easy for a beginner to understand:

int foo[5];

That’s an array of 5 integers.

But what about:

char *(*(**foo[][8])())[];

What the hell???

Often you find C programmers avoiding the “expressive power” of C by using typedefs instead; declaring each component of the above as an individual type.

It is in large part because of C’s “expressive power” (combined with terse syntax) which allows contests like the International Obfuscated C Code Contest to exist: notice we don’t see an “obfuscated Java contest”.

Behold, a runner up in that contest.

But at least it isn’t APL, a language once described to me as a “write-only programming language” because of how hard it is to read, making use of special symbols rarely found on older computers:

(~R∊R∘.×R)/R←1↓ιR

This is the Wikipedia example of an APL program which finds all prime numbers from 1 to R.

No, I have no clue how it works, or what the squiggly marks mean.

Simplicity, it seems to me, forgoes expressive power. Java, for example, cannot express the idea of an array of pointers to functions returning pointers to arrays–since Java does not have the concept of a pointer to a function (that’s handled by the reflection API), or does Java have the concept of pointers. Further, Java does not permit the declaration of complex anonymous structures; first, everything is a class. And second, classes are either explicitly named or implicitly named as part of an anonymous declaration. It’s hard to declare something like the following C++ declaration; you’re forced to break down each component into its own declaration.

struct Thing {
    struct {
        int x;
        int y;
    } loc;
    struct {
        int w;
        int h;
    } size;
};

And it’s just as well; this makes more sense if you were to write:

struct Point {
    int x;
    int y;
};

struct Size {
    int w;
    int h;
};

struct Thing {
    Point loc;
    Size size;
};

It becomes clear that “Thing” is a rectangle with a location and a size.

But then, people often complain that Java requires a lot more typing to express the same concept.


It’s a balance. It’s what makes all this so fascinating.