User Interface Truisms.

Fundamentally how hard it is to use a user interface depends on the cognitive load of that interface–that is, how much thinking you have to do in order to use the interface.

Now “thinking” is one of those fuzzy things that really needs to be quantified.

As a proxy for user interface complexity (or rather, how hard it is to use an interface) some people use the number of buttons that one has to press in order to accomplish a given task. But that is only a proxy: clearly even though it takes 19 button presses (including the shift key) to type “Hello World!”, one would never argue that these 19 button presses is as hard to accomplish as navigating through 19 unknown interface menus.

That’s because the button press is a proxy for a decision point. The real complexity, in other words, comes from making a decision–which goes right back to cognitive load, the amount of thinking you have to do to accomplish a task.

So decision points are clearly a sign of cognitive load.

Now think back when you first saw a computer keyboard, and how mystified you were just to type a single letter: search, search, search, ah! the ‘h’ key! Success! Now, where is that stupid ‘e’ key? Oh, there it is, next to the ‘w’ key–why is it next to the ‘w’ key? Weird. Now, if I were the ‘l’ key, where would… oh, there it is, all the way on the other side of the keyboard. Weird. At least I know to press it twice. And now the ‘o’–ah, right there, next to the ‘l’ key.

Ooops. I have a lower case ‘h’. How do I back up? …

Clearly, then, along with decision points we have familiarity with the interface as a factor in how hard a user interface is to use: if we know how to touch type, typing “Hello World!” doesn’t even require thought beyond thinking the words and typing them. For for the uninitiated to the mysteries of the computer keyboard, hunting and pecking each of those keys is quite difficult.

Complexity, then, is cognitive load. And cognitive load goes towards the difficulty in making the decisions to accomplish a task, combined with the unfamiliarity of the interface to find what one needs to do to accomplish that task.

separator.png

Now, of course, from a user interface design perspective there are a few things that can be done to reduce cognitive load, by attacking the familiarity problem.

One trick is to have a common design language for the interface. By “design language” I’m referring, of course, to what you have to do to manipulate a particular control on the screen. If you always manipulate a thing that looks like a square by clicking on it–and the act of clicking on it causes an ‘x’ to appear in that square or disappear if it was there, then you know that squares on the screen can be clicked on.

And further, if you know that squares that have an ‘x’ in them means the item is somehow “selected” or “checked” or “enabled” of whatever–and you know that unchecking something means it’s “unselected” or “un-checked” or “disabled”–then suddenly you have some familiarity: you can quickly see boxes and realize they are check-boxes, and checking them means “turn this on” and unchecking means “turn this off.”

This idea of a design language can even extend to interfaces built strictly using text-only screens: if you see a text that looks like [_________], and a blinking square on the first underscore, and typing types in that field–and hitting the tab key moves to the next [_________] symbol on the screen, then you know all you need to know to navigate through a form. Other text symbols can have other meaning as well: perhaps (_) acts like our checkbox example above, or acts like a radio button (a round thing you can select or unselect which has the side effect of unselecting all other related round button thingies) or whatever.

The point is consistency.

And this consistency extends beyond the simple controls. For example, if you have a type of record in your database that the user can add or remove from a screen, having the “add” and “delete” and “edit” buttons in the same place as on other screens where other records are added or deleted helps the user understand that yes, this is a list of records, and immediately he knows how to add, delete, and edit them.

Visual language provides a way for a user to understand the unfamiliar landscape of a user interface.

separator.png

The other trick is for selective revelation of the interface in a guided way, revealing decisions that need to be made in an orderly way.

For example, imagine an order entry system where the type of order must be first selected, then the product being ordered, then product-specific information needs to be entered. This could be implemented by selectively showing controls that need to be filled out at each step of the process. This could be accomplished by a wizard. And notice that unneeded information (such as the size of a clothing item, unneeded when ordering a purse) can continue to be hidden if not needed.

The goal with this is to help guide the decision making process, to help gather the information in the order needed by the system. And by guiding the decision making process you reduce cognitive load: you ask only the questions that are needed rather than overwhelm the user with a bewildering array of interrelated choices, some of which (such as the clothing size of a purse) are nonsensical.

separator.png

The problem with all these user interface tricks (and there are plenty of others: arranging the information on the screen, tips and hints that dynamically come up, on-line help for the first time user, making interface reflect a consistent cognitive model, reducing short-term memory load by segregating items into 7 +/- 2 items or groups, etc.) is that they all go towards tackling the familiarity problem of the interface. In other words, they only go towards reducing the cognitive load of the interface itself.

And, honestly, most of these design patterns are pretty well known–and only go towards reducing the cognitive load of the first-time user. Once someone has gained familiarity with an interface–even a very bad one–the cognitive load imposed by a poorly designed interface is like the cognitive load imposed by a computer keyboard: eventually you just know how to navigate through the interface to do the job. (To be clear, reducing familiarity cognitive load reduces training costs if this is an internal interface, and reduces consumer friction and dissatisfaction of an external interface–so it’s important not to design a bad interface.)

Ultimately the cognitive load of a system comes from the decision points imposed by the interface. And a user interface can only present the information from the underlying system: ultimately it cannot make those decisions on behalf of the operator. (If the user interface could, then the decision wouldn’t need to be made by the operator–and the decision point really isn’t a decision point but an artifact of a badly designed system.)

What this means is that in order to simplify a product, the number of operator decision points must ultimately be reduced–either by prioritizing those decision points (noting which decisions are optional or less important to capture), or through redesigning the entire product.

separator.png

Remember: a user interface is not how a product looks. It’s how the product works.

What’s wrong with business reporting of the Computer Industry

You read through a report on computer industry jobs, or you take a test in high school which leads you to believe you may have a future as a “Computer Terminal Operator” or a “Computer Software Analyst.” And the stuff you read makes no sense whatsoever–things like “A Computer Programmer converts symbolic statements of business, scientific or engineering problems to detailed logic flow charts into a computer program using a computer language”, and you think “what?” Or you see something like:

Disappearing Jobs:

Plus, the work of computer programmers requires little localized or specialized knowledge. All you have to know is the computer language.

And you think “WTF?!?”

Really?

Here’s the problem. All of these descriptions are based on an industry classification scheme first created by the Bureau of Labor Statistics, part of the United States Department of Labor. And the descriptions are hopelessly out of date.

In the world of the Bureau of Labor Statistics, this is how a computer program is created, executed, and the results understood:

separator.png

First a business (such as a WalMart) decides that it has a business reason to create a new computer program. For example, they decide that they need a computer system in order to determine which products are selling better in one geographic region, so they can adjust their orders and make sure those products flow to that area.

This problem was probably identified by a Management Analyst (B026), whose job is to “analyze business or operating procedures to advise the most efficient methods of accomplishing work.” Or it was identified by an Operations Analyst (A065).

So they work with a Systems Analyst (A064) in order to restate the problem (“find the areas where different SKUs are selling better, compare against current logistical shipping patterns, and adjust future orders to make sure stores are well stocked for future demand”) into a detailed flow chart and requirements documents outlining how this process should work.

Once this flow chart has been agreed upon by the analysts and operations engineers and scientists, they turn the problem over to a Computer Programmer (A229) in order to convert the flow charts describing the problem into a computer program. This computer program is generally written using automated data processing equipment, such as a punch card reader.

The computer programmer verifies his program by working with a Computer Operator (D308) to submit his punched cards to the mainframe and, after a program run completes, returning the printout to the computer programmer’s “in box”, a wooden box used to hold the printouts designated to a specific programmer. (Computer operators “select and load input and output units with materials, such as tapes or disks and printout forms for operating runs.”)

Once the Computer Programmer has verified that his punch card deck is properly functioning, he will then eventually request (depending on the nature of the program) to either have his job run on a regular basis or loaded into the business mainframe so that inputs to his program may be submitted by a Data Entry Keyer (also known as a Computer Terminal Operator D385), or the process may be batch run by management, depending on if the Computer Systems Administrator (B022) permits it.

separator.png

Does this sound like the software industry you’re familiar with? No?

Here’s the problem. This is what every organization outside of the computer industry thinks goes on at places like Google, Apple, or Microsoft. Government decisions on education, the Bureau of Labor Statistics reports, and even reporting by places like CNBC are all driven by this image of the computer industry–an image that is about 30 to 40 years out of date.

And no-one has figured it out in the government, because every time they send out a survey on jobs to the computer industry, generally some guy somewhere goes “well, hell, none of this sounds like my guys. So I’m going to guess they’re all in the A064 category, because they need to think about what they’re writing–so they can’t be in the A229 category.

Until someone tells the Bureau of Labor Statistics their designations are garbage, we’re just going to continue to get garbage out from the BLS, from Government managed school textbooks (who still advise people about professions as a “Systems Analyst”, a profession that doesn’t actually exist as such), and from reporting outlets like CNBC–all who get their understanding from the BLS occupational classification system.

separator.png

Like fish who don’t notice the water they swim in, we don’t really know how much the government and government classification systems affect our thinking in this country.

separator.png

Addendum: Even job survey sites tend to use the same BLS classifications for classifying job and salaries, which is why most job sites talk about “Systems Analyst III” or “Computer Programmer II” job designations–which you will never see on a Google or an Apple job listing. It’s why figuring out salary requirements is such a royal pain in the ass for the computer industry as well–because everything is getting classified into buckets that are 30 years out of date.

Really?

Sorry I’m picking on this guy, but he brings up two things that really irritates the living hell out of me about our industry.

Having worked in software development for over 15 years and developing software for nearly 30 years,…

So… you’re counting coding when you were 5?

Look, for all I know this guy was a lawyer who started writing software out of college, but waited to jump ship until he was 35 and he’s now in his 50’s. But most developers I know who want to talk about how much programming experience they are count all the way to the time when their mommy gave them a toy laptop with a BASIC interpreter.

I’m sorry, but I’d like to call “bullshit” on that.

In every other industry I’m aware of, experience is counted as professional experience, where you had a fucking job (or a volunteer job) where you actually went somewhere and did something for money (or to help with a volunteer organization). It involved regular hours and following directions from another managing supervisor, and it involved delivering stuff on a schedule.

Fucking around when you’re 8 on daddy’s computer with the copy of GW-Basic that came on the computer no more counts (in my mind) as “experience” than does wrapping duct tape over a leaking air duct counts as “experience” as an HVAC Contractor, or fixing a leaky pipe counts as a Plumber, or helping a friend nail boards together to tack up a broken piece of trim on their house counts as experience as a finish contractor. Renting a U-Haul doesn’t count as truck driver experience, helping your younger sister with her homework doesn’t count as teaching experience, and convincing a friend he should buy a new phone doesn’t count as sales experience.

So why in the name of Cthulhu does tinkering around with a BASIC interpreter after school in the 2nd grade count as development experience?

Really?!?

To me, experience is professional experience. College education can be counted if called out separately as college, in the same way that taking technical courses to become a plumber can count on your resumé towards being a plumber, if called out separately. Fucking around sniffing pipe glue doesn’t count.

By this metric, I graduated from Caltech in 1988 with a degree in Mathematics, with experience in computer graphics, computational theory, a touch of VLSI design and hardware design, and a touch of mathematical optimization theory. From 1987 to 1988 I worked for a company doing computer graphics for the Macintosh (I needed to take one class to finish my degree, so worked full time while finishing up), and since graduating I’ve worked non-stop as a software developer in a variety of fields.

Which means I have 23 years of professional experience and a 4 year degree from a pretty good school.

I don’t count the time when I got a TRS-80 back in the 70’s, or the time when I was 12 learning Z80 assembler to hack my computer, or the time I spent in high school learning LISP. If I were, I could say things like “well, I have been developing software for nearly 33 years”–or is that longer, since I started tinkering with programmable calculators and electronic circuits well before then?

I call bullshit on the practice. Sorry.

In my opinion development experience starts accumulating the first time you get a real fucking job, either post-college, post-high school, or after dropping out of college. Not from the time your elementary school teacher allowed you to play with the old Commodore 64’s after school if you finished your homework first.

You’ve guessed it, Scrum adresses all of these resulting in 99% – 100% on target delivery. So it’s not due to bad programmers if an agile process can fix this.

I also call bullshit on the “one size fits all management style fixes all the problems in the computer industry.”

Excuse me, but Agile will fix all the ills of wrong estimation, wrong status updates, scope creep, and the like?

Really?

Because Agile will–what, exactly? Reduce the problem set into manageable chunks that can be fed to our steady stream of interchangeable cogs of programmers?

Don’t get me wrong; I think Agile is a reasonable tool for certain problems in a managerial toolkit. Along with good bug tracking, a well organized QA process, and motivated developers, and a good project plan. But Agile doesn’t fix a damned thing: it simply creates a regular communications channel between individual contributors and managers who need to keep track of the bottom line.

It helps, in other words, if you have good managers and good developers. But it won’t do squat if you have dysfunction in the overall team. And while I’ve seen people say that this dysfunction isn’t Agile, I’d argue that if you use a business label to label not just the tool but the overall team result, then you’re no longer describing a tool–you’re describing a condition and attributing it to a tool.

And I despise circular thinking.

Software shipped well before the latest management fad came down the pike, and the introduction of this fad did not make managers good leaders or developers better programmers. And I’d even go so far as to argue that the practice simply altered the workplace, making it harder, not easier, to ship new development efforts which are more properly handled with a waterfall method with constant feedback from the development team.

There is a reason why large companies generally bring new products to market by buying small companies who develop those products, rather than doing it in-house: because all of these managerial fads can never replaced a well motivated small team of people doing something either out of love or out of greed. Large companies take away motivation by greed and they tend to marginalize those who have love for a project.

And Agile will never replace passion.

Thanks for reading my rant. And apologies to Stephan Schmidt whose post just set me off this morning–I actually agree with 90% of what he wrote about the whole “bad programmer” debate, which is honestly a different rant for a different time. Though I will note that just because I need a C compiler to be more effective than entering a program via a toggle switch panel, does that mean I’m a bad programmer because I need a tool to help me figure out which bits in program space should be set to 1, and which ones should be set to 0?

Sometimes you get what you want by accident.

A year ago I had thought to go back and get my Ph.D., thinking that eventually I’d like to turn this into either a teaching gig or into a research gig. I’m nearly 45, you see, and while I probably could go on and do the single contributor thing pretty much the rest of my life (since I love to learn new things all the time), at some point it’d be nice to think of the next generation.

So I talked to a couple of college professors about Computer Science Ph.D.s, and–meh. The problem is most of the interesting research is being driven by corporate need rather than by theoretical considerations. Which makes a certain degree of sense: the Internet and modern CPUs have introduced all sorts of interesting problems that really need solving far more than refining Turing Machines. And university research is often funded by–you guessed it–those corporations generating the problems that need to be solved.

Besides, you do anything for 20+ years, constantly striving to stretch your own talents the entire way, and chances are you probably have quite a bit you could be teaching the professors.

So I punted.

And instead I took a management gig thinking that perhaps I could use the opportunity to act like a college professor–teaching a group of graduate students. Except they’re professional developers with a decade and a half less experience than I, and the problems we’re working on aren’t as theoretical.

But I was hoping no-one would notice that I was spending all my time with one publicly stated purpose: execution–but all the while with one secret purpose: developing my direct reports.

Now I’m reading The Extraordinary Leader as part of a management training program at AT&T. And something interesting stood out: one of the most important things a leader can do is develop and advance the people he leads.

Hmmmm…

I guess my secret agenda doesn’t need to be quite so secret.

So here we go: I think what I’m going to do is (a) when there are tight deadlines, keep the schedule by pushing back at unreasonable requests, setting a schedule that may be aggressive but can be kept, get the resources we need and protecting my team, and helping out (very selectively) in areas where I can do the most good while leaving most of the work to my team. But when (b) we have times where our schedule is not quite so tight, to help by creating team building and learning exercises–such as tomorrow’s brainstorming sessions where I’m going to turn over the design of a new component to my team and have them come up with several alternative architectural designs, then debate the pros and cons of each.

My theory–and we’ll see if it works–is that by having such exercises, both teaching exercises, as well as team building through constructive discussions of alternate ways of designing something–my team will learn new stuff through practice and become better developers and, if they wish, better leaders.

Wish me luck.

Fortunately management pays far more than a professorship.

Goodbye Far Clipping Plane.

I really wanted to write this up as a paper, perhaps for SigGraph. But I’ve never submitted a paper before, and I don’t know how worthy this would be of a SigGraph paper to begin with. So instead, I thought I’d write this up as a blog post–and we’ll see where this goes.

Introduction

This came from an observation that I remember making when I first learned about the perspective transformation matrix in computer graphics. See, the problem basically is this: the way the perspective transformation matrix works is to convert from model space to screen space, where the visible region of screen space goes from (-1,1) in X, Y and Z coordinates.

In order to map from model space to screen space, typically the following transformation matrix is used:

perspective.gif

(Where fovy is the cotangent of the field of view angle over 2, aspect is the aspect ration between the vertical and horizontal of the viewscreen, n is the distance to the near clipping plane, and f is the distance to the far clipping plane.)

As objects in the right handed coordinate space move farther away from the eye, the value of z increases to -∞, and after being transformed by this matrix, as our object approaches f, zs approaches 1.0.

Now one interesting aspect of the transformation is that the user must be careful to select the near and far clipping planes: the greater the ratio between far and near, the less effective the depth buffer will be.

If we examine how z is transformed into zs screen space:

derivematrix.gif

And if we were to plot values of negative z to see how they land in zs space, for values of n = 1 and f = 5 we get:

zpersgraph.png

That is, as a point moves closer to the far clipping plane, zs moves closer to 1, the screen space far clipping plane.

Notice the relationship as we move closer to the far clipping plane, the screen space depth acts as 1/z. This is significant when characterizing the accuracy of the representation of an object’s distance and the accuracy of the zs representation of that distance for drawing purposes.

If we wanted to eliminate the far clipping plane, we could, of course, derive the terms of the above matrix as f approaches ∞. In that case:

farclipinf1.gif

And we have the perspective matrix:

persmatrix2.gif

And the transformation from z to zs looks like:

zpersgraph2.png

IEEE-754

There are two ways we can represent a fractional numeric value. We can represent it as a fixed point value, or we can use a floating point value. I’m not interested here with a fixed point representation, only with a floating point representation of numbers in the system. Of course not all implementations of OpenGL support floating point mathematics for representing values in the system.

An IEEE 754 floating point representation of a number is done by representing the fractional significand of a number, along with an exponent.

ieee754.gif

Thus, the number 0.125 may be represented with the fraction 0 and the exponent -3:

ieee754ex.gif

What is important to remember is that the IEEE-754 representation of a floating point number is not accurate, but contains an error factor, since the fractional component contains a fixed number of bits. (23 bits for a 32-bit single-precision value, and 52 bits for a 64-bit double-precision value.)

For values approaching 1, the error in a floating point value is determined by the number of bits in the fraction. For a single-precision floating point value, the difference from 1 and the next adjacent floating point value is 1.1920929E-7, which means that as numbers approach 1, the error is of order 1.1920929E-7.

We can characterize the error in model space given the far clipping plane by reworking the formula to find the model space z based on zs:

zspacederive.png

We can then plot the error by the far clipping plane. If we assume n = 1 and zs = 1, then the error in model space zε for objects that are at the far clipping plane can be represented by:

zerrorderive.gif

Graphing for a single precision value, we get:

zerror.png

Obviously we are restricted on the size of the far clipping plane, since as we approach 109, the error in model space grows to the same size as the model itself for objects at the far clipping plane.

Clearly, of course, setting the far clipping plane to ∞ means almost no accuracy at all as objects move farther and farther out.

The reason for the error, of course, has to do with the representation of the number 1 in IEEE-754 mathematics. Effectively the exponent value for the IEEE-754 representation is fixed to 2-1 = 0.5, meaning as values approach 1, the fractional component approaches 2: the number is effectively a fixed-point representation with 24 bits of accuracy (for a single-precision value) from 0.5 to 1.0.

(At the near clipping plane the same can be said for values approaching -1.)

separator.png

All values in the representation range of IEEE-754 points have the same feature: as we approach the value, the representation is similar to if we had picked a fixed-point representation with 24 (or 53) bits. The only value in the IEEE-754 range which actually exhibits declining representational error as we approach that value is zero.

In other words, for values 1-ε, accuracy is fixed to the number of bits in the fractional component. However, for values of ε approaching 0, the exponent can decrease, allowing the full range of bits in the fractional component to maintain the accuracy of values as we approach zero.

With this observation we could in theory construct a transformation matrix which can set the far clipping plane to ∞. We can characterize the error for a hypothetical algorithm that approaches 1 (1-1/z) and one that approaches 0 (1/z):

zerrors.png

Overall, the error in model space of 1-1/z approaches the same size as the actual distance itself in model space as the distance grows larger: err/z approaches 1 as z grows larger. And the error grows quickly: the error is as large as the position in model space for single precision values as the distance approaches 107, and the error approaches 1 for double precision values as z approaches 1015.

For 1/z, however, the ratio of the error to the overall distance remains relatively constant at around 10-7 for single precision values, and around 10-16 for double-precision values. This suggests we could do away without a far clipping plane; we simply need to modify the transformation matrix to approach zero instead of 1 as an object goes to ∞.

Source code:

The source code for the above graph is:

public class Error
{
    public static void main(String[] args)
    {
        double z = 1;
        int i;
        
        for (i = 0; i < 60; ++i) {
            z = Math.pow(10, i/3.0d);
            
            for (;;) {
                double zs = 1/z;
                double zse = Double.longBitsToDouble(Double.doubleToLongBits(zs) - 1);
                double zn = 1/zse;
                double ze = zn - z;

                float zf = (float)z;
                float zfs = 1/zf;
                float zfse = Float.intBitsToFloat(Float.floatToIntBits(zfs) - 1);
                float zfn = 1/zfse;
                float zfe = zfn - zf;

                double zs2 = 1 - 1/z;
                double zse2 = Double.longBitsToDouble(Double.doubleToLongBits(zs2) - 1);
                double z2 = 1/(1-zse2);
                double ze2 = z - z2;

                float zf2 = (float)z;
                float zfs2 = 1 - 1/zf2;
                float zfse2 = Float.intBitsToFloat(Float.floatToIntBits(zfs2) - 1);
                float zf2n = 1/(1-zfse2);
                float zfe2 = zf2 - zf2n;
                
                if ((ze == 0) || (zfe == 0)) {
                    z *= 1.00012;   // some delta to make this fit
                    continue;
                }

                System.out.println((ze/z) + "t" + 
                        (zfe/zf) + "t" + 
                        (ze2/z) + "t" + 
                        (zfe2/zf));
                break;
            }
        }
        
        for (i = 1; i < 60; ++i) {
            System.out.print(""1e"+(i/3) + "",");
        }
    }
}

We use the expression Double.longBitsToDouble(Double.doubleToLongBits(x)-1) to move to the previous double precision value (and the same with Float for floating point values), repeating (with a minor adjustment) in the event that floating point error prevents us from propery calculating the error ratio at a particular value.

A New Perspective Matrix

We need to formulate an equation for zs that crosses -1 as z crosses n, and approaches 0 as z approaches -∞. We can easily do this by the observation from the graph above: instead of calculating

zoldformula.gif

We can simply omit the 1 constant and change the scale of the 2n/z term:

znewformula.gif

This has the correct property that we cross -1 at z = -n, and approach 0 as z approaches -∞.

znewgraph.png

From visual inspection, this suggests the appropriate matrix to use would be:

newpersmatrix.gif

Testing the new matrix

The real test, of course, would be to create a simple program that uses both matrices, and compares the difference. I have constructed a simple program which renders two very large, very distance spheres, and a small polygon in the foreground. The large background sphere is rendered with a diameter of 4×1012 units in radius, at a distance of 5×1012 units from the observer. The smaller sphere is only 1.3×1012 units in radius, embedded into the larger sphere to show proper z order and clipping. The full sphere (front and back) are drawn.

The foreground polygon, by contrast, is approximately 20 units from the observer.

I have constructed a z-buffer rendering engine which renders depth using 32-bit single-precision IEEE-754 floating point numbers to represent zs. Using the traditional perspective matrix, the depth values become indistinguishable from each other, as their values approach 1. This results in the following image:

rendertest_image_err.png

Notice the bottom half of the sphere is incorrectly rendered, as is large chunks of the smaller red sphere.

Using the new perspective matrix, and this error does not occur in the final rendered product:

rendertest_image_ok.png

The code to render each is precisely the same; the only difference is the perspective matrix:

public class Main
{
    /**
     * @param args
     */
    public static void main(String[] args)
    {
        Matrix m = Matrix.perspective1(0.8, 1, 1);
        renderTest(m,"image_err.png");
        
        m = Matrix.perspective2(0.8, 1, 1);
        renderTest(m,"image_ok.png");
    }

    private static void renderTest(Matrix m, String fname)
    {
        ImageBuffer buf = new ImageBuffer(450,450);
        m = m.multiply(Matrix.scale(225,225,1));
        m = m.multiply(Matrix.translate(225, 225, 0));
        
        Sphere sp = new Sphere(0,0,-5000000000000d,4000000000000d,0x0080FF);
        sp.render(m, buf);
        
        sp = new Sphere(700000000000d,100000000000d,-1300000000000d,300000000000d,0xFF0000);
        sp.render(m, buf);
        
        Polygon p = new Polygon();
        p.color = 0xFF00FF00;
        p.poly.add(new Vector(-10,-3,-20));
        p.poly.add(new Vector(-10,-1,-19));
        p.poly.add(new Vector(0,0.5,-22));
        p = p.transform(m);
        p.render(buf);
        
        try {
            buf.writeJPEGFile(fname);
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Notice in the call to main(), we first get the traditional perspective matrix with the far clipping plane set to infinity, then we get the alternate matrix.

The complete sources for the rendering test which produced the above images, including custom polygon renderer, can be found here.

With this technique it would be possible to render correctly large landscapes with very distant objects without having to render the scene twice: once for distant objects and once for near objects. To use this with OpenGL would require adjusting the OpenGL pipeline to allow the far clipping plane to be set to 0 instead of 1 in zs space. This could be done with the glClipPlane call.

Conclusion

For modern rendering engines which represent the depth buffer using IEEE-754 (or similar) floating point representations, using a perspective matrix which converges to 1 makes little sense: as values converge to 1, the magnitude of the error is similar to that of a fixed-point representation. However, because of the nature of the IEEE-754 floating point representation, convergence to 0 has much better error characteristics.

Because of this, a new perspective matrix than the one commonly used should have better rendering accuracy, especially if we change the far clipping plane to ∞.

By using this new perspective matrix we have demonstrated a rendering environment using 32-bit single-precision floating point values for a depth buffer which is capable of representing in the same scene two objects whose size differs by 11 orders of magnitude. We have further shown that the error in representation of the zs depth over the distance of an object should remain linear–allowing us to have even greater orders of magnitude difference in the size of objects. (Imagine rendering an ant in the foreground, a tree in the distance, and the moon in the background–all represented in the correct size in the rendering system, rather than using painter’s algorithm to draw the objects in order from back to front.)

Using this matrix in a system such as OpenGL, for rendering environments that support floating point depth buffers, would be a matter of creating your own matrix (rather than using the built in matrix in the GLU library), and setting a far clipping plane to zs = 0 instead of 1.

By doing this, we can effectively say goodbye to the far clipping plane.

Addendum:

I’m not sure but I haven’t seen this anywhere else in the literature before. If anyone thinks this sort of stuff is worthy of SigGraph and wants to give me any pointers on cleaning up and publishing, I’d be greatful.

Thanks.

There is nothing new under the sun.

Well, the rant on TechCrunch has gone global: Tech’s Dark Secret, It’s All About Age.

Excuse me while I throw in my two cents, as a 44 year old software developer.

  1. Pretty much all of the useful stuff in Computer Science was invented by the 1960’s or 1970’s. Very little is out there today that is really “new”: MacOS X, for example, is based on Unix–whose underpinnings can be traced back to 1969, with most of the core concepts in place by the early 1980’s.

    Even things like design patterns and APIs and object oriented programming stem from the 70’s and 80’s. Sure, the syntax and calling conventions may have changed over the years, but the principles have stayed the same.

    For example, take the MVC model first discussed formally 20 years ago. The ideas behind that weren’t “invented” then; the MVC papers from Talgent simply codify common practices that were evolving in the industry well before then. One can find traces of the ideas of separating business logic from presentation logic in things like Curses, or in Xerox Parc’s work from the 1970’s. I remember writing (in LISP) calendrical software for Xerox as a summer intern at Caltech, using the principles of MVC (though not quite called that back then) in 1983.

    Or even take the idea of the view model itself. The idea of a view as a rectangular region in display space represented by an object which has a draw, resize, move, mouse click handler and keyboard focus handler events can be found in InterLisp, in NextStep, in Microsoft Windows, on the Macintosh in PowerPoint; hell, I even wrote a C++ wrapper for MacOS 6 called “YAAF” which held the same concepts. The specific names of the specific method calls have changed over the years, but generally there is a draw method (doDraw, -drawRect:, paint, paintComponent, or the like), a mouse down/move/up handler, a resize handler (or message sent on resize), and the like.

    The idea never changes; only the implementation.

    Or hell, the Java JVM itself is not new: from P-machines running a virtual machine interpreter running Pascal to the D machine interpreter running InterLisp, virtual machine interpreters running a virtual machine has been around longer than I’ve been on this Earth. Hell, Zork ran on a Virtual Machine interpreter.

  2. I suspect one reason why you don’t see a lot of older folks in the computer industry is because of self-selection. Staying in an industry populated by Nihilists who have to reinvent everything every five years or so (do we really need Google Go?) means that you have to be constantly learning. For some people, the addiction to learning something new is very rewarding. For others, it’s stressful and leads to burnout.

    Especially for those who are smart enough to constantly question why we have to be reinventing everything every five years, but who don’t like the constant stress of it–I can see deciding to punt it all and getting into a job where the barbarians aren’t constantly burning the structures to the ground just because they can.

    I know for a fact that I don’t see a lot of resumes for people in their 40’s and 50’s. I’m more inclined to hire someone in their 40’s as a developer than someone in their 20’s, simply because you pay less per year of experience for someone who is older. (Where I work, there is perhaps an 80% or 90% premium for someone with 4 or 5 times the experience–a great value.)

    But I also know quite a few very smart, bright people who decided they just couldn’t take the merry-go-round another time–and went off to get their MBA so they could step off and into a more lucrative career off the mental treadmill.

    I have to wonder, as well, where I would be if I had children. Would I have been able to devote as much time reading about the latest and greatest trends in Java development or Objective C or the like, if I had a couple of rug-rats running around requiring full-time care? (I probably would have, simply because I’d rather, on the whole, read a book on some new technology than read the morning paper. I would have probably sacrificed my reading on history and politics for time with my children.)

  3. There is also this persistent myth that older people have familial obligations and are less likely to want to work the extra hours “needed to get the job done.” They’re less likely to want to pull the all-nighters needed to get something out the door.

    But in my experience, I have yet to see development death marches with constant overnighters paid off in pizza that didn’t come about because of mismanagement. I don’t know another industry in the world where mis-managing the resource sizing, and demanding your workers work overtime to compensate for this failure to do proper managerial resource sizing and advanced development planning is seen as a “virtue.”

    And I suspect the older you get, the less likely you are to put up with the bullshit.

    Having seen plenty of product make it to market–and plenty not make it to market, and having lived through several all nighters and product death marches, I can see a common theme: either a product’s sizing requirements were mismanaged, or (far more commonly) upper management is incapable of counting days backwards from a ship date and properly assessing what can be done.

    The project I’m on, for example, was given nearly a year to complete. And Product Management pissed away 7 of those months trying to figure out what needs to be done.

    The younger you are, the less likely you are to understand that three months is not forever, and if you need to have something in customer hands by December, you have to have it in QA’s hands by September or October–which means you have to have different modules done by July. It’s easy if you don’t have the experience to understand how quickly July becomes December to simply piss away the time.

    So I can’t say that it’s a matter of older people not being willing to do what it takes–if upper management also was willing to do what it takes, projects would be properly sized and properly planned. No, it’s more a matter of “younger people don’t have the experience to do proper long-term planning to hit deadlines without working overtime,” combined with “younger people don’t have the experience to call ‘bullshit’.”

  4. There is also, as an aside, a persistent myth that it takes a certain type of intelligence or a certain level of intelligence to be successful in the software industry.

    I’m inclined to believe more in the 10,000 hour rule: if you practice something for 10,000 hours, you will become successful at that thing.

    Intelligence and personality could very well help you gain that 10,000 hours: the first few hours of learning how to write software or learning a new API or a new interface can be quite annoying and stressful. But if you persist, you will get good at it.

    Which means IQ and personality, while perhaps providing a leg up, doesn’t guarantee success.

    It’s why I’m inclined also to want to favor more experienced and older developers who have persisted with their craft. If we assume a 6 hours of actual development work (with the other 2 on administrative stuff), then a work year only has 1,500 hours–meaning 10,000 hours takes about 7 years to accumulate. Assuming you start out of college at 21, this means that anyone under the age of 28 will not have sufficient experience to be good at their craft.

    And that assumes they practiced their craft rather than just going through the motions.

The whole “it’s all about ageism” in the tech industry is an interesting meme–simply because it’s far more complicated than that.

It’s not done until you document your code.

I remember the original “Inside Macintosh.” I actually still have the loose-leaf binder version of “Inside Macintosh” that shipped with System v1.

The original Inside Macintosh documented the “Macintosh Toolkit” (the acronym “API” wasn’t in common use then), and, aside from two introductory chapters–one which described the OS and one which documented a sample application–each chapter followed the same formula. The first part of the chapter, consisting of from 1 to a dozen pages, would provide an overview of that toolkit. For example, the “Resource Manager” overview describes what resources are, how resource are important, and how resources are stored on disk. The second part of the chapter would always be “Using the XXX Manager”–giving examples which generally followed the pattern of how you initialized that manager, how to create or manipulate fundamental objects, how to dispose of fundamental objects, and how to shut the manager down. This would consist of the bulk of the chapter. And the end of the chapter would be a summary–generally a summary of the header file for that manager.

It always struck me that such a model was a great way to handle documentation. Start with a 1 to 3 page introduction to whatever fundamental module you are documenting–a “module” consisting of a logical unit of functionality, which could consist of several classes. Then launch into a 20 page document showing how you use that module: how you start it up, how you use the major features, how you configure the major features, how you shut it down. And give snippets of code showing how these are done.

And the summary should point to either the generated JavaDocs or HeaderDoc generated documentation giving the specific calls and specific parameters for each call.

What is interesting about such a model is that it should be fairly easy to write for the technical person creating the toolset: he knows how he wants his class set to be used, so he should be able to craft documentation describing how to use it. For a class which presents a table, for example, the developer has a mental model of how tables are displayed: a delegate or data source is created which responds to certain calls and is attached to the table.

It has always been my opinion that writing code is simply one part of a multi-step process which ultimately results in your code being used. After all, isn’t the whole point of creating code is getting people to use your code? Developers rail against “lusers” who refuse to learn how to use their computers–but I suspect it’s because developers know their fellow developers are more likely to work through the source kit than the average person, and it allows them the excuse not to write the documentation they should write.

Your code isn’t finished until it is well documented. And complaining about people who are confused without good documentation is simply shifting the blame.