OCTools

OCTools is a suite of tools which serve as a plug-in replacement for yacc and lex. The goal
is to provide a (roughly) source compatible tool which can convert yacc and lex
grammars into Objective C output for building parsers that run on MacOS and iOS.

The source kit can be found on GitHub.

Both tools can be compiled on the Macintosh using Xcode, and generate a command-line tool which can be run from the terminal or included into an Xcode project. (Both tools are built in C, so they should be portable to other platforms; however, I haven’t done the port so I’m not sure how successful porting would be.)

The goal of this project was to create Yacc and Lex analogs which generate re-entrant Objective C classes which implement the parsers, and for the Lex analog to use an Objective-C protocol definition for the input file, for maximum flexibility.


More information can be found here: OCTools.

Quiet Insanity and YACC.

One of the things I wanted to do involves having a parser generated from a grammar, similar to YACC.

But I need the code generated in Objective C. And I need a parser that is re-entrant, so it can be run in a separate thread.

Now there are a number of solutions out there. But what I want is an LR(1) or GLR based parser built via a state machine which can be incorporated into Xcode and which generates Objective C code that can be used in an iPhone or iPad.

And let’s be honest, a lot of advise out there is really fucking stupid. Like this:

Code generation is not the “true way” in dynamic languages like Objective-C. Anything that can be achieved by a parser generator can be achieved at runtime. So, I’d suggest you try something like ParseKit, which will take a BNF-like grammar, and give you various delegate hooks you can implement to construct your parser.

That sound you just heard was my eyes rolling.

The reason, by the way, why you may wish to precompile a grammar rather than compile it at runtime is because generally (a) your grammar won’t change, and (b) the more computational time you can use evaluating the grammar, the more compact the grammar can be that you generate.

So without any real good solutions I thought how hard can it be to roll my own?

Well, fairly hard, mostly because the documentation out there for constructing LR(1) grammars sorta sucks.

So I started writing a document that attempts to describe the LR parsing algorithms out there in sufficient detail to roll my own YACC-like parser.


I haven’t quite figured out the GLR grammar parsing techniques in “Efficient Parsing for Natural Language, so it hasn’t been included yet.

But the rest should be there.

And sadly the whole thing grew to 66 pages in length, even without the GLR stuff.

This is a preliminary version, of course. Eventually I plan to upload all this to my GitHub account.

Elaborating on that bug from yesterday.

The code which caused me problems which I traced to a rather popular third-party networking library–a networking library that was (in my opinion) rendered obsolete with the introduction of Grand Central Dispatch and NSURLConnection’s sendSynchronousRequest:returningResponse:error: method–which, by the way, is two generations deprecated from today’s NSURLSession class–was a retain cycle that the developer of that library choose to ignore rather than fix in the most obvious way.

The offending code was something like this:

self.callbackBlock = ^{
    [self doSomething];
};

In order to understand why this is a problem we need to look into how Objective C blocks work.

Generally blocks are created on the stack. In order to allow a block (essentially a form of lambda expression for Objective C) to exist beyond the current execution scope, you must make a copy of the block. In creating a copy we create an Objective-C object in memory which must strongly retain any values outside the scope of the block for use inside the scope.

That is, at the end of the assignment to callbackBlock we get in memory something like this:

The retain cycle should be obvious here:

Basically the block holds a pointer to self, and self holds a point to the block. If we then free self, the reference count from the reference holding self is decremented by 1, but the self object and the block object–because they hold references to each other–are never freed:

Memory is leaked–and if self is something complex (like a UIViewController), a lot of memory is leaked.

Oh, but I break the retain cycle in my code by expressly deleting the reference to my block when I’m done with it!

Yeah, because your code is bug free. (Sheesh)


Look, “I deal with it in code so it’s no big deal” is a fucking excuse, not a reason.

And it’s an excuse that is dependent on a misunderstanding of the nature of ARC and the retain process. So if you say “but I break the retain cycle in code”, you are a fucking idiot. Well, not a complete fucking idiot–just smart enough to shoot yourself in the foot.

It’s simple, really. The assumption behind ARC is that all allocations are in a tree-like structure of memory references. Things hold other things in a tree-like hierarchy: your UIViewController holds the root view it manages. The root view holds the other views in the hierarchy. The NSDocument object holds the root of the data structures representing your document. And so forth.

By holding references in a strict tree-like structure, it is clear what objects need to be freed when the retain count for a particular object hits zero: everything below it needs to be released. When a view goes away the children views also need to go away. When a block goes away the objects held by that block can be released.

Now of course sometimes we need back references up the tree. And those back references should be handled with weak references; references which do not increment the reference count, but are intelligent enough to be set to nil if the contents are deallocated. (I imagine a linked list of weak references associated with each object, though I’m not entirely sure how weak references are handled.)

This way we don’t get retain cycles.


So what do you do instead of writing

#pragma clang diagnostic ignored "-Warc-retain-cycles"

above your block where the compiler is complaining about a retain cycle?

Simple. Don’t break the cycle in code; that’s just an invitation to a memory leak, and depending on how you try to break the cycle, the cycle may never be broken. (I’m looking at you, AFNetworking.) Break the cycle using a weak reference.

In the example above, this means writing:

__weak MyObject *this = self;
self.callbackBlock = ^{
    [this doSomething];
};

When we create a weak reference cycle what we wind up constructing in memory is this:

That is, while self holds the block, the block holds a weak reference to self. This breaks the retain cycle by preserving the strict tree-like ordering to the hierarchy.

Then, as we free the reference to self:

And as the retain cycle reaches zero in self, it gets freed:

and then the block goes away itself.


So what happens if we have a separate reference to the block?

Well, again, notice the weak reference to self in block, which gets zeroed out.

So if you ever want to be extra cautious you could write:

__weak MyObject *this = self;
self.callbackBlock = ^{
    if (this) [this doSomething];
};

But remember a method invocation on a nil reference is not illegal in Objective C; it simply invokes a nil method which returns nil. So the

if (this)...

is not strictly necessary.


Today’s tl;dr: always break retain cycles when writing block code.

Never use “-Warc-retain-cycles”–even if you think you know what you’re doing. Because the very fact that you’re contemplating breaking a retain cycle in code clearly demonstrates you do not know what you’re doing.


Addendum: One thing I’ve heard developers say is that sometimes you have to break the rules in exceptional cases.

Sure. I’ll buy that. I’ve even done that.

However, your lack of understanding is not an exceptional case. It’s just stupidity.

Retain cycles, or why your iOS app sucks.

One of the reasons why I hate the over-use of third party libraries is that, while in theory it’s “the more eyeballs the better the code”, in practice none of us who use a third party library ever bother to trace bugs into the third party library. It’s because we think that third party library (especially one that has been around for a long time) is somehow “bug free”–when, as we all know, “too many cooks spoil the broth”.

(That is, we’ve seen in practice the more people who work on a software project in a corporate setting, the more likely there are bugs in the source kit, as different developers with different levels of aptitude work on the same source kit to varying results, often unchecked by his peers as most software projects are just too big for everyone to understand. So why is this different with third party libraries?)

In my case I managed to trace down a bug.

Let’s put it this way:

If, in your source kit, you have something like this:

#pragma clang diagnostic ignored "-Warc-retain-cycles"

You have a really bad memory leak bug which affects Every. Single. Fucking. App. that uses your code.

Now I’d forgive this if we’re talking about a small project and the person working on it wasn’t all that established in the community.

But this is a big honking library which is used by a lot of companies, where there are tutorials floating around on sites like NSHipster and Ray Wenderlich, and is often the very first library everyone wants to include that may engage in any sort of networking operations.


The good news is that 3.0 of AFNetworking did away with the problem.

The bad news is that 2.0 of AFNetworking has the problem in spades.

Things to remember: opening an SSL socket using CFStream

This was a pain in the neck, but I finally figured out the answer.

So I was attempting to open a connection to a server from iOS which may have a self-signed certificate installed.

The specific steps I used to open the connection (from my test harness) was:

1. Set up the variables

	NSString *host = @"127.0.0.1";
	NSInteger port = 12345;
	CFReadStreamRef readStream;
	CFWriteStreamRef writeStream;

2. Set debugging diagnostics. Noted here so I can remember this trick.

	setenv("CFNETWORK_DIAGNOSTICS","3",1);

3. Open the sockets.

	CFStreamCreatePairWithSocketToHost(kCFAllocatorDefault, (__bridge CFStringRef)host, (uint32_t)port, &readStream, &writeStream);

	NSInputStream *inStream = (__bridge_transfer NSInputStream *)readStream;
	NSOutputStream *outStream = (__bridge_transfer NSOutputStream *)writeStream;

4. Set up SSL. The part that tripped me up: you must set the properties for kCFStreamPropertySSLSettings after setting NSStreamSocketSecurityLevelKey. It appears the NSStreamSocketSecurityLevelKey setting overwrites the kCFStreampropertySSLSettings parameter.

	// Note: inStream and outStream are linked by an underlying object, so
	// parameters only need to be set on one of the two streams.

	NSDictionary *d = @{ (NSString *)kCFStreamSSLValidatesCertificateChain: (id)kCFBooleanFalse };
	[inStream setProperty:NSStreamSocketSecurityLevelNegotiatedSSL forKey:NSStreamSocketSecurityLevelKey];
	[inStream setProperty:d forKey:(id)kCFStreamPropertySSLSettings];

5. Open and initalize as needed. Here I’m just opening (because I don’t care if this blocks; this is test code). If using blocking API, use threads. Otherwise use the runloop and delegate APIs.

	[inStream open];
	[outStream open];

On the Java (server) side, the way I set up my server socket for listening to incoming connections was:

1. Set up the variables. Note that my Config class is an internal class that reads properties, and is beyond the scope of this exercise.

	Properties p = Config.get();
	String keystore = p.getProperty("keystorefile");
	String password = p.getProperty("keystorepassword");

	int port = 12345;

2. Load the keystore and key manager. This can be a signed or (in my case) self-signed certificate.

	FileInputStream keyFile = new FileInputStream(keystore); 
	KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
	keyStore.load(keyFile, password.toCharArray());

	KeyManagerFactory keyManagerFactory = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm());
	keyManagerFactory.init(keyStore, password.toCharArray());

	KeyManager keyManagers[] = keyManagerFactory.getKeyManagers();

3. Create an SSLContext. Note that you cannot use “Default” for getInstance below, because that returns an already initialized context, and we want to initialize it with our parameters above. Also note that iOS 9 prefers TLS 1.2.

	SSLContext sslContext = SSLContext.getInstance("TLSv1.2");
	sslContext.init(keyManagers, null, new SecureRandom());

4. Open a ServerSocket class to listen for incoming connections. Note the constant 50 below is arbitrary.

	SSLServerSocketFactory socketFactory = sslContext.getServerSocketFactory();

	socket = socketFactory.createServerSocket(port, 50);

Note that the way I loaded the keystore is sort of the “hard way” to do this; my eventual goal is to have the Java startup code generate a self-signed certificate internally if a keystore is not provided, but I haven’t figured out how to do that yet. (There are plenty of pages out there that show how, but most of them rely on internal Java APIs, and I’m sort of allergic to using undocumented stuff.)

Creating a custom Key in Objective C

When creating a custom key in Objective C for NSDictionary or NSCache or the like, you need to create an object which does the following:

Implements <NSCopying> protocol.

If your key is invariant, you can implement the method copyWithZone: as follows:

- (id)copyWithZone:(NSZone *)zone
{
    return self;
}

Of course if your key is invariant, ideally you would create the key entirely using a custom init function, and mark all the properties (readonly).

Implement the isEqual: method.

This is part of the NSObject protocol. Note that any class (or nil) could be passed in as the argument to isEqual: method, so you may want to use the method isKindOfClass: to verify that you got what you expected as the parameter.

Implement the hash method.

This is also part of the NSObject protocol, and returns a NSUInteger value which provides a hash function of the data you passed in.

The hash function doesn’t need to be complicated. For example, if your key is three integers, your hash function could be as simple as:

- (NSUInteger)hash
{
    return (self.a << 8) ^ (self.b << 4) ^ self.c;
}

What is important is that two keys passed into your system are unlikely to have a similar value.

Also note that many of the classes that you see used routinely as keys (such as NSString or NSNumber) also follow this protocol. Meaning if your custom key has a string in it, you can use the NSString’s hash function as one of the inputs to your own hashing function:

- (NSUInteger)hash
{
    return (self.intVal << 16) ^ [self.stringVal hash];
}