Big-O: Not so Big…

I’m trying to learn kotlin, and my standard procedure for picking up a new language is to try and implement your basic CS101-ish data structures. Binary search trees, linked lists, etc. Generally works pretty well – it seems everyone who knows what a compiler is has written a tutorial or article on those, so it’s easy enough to find the basics when I forget how to do them.

In looking into details on linked lists, I came across this post about why linked lists may not be as good for in-depth processing as they otherwise appear to be. The post is well worth the read – if nothing else, he’s got a decent writing style that makes it a little easier to grasp.

The gist of it, if you don’t feel like heading over there right now, is that the mathematic complexity of an algorithm or data type is really only part of the story when it comes to gauging possible performance, and it might not even be the most important part. He’s got graphs and benchmarks and lots of other stuff to back it up on the article, and there’s tons of debate in the comments – and in the comments on the CodeProject mirror. He dives in on the specific use case(s) of linked lists vs. contiguous data (i.e. arrays), and while he has a related article dealing with the topic in Java land, the original article is focused on C++, where memory management is much more in the programmer’s control.

For the most part, I’m a web developer; easily 90% of the performance bottlenecks I’ve dealt with have been due to database queries or cross-application communications. Dealing with stuff like this is virtually unnecessary in most of the work I’ve done; performance optimization consists of database organization or throwing more servers at the problem. That said, however, this article put things in a different perspective.

Big-O For Fun and Profit

So, mathematic complexity – as I’m (mis?)using it here, at least – is what’s more familiar in programming circles as Big-O notation. It tells you at a glance how complicated an algorithm is, and what you can generally expect for performance as your input grows. It normally looks something like ‘O(n)’ or ‘O(n2)’, where n is the number of items you’re processing in your algorithm. Some common ones are:

  • O(n): indicates the algorithm is linear; if it takes 1 second to process 1 item, it’ll take 20 seconds to process 20 items, etc.
  • O(n2): indicates the algorithm is quadratic; if it takes 1 second to process 1 item, it’ll take 400 seconds to process 20 items, etc.
  • O(log n): indicates the algorithm is logarithmic; processing those 20 items again will take about 1.3 seconds. (For math nerds: I used log10 to calculate that because the calculator on my Mac has that button. In big O notation, constant factors – such as the base of the log function – are dropped, so log2 and log10 would be equivalent.)
  • O(1): indicates constant time – those 20 items will take the same time as 1 item or 2,000,000 items.

There’s a crap ton more examples at Wikipedia if you’re interested, but I think we’ve got enough here to work with.

You’d be hard pressed to argue those aren’t useful in the general case; they give a great quickie view of how an algorithm operates – something that’s O(n) runs faster than something that’s O(n2) and slower than O(log n). And they hold true when it comes to the logical complexity involved. A brute-force, linear search of a sorted collection has to search every item in the collection until it finds the target, whereas a binary search cuts the number of possible items it needs to search in half every iteration. Linear search is O(n), binary search is O(log n); in the long run, binary search will generally outperform linear search.

And Now the Rest of the Story

Like I said above though: Big-O complexity is only part of the story when it comes to performance. The rest of the story is in the hardware, and this is where it gets fun. Well, fun-ish.

How fast your program works with data depends on where that data resides. If it’s in a CPU register, then you have it before you can even finish asking for it. If it’s in a file cabinet down the hall from your server room, then you either have an awesome program or someone forgot a data entry step. Regardless, it’s probably slower than getting it from the CPU register.

In between those, you have the CPU cache (with various levels, depending on the architecture), RAM, disk, network stores, etc. as places your data can reside. In Java-land, it’s kinda hard to ensure (or even really know) if your data fits in the CPU cache; if you know of a way to do so, by all means shoot for that. That’s (as the kids say these days) hella fast. RAM’s not terrible, and that’s where most stuff is going to go by default though; the important point made in Kjellkod’s blog is based on where the data gets stored in memory.

When you create an arbitrary object instance – regardless of whether you’re in Java or C++ or whatever the flavor of the week is – it gets assigned a chunk of memory. When you create another object, it gets assigned a different chunk. These two chunks may be right next to each other, or they may be at opposite ends of the memory pool; there’s no way to know or guarantee. In the case of a linked list, when you reference one node and then move on to reference the next node, the CPU needs to take data from separate blocks of memory.

If, however, you use a contiguous data structure – like an array, for example, or at least an array-based structure – then that data structure contains places for X items in one large block of memory. Moving from one item to the next in a contiguous memory block is much faster for the CPU than bouncing around RAM.

As a quick note, for primitives in Java (int, char, double, etc.) this holds completely true. However, for Objects it’s not quite perfect – your array just holds a reference to the object, which means the CPU still needs to locate the object in memory – where ever it may be – before you can work with it. You still do have the upshot of having those references in a single RAM block.

Stealing a quote from the Java Battle Royal article:

LinkedList insertion is: search time + O(1).
ArrayList insertion is: search time + O(n).

By mathematic complexity – Big-O notation – at a glance it looks like LinkedList insertion should be faster than ArrayList insertion; however, all of his examples show otherwise. He goes into a little more detail as to why, but the main idea is pretty simple: it’s based on where the data is stored in the computer (RAM or cache). With the LinkedList, it’s all over the place; with the ArrayList, it’s in a single clump.

Moral of the Story

As a programmer with 15+ years under my belt, I’ve spent a lot of time looking at algorithms and making decisions based mostly on their Big-O complexity. If we were working with simple data structures on a simple Turing machine, eliminating all the ins-and-outs of CPU cache and memory management, Big-O would probably be sufficient. We aren’t though, and it isn’t.

Computers are complex beasts, and while worrying about hardware structure and cache levels has been abstracted out pretty well (at least in the realm of Java web development), it’s still important enough to keep in mind and remember that Big-O isn’t the end-all-and-be-all of performance.


And what might you think?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s