But I wasn’t just interested in making a cheater for a word game – I wanted to explore what it would be like to try to write a computation-heavy service in Node.js.
Many programmers – even Node.js aficionados – would say that’s a ridiculous thing to want to do.
They would say it goes “against the grain” of the platform, which is just a way for inexperienced programmers to whip up simple servers that perform pretty well.
After doing this project my tentative conclusion is that Node.js can be a great choice for computation-heavy services. I’m not suggesting it should be your tool of choice for writing a movie encoder, but for the occasional CPU-bound task you might have when creating web-facing services, it’s got a pretty good toolbox. And it offers some scaling advantages you can’t get in other languages.
- cluster is the simplest way
- However, threads will force you to refactor into different services
- Which can run on other servers
- …or even the browser
LetterPwn is pretty fast – most results come back in a third of a second or less. But Node processes requests one at a time. Once you have a dozen or so people using the site simultaneously, those thirds of a second add up, and suddenly everyone starts cursing the spinning wheel on their screen.
And it is possible to craft a request to LetterPwn that would make it spin for even longer, absolutely killing performance for everyone else:
Most web apps don’t have this problem, simply because they don’t do much computation. They spend most of their time waiting for some other service to get back to them, like a database. The brilliance of Node is to exploit that fact, and use a very fast event loop which allows it to put a request on ice whenever it’s waiting for input. Meanwhile it goes off and serves other users.
But not every website is like this. What if we had an web service which applied 1970s filters to photos of overpriced beverages? There’s no input or output (I/O) there; the CPU is just grinding away at the pixels, probably for a few seconds. From Node’s perspective there’s no opportunity to switch tasks. Such tasks are called CPU-bound, versus the average website, whose tasks are I/O-bound. LetterPwn is just an example of such a CPU-bound service.
This model of sharing the resources of a computer is straight out of the 1980s, called cooperative multi-tasking. Every process has to voluntarily give up full control of the computer, or nobody else gets their work done. Every operating system you use today shares resources among tasks automatically.
In Node it’s even worse since that cooperation is usually implicit, based on a guess that input or output (I/O) is likely to happen soonish. So if you do real computation in Node, you suddenly become a very uncooperative single-tasker.
It turns out that you can solve these problems, with a number of techniques.
Being more cooperative
Typically, Node.js waits for some asynchronous I/O to yield to the next waiting process. But you can force that with
process.nextTick. (Update: as of Node 0.10 (changelog), you should
almost always use setImmediate instead.)
This is surprisingly efficient. I even got it working within
forEach loops, and abstracted that into an
The price: if you thought Node programming had too many callbacks already, get ready to use them just to iterate through arrays, call a method, add some numbers together….
Even if you’re willing to accept that, you don’t always have a chance to even use
At one point in my code, I call
sort on an array which could have a million or more items. And as far
as I know, there’s nothing I can do to make that
So it looks like just cooperation1 isn’t going to work for us.
Getting more help
So if we can’t make our single process more cooperative, we’re going to have to get multiple ‘workers’ to attack the problem.
We can’t magically obtain more computing resources, but we can pop out of Node’s little world into the rest of the operating system, where processes can share resources without having to know about each other. That way, the Node event loop can just pass off a complicated calculation to one of these workers, and from its perspective, it can forget about it until it’s done.
Forking is the traditional solution for web servers like Apache. This creates the illusion that there are multiple isolated copies of the program running. Where secretly, the operating system ensures that they mostly reuse the same areas of memory. This makes it easy to have hundreds of forks open on the same machine, and if one gets stuck or dies, the rest chug along merrily. The downside is that communication between these forks is cumbersome, more like network communication.
In languages like Java, there’s a tradition of threading, where there are multiple paths of execution managed right in the same program, and data structures held in common. This approach has considerably more power, but even more opportunities to crash the program or put it into deadlock.
The twist: neither of these solutions are available in Node. It’s life, Jim, but not as we know it.
In Node, processes and threads share many characteristics. They have some of each other’s advantages and disadvantages.
Node taunts us by having a
but it is all shameless deception. This is a fork and exec. So that means we will incur
the penalty of starting up an entire new Node process, with no shared resources at all (apart from filehandles and other aspects of the environment).
However, we will get a worker, in a separate process, managed by the operating system. Exactly what we want. There are quite a few libraries out there that can even help us manage pools of workers. I happened to use backgrounder.
Unfortunately, I was trying to do this whole project inside a single free Heroku dyno. That meant I got 500MB of space to work with. With the dictionary and all libraries loaded, including some function calls memoized with my lru-memoize library, my Node process was already pushing 200MB. So I could spawn exactly one worker process before maxing out, and that’s not even counting the usual fluctuations in memory use you get with a loaded Node process.
At best we would be able to serve two requests at once. Better than one, but it would still be completely blocked if it got two difficult requests at once.
cluster module allows to really fork our service. All we have to do is wrap the creation of the http server inside
a very traditional forking structure (if parent, then do nothing; if child, start service). This allows them all to share and respond on the same port, and the
operating system takes care of dividing CPU time among the processes. But, crucially, if there’s a section of memory that is merely read from, and not written to,
then all processes will share it. So Node itself, all its libraries, and the dictionary, should all be shared!
This turns out to work beautifully. While the amount of shared memory wasn’t as large as I hoped, it was enough to run five to seven workers, more or less reliably. So this entire discussion could end right here.
There are a few issues; you can’t scale a simple forked server across machines. You’d still need some kind of load balancing proxy to direct requests to different machines.
So I continued to try a few more things, just to see…
Worker pools, hybrid approach
I realized that there already were two parts to my application:
- Finding words was data-bound; it needed to be close to a giant dictionary in memory, but it could be done in nearly constant time.
- Ranking moves was CPU-bound: it didn’t need to be close to data, or use so much memory, but required long processing times.2
So there was an obvious way to decompose the app; do the word finding in the main “loop” of the application. Dispatch to workers to calculate the best moves.
This also worked wonderfully. I could spawn at least five different workers within the free Heroku instance with room to spare. The workers can transiently balloon up to large sizes when they have to process huge dictionaries, but they quickly go back to normal after garbage collection.
I verified I could send it one of those 10-second nightmare boards in one browser tab, and still get quick results for easier boards in the other tabs.
There is still a lot of serialization and deserialization to manage as the server shuttles information in and out of the workers. We lost the advantage of simply accessing that giant data structure directly.
But there is one benefit; if we wanted to move these workers to a completely different server, or pool of servers, our code would barely change. So while Node forces the refactorings needed for this kind of scale out early, it will have benefits later on, if we need it.
Threads, hybrid approach
The worker pool approach was working well, but the memory fluctuations were still disturbing. Heroku will bring your application to a complete halt if you grab more memory than you are allocated, for some amount of time. I determined this was mostly related to the large serialization and deserializations involved. Also, the backgrounder library seemed to have some slow memory leaks, which I couldn’t track down. So I was looking for a slightly better solution.
These threads are created and called in slightly bizarre ways. Even though they are still in the same process, you still communicate using serialized data, typically JSON, and you may
You cannot use
require to load in any code. If you need to load in libraries, you’re going to have to use a tool like
or onejs. Ironically, these are normally used to bundle up code from libraries, to be used in browsers.
So we have some of the same difficulties as processes. We can’t load in a lot of code or data, because it will be duplicated for each thread. But overall my testing shows that it uses far less memory, and it has a much more stable memory footprint, probably because the same memory is holding serialized JSON as it enters and leaves the threads.
There’s only one downside I can think of to using threads; if one of those threads went haywire, you couldn’t kill it with a simple
kill command. You’d probably have to stop the entire web server.
The ultimate solution: code running on the server or the client
But now that we’ve got move ranking working inside of a separate process, with JSON-serialized messges, on a V8 instance, and no shared code or memory, another possibility arises; what if we simply did ran this on the client ?
What we need is an abstraction that allows us to have the same kind of parallelism that we have on the server, with processes and threads, on the client. The solution: Web Workers.
Most modern browsers can run Web Workers, but we still need to be ready for the ones that can’t. So here’s the plan:
- We test for the presence of Web Workers in the browser.
- If we have Web Workers, we ask the server to give us just the words that match our board, and we’ll figure out the moves locally.
- If not, we ask the server to do it all.
This works beautifully. The big dictionary stays where it is, and we serve requests for word-finding in tiny fractions of a second. More complex work is farmed out to our threads, and for many browsers, we can dispatch work off to the browser itself.
It’s not a perfect solution, since we incur the overhead of more I/O. In LetterPwn’s case, it might be sending a lot of JSON down to the client, which the client then boils down to just a few recommended moves. But the general pattern is a good one, especially if we can use it to avoid making subsequent requests to the server at all. We can apply this to any sort of ‘last-minute’ massaging of database search results, particularly pagination or sorting.
I wanted to see if Node.js was a practical solution, and I was more than satisfied. In the world of web apps, we often use highly dynamic languages with simple programming models.
But now Node.js is bringing a few other things to the party: easy background tasks, without really giving up our original simple programming model. And we can also keep giant data complex structures persistent in our request process, something which you can’t do at all in languages like PHP.
The real revelation is that we can now offload a lot of processing to the client, without any extra development time. Add a few automated scripts to recompile the ‘browserified’ scripts and you’re done.
Now we don’t have to think in terms of “server” and “client”, we now should think of what needs to be close to the server’s data, or is deeply entwined with it. Frameworks that blur the client-server distinction, like Meteor, Hoodie and more are cropping up on a daily basis.
The title of this blog post is a little provocative. Of course Node.js still has some pretty large defects, as a young platform, based on a language not known for its friendliness to programmers looking for computational performance.
But I’m not only sticking to the title’s claim, I’m even starting to believe it myself!
In theory, the complexity of this task is O(n!), where n is the number of words on the board, since combinations are involved. But in practice, this never happens, because if you can play one word almost everywhere on the board, that means other words can’t be played in as many ways. So it’s more like O(n), although with a large constant (a maximum of about 20n). I don’t think there are boards with more than 1.5 million moves to consider.