A small drawback of 64-bit machines

September 10, 2007

It used to be that on a large memory 32-bit compute server, no single process could run away and exhaust all of the machine's memory. On an eight or sixteen gigabyte machine, processes ran into the 3 gigabyte (max) or so limit on per-process virtual address space well before they could run the machine itself into the ground.

(On a large enough machine you could survive a couple of such processes.)

This is no longer true on 64-bit large memory compute servers, as I noticed today; it is now possible for a single runaway process to take even a 32 gigabyte machine into an out of memory situation. I am now a bit nervous of what the kernel's OOM handling will do to us, since these are shared machines that can be running jobs for several people at once.

(Adding more swap space is probably not the solution.)

I have to say that the kernel OOM log messages are a beautiful case of messages being logged for developers instead of sysadmins. As a sysadmin, I would like a list of the top few processes by OOM score, with information like their start time, total memory usage, and their recent growth in memory usage if that information is available.

(And on machines with lots of CPUs, the kernel OOM messages get rather verbose. I hate to think what they will be like on our 16-core machine.)

Written on 10 September 2007.
« Rethinking my views of Fibrechannel
Why I dislike ATX power supplies »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Sep 10 23:36:12 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.