Reliability is not only about overload: RANDOMize your system state

I never used mnemonics. Never say never... Here is my own mnemonic. Double RANDOM stands for ReadRights AalternativeAccess, NoNetwork, DesynchronyzedData, OutOf(whatever resource), MinMax(installation). Depending on context (e.g. reliability requirements, etc.) I spend more or less time trying to emulate system states unexpected by developer and thus unhandled by code.

Detailed explanation of mnemonic
So there are 5 items, prioritized from the one that I would start with ending with ones that I seldom use myself. Each with a short example of how to test and why bugs found by such test needs to be taken into account.
RR – read rights. How to test: set read only rights to whatever part of file system you are accessing: folder where application is installed to, temporal file storage, try to install application to CD disc, etc.). Why important: you have to think yourself which tests are important: do you want installation to report user that it can’t write to the destination or is it OK that user get’s whatever Operation System error? Do you expect files to be copied from CD (with read-only flag set on by default), etc. etc. Do you expect software to work in secured environment, shared by multiple users – well, this partially overlap with the next item?
AA – Alternative Access How to test: use network path, ideally on a different operational system, instead of local disk to: install from/to; store system/temp data; etc. Install with one user, use with another. Use application without administrator right for: local operational system, database, etc.) Why/When is it important?: systems supposed to be used in company/organization local network are typically to need to support all those things and customer expects by default that our software will support it.
NN – no network - for a limited period of time. How to test: simply unplug network cable for a few seconds and plug it back in, do it on client and server, try to perform some actions in between. Setup firewall to require manual confirmation and delay confirmation for a minute or so). Why/when important: well network becomes more reliable nowadays, but people get used to notebooks and WiFi, and there may be areas in any office where network is not accessible. Not to mention that users expect plug-and-play.
DD – Desynchronized Data : client and server or several server hosts have different: local time, regional settings, custom data (for example client may store some unique ID that may be no more unique since temporally lost synchronization – partially overlap with previous item).
OO – Out Of (disk space, memory, tablespace (oracle), licensed/custom limits: connection count, files in a single folder, items in a predefined array). How to test: use a tool (see below) or manually make any system resource to be close to it’s limits (e.g. copy fake data to you local hard drive so that it is almost full) and try to then run your application. Why/When it is important: those things may happen under unexpected load or after certain period of time (months, years). You may want to make sure system does not corrupt existing data in this case (and maybe even test that it shuts down gracefully).
MM – Min/Max installation: install only minimal set of software required (only OS and prerequisites). Install multiple versions of 3rd party software required, etc. (e.g. two or more Oracle homes, disk keeper, etc.). You better read this post and it’s comments. My experience was with having multiple Oracle homes and application using non-default algorithm to find the one to use… Why/when important? Again –refer to the link above, I could only add that sometimes application installation misses to include some dll files that developer have in his environment…

Why I like "double RANDOM"
As a perfectionist and poet (in my soul) I was unable to resist desire to create meaningful mnemonic. So I came up with random. No, not because I want to do random testing. Random is a way to emulate something that is impossible to predict.
This is what you need to test: situations unpredicted (unexpected) by developers. For example - developer may have predicted that users will sometimes enter wrong values, but what if they enter them so frequently that error log file increase enormously and eat all the hard drive space...? What if developer believes to each all the errors and fire exceptions, but there are an error type that is not caught?
That's why it is double random: you want to test situations that goes beyond "predicted unexpected" cases. You want to cause system states that developer never imagined to happen. Like bad memory block...

Tools supporting
There are tools supporting this type of testing. For example I’ve once tried holodeck. I can recommend it, but … well only if you have a time to good hardware. I prefer to do simple tests as described by mnemonic first and only continue with tools if it either seems that the system/technology appears to poorly resist situations of „denying it the resources”, or it is critical aspect of a system.

The scope of so called stress testing?
Years ago I used to call this type of testing a stress testing. I’ve figured out that term stress testing is already reserved in this industry for testing system „beyond the limits of normal operation”.
Yet a bit older definition I’ve found (I assume it is definition by Boris Beizer) is almost the same „an unreasonable load while denying it the resources needed to process that load” - if we stress load and ignore „while denying it the resources”. On the other hand we could quite well deny system some resources without involving load. This gives me a whole set of tests that are already called in a different names... Whatever the names are – I give you the mnemonic to remember all tests that you could do without need to write any automated script to be run under load. The good thing about such tests – they are easy to repeat and to diagnose the problem if it manifests. It is not so simple if you are running system under heavy load.