Subject: Large database and large number of documents
You are going to be pushing the envelope, but there is no guaranteed reason why this won’t work. I’ve seen databases this big that work well. I’ve also seen much smaller databases that were dogs. Try it and see. I’m sure everybody would be interested in the results.
Personally, I wouldn’t do it on 6.5.4 FP3. If I were going to push the envelope like that, I would want to be on something that is up to date. Either 7.0.3 or 8. Why? Two reasons. First, because there are thousands of bug fixes between 6.5.4 and today, and when you push the envelope in scale your chances of running into the sort of boundary conditions that are found in code as mature as Domino surely go up. Second, because if you do run into problems, one of the first things IBM support is going to want you to do is upgrade; and its better to be able to plan upgrades on your own schedule now than to be faced with an urgent need to do it later.
Transaction logging is an absolutely must. The fixup time on a database of this size will kill you if your server crashes. And be sure to follow IBM’s recommendation and put the transaction logs on a separate mirrored pair of disks.
Clearly, your hardware capabilities will be a factor as well. You’ll want the fastest i/o you can afford. Spreading the NSF across a striped array with as many spindles as you can support would be a good idea.
Frequent backups are a must. One of the things that can tend to happen in databases that get very large is occasional mysterious loss of one or two documents. I have no explanation; just experience to tell me that it does sometimes happen.
If you can batch your updates and apply them during off-hours, or avoid immediate index updates during working hours, then you will be better off with performance. Also, if updates are posted to a separate database and held until the batch update occurs, that gives you an easy way to implement your own incremental backup system.
The last thing I would say is the JY Riverin is quite right about the readername fields, but it all depends on what you are putting in those fields. If they contain individual names, and each user sees only a small percentage of documents, then you will undoubtedly have big performance problems. But if they contain one or more roles, and each user who has the role will nomally have access to all of these documents (or a large subset), then performance should be ok.
-rich