Large database and large number of documents

Hi all,

we would like to merge the contents of some old databases into 1 new database:

New size: ~ 40 GB

#docs: ~ 300000 Maindocs

           ~ 200000 ResponseDocs

The Maindocs will not contain any richtext field. Each responsedoc will consist of 1 plain text field and 1 richtext field containg 1 attachment ( scanned .pdf )

~ 20% of the responsedocs must be protected by readerfields and roles from the acl.

~ 500 docs/day will be changed or created.

We would like to create a FTI ( but not indexing the attachments )

Does anyone have experiences with such a large db ?

Each positive or negative info is welcome!

regards axel

Subject: Large database and large number of documents

hi axel,we had databases with many 100k docs.

my experience here: you should turn on the “don’t maintain unread marks” property, otherwise it will be painfully slow.

you also should be aware of modifying the readers field after the docs are created: the agent will probably work for a while.

robert

Subject: RE: Large database and large number of documents

Hello Robert,

thanks for your reply.

It’s a good hint as the read/unread marks are not necessary within this appl.

The readerfields are only computed when the doc is being saved.

regards axel

Subject: RE: Large database and large number of documents

Hi,

you could get in trouble if you need to display documents that your users have access. If you have, for example, 100 000 documents. Domino will have to “scan” every documents to validate if i’m authorize to see them. So, if the only document you can see is in position 99 999, it will be very long.

Look here:

About readers fields, three very interesting documents:

Performance Considerations for Domino Applications

http://www.redbooks.ibm.com/redbooks/pdfs/sg245602.pdf

Notes from Support: Reader Names fields can impact performance

and

technote : Use of ReaderNames fields slows view performance in Notes

Product: Lotus Notes > Lotus Notes > Version 7.0, 6.5, 6.0, 5.0

Platform(s): Mac OS, Windows

Doc Number: 1097609

Published 2006-05-22

http://www-1.ibm.com/support/docview.wss?uid=swg21247611

Excellent article

Great links here:

http://www-10.lotus.com/ldd/bpmpblog.nsf/dx/search.htm?opendocument&q=performance

Notes/Domino Best Practices: Performance

http://www-1.ibm.com/support/docview.wss?rs=463&uid=swg27008849

I hope for you that you have good servers.

Subject: RE: Large database and large number of documents

You will certainly want to be very thoughtful in your construction of your database. For example, I would put a lot of thought into:1) How many views does this db need to carry? Because of the large num of documents it’s helpful if you can reduce overall indexing size/time by reducing the complexity and number of views. Simple example, limit use categorized columns.

  1. When is the data updated? If it’s possible to position some of the updates at off hours, that’s great in terms of keeping view indexing minimized during the day.

  2. Think about how users access their data. With a large database/views, you might not want to encourage scrolling. With a small db, scrolling might be preferred. So think about what data users need to see and how to provide that data with minimal view accesses.

I suspect this will be a case where you’ll need to think for 40 hours and code for 4 hours, if that makes sense.

regards,

raphael

Subject: Large database and large number of documents

You are going to be pushing the envelope, but there is no guaranteed reason why this won’t work. I’ve seen databases this big that work well. I’ve also seen much smaller databases that were dogs. Try it and see. I’m sure everybody would be interested in the results.

Personally, I wouldn’t do it on 6.5.4 FP3. If I were going to push the envelope like that, I would want to be on something that is up to date. Either 7.0.3 or 8. Why? Two reasons. First, because there are thousands of bug fixes between 6.5.4 and today, and when you push the envelope in scale your chances of running into the sort of boundary conditions that are found in code as mature as Domino surely go up. Second, because if you do run into problems, one of the first things IBM support is going to want you to do is upgrade; and its better to be able to plan upgrades on your own schedule now than to be faced with an urgent need to do it later.

Transaction logging is an absolutely must. The fixup time on a database of this size will kill you if your server crashes. And be sure to follow IBM’s recommendation and put the transaction logs on a separate mirrored pair of disks.

Clearly, your hardware capabilities will be a factor as well. You’ll want the fastest i/o you can afford. Spreading the NSF across a striped array with as many spindles as you can support would be a good idea.

Frequent backups are a must. One of the things that can tend to happen in databases that get very large is occasional mysterious loss of one or two documents. I have no explanation; just experience to tell me that it does sometimes happen.

If you can batch your updates and apply them during off-hours, or avoid immediate index updates during working hours, then you will be better off with performance. Also, if updates are posted to a separate database and held until the batch update occurs, that gives you an easy way to implement your own incremental backup system.

The last thing I would say is the JY Riverin is quite right about the readername fields, but it all depends on what you are putting in those fields. If they contain individual names, and each user sees only a small percentage of documents, then you will undoubtedly have big performance problems. But if they contain one or more roles, and each user who has the role will nomally have access to all of these documents (or a large subset), then performance should be ok.

-rich

Subject: RE: Large database and large number of documents

Hi all,thanks for your suggestions. As it’s not my “own” db but a db of a customer, I’ll try to convince him to upgrade to Rel. 7.x at first. All the other HW aspects and transaction logging are already fulfilled.

Deeper analysis of the readerfield-problem shows:

no usernames in readerfields

only acl-roles used

99% of the Maindocs and ~80% of the responsedocs will be readable by everyone: there will be no readerfield

I will post further infos as soon as possible.

Best regards axel

Subject: Large database and large number of documents

one new finding:- a simple action agent which should delete old docs (db: >500000 docs, view as data source for the agent <100000) is unable to run: max. number of docs for a temporary fti is reached.

robert

Subject: RE: Large database and large number of documents

Hi all,bad news at first:

administration and operation of my customers IT infrastructure is sourced out to a third party.

This provider refuses to operate such a big database, as he is afraid the SLAs cannot be fulfilled in case of a crash.

good news:

They are going to upgrade to 7.x asap.

I’ve setup the database just for to know “Does it work?” on my own local server.

  • I’ve created an empty database from the “-Blank-” template

  • imported all documents by an external scriptagent and built the response-structures on the fly

  • Changed the db-design with the correct template

  • built all views ( this took about 6 hours ! )

  • created the FTI ( took another 6 hours )

Remember: this did all happen on my laptop ( no transaction logging, just 1 disk, but no read/unread-marks for the db )

Summary: Response-times when accessing documents, FTI-searches seem to be the same as for smaller databases.

Problems:

Agents triggered by “when documents are created or modified”:

When the agent runs for the first time the agent has to process ALL docs in the database. Building up the “unprocessed” documents list by the amgr takes more than half an hour

Take care for

@dblookup and @dbcolumn: remember the size limit l

thats all for the moment

axel