Attachments and mail nsf sizes

Hi,we have a large amount of repeated data in the form of attachments copied into others in the company, and in multiple emails. Probably over 300Gb worth.

Is there a script, java/lss (other via COM) that anyone knows that can pull the names of all attachments, find their sizes, and create md5, and finally save to a file system with NTFS ACLs for the users mail file,

I want to write this script, so we can save to a file server but dont want to reinvent the wheel.

I know about QuickR, and various commercial projects.

I am open to all suggestions, and will post back the final solution if it will benefit anyone.

matt

Subject: Coming soon: DAOS (8.5)

You may not be aware of it, but you just described (almost exactly) the new Domino Attachment and Object Service coming Domino 8.5.

More information from the (beta) Admin 8.5 help: http://infocenters.lotus.com/help7/topic/com.ibm.help.domino.admin85.doc/DOC/H_ATTACHMENT_CONSOLIDATION_OVER.html

Subject: I personally don’t like DAOS

I personally don’t like DAOS since it leaves plenty of files in the file system… there may be servers that receive thousands of emails per day, son in few months you may get millons of files in the DAOS subdirectories. Not good technically speaking. On the top of that it is unsecure, I tested sending an email disabling the huffman compression and in the file system I saw this txt file can be read directly in the DAOS folder! What about backup? If you want to backup/restore only 1 Db or a couple of dbs you must use a domino dedicated backup utility and you cannot use anymore what you may have since you don’t know which files belong to wich DBs!

Also you cannot copy the DBs via file system,

you need to enable transaction logging and you may not want to enable it.

Subject: What’s not to like?

“I personally don’t like DAOS since it leaves plenty of files in the file system… there may be servers that receive thousands of emails per day, son in few months you may get millons of files in the DAOS subdirectories. Not good technically speaking.”

What’s not good about it? Do you think your operating system has trouble with large numbers of individual files in the OS? Do you think you’re better off letting Domino manage hundreds of copies of those millions of files inside NSF object references?

“On the top of that it is unsecure, I tested sending an email disabling the huffman compression and in the file system I saw this txt file can be read directly in the DAOS folder!”

Really? Because the DAOS doesn’t write the files with their original extensions to the drive. It doesn’t even write the exact original file, as it also maintains a hash of the file so that it can uniquely identify different versions of the same filename.

Oh yeah, and it locally encrypts the file when it writes it to the DAOS subfolder.

So I’m quite curious what you actually tested.

“What about backup? If you want to backup/restore only 1 Db or a couple of dbs you must use a domino dedicated backup utility and you cannot use anymore what you may have since you don’t know which files belong to wich DBs!”

Backup is where DAOS really shines. If you’re running regular Notes mail today, consider that NSF backup is making a complete daily backup of the entire history of a user’s email. If I send a 1MB attachment to 100 people, that attachment is duplicated 100 times and consumes 100MB. But if I then take backups of those mail NSF for 100 days, the attachment is backed up 100 times FOR EACH of the 100 users, and therefore consumes 10GB of backup space.

With DAOS, the file is backed up exactly ONCE. If the file in the attachment object folder doesn’t change, then the backup system doesn’t need to back it up again, does it? That is a 5-orders-of-magnitude improvement in the archival requirement for that file.

"Also you cannot copy the DBs via file system,

you need to enable transaction logging and you may not want to enable it."

If you’re presently moving NSFs from server to server via the operating system, you’re not properly managing your Domino servers. If you’re NOT using transaction logs, you’re not properly managing your Domino servers.

I’ll admit that “different” doesn’t always mean “better,” but if you’re saying that you just don’t want to bother using the new capabilities of the Domino server since R5, it would certainly seem that you’re leaving a lot of scaling and reliability improvements on the table.

Personally, I wish my job was so easy that I could put an extra drive in a server, flip a few server doc and run a command line – and suddenly discover that my servers were faster, cheaper, had more capacity and ran more reliably than ever before. But if you prefer to just waste your time and money, that’s up to you. I hope your employer is aware of what you’re doing.

Subject: DAOS sounds like SCOS

DAOS sounds a lot like Single copy Object Store (SCOS). I had a client with lots of email attachments and implemented SCOS. This caused many problems and was abandoned. Review forums for 6 & 7 of Domino for SCOS.

Subject: Still it has the potential to serve way better

  • it is using the file system, not databases (works faster, easier for Backup systems)

  • DAOS files in the file system stay for some time, even when the last document refering to it is deleted => a lot of backup scenarios don’t need to care about it (and if I understood right, this was a big issue with scos).

  • looks just simply as the simpler approach (Start with attachments now, not with ‘everything’, and see later where the next steps go)

To turn it the other way around, from what you descripe, it does not look like you are going to be an early adopter anyhow …

Subject: DAOS is a nightmare! I would advise not to touch it

  • Cluster servers (you need one DAOS repository per server)

  • DAOS Requires transaction logging (transaction logs should be on another server so it doubles your numbers of servers) and will reduce your server performance by one third.

  • Security (you can visit directly the attached files in the DAOS file system repository (even when in Huffman format, you can easily decode them by program). It is so very easy for a hacker to have a program scanning all the attached files of all of the users searching for particular keywords (EVEN THE ONES IN DELETED EMAILS because they are not removed automatically) and to pick only the ones needed) on a USB flash drive.

  • Consistency / backups. You cannot backup a NSF file doing a file copy. The attached files are not in anymore.

  • What about millions of file inside a Windows file system??? Is it only supported???

This is simply crazy to see IBM providing a such unprofessional answer to volume issues.

It is well known by any senior Notes/database administrator that moving files outside databases (and particularly for email databases) and replacing them by links is the source of tons of problems because the consistency and security of the databases is lost.

These kinds of “features”, like the previous SCOS “Single copy Object Store” that as NEVER worked, will kill Notes in time.

Subject: Some comments on your assessment of DAOS

  • Cluster servers (you need one DAOS repository per server)This is not a cluster specific requirement. That is the case for any server on which DAOS is enabled. The Domino clustering model is a “shared nothing” model, why would you expect DAOS to be shared ?

  • DAOS Requires transaction logging (transaction logs should be on another server so it doubles your numbers of servers) and will reduce your server performance by one third.

TL’s have never been remote or on another server, do you mean another disk ?

  • Security (you can visit directly the attached files in the DAOS file system repository (even when in Huffman format, you can easily decode them by program). It is so very easy for a hacker to have a program scanning all the attached files of all of the users searching for particular keywords (EVEN THE ONES IN DELETED EMAILS because they are not removed automatically) and to pick only the ones needed) on a USB flash drive.

You can do the same with any Domino data if the server itself is compromised. This does not relieve you of the responsibility for ensuring all access methods to the server in question are secured, not just from a Notes client or browser, but physical, remote network share, ftp, etc.

  • Consistency / backups. You cannot backup a NSF file doing a file copy. The attached files are not in anymore.

You can absolutely back up the NSF using a file copy, but you also have to back up the DAOS content. The good news is that the data volume is much lower with DAOS than if you back up a traditional set of mail data.

  • What about millions of file inside a Windows file system??? Is it only supported???

No idea what this means. Are you questioning how many files can be stored in the file system ?

This is simply crazy to see IBM providing a such unprofessional answer to volume issues.

It is well known by any senior Notes/database administrator that moving files outside databases (and particularly for email databases) and replacing them by links is the source of tons of problems because the consistency and security of the databases is lost.

These kinds of “features”, like the previous SCOS “Single copy Object Store” that as NEVER worked, will kill Notes in time.

The reason DAOS was simplified into a file system repository is because of the problems with SCOS.

So if you don’t want a SCOS and you don’t want the files in a file system, what is your solution to de-duplicating attachment storage ?

Subject: You know about this from all your vast DAOS experience?

DAOS is not SCOS.

I can’t find one sentence in your reply that’s supported by evidence or experience.

Subject: Not sure if this leads anywhere - still my comments

DAOS is a nightmare! I would advise not to touch it=> Nobody is focing you to do so - and one of the very best Admin I know did detailed testings even before it got public beta, and was impressed by it.

  • Cluster servers (you need one DAOS repository per server)

=> If it would not be so, I’d be asking for at least an option to get it this way (cluster failover - one server down, I still want all data accessible)

  • DAOS Requires transaction logging (transaction logs should be on another server so it doubles your numbers of servers) and will reduce your server performance by one third.

=> from all what I know, transactional logging is, if making a noteable change in server performance at all, as likely going to improve server performance than to cost anything. (Based on high load server and dedicated harddrives for the log - I have never seen any other numbers). If there is different numbers out there, I would like to learn (Do you happen to have any links?)

  • Security (you can visit directly the attached files in the DAOS file system repository (even when in Huffman format, you can easily decode them by program). It is so very easy for a hacker to have a program scanning all the attached files of all of the users searching for particular keywords (EVEN THE ONES IN DELETED EMAILS because they are not removed automatically) and to pick only the ones needed) on a USB flash drive.

=> From what I know, it can be switched on and off database by database. Everyone is free to use and decide any way.

=> From what I know, the files are removed, but with a time distance of something like 2 or 4 weeks. This way for many restore procedures, it is enough to restore the database, as the files are likely to still be there. To me this looks like a good compromise between deleting immediately and never - there is good reasons to ask for both.

=> Encrypted file systems are available - often enough out of the box of the OS manufacturer, and I guess it is a tough world we live in, with lots of other problems, too, if we can’t expect harddrives and their access in carefully monitored servers in server rooms to be save in general.

  • Consistency / backups. You cannot backup a NSF file doing a file copy. The attached files are not in anymore.

=> True. Not to store an attachment send to 12 people at the same time in 12 different databases was the intended target from the very beginning. If it is a requirement to store it 12 times (as basic to have it on tape 12 times) I guess it is the wrong tool - But I honestly don’t see how something working better for this purpose would have to be designed.

  • What about millions of file inside a Windows file system??? Is it only supported???

=> Sorry, not sure, what you are trying to say. - A limitation of the number of files on the harddrive? Or in one single Folder in the harddrive? - I honestly don’t know where the limitations are, if they would apply, and if the propper measurements are done.

This is simply crazy to see IBM providing a such unprofessional answer to volume issues.

=> If you care to do so, can you share the idea how something better from your point of view would have to look like?

It is well known by any senior Notes/database administrator that moving files outside databases (and particularly for email databases) and replacing them by links is the source of tons of problems because the consistency and security of the databases is lost.

=> Not sure if a file in an nsf is more secure than outside in general, but I take your point.

These kinds of “features”, like the previous SCOS “Single copy Object Store” that as NEVER worked, will kill Notes in time.

=> I take this point too, but for me, going thru the SCOS concept ment to be convinced, that this is not what I want, and DAOS is -from a high level view- different, and to me worth trying. However, if you got bitten by SCOS, and decide to stay away from DAOS at least for now, I can understand this very well.

Subject: According to IBM, DAOS does not share a single line of code with SCOS

SCOS had several fundamental problems…

  1. it tried to single-copy store all non-summary compound document structures, instead of just file attachments.

  2. it stored them in an NSF structure with special rules, rather than simply on the file system.

  3. because there were multiple SCOS NSFs on a server, it was possible to lose track of the references in the event of a server failure.

DAOS is a much simpler and more atomic approach. It’s being tested and performing reliably by a number of major customers and partners.

Subject: DAOS works great

I have tested DAOS in 8,5 beta 1 and I think it is a great solution to a long standing problem.

Not only does it solve the redundant attachment problem but it also harmonizes well with usual backup scenarios (e.g. Tivoli Storage Manager).

Performance was good and it is truly transparent on the NRPC level. That means clients and other servers simply see the NSFs as they always were.

I personally almost can’t wait to get 8.5 to have DAOS on production servers.

Those counter arguments don’t count for me.

What’s the problem with transaction logging? ANY Domino server I set up gets it. ANY Domino server SHOULD have transaction logging. Running one without it is like gambling.

Security of the file system is not really an issue. You admins have to restrict and secure access to the file systems hosting Domino servers anyway (think about the server ID files laying there …).

And there is no problem with a DAOS repository on every server of a network (or cluster). Without DAOS, every server that has replicas of databases with attachments has those attachment copies, too. So DAOS on every server simply saves storage on every server. And the DAOS repository on every server will only hold those attachments that are needed on it.

My $0.02

Kai Uwe