Has anyone attempted to do any “HTML Scraping” or “Screen Scraping” using Lotus Notes? Basically, I just want to capture the data in this website:
http://www.fms.treas.gov/c570/c570_a-z.html
And then do some things with it once I’ve captured it. In excel it can be done using a query table and xlWebQuery but I have no idea how I would actually “capture” the data using notes.
Any suggestions?
Subject: Use this class.
I posted the class I wrote on my blog.http://planetlotus.org/9e57a0
Makes it very easy to read the HTML code of a webpage and then you can parse it any way you like.
Subject: Success!
Many thanks for your response. I tried the method and all works well!
Subject: Ugly, but it works
Here’s some code I wrote in an agent as an experiment to get hold of exchange rates.
Dim ws As New NotesUIWorkspace
Dim session As New NotesSession
Dim db As NotesDatabase
Dim wdoc As NotesDocument
Dim url As String
Dim USD_rate As Double, EUR_rate As Double,JPY_rate As double
Dim pagetext As String
Dim bodyitem As notesitem
url = |http://www.x-rates.com/d/GBP/table.html|
Set db = session.currentdatabase
Set wdoc = db.getdocumentbyURL(url, 2, False)
'need to access text of Body Item (RT)
Set bodyitem = wdoc.Getfirstitem("Body")
pagetext = bodyitem.Text
Then I parse the pagetext to lift out the rates I want, using StrRight, StrRight etc. Note that ‘pagetext’ holds the text of the page, not html.
Tangentially, I just ran it and discovered that the pound is worth 5 more Euro cents than yesterday!!
Phil
Subject: Notes Error
I tried the method you mentioned and get a “The Web Navigator Retrieval process is not running” notes error. I’m not too familiar with the admin side of things, but from my research, it looks like this is something that I need running on the server. Can you verify this before I go to our Notes Admin and ask them to “Turn it on?” Again, many thanks for your help on this!
Subject: Ah 
I’ve only ever run this code on my local machine, rather than on a server. I’m sure a quick word with a friendly Admin person will resolve this.
Subject: Thank you!
Thanks so much for the quick response! This looks like exactly what I need.
Subject: Another approach you might want to consider as well…
What I’ve done in the past is use MSIE from within Notes i.e. Set ie = CreateObject(“InternetExplorer.Application”) etc. You can then navigate to the URL then use methods like getElementById and getElementsByTagName on the document to find what you want.