Parse html

I have a field on a form that contains an HTML dump. I would like to create a new document for each row in the HTML field. The data in the field is shown below. The new document will contain fields called country, exchange type, and exchange rate.

What is the best way to do this?

Here’s my HTML

"

European Central Bank

Date: 08 Nov 2007

US dollar (USD) = 1.4666

Japanese yen (JPY) = 165.9

Bulgarian lev (BGN) = 1.9558

Cyprus pound (CYP) = 0.5842

Czech koruna (CZK) = 26.894

Danish krone (DKK) = 7.4548

Estonian kroon (EEK) = 15.6466

Pound sterling (GBP) = 0.69625

Hungarian forint (HUF) = 253.34

Lithuanian litas (LTL) = 3.4528

Latvian lats (LVL) = 0.7017

Maltese lira (MTL) = 0.4293

Polish zloty (PLN) = 3.637

New Romanian leu (RON) = 3.403

Swedish krona (SEK) = 9.262

Slovak koruna (SKK) = 33.122

Swiss franc (CHF) = 1.6601

Icelandic krona (ISK) = 87

Norwegian krone (NOK) = 7.741

Croatian kuna (HRK) = 7.3365

Russian rouble (RUB) = 35.882

New Turkish lira (TRY) = 1.7359

Australian dollar (AUD) = 1.5783

Canadian dollar (CAD) = 1.3629

Chinese yuan renminbi (CNY) = 10.8832

Hong Kong dollar (HKD) = 11.3896

Indonesian rupiah (IDR) = 13396.66

South Korean won (KRW) = 1332.7

Malaysian ringgit (MYR) = 4.8845

New Zealand dollar (NZD) = 1.8923

Philippine peso (PHP) = 63.284

Singapore dollar (SGD) = 2.1117

Thai baht (THB) = 46.271

South African rand (ZAR) = 9.508

"

Subject: PARSE HTML

You did not specify:

  1. how the process is to be initiated

  2. which database is to receive the new documents

  3. which form should be used when creating the new documents

  4. the name of the field containing the HTML

so I made a few assumptions below.

Note that there is little error checking and I make a few assumptions when parsing, like that the country name will not contain parentheses and that otherwise, your sample file will not change in things such as how the break tag is created. You should be able to take this and modify it if need be.

Sub Click(Source As Button)

Const FIELD_WITH_HTML = "MyHTMLfield" 'Whatever the field name actually is

Const NEWDOC_FORM = "My Most Excellent Form" 'Whatever the form name is for the new documents you create



Dim uiws As New NotesUIWorkspace

Dim uidoc As NotesUIDocument

Dim doc As NotesDocument

Dim db As NotesDatabase

Dim newDoc As NotesDocument

Dim strHTML As String

Dim strCurrencies As String

Dim varCurrencies As Variant

Dim strCountry As String

Dim strCurrencyCode As String

Dim dblExchangeRate As Double

Dim crlf_chars(1) As String



crlf_chars(0) = Chr(10)

crlf_chars(1) = Chr(13)



Set uidoc = uiws.CurrentDocument

Set doc = uidoc.Document

Set db = doc.ParentDatabase 'Are you creating the new documents in the same database???



strHTML = doc.GetItemValue(FIELD_WITH_HTML)(0) 'But for testing purposes, I will just hard-code in the next line



strHTML = {<body>

European Central Bank

Date: 08 Nov 2007

US dollar (USD) = 1.4666

Japanese yen (JPY) = 165.9

Bulgarian lev (BGN) = 1.9558

Cyprus pound (CYP) = 0.5842

Czech koruna (CZK) = 26.894

Danish krone (DKK) = 7.4548

Estonian kroon (EEK) = 15.6466

Pound sterling (GBP) = 0.69625

Hungarian forint (HUF) = 253.34

Lithuanian litas (LTL) = 3.4528

Latvian lats (LVL) = 0.7017

Maltese lira (MTL) = 0.4293

Polish zloty (PLN) = 3.637

New Romanian leu (RON) = 3.403

Swedish krona (SEK) = 9.262

Slovak koruna (SKK) = 33.122

Swiss franc (CHF) = 1.6601

Icelandic krona (ISK) = 87

Norwegian krone (NOK) = 7.741

Croatian kuna (HRK) = 7.3365

Russian rouble (RUB) = 35.882

New Turkish lira (TRY) = 1.7359

Australian dollar (AUD) = 1.5783

Canadian dollar (CAD) = 1.3629

Chinese yuan renminbi (CNY) = 10.8832

Hong Kong dollar (HKD) = 11.3896

Indonesian rupiah (IDR) = 13396.66

South Korean won (KRW) = 1332.7

Malaysian ringgit (MYR) = 4.8845

New Zealand dollar (NZD) = 1.8923

Philippine peso (PHP) = 63.284

Singapore dollar (SGD) = 2.1117

Thai baht (THB) = 46.271

South African rand (ZAR) = 9.508

}
strCurrencies = Trim$(Strleft(Strright(strHTML, "</h1>"), "</body>"))



varCurrencies = Split(strCurrencies, "<br />")



Forall currencyLine In varCurrencies

	

	currencyLine = Replace(currencyLine, crlf_chars, "")

	

	If currencyLine <> "" Then

		strCountry = Trim$(Strleft(currencyLine, "("))

		strCurrencyCode = Strrightback(Strleft(currencyLine, ")"), "(")

		dblExchangeRate = Cdbl(Trim$(Strrightback(currencyLine, "=")))

		

		'Print strCountry & " | " & strCurrencyCode & " | " & dblExchangeRate

		

		Set newDoc = db.CreateDocument

		With newDoc

			.ReplaceItemValue "Form", NEWDOC_FORM

			.ReplaceItemValue "country", strCountry

			.ReplaceItemValue "exchange_type", strCurrencyCode

			.ReplaceItemValue "exchange_rate", dblExchangeRate

			.Save True, False

		End With

	End If

	

End Forall

End Sub

Subject: One other thing…

Presumably you are getting updates from past exchange rate quotations. What are you planning on doing with the older documents?

Subject: RE: One other thing…

I haven’t thought about it yet.

Thank you for all of your help. I did modify your code to include the info that I had left out – form name, etc. and then tried to run it and receive an error message that

“only plain text can be used in this type of field”

I tried to debug to see where the error was occurring, but it comes up as soon as I click the button. Any ideas?

Subject: RE: One other thing…

Hmmm - the error must be with how you are getting the HTML into the field, I would guess…

http://www-10.lotus.com/ldd/nd6forum.nsf/Search?SearchView&Query="only%20pla%3Fn%20text%20can%20be%20used%20in%20this%20type%20of%20field"&SearchOrder=0&Start=1&Count=100

The code ran fine when I tried strHTML hard-coded to the sample value you gave in your original posting, so as long as the text is in that field, the parsing and document creation part should work fine.

Subject: RE: One other thing…

That fixed it.

Thank you again for your help.

Subject: PARSE HTML

Here’s the easiest way I know of. This code is set up to run from a button on the existing document; if you want it to run as a scheduled agent (or any kind of agent), you’ll need to make a few changes.

Sub Click(Source As Button)

On Error Goto errhand

Dim w As New notesuiworkspace

Dim db As notesdatabase

Dim doc As notesdocument

Dim newDoc As notesdocument

Dim uidoc As notesuidocument

Dim htmlField As notesitem

Dim parsedField As notesitem



Set db = w.currentdatabase.database

Set uidoc = w.currentdocument

Set doc = uidoc.document

Set htmlField = doc.getfirstitem("htmlField")

doc.parsedField = Split(htmlField.Text, "<br />")



Forall x In doc.parsedField

	'create new document here

	Set newDoc = db.createdocument

	With newDoc

		.form = "yourFormName"

		.exchangeRate = Strrightback(x, "=")

		.currency = Strleft(x, "=")

	End With

	Call newDoc.Save(True, True)		

End Forall



Exit Sub

errhand:

Msgbox "error at line " & Erl & ": " & Error

Exit Sub

End Sub

Subject: PARSE HTML

use the getformattedtext method in to extract the contents to a string then u can parse the string and extract the elements and create new documents.