ISO: Someone using the new enhanced MAT2STR and STR2MAT

Post by **Susan Smith** » Sun Feb 03, 2013 11:39 am

Hi everybody,

I'm looking for somebody to show us (at the conference) the new enhanced STR2MAT and MAT2STR functions, as they relate to parsing CSV files. Many of you are using CSV files for all sorts of things, so this should be quite applicable to many of you. The same concepts would apply to parsing XML files.

Are any of you using the new BR 4.3 enhanced functions and would you like to come up with a few short VERY SIMPLE and CLEAR examples and tell us about it at the conference? (the simpler, the better - just to demonstrate the concept in a bare bones way)

Let me know! Thanks

-- Susan

Here is the applicable section of the BR 4.3 release notes:

CSV AND XML PARSING ENHANCEMENTS
The string to mat and mat to string functions have been extended to ease parsing of CSV and XML data.

STR2MAT( str$, MAT zzz$ [, [MAT] Sep$ [, flags$]] )
Where Sep$ may be an array and flag$ is in the format:
[ quote-type ] [ :LTRM ] | [ :TRIM ] | [ :RTRM ]
Where quote-type can be Q, QUOTES, ('), or ("), case insensitive. Q and QUOTES denote standard BR quote processing. The trim flags denote post processing of extracted elements and the leading colon is only present when quote-type is specified.
When Sep$ is an array, then any or all of the specified values are deemed to represent a single separator with the qualification that any one separator, cannot occur more than once in a string of adjacent separators. To restate this, when elements of a Sep$ array occur adjacent to each other within the source string, they are grouped as a separator substring.
Sep$ elements cannot occur more than once in a separator substring. When they do, it denotes the specification of a null element. e.g. two successive commas or two successive occurrences of CR+LF both denote null elements. Essentially when Sep$ elements are 'consumed' by their recognition within the source string, then they cannot be re-recognized without inserting a null element into the output array.
7a) Quote Processing
Quotation marks suppress the recognition of separators in accordance with the following rules.
Standard BR Quote Processing
When examining str$ left to right, the first character (and the first character after each separator) is checked to see if is either (') or ("). If it is ether of those then it activates quotation processing which suppresses the recognition of separators until quotation processing is deactivated. The first character thus becomes the governing quote type until quotation processing is deactivated.

The string is copied until it ends or until an odd number of successive occurrences of the governing quote type is encountered. During this processing, two adjacent occurrences of the governing quote character denote an embedded occurrence of the quote character.
Examples
"abc,def" -> abc,def where the comma is not recognized as a separator and is part of the data
abc"def -> abc"def naturally embedded quotes may occur anywhere within a string after the first character
"abc"def" -> abcdef" quotation processing is deactivated by the center quote mark
"abcdef" -> abcdef normal data
"abc'def" -> abc'def the single quote is treated like any other character while double quotes govern
'abc"def' -> abc"def double quotes are treated like any other character while single quotes govern
"abc""def" -> abc"def pairs of governing quotes denote a single embedded quote
"abc"""def" -> abc"def" the third successive occurrence deactivates quote processing

MAT2STR( MAT zzz$, str$ [, sep$ [, flags$]] )
Where flag$ is in the format:
[ quote-type ] [ :LTRM ] | [ :TRIM ] | [ :RTRM ]
Where quote-type can be Q, QUOTES, ('), or ("), case insensitive. Quote-type denotes that each element should be enclosed in quotation marks. The trim flags denote pre-processing of array elements and the leading colon is only present when quote-type is specified.
If Q or QUOTES is specified then BR automatically determines which quote type to apply as follows:
First the element is unpacked. That is, if it is contained in quotes, the quotes are stripped and embedded pairs are singled. Next the element is scanned left to right for either type of quote character ( single or double ). If a quote character is encountered the element is enclosed in the alternate quote type and embedded occurrences of that quote type are doubled. If no quote character is encountered then double quotes are applied.
Examples
Quote Type is Q or QUOTES
abcdef -> "abcdef"
abc'def -> "abc'def"
abc"def -> 'abc"def'
abc""def -> ‘abc""def’ embedded quotes are left intact when quotes are not active
'abcdef -> "'abcdef"

Quote Type is ' ( quote type single )
abcdef -> 'abcdef'
'abcdef -> '''abcdef' single quotes get doubled when embedded in single quotes
"abcdef -> '"abcdef' leading double quote is treated normally

Quote type double mirrors quote type single.
MAT2STR and STR2MAT trim outside of quotes but not inside of quotes. Also MAT2STR always adds quotes when quotes are present in the data.
When using MAT2STR on a 2 dimensional array, the first delimiter is used for individual elements and the second delimiter at the end of each row. This principle also applies to three to seven dimensions.
Example
Given the following two dimensional array zzz$ containing the values-
1 2
3 4

The following statements-
10 Sep$(1)=","
20 Sep$(2)=hex$("0D0A") ! CRLF
30 MAT2STR( MAT zzz$, str$, MAT Sep$ )
40 PRINT str$

Will produce-
1,2
3,4

7b) CSV Parsing Example
Parsing CSV data files is now quite easy, the following code snippet demonstrates how to open a CSV/Tab File, read in the fields from the header, and then loop through the records.
01000 dim CSV_LINE$*999,CSV_FILE$*256, CSV_DELIM$*1, CSV_HEADER$*999,
CSV_FIELDS$(1)*40, CSV_DATA$(1)*60
01020 form C," "
01040 let CSV_FILE$="Sample_File.tab" : let TAB$=CHR$(9)
01060 open #(CSV_HANDLE:=10): "name="&CSV_FILE$&",shr",display,input
01080 linput #CSV_HANDLE: CSV_HEADER$
01100 let CSV_DELIM$=TAB$
01120 if POS(CSV_HEADER$,TAB$) <= 0 then
01140 let CSV_DELIM$=","
01160 end if
01180 let STR2MAT(CSV_HEADER$, MAT CSV_FIELDS$, CSV_DELIM$, "QUOTES:TRIM")
01200 print using 1020: MAT CSV_FIELDS$
01220 do
01240 linput #CSV_HANDLE: CSV_LINE$ eof Exit_Csv
01260 let STR2MAT(CSV_LINE$,MAT CSV_DATA$,CSV_DELIM$,"Q:trim")
01280 print using 1020: MAT CSV_DATA$
01300 loop
01320 Exit_Csv: !

You might wish to copy any CSV file to Sample_File.tab and run this program to view the content.

7c) XML Parsing Examples

STR2MAT may also be used to Parse XML data.
This is a bit more complex than parsing CSV files, but remains a powerful tool.
The following example will parse XML$ into "MAT XML_LINE$"

10 DIM XML$*999999,XML_LINE$(1)*32000
20 XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM></NODE></XML>"
100 LET Str2mat(XML$,Mat XML_LINE$,">","TRIM")

This makes the parsing of XML a bit more convenient. The following XML sample shows how the function will parse the data

Input:
<XML><NODE><ITEM>ITEM VALUE</ITEM></NODE></XML>

Output:
<XML
<NODE
<ITEM
ITEM VALUE</ITEM
</NODE
</XML

While the above technique is useful, a more complete and useful technique can be performed if the Node names are known. You may use an array of SEP$ values to parse the data. Take the following example:

100 dim XML$*999999,XML_LINE$(1)*32000,SEP$(4)*32
110 let XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM><ITEM2>ITEM2 VALUE</ITEM2></NODE></XML>"
120 read MAT SEP$
130 data </XML>,</NODE>,</ITEM>,</ITEM2>
140 let STR2MAT(XML$,MAT XML_LINE$,MAT SEP$,"TRIM")
150 print MAT XML_LINE$

This program would return the following results:

<XML><NODE><ITEM>ITEM VALUE
<ITEM2>ITEM2 VALUE

Notice that "Nested Nodes" are listed before the initial data, this may be used to identify the node.