STR2MAT - Quote Processing

General development discussion.

Moderators: Susan Smith, admin, Gabriel

Post Reply
Susan Smith
Posts: 717
Joined: Sun Aug 10, 2008 4:24 am
Location: Southern California

STR2MAT - Quote Processing

Post by Susan Smith »

Can I have embedded quotes in strings where I use STR2MAT to parse? I do not understand the information on the wiki for MAT2STR Quote Processing. I am using BR 4.2 and I am parsing CSV file records into an array. One of the fields in the CSV file is a name with an embedded comma - in the format of LASTNAME, FIRSTNAME
In order to keep that embedded comma, I have encapsulated (surrounded) the name with double quotes as you can see directly below in the sample csv file. STR2MAT is ignoring the quotes and separating the line at the comma.

CSVFILE contents (the columns are account number, name, zipcode):
12345,"Anderson, Chloe",91105
12346,"Brown, Tom",91104

00010 DIM PART$(3)*100,QLINE$*200
00012 open #1,"Name=csvfile.csv",display,input
00020 LINPUT #CSVFILE: QLINE$ eof endfile
00030 LET STR2MAT(QLINE$,mat part$,",")
00040 pause
00050 goto 20
00060 endfile: pause

print part$(2)
"Anderson

print part$(3)
Chloe"

And there is a part$(4) - even though I dimmed it at (3) - which contains the zipcode 91105

Why wouldn't part$(2) be
Anderson, Chloe

and part$(3) be
91105

Perhaps I don't understand the documentation in terms of quote processing. Or is it that I cannot deal with quote processing in 4.2? If the latter is the case, I will use David Blankenship's FNPARSETEXT utility to parse them instead.

-- Susan
Gabriel
Posts: 412
Joined: Sun Aug 10, 2008 7:37 am
Location: Arlington, TX
Contact:

Re: STR2MAT - Quote Processing

Post by Gabriel »

You want to use the flag$ parameter to tell BR to handle the quotes for you.

http://brwiki2.brulescorp.com/index.php?title=STR2MAT

Try this:

Code: Select all

00030 LET STR2MAT(QLINE$,mat part$,",","Q")
or

Code: Select all

00030 LET STR2MAT(QLINE$,mat part$,",","Q:TRIM")
However, it does look like that ability is available in BR 4.3 only.

Gabriel
Gabriel
Posts: 412
Joined: Sun Aug 10, 2008 7:37 am
Location: Arlington, TX
Contact:

Re: STR2MAT - Quote Processing

Post by Gabriel »

You will still have problems with embedded Carriage Returns though, because LINPUT will only read up to the first Carriage Return.


If you're trying to read a CSV file, you can use this function if you like. Add line numbers using Lexi.

If people wanted, I was thinking of including this functionality into FileIO or something like it. So you can just "open" a CSV file using a function and it automatically reads the first line and uses that to make the subscripts. Then you can call "fnReadCSVFile" and it will automatically read a line at a time of a CSV file, including embedded CR's and handling Quotes properly, and you get back an Array which you can then use the CSV file subscripts to access.

Code: Select all

 
 dim CSVImportString$*30000
 dim CSV_TempImport$(1)*10000
 dim CSV_Temp$(1)*10000

 def library fnReadCSVFile(CsvFile,mat CsvFile$;Delimiter$,Quote$,___,Index,Length,TopIndex,EndOfFile)
    if Delimiter$="" then let Delimiter$=","
    if Quote$="" then let Quote$=""""
    mat CsvFile$=("")

    linput #CsvFile: CSVImportString$ eof Ignore

    ! if CSVImportString$(1:3)="463" then pause
    
    if file(CsvFile)=0 then
       mat Csv_TempImport$=("")

       if CSVImportString$="" then let CSVImportString$=" " ! Need at least one space.
       str2mat(CSVImportString$,mat Csv_TempImport$,Delimiter$)

       ! go through all of them, strip the quotes. If one has a front quote but no end quote, we need to
          ! put it together with the next one. If there is no next one, read another line and put it together with the next one.

       let Index=0
       do while Index<udim(mat Csv_TempImport$)
          let Index+=1

          let Csv_TempImport$(Index)=trim$(Csv_TempImport$(Index))

          if Csv_TempImport$(Index)(1:1)=Quote$ then
             let Length=len(Csv_TempImport$(Index))
             if Csv_TempImport$(Index)(Length:Length)=Quote$ and Length>1 then
                ! Strip the Quote$ off, and we're good.
                let Csv_TempImport$(Index)=Csv_TempImport$(Index)(1+len(Quote$):Length-len(Quote$))
             else
                ! We need to read the next one.
                if Index<udim(mat Csv_TempImport$) then
                   ! There's a next one, slide it into this one and keep going
                   let Csv_TempImport$(Index)=Csv_TempImport$(Index)&","&Csv_TempImport$(Index+1)
                   ! Go through the rest of them and slide them up one

                   let fnMoveUpOne(Index+1,mat Csv_TempImport$)
                   let Index-=1 ! Slide back one to retest now that they're put together
                else
                   ! We need to read the next line and put it together and keep going.
                   linput #CsvFile: CSVImportString$ eof Ignore
                   if file(CsvFile)=0 then
                      mat CSV_Temp$=("")
                      if CSVImportString$="" then let CSVImportString$=" " ! Need at least one space.
                      str2mat(CSVImportString$,mat CSV_Temp$,Delimiter$)

                      let TopIndex=udim(mat Csv_TempImport$)
                      mat Csv_TempImport$(TopIndex+udim(mat Csv_Temp$))
                      mat Csv_TempImport$(TopIndex+1 : udim(mat Csv_TempImport$))=Csv_Temp$

                      let Csv_TempImport$(Index)=Csv_TempImport$(Index)&hex$("0A0D")&Csv_TempImport$(Index+1)
                      let fnMoveUpOne(Index+1,mat Csv_TempImport$)

                      let Index-=1 ! Slide back one to retest now that they're put together
                   else
                      ! We hit the end of the file, and we're still in open quote territory .. what do we do here?
                      let EndOfFile=1
                   end if
                end if
             end if
          else
             ! We have no quotes, not a quoted string, keep going.
          end if
       loop until EndOfFile

       if ~EndOfFile then
          mat CsvFile$(udim(mat Csv_TempImport$))=Csv_TempImport$
          let fnReadCSVFile=1
       end if
    end if
 fnend
 
 def fnMoveUpOne(Index,Mat Array$;___,Jndex)
    for Jndex=Index to udim(mat Array$)-1
       let Array$(Jndex)=Array$(Jndex+1)
    next Jndex
    mat Array$(udim(mat Array$)-1)
 fnend
 
Susan Smith
Posts: 717
Joined: Sun Aug 10, 2008 4:24 am
Location: Southern California

Re: STR2MAT - Quote Processing

Post by Susan Smith »

Thanks Gabe. Before I saw your reply, I changed my program to use David's FNPARSETEXT routine. It handles all of the quotes beautifully so I just had to drop (one line) it in where I had previously had the STR2MAT command. It wasn't apparent to me, from the wiki description, exactly WHAT was in BR 4.3 and what was in BR 4.2 in terms of this command. So I couldn't tell if it didn't work because I didn't have the FLAG$ parameter set up, or I made some other mistake, or it wasn't in my BR version at all. Since I don't know what my data could like (there could be an embedded comma in a CSV file), it seems that it's probably not a good idea to try to use STR2MAT in BR 4.2 at all then...just in case it stumbles over an unexpected comma.

-- Susan
gordon
Posts: 358
Joined: Fri Apr 24, 2009 6:02 pm

Re: STR2MAT - Quote Processing

Post by gordon »

To elaborate a tag on Gabriel's comments.

I find it highly unusual to have newlines or CRLF combinations embedded in valid CSV data. However, the operative word is there is "valid". If you think your users will have CRLF characters in their data and you don't want such characters to denote end-of-input-line, then using Gabriel's function may be your best bet. It is possible to still use STR2MAT for such data, but it requires a bit of special processing and why bother if there is a tool for it.

Otherwise, for the normal case, the 4.3 "Q" flag is sufficient to handle things like embedded quotes and other embedded delimiters such as commas.
Susan Smith
Posts: 717
Joined: Sun Aug 10, 2008 4:24 am
Location: Southern California

Re: STR2MAT - Quote Processing

Post by Susan Smith »

Gordon,

In this particular case, I won't have to worry about embedded CR or LF because it's an export from another software package. But what I DO have to deal with is that I am using BR 4.2, so the "Q" flag is not available to me. Luckily, my alternate solution is working beautifully now, so I'm good. But for those who ARE using BR 4.3, they would have the ability to handle the parsing with STR2MAT and the "Q" flag.

-- Susan
GomezL
Posts: 258
Joined: Wed Apr 29, 2009 5:51 am
Contact:

Re: STR2MAT - Quote Processing

Post by GomezL »

CRLF in CSV files are used when a "Memo or Note Record" is included in the data. It does add an extra complication for processing CSV files.

BR 4.3, really does make it much easier to parse CSV data.

http://brwiki2.brulescorp.com/index.php ... _.284.3.29
http://brwiki2.brulescorp.com/index.php ... Processing

My recommendation is to create a function in BR4.2 that uses similar processing, so that one day, you can just "Update the function" to leverage the BR 4.3 functionality.
Gabriel
Posts: 412
Joined: Sun Aug 10, 2008 7:37 am
Location: Arlington, TX
Contact:

Re: STR2MAT - Quote Processing

Post by Gabriel »

Memo fields are common in exports from other programs, including Access and Excel (the two most common programs people export from in CSV format.)

In order to support CRLF you have to do complicated logic during which its very easy to also support Quotes.

The problem with CRLF is that it requires complicated logic around your read statement. Gordon could perhaps address this by updating LINPUT to intelligently check quotes but he probably has more important things to fix first, such as Local Variables, graphical glitches, unhandled exceptions, and many others.

So once you write your function to handle CRLF, you might as well just use that function everywhere.

The function I posted in this thread yesterday supports CRLF, supports Quotes and Embedded Commas, even under BR 4.2.

But its not a simple replacement for Str2mat because it can't be, because embedded CRLF requires special logic around the read statement.

Gabriel
gordon
Posts: 358
Joined: Fri Apr 24, 2009 6:02 pm

Re: STR2MAT - Quote Processing

Post by gordon »

I like the suggestion of changing LINPUT (with a retro OPTION of course) to read and append the next "line" if quotes are unbalanced.

That would make STR2MAT sufficient for all cases that I'm aware of.
GomezL
Posts: 258
Joined: Wed Apr 29, 2009 5:51 am
Contact:

Re: STR2MAT - Quote Processing

Post by GomezL »

There probably needs to be some type of option in the open statement, and of course the crlf needs to be retained in the line that is being read.
Post Reply