Accurate "non-deleted" record count

General development discussion.

Moderators: Susan Smith, admin, Gabriel

Mikhail
Posts: 87
Joined: Tue Jul 07, 2009 10:26 am
Location: Ukraine

Accurate "non-deleted" record count

Postby Mikhail » Mon Jan 11, 2016 8:52 am

I want to write a function to give me an ACCURATE record count in a keyed BR file, NOT counting deleted records. We know that LREC counts deleted records and we can't always perform a COPY -D to purge the deleted records in a multiuser environment, as the file might be locked. So instead, here is the function that counts the number of items in an INDEX file using KEYONLY. The number of items in the index file also happens to be the number of active records in the corresponding master file. My questions is, is there an easier and more importantly, FASTER way to do this?
(This function uses Gabriel Bakker's FileIO)

Code: Select all

dim form$(1)*255
!--------------------------------------------------------------------
def library fnrecordsinfile (layout$;___,keyform$,datafile,nextkey$*100,nextrec)
   datafile = fnopen(layout$, Mat Datarec$, Mat Datarec, mat form$,1,1)
   keyform$ = cform$("form C "&str$(kln(datafile))&",B 4")
   restore #datafile,search>='':
   for index=1 to lrec(datafile)
      read #datafile,using keyform$,keyonly: nextkey$,nextrec eof reccountdone
   next index
reccountdone:!
   close #datafile:
   fnrecordsinfile = index - 1
fnend
!--------------------------------------------------
def Fnopen(Filename$*255, Mat F$, Mat F, Mat Form$; Inputonly, Keynum, Dont_Sort_Subs, Path$*255, Mat Descr$, Mat Field_Widths,Supress_Prompt,Ignore_Errors,Suppress_Log,___,Index)
   library 'fileio' : Fnopenfile
   dim _Fileiosubs$(1)*800, _Loadedsubs$(1)*80
   Fnopen=Fnopenfile(Filename$, Mat F$, Mat F, Mat Form$, Inputonly, Keynum, Dont_Sort_Subs, Path$, Mat Descr$, Mat Field_Widths, Mat _Fileiosubs$, Supress_Prompt,Ignore_Errors,Program$,Suppress_Log)
   if Srch(_Loadedsubs$,Uprc$(Filename$))<=0 then : mat _Loadedsubs$(Udim(_Loadedsubs$)+1) : _Loadedsubs$(Udim(_Loadedsubs$))=Uprc$(Filename$) : for Index=1 to Udim(Mat _Fileiosubs$) : execute (_Fileiosubs$(Index)) : next Index
fnend
! --------------------------------------------------------------------

Gabriel
Posts: 371
Joined: Sun Aug 10, 2008 7:37 am
Location: Arlington, TX
Contact:

Re: Accurate "non-deleted" record count

Postby Gabriel » Mon Jan 11, 2016 8:57 am

Awesome!

I don't know of a faster way. But I like your function.

It looks like you're only using FileIO to open the file, so that you can specify a layout name instead of a file name when you call the function. I should probably include your function directly in FileIO in future releases.

But it occurs to me that, since you're not actually using any of the special functionality of fileio, it would be really easy to modify your function so that you can give it a file name instead, and then its no longer dependent on fileio.

Its also a lot faster to open a file directly then it is to use FileIO to open the file, in situations like this where you're not using the subscripts anyway. FileIO can be slow to open files, but its usually worth it (when you're really reading the file and using the file subscripts) because of the programming and maintenance and support capabilities that it provides you.

Gabriel

Mikhail
Posts: 87
Joined: Tue Jul 07, 2009 10:26 am
Location: Ukraine

Re: Accurate "non-deleted" record count

Postby Mikhail » Mon Jan 11, 2016 9:54 am

You mean like this? It DID turn out to be faster...

Code: Select all

!--------------------------------------------------------------------
def library fnrecordsinfile (flname$,kfname$;___,keyform$,datafile,nextkey$*100,nextrec)
   library 'fileio' : Fngetfilenumber
   open #datafile:=Fngetfilenumber:'name='&flname$&',kfname='&kfname$&',shr',internal,input,keyed
   keyform$ = cform$("form C "&str$(kln(datafile))&",B 4")
   restore #datafile,search>='':
   for index=1 to lrec(datafile)
      read #datafile,using keyform$,keyonly: nextkey$,nextrec eof reccountdone
   next index
reccountdone:!
   close #datafile:
   fnrecordsinfile = index - 1
fnend
!--------------------------------------------------

Gabriel
Posts: 371
Joined: Sun Aug 10, 2008 7:37 am
Location: Arlington, TX
Contact:

Re: Accurate "non-deleted" record count

Postby Gabriel » Mon Jan 11, 2016 9:56 am

Exactly..

Though I see why you wanted to do it using the layout name instead of having to pass both a key and a master file name, and you also still need fnGetFileNumber.

When I add it to FileIO, i'll add both versions.

Gabriel

Mikhail
Posts: 87
Joined: Tue Jul 07, 2009 10:26 am
Location: Ukraine

Re: Accurate "non-deleted" record count

Postby Mikhail » Mon Jan 11, 2016 10:22 am

There are 2 problems with this function:
1) it only works with keyed files (but that's OK for me because most files I work with are keyed)
2) The way I generate the form statement assumes that the key spec is 'C', so if the key is actually N or B, or some other spec, it won't work I think.

Code: Select all

"form C "&str$(kln(datafile))&",B 4"

John
Posts: 523
Joined: Sun Apr 26, 2009 8:27 am
Location: West Orange, NJ
Contact:

Re: Accurate "non-deleted" record count

Postby John » Mon Jan 11, 2016 10:28 am

wouldn't something like this be faster, simply because it isn't using the keyed file at all?

Code: Select all

open #h:=1: 'name=that_file,shr',input,internal,relative
real_record_count=0
lrec_it=lrec(h)
for item=1 to lrec_it
  read #h,rec,rec=item: norec whoops eof eof_1
  real_record_count+=1
  whoops: !
next item
eof_1: !
print real_record_count
John Bowman

Gabriel
Posts: 371
Joined: Sun Aug 10, 2008 7:37 am
Location: Arlington, TX
Contact:

Re: Accurate "non-deleted" record count

Postby Gabriel » Mon Jan 11, 2016 10:46 am

John,

Mikhail was reading ONLY the key, not the primary file. Notice his use of the KEYONLY specification which I must admit I didn't know about before this week.

Which you would think would be faster, considering key files are much smaller then data files.

However .....


Mikhail,

you say it really is faster reading the data file instead of the key? Or did you mean that its faster opening the file without fileio?


If it is indeed faster reading the data file then it is reading the key, you can make your function even faster still by opening it "relative" instead of "keyed". You're not using the key and you're not changing the datafile, so you don't need the key. Then your function becomes simpler to call, and also faster.

By the way, both FileIO and ScreenIO open files relative when they're trying to read the entire file, regardless of weather there is a key or not, because its MUCH faster to do it that way (maybe twice as fast).


Also, I notice you're saying:

Code: Select all

restore #datafile,search>="":


why not just say:

Code: Select all

Restore #datafile:


Doesn't that do the same thing?


Also:

You're not actually using the lrec .. you might as well just make that a do loop with a counter that ends when you hit EOF.

Also, you're not actually doing anything with "nextkey$" so change it to "read #datafile, using "form X 1" : eof reccountdone.

Or if you use Johns method, use "restore, rec=" instead of reading the file at all, it will be even faster still.

Gabriel



Mikhail wrote:You mean like this? It DID turn out to be faster...

Code: Select all

!--------------------------------------------------------------------
def library fnrecordsinfile (flname$,kfname$;___,keyform$,datafile,nextkey$*100,nextrec)
   library 'fileio' : Fngetfilenumber
   open #datafile:=Fngetfilenumber:'name='&flname$&',kfname='&kfname$&',shr',internal,input,keyed
   keyform$ = cform$("form C "&str$(kln(datafile))&",B 4")
   restore #datafile,search>='':
   for index=1 to lrec(datafile)
      read #datafile,using keyform$,keyonly: nextkey$,nextrec eof reccountdone
   next index
reccountdone:!
   close #datafile:
   fnrecordsinfile = index - 1
fnend
!--------------------------------------------------

Gabriel
Posts: 371
Joined: Sun Aug 10, 2008 7:37 am
Location: Arlington, TX
Contact:

Re: Accurate "non-deleted" record count

Postby Gabriel » Mon Jan 11, 2016 10:47 am

PS. I haven't tried the trick of reading nothing but I assume it would be faster. If that doesn't work, try reading a single character or something, which would still be faster then reading a bunch of characters.

gordon
Posts: 309
Joined: Fri Apr 24, 2009 6:02 pm

Re: Accurate "non-deleted" record count

Postby gordon » Mon Jan 11, 2016 11:25 am

I suggest opening the master file INPUT relative or sequential, reading it with no form statement and no variables in the IO list, and counting records until end of file. The record length is irrelevant to the speed.

Mikhail
Posts: 87
Joined: Tue Jul 07, 2009 10:26 am
Location: Ukraine

Re: Accurate "non-deleted" record count

Postby Mikhail » Mon Jan 11, 2016 11:52 am

I tried both John's approach and Gordon's approach.
They both work and they both took about 0.04 seconds for a 50 MB file with 35000 records.

Both approaches are MUCH faster than reading it the way I was doing it with KEYONLY - that took 0.4 seconds, 10 times slower!

I guess using KEYONLY makes sense if I need to use the values of the keys and records numbers associated with those keys. But since in this function we don't need the keys or the record numbers, KEYONLY is not the way to go...

GomezL
Posts: 232
Joined: Wed Apr 29, 2009 5:51 am
Contact:

Re: Accurate "non-deleted" record count

Postby GomezL » Mon Jan 11, 2016 4:47 pm

Just for fun, I wrote this function.

(I hard coded the file handle to 99)
I hard coded line 120 "Your_File.Int".


Total Records 1,797,879
Total Time 7.374938 seconds

Comes out to 243,782 records per second.

Something about time tests, you only get accurate tests the 1st time, the 2nd time I ran it, it took 2.8321784 seconds. (This result was consistant).

Code: Select all

00100   PRINT Newpgae
00110   LET Stime=Timer
00120   LET Total_Records=Fnrecordsinfile("ACTIVE.INT//6")
00130   LET Etime=Timer
00135   PRINT "Total Records:",Total_Records
00140   PRINT "Total Time:",Etime-Stime
01000   DEF Fnrecordsinfile(Flname$*256)
01010     DIM Temp_Index$*256
01020     LET Temp_Index$=Env$("TEMP")&"\RecordsInFile.[session]"
01030     EXECUTE "*Index "&Flname$&" "&Temp_Index$&" 1 1 REPLACE ISAM DUPKEYS SHR -n"
01040     OPEN #99: "NAME="&Temp_Index$,INTERNAL,OUTIN,RELATIVE
01050     LET Fnrecordsinfile=Lrec(99)
01060     CLOSE #99,FREE:
01070   FNEND
Attachments
F.DEV.CLSINC.DEV.FNRECORDSINFILE.wb-572.brs
(598 Bytes) Downloaded 37 times
Last edited by GomezL on Mon Jan 11, 2016 4:55 pm, edited 1 time in total.


Return to “General Development”

Who is online

Users browsing this forum: No registered users and 2 guests

cron