My Projects
Search Blog

Categories
Archives
Useful Links
Photo Albums
RSS

Powered by
BlogCFM v1.15

Vivio Technologies CFML Hosting
04 June 2008
JavaCSV for creating large CSV and other delmiited files with Coldfusion

In an effort to resolve memory and performance problems with generating large CSV and tab delimited files in an application I wrote at Duke, I started hunting around for solutions.

Initially, I was using the java stringbuffer method, but found that it's really hard to be sure that CF doesn't use String objects, especially when doing things like calling out to an external function to perform formatting (ie, if the field is a string, then surround it with double quotes and escape any internal double quotes).

A simple file drop of 7200 rows and 140 columns took 68 seconds and sucked a lot of memory.  And no, it wasn't the file writing that caused the problem, it was the call out to the formatting function.

If I performed the same drop using the tab delimited format, I didn't have to call out to that function, but the drop still took 30 seconds.  I needed it to be faster because some of the drops my users perform are much much larger.

so I started hunting around for a java-based solution and found the JavaCSV library:

After installing this library in my C:\Jrun4\servers\myInstance\cfusion.ear\cfusion.ear\WEB-INF/cfusion/lib directory and restarting Coldfusion, I was able to use the following code to generate my CSV files:

<cfset var fileOutput = createObject("java","com.csvreader.CsvWriter")>

<cfset fileOutput.init("#expandPath("..")#\drops\#filename#")>

<cfif format eq "TAB">
     <cfset fileOutput.setDelimiter( javacast("char", "     ") )>
</cfif>

<!--- write header --->
<cfloop from="1" to="#numFields#" index="i" step="1">
     <cfset fileOutput.write( fieldsArray[i] ) >
</cfloop>
<!--- end of header row --->
<cfset fileOutput.endRecord()>
<!--- loop through results --->
<cfloop query="resultSet">
     <!--- write record --->
     <cfloop from="1" to="#numFields#" index="i" step="1">
          <cfset fileOutput.write( resultSet[fieldsArray[i]][resultSet.currentRow].toString() )>
     </cfloop>
     <!--- write end of record --->
     <cfset fileOutput.endRecord()>
</cfloop>
<cfset fileOutput.close()>

The same drop which had previously taken 68 seconds now only took 18 seconds - AND used considerably less memory.

As you can see, the code handles both CSV and tab-delimited formats AND handles the proper escaping of strings containing delimiters as well.

Posted by rickroot at 8:03 AM | Link | 1 comment
03 June 2008
cfsavecontent vs. cfset for performance improvement

Many CF programmers out there know that coldfusion uses java string objects to store its variables usually.  And since java strings are "immutable", every time you change it, a new string is created.

If you find yourself doing huge amounts of string concatenations, you'll often see people suggesting that you look up the java StringBuffer object and use that instead.  That would allow you to append to a single StringBuffer object rather than creation a million string objects.

But there's another solution, apparently.

CFSAVECONTENT is so ridiculously fast compared to the old string concatenation method with CFSET that it has got to be using a StringBuffer behind the scenes.  At least, that's what I'm thinking.

Take the following code, for example.  On my local machine, the CFSET method took 64 seconds to complete.  The CFSAVECONTENT method completed in a mere 203ms.

Also, the memory consumption of the CFSET method was significant, while the CFSAVECONTENT method was hardly noticeable.

 

<cfsetting enablecfoutputonly="yes">
<cfsetting requesttimeout="600">
<cfset reps = 100000>

<cfif 1>
     <cfset start = now().gettime()>
     <cfset result = "">
     <cfloop from="1" to="#reps#" step="1" index="i">
          <cfset result = result & i>
     </cfloop>
     <cfset end = now().gettime()>
     <cfoutput><p>#end-start#ms : #len(result)#</p></cfoutput>
<cfelse>
     <cfset start = now().gettime()>
     <cfsavecontent variable="result">
     <cfloop from="1" to="#reps#" step="1" index="i">
          <cfoutput>#i#</cfoutput>
     </cfloop>
     </cfsavecontent>
     <cfset end = now().gettime()>
     <cfoutput><p>#end-start#ms : #len(result)#</p></cfoutput>
</cfif>


 

 

Posted by rickroot at 1:23 PM | Link | 1 comment
18 July 2006
Asynchronous HTTP from Coldfusion
Ever wanted to make an HTTP request but you didn't really care whether or not it returned anything successfully?

You can't do that with CFHTTP... well, you can, but you would have to still wait for it to finish before your page continued - it's not asynchronous.

Well, your prayers are answered.  Mark Mandel justed posted his AsyncHTTP package that uses existing java classes to perform asynchronous http GET and POST operations.

You can find out more here:
http://www.compoundtheory.com/?action=asynchttp.index
Posted by rickroot at 6:06 PM | Link | 0 comments
12 July 2006
Reading large files with java versus CFFILE

A question was posted on the cf-talk list (thread) about reading large files with CFFILE and problems they were having.

I suggested trying java to read the large file line by line and I posted the following code:

<cfsetting showdebugoutput="Yes">
<cfscript>
 cnt = 0;
 // large text file, 4MB, 80,000+ lines
 srcFile = "E:\Inetpub\wwwroot\tools\mass_email\list.dat";
 // create a FileReader object
 fr = createObject("java","java.io.FileReader");
 // Call the constructure with the source file path
 fr.init(srcFile);
 // create a BufferedReader object
 br = createObject("java","java.io.BufferedReader");
 // call the constructor with the FileReader as the arg
 br.init(fr);
 // read the first line
 str = br.readLine();
 // loop ... str will be undefined if there are no more lines
 while (isDefined("str")) {
  // do stuff with the string
  cnt = cnt + 1;
  // read the next line so we can continue the loop
  str = br.readLine();
 }
 // close the buffered reader object
 br.close();
 writeOutput(cnt);
</cfscript>

 

The code above was tested on CFMX 7 and it does work.  On my server, it consistently returns the results in about 400ms (ranging between 350ms and 500ms).

In order to compare, I wrote some CFML code that does essentially the same thing using CFFILE and looping through the file content as a list with chr(10) as the delimiter.

The CFFILE route was slower and much more erratic, ranging from 450ms to over 2000ms - probably averaging 1400ms in the 20-30 times I reloaded the page.

So if you're reading a large file and doing line by line processing - consider using native java rather than CFFILE.

Posted by rickroot at 2:15 PM | Link | 1 comment