I'm presenting tomorrow at CFinNC in Raleigh, and I've uploaded the presentation and examples here:
In an effort to resolve memory and performance problems with generating large CSV and tab delimited files in an application I wrote at Duke, I started hunting around for solutions.
Initially, I was using the java stringbuffer method, but found that it's really hard to be sure that CF doesn't use String objects, especially when doing things like calling out to an external function to perform formatting (ie, if the field is a string, then surround it with double quotes and escape any internal double quotes).
A simple file drop of 7200 rows and 140 columns took 68 seconds and sucked a lot of memory. And no, it wasn't the file writing that caused the problem, it was the call out to the formatting function.
If I performed the same drop using the tab delimited format, I didn't have to call out to that function, but the drop still took 30 seconds. I needed it to be faster because some of the drops my users perform are much much larger.
so I started hunting around for a java-based solution and found the JavaCSV library:
- Java CSV Homepage:
http://www.csvreader.com/java_csv.php - JavaCSV Sourceforge Project Page:
http://sourceforge.net/projects/javacsv/
After installing this library in my C:\Jrun4\servers\myInstance\cfusion.ear\cfusion.ear\WEB-INF/cfusion/lib directory and restarting Coldfusion, I was able to use the following code to generate my CSV files:
<cfset var fileOutput = createObject("java","com.csvreader.CsvWriter")>
<cfset fileOutput.init("#expandPath("..")#\drops\#filename#")>
<cfif format eq "TAB">
<cfset fileOutput.setDelimiter( javacast("char", " ") )>
</cfif>
<!--- write header --->
<cfloop from="1" to="#numFields#" index="i" step="1">
<cfset fileOutput.write( fieldsArray[i] ) >
</cfloop>
<!--- end of header row --->
<cfset fileOutput.endRecord()>
<!--- loop through results --->
<cfloop query="resultSet">
<!--- write record --->
<cfloop from="1" to="#numFields#" index="i" step="1">
<cfset fileOutput.write( resultSet[fieldsArray[i]][resultSet.currentRow].toString() )>
</cfloop>
<!--- write end of record --->
<cfset fileOutput.endRecord()>
</cfloop>
<cfset fileOutput.close()>
The same drop which had previously taken 68 seconds now only took 18 seconds - AND used considerably less memory.
As you can see, the code handles both CSV and tab-delimited formats AND handles the proper escaping of strings containing delimiters as well.
I was trying to speed up a process of mine that involves a large amount of string concatenation.
In other words, <cfset str = str & someOtherString>
Turns out there's a much faster way. In my case, I was dynamically generating an data file from a query. Each row had let's say 50 fields. Generation of each row required 3 string concatenations at a time (maybe more, depending on how it worked when compiled to the servlet). Do a drop of 5000 records, and well, that's a lot of string concats! 750,000 at a minimum.
Someone on cf-talk suggested using a StringBuffer might be wise because Coldfusion probably creates a new string object every time you do <cfset foo = foo & foo2>. So I decided to write some code to test it out.
Sure enough, using the java StringBuffer class is quite a lot faster. On my local machine, I ran the following code (see below). The CFML method completed in 89 seconds, while the java StringBuffer method completed in only 28 seconds - about 1/3 of the time!
Here's the code, if you're interested:
<cfset a = "">
<cfset starttime = now()>
<cfset cnt = 0>
<cfloop index="i" from="1" to="2000000">
<cfset a = a & "This is some crazy stuff. ">
<cfset cnt = cnt + 26>
<cfif i mod 1000 is 0>
<!--- write to file and reset string to empty --->
<cfset a = "">
</cfif>
</cfloop>
<cfoutput><p>Done with #cnt# bytes in #abs(dateDiff('s',Now(),starttime))# seconds.</p></cfoutput>
<cfset a = createObject("java","java.lang.StringBuffer")>
<cfset starttime = now()>
<cfset cnt = 0>
<cfloop index="i" from="1" to="2000000">
<cfset a.append("This is some crazy stuff. ")>
<cfset cnt = cnt + 26>
<cfif i mod 10000 is 0>
<!--- write to file and reset string to empty --->
<cfset a = createObject("java","java.lang.StringBuffer")>
</cfif>
</cfloop>
<cfoutput><p>Done with #cnt# bytes in #abs(dateDiff('s',Now(),starttime))# seconds.</p></cfoutput>
A question was posted on the cf-talk list (thread) about reading large files with CFFILE and problems they were having.
I suggested trying java to read the large file line by line and I posted the following code:
<cfsetting showdebugoutput="Yes">
<cfscript>
cnt = 0;
// large text file, 4MB, 80,000+ lines
srcFile = "E:\Inetpub\wwwroot\tools\mass_email\list.dat";
// create a FileReader object
fr = createObject("java","java.io.FileReader");
// Call the constructure with the source file path
fr.init(srcFile);
// create a BufferedReader object
br = createObject("java","java.io.BufferedReader");
// call the constructor with the FileReader as the arg
br.init(fr);
// read the first line
str = br.readLine();
// loop ... str will be undefined if there are no more lines
while (isDefined("str")) {
// do stuff with the string
cnt = cnt + 1;
// read the next line so we can continue the loop
str = br.readLine();
}
// close the buffered reader object
br.close();
writeOutput(cnt);
</cfscript>
The code above was tested on CFMX 7 and it does work. On my server, it consistently returns the results in about 400ms (ranging between 350ms and 500ms).
In order to compare, I wrote some CFML code that does essentially the same thing using CFFILE and looping through the file content as a list with chr(10) as the delimiter.
The CFFILE route was slower and much more erratic, ranging from 450ms to over 2000ms - probably averaging 1400ms in the 20-30 times I reloaded the page.
So if you're reading a large file and doing line by line processing - consider using native java rather than CFFILE.
