My Projects
Search Blog

Categories
Archives
Useful Links
Photo Albums
RSS

Powered by
BlogCFM v1.15

Vivio Technologies Dedicated Hosting
28 November 2006
Using coldfusion to strip some or all HTML tags from a string

Years ago (ie, 1999), I wrote a custom tag called CF_TAGSTRIPPER that would strip HTML tags from a source string, optionally preserving specific tags.

A few years ago, I converted it to a UDF and I've used it in a variety of my applications to strip out all but a few tags - like if I wanted to allow users to use bold and italic tags.

Recently, I've had a desire to strip out only certain tags, so I modified my tagStripper function to perform this task as well.  Here's the code.  Please let me know if you see anything wrong with it!

There are two versions of this UDF - one for CFMX and one for CF 5.  They are functionally the same, so use whichever you prefer.

CFMX Version using CFFUNCTION tag:

<cffunction name="tagStripper" access="public" output="no" returntype="string">
    <cfargument name="source" required="YES" type="string">
    <cfargument name="action" required="No" type="string" default="strip">
    <cfargument name="tagList" required="no" type="string" default="">
   
<!---
    source = string variable
        This is the string to be modified
       
    action = "preserve" or "strip"
        This function will either strip all tags except
        those specified in the tagList argument, or it will
        preserve all tags except those in the taglist argument.
        The default action is "strip"

    tagList = string variable
        This argument contains a comma separated list of tags to be excluded from
        the action.  If the action is "strip", then these tags won't be stripped.
        If the action os "preserve", then these tags won't be preserved (ie, only
        these tags will be stripped)
       
    EXAMPLE
   
    tagStripper(myString,"strip","b,i")
   
    This invocation will strip all html tags except for
    <b></b> and <i></i>
--->
    <cfscript>
    var str = arguments.source;
    var i = 1;
   
    if (trim(lcase(action)) eq "preserve")
    {
        // strip only the exclusions
        for (i=1;i lte listlen(arguments.tagList); i = i + 1)
        {
            tag = listGetAt(tagList,i);
            str = REReplaceNoCase(str,"</?#tag#.*?>","","ALL");
        }
    } else {
        // if there are exclusions, mark them with NOSTRIP
        if (tagList neq "")
        {
            for (i=1;i lte listlen(tagList); i = i + 1)
            {
                tag = listGetAt(tagList,i);
                str = REReplaceNoCase(str,"<(/?#tag#.*?)>","___TEMP___NOSTRIP___\1___TEMP___ENDNOSTRIP___","ALL");
            }
        }
        str = reReplaceNoCase(str,"</?[A-Z].*?>","","ALL");
        // convert excluded tags back to normal
        str = replace(str,"___TEMP___NOSTRIP___","<","ALL");
        str = replace(str,"___TEMP___ENDNOSTRIP___",">","ALL");
    }
   
    return str;   
    </cfscript>
</cffunction>

Coldfusion 5 Compatible Version:

<cfscript>
function tagStripper(str)
{
    var i = 1;
    var action = 'strip';
    var tagList = '';
   
    if (ArrayLen(arguments) gt 1 and lcase(arguments[2]) eq 'preserve')
    {
        action = 'preserve';
    }
    if (ArrayLen(arguments) gt 2)
    {
        tagList = arguments[3];
    }

    if (trim(lcase(action)) eq "preserve")
    {
        // strip only those tags in the tagList argument
        for (i=1;i lte listlen(tagList); i = i + 1)
        {
            tag = listGetAt(tagList,i);
            str = REReplaceNoCase(str,"</?#tag#.*?>","","ALL");
        }
    } else {
        // strip all, except those in the tagList argument
        // if there are exclusions, mark them with NOSTRIP
        if (tagList neq "")
        {
            for (i=1;i lte listlen(tagList); i = i + 1)
            {
                tag = listGetAt(tagList,i);
                str = REReplaceNoCase(str,"<(/?#tag#.*?)>","___TEMP___NOSTRIP___\1___TEMP___ENDNOSTRIP___","ALL");
            }
        }
        // strip all remaining tsgs.  This does NOT strip comments
        str = reReplaceNoCase(str,"</?[A-Z].*?>","","ALL");
        // convert unstripped back to normal
        str = replace(str,"___TEMP___NOSTRIP___","<","ALL");
        str = replace(str,"___TEMP___ENDNOSTRIP___",">","ALL");
    }
   
    return str;   
}
</cfscript>

 

Posted by rickroot at 3:45 PM | Link | 12 comments
Subscription Options

You are not logged in, so your subscription status for this entry is unknown. You can login or register here.

Re: Using coldfusion to strip some or all HTML tags from a string
I had originally posted code that also stripped (or preserved) comments. However, because HTML comments can be nested, it would be unreliable. Not dangerous, but unreliable.

The same applies for quoted HTML attributes containing > characters - however I'm less concerned about that.

Neither issue would allow unwanted tags to get through the stripping process - but would cause undesirable output probably.

Thanks to Ben Nadel for pointing out the nested comment issue.
Posted by rickroot on November 28, 2006 at 5:27 PM

Re: Using coldfusion to strip some or all HTML tags from a string
In order to ensure thread safety, I suggest adding the following to the top of the method:

var tag = "";
Posted by dshuck on November 23, 2007 at 11:28 PM

Re: Using coldfusion to strip some or all HTML tags from a string
Thanks!
Posted by rickroot on November 28, 2007 at 8:49 AM

Re: Using coldfusion to strip some or all HTML tags from a string
One other thing I might mention. We are using this on the new version of InstantSpot that we will be releasing in a few weeks. In addition to that threading concern, I found that the fact that the tag leaves closing html was allowing the content to break out of containers and kill formatting. I added an additional line to remove them. Here is what ours looks like now:
// strip all remaining tags. This does NOT strip comments
str = reReplaceNoCase(str,"<[A-Z].*?>","","ALL");
str = reReplaceNoCase(str,"<\ */\ *[A-Z].*?>","","ALL");
Posted by dshuck on November 28, 2007 at 9:08 AM

Re: Using coldfusion to strip some or all HTML tags from a string
oops... sorry that should be:
str = reReplaceNoCase(str,"<\ *[A-Z].*?>","","ALL");
str = reReplaceNoCase(str,"<\ */\ *[A-Z].*?>","","ALL");

You can see that I modded the first line as well in case people put some whitespace between the opening of the tag and the actual tag text.
Posted by dshuck on November 28, 2007 at 9:12 AM

Re: Using coldfusion to strip some or all HTML tags from a string
That's interesting, but I don't think any browsers will render "< b>" as an html tag.
Posted by rickroot on November 28, 2007 at 1:05 PM

Re: Using coldfusion to strip some or all HTML tags from a string
Hmm... looks like you are right. Please let it be noted that I just made my first mistake ever! :)

So, first line stays the same, and second line is added in ours without accounting for a space(s) after the "<". Thanks for the HTML for noobs lesson.
Posted by dshuck on November 28, 2007 at 1:15 PM

Re: Using coldfusion to strip some or all HTML tags from a string
We all have to face the time where we make our first mistake ever at some point ;)

I'm dreading that day!!
Posted by rickroot on November 29, 2007 at 6:24 AM

Re: Using coldfusion to strip some or all HTML tags from a string
Hi,

This is great, but it's still not working for me with the closing html tags. I did add your code: str = reReplaceNoCase(str,"<\ *[A-Z].*?>","","ALL");
str = reReplaceNoCase(str,"<\ */\ *[A-Z].*?>","","ALL");
but still no luck.
Posted by whoey on February 17, 2008 at 7:09 PM

Re: Using coldfusion to strip some or all HTML tags from a string
You definately don't want to do those two lines of code that dshuck suggested because it would cause the editor to strip things that would NEVER be treated as html anyway, like

if (3 < a > 5 ) {
print 'what math world are you living in?';
}

So you're saying tagStripper isn't strippping the tag?

Hmmm....
Posted by rickroot on February 17, 2008 at 8:11 PM

Re: Using coldfusion to strip some or all HTML tags from a string
(it certainly seems to be!) Cuz I typed it in there between "the" and "tag", and I think blogcfm uses the tagStripper function.
Posted by rickroot on February 17, 2008 at 8:12 PM

Re: Using coldfusion to strip some or all HTML tags from a string
Oh hell, never mind here's the fix.

// strip all remaining tsgs. This does NOT strip comments
str = reReplaceNoCase(str,"","","ALL");
Posted by rickroot on February 17, 2008 at 8:17 PM

Post a comment (login required)