Years ago (ie, 1999), I wrote a custom tag called CF_TAGSTRIPPER that would strip HTML tags from a source string, optionally preserving specific tags.
A few years ago, I converted it to a UDF and I've used it in a variety of my applications to strip out all but a few tags - like if I wanted to allow users to use bold and italic tags.
Recently, I've had a desire to strip out only certain tags, so I modified my tagStripper function to perform this task as well. Here's the code. Please let me know if you see anything wrong with it!
There are two versions of this UDF - one for CFMX and one for CF 5. They are functionally the same, so use whichever you prefer.
CFMX Version using CFFUNCTION tag:
<cffunction name="tagStripper" access="public" output="no" returntype="string">
<cfargument name="source" required="YES" type="string">
<cfargument name="action" required="No" type="string" default="strip">
<cfargument name="tagList" required="no" type="string" default="">
<!---
source = string variable
This is the string to be modified
action = "preserve" or "strip"
This function will either strip all tags except
those specified in the tagList argument, or it will
preserve all tags except those in the taglist argument.
The default action is "strip"
tagList = string variable
This argument contains a comma separated list of tags to be excluded from
the action. If the action is "strip", then these tags won't be stripped.
If the action os "preserve", then these tags won't be preserved (ie, only
these tags will be stripped)
EXAMPLE
tagStripper(myString,"strip","b,i")
This invocation will strip all html tags except for
<b></b> and <i></i>
--->
<cfscript>
var str = arguments.source;
var i = 1;
if (trim(lcase(action)) eq "preserve")
{
// strip only the exclusions
for (i=1;i lte listlen(arguments.tagList); i = i + 1)
{
tag = listGetAt(tagList,i);
str = REReplaceNoCase(str,"</?#tag#.*?>","","ALL");
}
} else {
// if there are exclusions, mark them with NOSTRIP
if (tagList neq "")
{
for (i=1;i lte listlen(tagList); i = i + 1)
{
tag = listGetAt(tagList,i);
str = REReplaceNoCase(str,"<(/?#tag#.*?)>","___TEMP___NOSTRIP___\1___TEMP___ENDNOSTRIP___","ALL");
}
}
str = reReplaceNoCase(str,"</?[A-Z].*?>","","ALL");
// convert excluded tags back to normal
str = replace(str,"___TEMP___NOSTRIP___","<","ALL");
str = replace(str,"___TEMP___ENDNOSTRIP___",">","ALL");
}
return str;
</cfscript>
</cffunction>
Coldfusion 5 Compatible Version:
<cfscript>
function tagStripper(str)
{
var i = 1;
var action = 'strip';
var tagList = '';
if (ArrayLen(arguments) gt 1 and lcase(arguments[2]) eq 'preserve')
{
action = 'preserve';
}
if (ArrayLen(arguments) gt 2)
{
tagList = arguments[3];
}
if (trim(lcase(action)) eq "preserve")
{
// strip only those tags in the tagList argument
for (i=1;i lte listlen(tagList); i = i + 1)
{
tag = listGetAt(tagList,i);
str = REReplaceNoCase(str,"</?#tag#.*?>","","ALL");
}
} else {
// strip all, except those in the tagList argument
// if there are exclusions, mark them with NOSTRIP
if (tagList neq "")
{
for (i=1;i lte listlen(tagList); i = i + 1)
{
tag = listGetAt(tagList,i);
str = REReplaceNoCase(str,"<(/?#tag#.*?)>","___TEMP___NOSTRIP___\1___TEMP___ENDNOSTRIP___","ALL");
}
}
// strip all remaining tsgs. This does NOT strip comments
str = reReplaceNoCase(str,"</?[A-Z].*?>","","ALL");
// convert unstripped back to normal
str = replace(str,"___TEMP___NOSTRIP___","<","ALL");
str = replace(str,"___TEMP___ENDNOSTRIP___",">","ALL");
}
return str;
}
</cfscript>
You are not logged in, so your subscription status for this entry is unknown. You can login or register here.
The same applies for quoted HTML attributes containing > characters - however I'm less concerned about that.
Neither issue would allow unwanted tags to get through the stripping process - but would cause undesirable output probably.
Thanks to Ben Nadel for pointing out the nested comment issue.
var tag = "";
// strip all remaining tags. This does NOT strip comments
str = reReplaceNoCase(str,"<[A-Z].*?>","","ALL");
str = reReplaceNoCase(str,"<\ */\ *[A-Z].*?>","","ALL");
str = reReplaceNoCase(str,"<\ *[A-Z].*?>","","ALL");
str = reReplaceNoCase(str,"<\ */\ *[A-Z].*?>","","ALL");
You can see that I modded the first line as well in case people put some whitespace between the opening of the tag and the actual tag text.
So, first line stays the same, and second line is added in ours without accounting for a space(s) after the "<". Thanks for the HTML for noobs lesson.
I'm dreading that day!!
This is great, but it's still not working for me with the closing html tags. I did add your code: str = reReplaceNoCase(str,"<\ *[A-Z].*?>","","ALL");
str = reReplaceNoCase(str,"<\ */\ *[A-Z].*?>","","ALL");
but still no luck.
if (3 < a > 5 ) {
print 'what math world are you living in?';
}
So you're saying tagStripper isn't strippping the tag?
Hmmm....
// strip all remaining tsgs. This does NOT strip comments
str = reReplaceNoCase(str,"?[A-Z].*?>","","ALL");
REReplaceNoCase(str,"<[^>]*>","","ALL");
This should delete all html content and comments. In instances where you may have content that you want outputted like as rickRoot used in the case where he has "< a >" in an equation than the user should be entering the < and > html symbols as the symbols < and > are now depreciated in html unless they are being used as part of an html tag.
And asking users to type in a html entity isn't very user friendly. Most people on the internet don't know what an html entity is, but perhaps usability isn't your concern.
(Hmm.. does this work)
3 < 5 > 4
Post a comment (login required)
