In this article we will explore how to safely escape and render html characters while preserving special characters like ampersand(&) using strip_tags and Loofah.

Recently, I was working on a legacy project, displaying some old data stored in the db in some new views.

The data stored in the db came from a legacy WYSIWYG editor, and had not been sanitised before saving. It had a mixture of sanitised html tags as well as raw tags.

To render the content in a safe way, I had to come up with a custom Sanitizer on top of ActionView helpers and Loofah.
To give an example, the following text:

text = "<p class='test'>Someone hacked Terms & Conditions with <script>alert('hello')</script> &amp; &lt;script&gt;alert('hi')&lt;script&gt;</p>

If we use strip tags from ActionView we get the following:

strip_tags(text)
# => "Someone hacked Terms &amp; Conditions with  &amp; &lt;script&gt;alert('hi')&lt;script&gt;"

But notice that the & also got escaped here, and I needed a way to render & without being escaped.

Loofah to the rescue

After doing some research, I figured out that Loofah can be used. So for the same example:

text = "<p class='test'>Someone hacked Terms & Conditions with <script>alert('hello')</script> &amp; &lt;script&gt;alert('hi')&lt;script&gt;</p>"

Loofah.fragment(text).text(encode_special_chars: false)
=> "Someone hacked Terms & Conditions with alert('hello') & <script>alert('hi')<script>"

Works like a charm! or does it?

On first look it seems to do what I wanted, but there was a big XSS security issue with the above. Setting encode_special_chars to false means that it will also not encode other special characters.

eg: &lt;script&gt;

Solution

strip_tags from ActionView was stripping out the html tags perfectly and helper from Loofah was perfectly giving back the text. If only there was a way to combine both.

You wish.!

The solution that I figured out was to combine strip_tags and Loofah helper.

So for our original text:

text = "<p class='test'>Someone hacked Terms & Conditions with <script>alert('hello')</script> &amp; &lt;script&gt;alert('hi')&lt;script&gt;</p>"


# text = "<p class='test'>Someone hacked Terms & Conditions with <script>alert('hello')</script> &amp; &lt;script&gt;alert('hi')&lt;script&gt;</p>"
# text = strip_tags(text)
# => "Someone hacked Terms &amp; Conditions with  &amp; &lt;script&gt;alert('hi')&lt;script&gt;"
# text = to_text(text)
# => "Someone hacked Terms & Conditions with  & <script>alert('hi')<script>"
# text = strip_tags(text)
# => "Someone hacked Terms &amp; Conditions with  &amp; "
# text = to_text(text)
# => "Someone hacked Terms & Conditions with  & "

While I am not sure if it's the most elegant solution, this seems to do the trick for me.

To make it a bit more extensible,

module Extentions
  class RecursiveSanitizer
    include ActionView::Helpers::SanitizeHelper

    attr_accessor :text

    def initialize(text)
      @text = text
    end

    def sanitize!
      to_text(strip_tags(to_text(strip_tags(text))))
    end

    private

    def to_text(text)
      Loofah.fragment(text).text(encode_special_chars: false)
    end
  end
end

and then you can call the following in your code:

custom_sanitizer = Extensions::RecursiveSanitizer.new(text)
custom_sanitizer.sanitize!

That's it. Enjoy and happy coding.!