How to implement a full-text search on HTML documents with Microsoft SQL Server

This scenario is quite common:

We want to do a linguistic, full-text search on text data, stored in a database.

The only problem: the text data is HTML formatted.

Therefore a target text like <strong>f</strong>oo e.g. will not be matched by the pattern foo.

So how can we implement an "HTML insensitive" search with the help of the Microsoft SQL Server?

While searching for a solution I could not find a complete guide or working sample of how to do this even though it's quite easy.

The trick is to create a full-text index using an HTML filter.

Here is a quick summary.

Continue reading →