This scenario is quite common:
We want to do a linguistic, full-text search on text data, stored in a database.
The only problem: the text data is HTML formatted.
Therefore a target text like <strong>f</strong>oo
e.g. will not be matched by the pattern foo
.
So how can we implement an "HTML insensitive" search with the help of the Microsoft SQL Server?
While searching for a solution I could not find a complete guide or working sample of how to do this even though it's quite easy.
The trick is to create a full-text index using an HTML filter.
Here is a quick summary.