Full Text Search

In this post, we are going to discuss a little about database searches and how to do a full text search on your SQL Azure database.

Explaining about the Properties of a SQL Azure Database

Entities in a database table usually have atomic attributes, like nvarchar, datetime or decimal. This means that they cannot be divided into smaller distinct pieces. Some consider this a requirement in order for the data model to be normalised. There some cases however where the existence and properties of a certain attribute can be dependant on another value of the entity or on a value of a related entity. In order to avoid over-granularization of the data (e.g. by creating an arbitrarily large number of distinct and somewhat dynamic tables) one can encapsulate these values into an object model. This object model can be stored in an nvarchar, JSON (see PostgreSQL) or XML format column.

There are some benefits to this technique:

  • The data will be stored in a semantic manner
  • Any two entities can not only have different values, but also different set of attributes, with different types
  • The structure of the data can easily accommodate even significant changes
  • The hierarchy can be represented between the attributes
Using This Model are there Limitations Running Full Text Search with SQL Azure?

Of course, there are a number of limitations when we use a model like this:

  • Increased query complexity
  • Simple SQL functions will be denied to execute on them
  • No sorting or grouping is possible (at least for now in SQL Azure).

 

Because of these reasons, running generic SQL queries in this column type can really grind even powerful machines to a halt. Even such small operations as a SELECT with a simple LIKE statement means that the server has to parse thousands of XML files, and do string interpolation on them. This is an operation that is fairly expensive and time-consuming in every scenario as well.

In one solution we have a number of business processes that share a large amount of generic information on their respective records. These are reference numbers, specific dates, and permissions.

There are some significant differences however in the details. Therefore, we decided on using an XML entity to represent these values. Over time the users wanted to be able to run queries on the data contained in these attributes as well. With Full-Text Search (FTS) made available on SQL Azure, we could fulfill their request.

Step by Step How to Run a Full Text Search on SQL Azure

In the case of Full-text Search, an engine examines all the words in every document.It returns only those that match the given search criteria.

 

To achieve this:

Firstly, create indexes for keywords on the given column in the given data table. These indexes are stored in objects called Catalogs. They are built based on the string terms found in the XML documents.

Secondly, build and populate these catalogs. For this, the server needs to index every row it already has in the given column. Processing them may take some time. Fortunately, it will automatically run as a background task. This means it should not interfere with the actual performance of the database.

FTS can be set up on views as well as tables. The only restriction is that the view must be schema-bound.

CREATE VIEW [dbo].[FullTextSearchView] WITH SCHEMABINDING AS (SELECT Id, Data FROM dbo.FullTextSearchTable)

Thirdly, create the catalog, and wire it up to the Data field. Please notice that these DDL statements can not be run in a transactional scope. Pay extra care setting up!

CREATE FULLTEXT CATALOG dataViewCatalog AS DEFAULT;
CREATE FULLTEXT INDEX ON [dbo].[FullTextSearchView](Data) KEY INDEX id_FullTextDataView ON dataViewCatalog;

From here there are only 2 steps left. These are to start the population of the catalog, and enable the FTS index.

ALTER FULLTEXT INDEX ON [dbo].[FullTextSearchView] ENABLE;
ALTER FULLTEXT INDEX ON [dbo].[FullTextSearchView] START FULL POPULATION;

Now the time is up for a cup of tea while the server is applying the indexes. The following query gives some insight on the status of the population process:

SELECT
OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableFulltextCatalogId') AS [Catalog Id],
OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableHasActiveFulltextIndex') AS [Active],
CASE OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableFulltextPopulateStatus')
    WHEN 0 THEN 'Idle'
    WHEN 1 THEN 'Full population in progress'
    WHEN 2 THEN 'Incremental population is in progress'
    WHEN 3 THEN 'Propagation of tracked changes is in progress'
    WHEN 4 THEN 'Background update index is in progress'
    WHEN 5 THEN 'Paused or throttled'
    WHEN 6 THEN 'Error'
    END AS [Populate status],
OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableFullTextBackgroundUpdateIndexOn') AS [Background update index is on],
OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableFullTextChangeTrackingOn') AS [Change tracking on],
OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableFulltextDocsProcessed') AS [Docs processed],
OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableFulltextFailCount') AS [Failure count],
OBJECTPROPERTYEX(object_id(N'dbo.FullTextSearchView'), N'TableFulltextItemCount') AS [Item count]

The number of successfully indexed documents can be found in the item count column.

The Completed Full Text Search SQL Azure Database

Once the population has finished, we can start querying the data. The two main functions that can be used with FTS (Full Text Search) are CONTAINS and FREETEXT. They serve different purposes and are capable of doing different tasks.

To extend our searching features we use CONTAINS. This is because we are interested in looking up some exact values that are contained in the documents. NHibernate does not support FTS natively so we have to format our queries as SQL statements. We then add them as extra criteria when needed to our underlying query engine.

It was at this time that the following problem came up. If we had more than one phrase to search for (and the result set should consist of elements containing all of them in the given order) the query discarded results where the phrases were glued together.

For example, if our input is “AAA BBB CCC”, then the

CONTAINS(Data, '"AAA BBB CCC"')

clause will not return any row that has “AAABBBCCC”.

This might be a problem if a search is directed to getting any combination back, and should disregard any whitespace. Also, it did not return any rows where we had any other strings between our terms, e.g. “AAA free BBB pizza CCC”.

How to Search For Possible Solutions

To search for all the possible solutions we realized that we must search the following clauses:

  • Exact search “AAA BBB CCC”
  • Disregard whitespaces “AAABBBCCC*”
  • Proximity search “NEAR((AAA*, BBB*, CCC*))”

Fortunately, logical operators are allowed inside the CONTAINS function call. This means that these three options can be translated into just one statement:

CONTAINS(Data, '"AAA BBB CCC" OR "AAABBBCCC*" OR NEAR(("AAA*", "BBB*", "CCC*"))')

Furthermore, let’s assume that we are only interested in records where “AAA BBB CCC” appears as the value of the attribute called “name” under “items”. If we have that information then we can chain the FTS query with some clever XPath, resulting in:

CONTAINS(Data, '"AAA BBB CCC" OR "AAABBBCCC*" OR NEAR(("AAA*", "BBB*", "CCC*"))')
    AND Data.exist('/Record/items/name') > 0
    AND (Data.value('(/Record/items/name)[1]', 'nvarchar(255)') LIKE '%AAA%'
        OR Data.value('(/Record/items/name)[1]', 'nvarchar(255)') LIKE '%BBB%'
        OR Data.value('(/Record/items/name)[1]', 'nvarchar(255)') LIKE '%CCC%')

Get Coding!

 

Azure Licks: How to Run a Full-Text Search on SQL Azure is the second in 3 ( so far ) of our Azure Licks posts. In them, we try to improve your understanding of working with Azure and its many features. At Greenfinch Technologies we are well known for our expertise in working with Azure, especially (though not limited) with .NET Software Development.

 

If you have enjoyed this post about working with SQL Azure you might enjoy our other 2 Azure Licks posts. These are:

  • AZURE LICKS: USING REDIS CACHE WITH SIGNALR
  • AZURE LICKS: EXPLAINING USES OF AZURE LOGIC APPS & FUNCTION APPS

.Net Software Development

If you are in need of  .Net software development you can reach us using our Contact Form or by calling us on +353 (0)1 818 2949

A brief note about where the expression ‘Azure Licks’ comes from.

For anyone wondering where the expression ‘Azure Licks’ origionated, it is a Greenfinch Technology expression. It was coined by Greenfinch software developers to show the similarities between ‘Lick music‘ and small repeatable, reusable bits and pieces of works on Azure stack.

Greenfinch Technology

Greenfinch Technology