Take a peak under the hood of the incredibly advanced search engine built into Shopp.

Search – Shopp’s Secret Weapon

1261069_33905979

Search – Shopp’s Secret Weapon

A good on-site search engine can make or break profitability on your storefront. Unlike other ecommerce plugins for WordPress, search in Shopp does not use the built-in WordPress search system at all. Instead Shopp has a custom search engine built for ecommerce and designed to be amazingly tweakable, allowing results to be sculpted by developers and merchants.

To understand how search engines work, you need to be familiar with the concepts of precision and recall. Precision in search refers to the relevancy of results. Perfect precision means every result returned by the search is relevant, but not all of the relevant entries that exist may be provided. Recall refers to the total number of results provided. Perfect recall means that every relevant result was returned, but some of the results might be irrelevant to the search.

Shopp has a complete search system engineered with a query engine and full content indexer for the products in your catalog. It was designed to provide absolute control over precision and recall in your store search.

Search in Shopp offers far more than any other ecommerce solution for WordPress can: full product content indexing, natural and boolean search, logic settings, query rewriting, price search, word stemming, index weighting, and customizable stop words.

That’s all well and good, but what does all that fancy language really mean? It means Shopp’s search can be perfectly tuned to deliver useful results to your customers to help you land more sales.

To learn how to bend search to your will, you’re going to need to know what some of that technobabble does. What follows is an in-depth look at each of the Shopp search engine’s features.

Full Product Content Indexing

Content indexing is a procedure that analyzes and optimizes a piece of content to make it easier for the search query system to match and score how relevant the content is. Shopp’s indexer does this by copying the full text of the content into the index, then a second copy is created and processed through the word stemmer.

So what is this content that get’s indexed? Simply put, all of it. All of a product’s text content is indexed for search including:

  • name
  • summary
  • description
  • specs (names and values)
  • tags
  • category names
  • variant labels
  • add-on labels
  • pricing

Each type of content is kept in its own index so that the type of content can be assigned an index factor. These factors are a way of adding weight to the terms in a specific kind of content. They allow you to fine-tune search precision and alter how matches are scored to change the products shown in the search results.

Index Factors

The index factors allow granular adjustment of search relevance scoring based on what kind of content the search terms match on. The product content types include: name, full description, summary, spec names and values, tags, category names, variant labels, addon labels, and prices. Each of these categories have an index factor, a number that modifies the match scoring. The factors are percentage multipliers on the match scoring returned from the natural language search. Search results are then sorted against the adjusted score to determine the display order from most relevant match to least relevant result.

The default index factors for these content categories are as follows:

  • product name: 200%
  • product prices: 160%
  • summary: 100%
  • description: 100%
  • specs: 75%
  • categories: 50%
  • tags: 50%

As shown above, a match on terms in the product name will be twice as relevant as a match in the full description, or four times as relevant as a match on the product tags.

Each of these factors can be altered by using the shopp_index_factors filter to adjust the values in the default index factors array. To make tag matching more powerful and reduce the power of matches in the full description you would add PHP code to your theme functions.php file or a custom WordPress plugin:

function adjust_shopp_index_factors ($factors) {
    $factors['tags'] = 150;
    $factors['description'] = 60;
    return $factors;
}
add_filter('shopp_index_factors', 'adjust_shopp_index_factors');

This code adjusts tag scoring to 1.5 times (150%) normal relevancy, and description matching to 3/5th (60%) of normal relevance scoring.

Natural & Boolean Queries

It sounds scarier than it is. In a nutshell, Shopp uses MySQL’s full-text search features in not one, but two modes simultaneously. Instead of just relying on the natural language full-text search alone, Shopp also adds in boolean full-text search to improve recall to provide more results than natural language search would on its own.

If you’re running a large storefront (like a Target or Wal-mart) with many thousands of product, more recall isn’t necessarily desirable. For small storefronts with a hundred or so products, more recall is a necessary evil to offer more to customers than just what they are currently searching for.

From a conversion strategy perspective, more precision means providing just what the customer is looking for. The search action pre-qualifies the sales lead, so you’re more likely to see a sale, a conversion when a customer finds exactly what they’re looking for. On smaller stores, there are less options and it is more likely that the store won’t have just the right pink fluffy bunny slippers.

Ordinarily this could result in few or no results from a search. For an online storefront, you might as well have lost the customer. So offering “something” – even something close to relevant – might just be enough to pique interest and delight the customer. It encourages discovery and that can be a secondary route to customer conversion.

While Shopp doesn’t re-invent the wheel here, it uses techniques that work better for an e-commerce application than those designed for just blog content.

Search Logic

Now, what if you actually are trying to start-up the next Amazon.com? You actually want to limit recall and improve precision? Shopp makes its boolean search tweakable too by changing the search logic used.

By default, Shopp uses or boolean search logic. As we just discussed above, that has the benefit of including more results in the result set. But In large catalogs, this can cause a lot of unrelevant results to appear. For example, consider this search:

apple computer

With OR boolean matching, any product that matches either apple or computer would get returned. If your catalog is large enough to have products related to the fruit, in addition to the computer company, both sets of products will apear. The products with both apple and computer will be scored with higher relevance, but all of the computer and fruit products will appear in the results.

To improve search results on storefronts with large catalogs, you can switch the boolean search to AND searching. Add the following configuration macro to the WordPress wp-config.php file to reduce the noise in multiple term search queries on large catalog storefronts:

define('SHOPP_SEARCH_LOGIC', 'AND');

Now the same search performed earlier will require that both terms apple and computer must be part of the content of the product for that product to get included in the search results. That’s a much narrower search, which naturally improves precision.

Query Rewriting

Another trick Shopp uses to improve search for store catalogs is some advanced query rewriting techniques. It’s a process that tunes search queries entered by customers into a strategy that is more condusive to index searching.

Without digging too deep into the mechanics, query rewriting in Shopp reformats the search query these techniques:

  • sanitizing it of dangerous stuff (like programming markup)-
  • handling accent characters
  • collapsing acronyms, hyphenated prefixed words and contractions
  • identifying price searches
  • identifying short words
  • identifying stop words
  • word stemming

All of this is critical to improving the successful match of potentially relevant results for better recall. It’s the job of the natural full-text search to handle scoring relevant results for precision.

A few of these strategies really separate Shopp’s search engine from any other solution available and are worth reviewing how they work and the benfits they provide to storefronts: price matching, short word searches, stop words and word stemming.

Price Matching

If ever there was a search feature specific to e-commerce, this is it. Shopp’s search engine can identify currency figures and use those as terms for searching by price. For example, customers can search the store catalog to find relevant products in the price range they’re looking for.

For example, if you wanted to find a watch on the site priced near $50 you could search for:

watches $50 

Or, if you’re looking for rings that are priced over $99:

ring >$99 

It even supports price ranges. If you’re looking for PlayStation 3 games between a $10 and $35 you could search like this:

ps3 games $10-$35

Price matching in Shopp’s search engine is optimized for ecommerce and is a clear example of the advantages Shopp has for running your store.

Stop Words

Both search queries and content indexes are filtered against a set of English stop words. Stop words are words that are so common they don’t add any degree of relevance or specificity to searching such as a, and, and the. They are part of natural language to such a degree that most every text uses them, so they’re everywhere. If left in and searchable you’d get a lot of irrelevant results polluting your search results.

There are a default set of stop words used by the Shopp search engine:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

By now it should be no surprise that you can alter the list by using the shopp_index_stopwords filter hook. This will obviously take a little programming know-how to exploit, but when used properly can have a profound effect on search results. For example it can be really helpful if you need a product to match on a specific stop word, or if you are getting noisy results for a specific search term common to the industry of the store.

When filtering the stop words, care should be taken to rebuild the Product Search Index. Login to the WordPress admin, navigate to ShoppSystemAdvanced and click the Rebuild Product Search Index to regenerate a new index against the new stop words.

The stop words are supplied to the search engine in a PHP array that is easily manipulated:

function custom_shopp_stop_words ($words) {
    // Add "those" to the stop words
    $words[] = "those";

    // Remove "these" from the stop words
    $words = array_diff($words,array("these"));

    return $words;
}
add_filter('shopp_index_stopwords', 'custom_shopp_stop_words');

The example above adds the word those to the stop words list and removes the word these. This specific example isn’t particularly useful except to show you the right techniques to use to add and remove a stop-word from the list. Again, remember to rebuild the search index after you install the code to see how it affects search results.

MySQL Stop Words

MySQL has its own stop-words list that is used. Some hosts may or may not allow you to alter this list in any way, so bear in mind that the stop words in Shopp are there to optimize the index more that the search queries. If you do have access to configure your own MySQL server, you’ll want to review “Fine-Tuning MySQL Full-Text Search” from the MySQL manual for how to change the stop words list used by MySQL.

Short Word Searches

Since Shopp still relies on the full-text matching of MySQL to carry out natural language and boolean searches it is limited by the MySQL minimum word length setting. The minimum word length setting for MySQL typically defaults to 4, meaning MySQL only indexes words that are 4 or more letters in length. It is adjustable, but requires changing the MySQL server configuration and rebuilding the table index (not the Shopp search index). For details on this, take a look at the article “Fine-Tuning MySQL Full-Text Search“.

Word Stemmers

One more thing. The last weapon in Shopp’s search arsenal is a built-in English word stemmer. A word stemmer reduces words to the simplest base form. For example, predicts, prediction and predicted are reduced to the stem word predict. The words beauty, beautiful and beauties are given the stem `beauti.

Word stemmers provide fuzzy search increasing the number of matching hits on similar, but not exact, terms. It improves recall to a degree beyond what the search built-in to WordPress is designed to do.

The built-in word stemmer in Shopp is an implementation of the Porter Stemming Algorithm, a proven and popular stemming algorithm for English.

There are cases where an different stemmer is desired, or even necessary. For example, other languages would need their own stemming rules, or adding a custom stemmer for industry-specific words.

Although it requires significant development experience, it is possible to extend Shopp and implement entirely an entirely custom stemmer without modifying any core files. A complete technical description of the approach for implementing a custom stemmer is outside the scope of this article.

The basic approach to implementing a custom stemmer requires the following:

  • A stemmer implementation in either a class or available as a function call
  • Unregister the built-in StemFilter() call from the shopp_index_content and shopp_boolean_search filter hooks
  • Register a function that implements the custom stemmer to both filters: shopp_index_content and shopp_boolean_search

Conclusion

WordPress search is not designed with product search and conversion optimization prinicples in mind. Still, the “good enough” content search in WordPress is what most e-commerce solutions built on WordPress provide. It may be servicable, but it certainly isn’t optimized.

Search in Shopp is clearly designed to for e-commerce goals and engineered for tunability to craft results that give customers what they’re looking for, give merchants opportunity for more sales, and a stack of technologies that make life easier for developers. A win, win, win scenario.

Avatar of Jonathan Davis

By

Jonathan was born at an early age and began designing and developing shortly after. He is the founder of Ingenesis Limited and Project Lead on the Shopp e-commerce plugin for WordPress. He lives and works in the heart of the midwest US with his family. He fancies himself a designer of code, and is only slightly addicted to coffee.

You must be logged in to post a comment.

© Ingenesis Limited. Shopp™ is a registered trademark of Ingenesis Limited.

Skip to toolbar