Alex Klibisz
Machine learning research intern at ORNL.

A quick and dirty approach to multi-property fuzzy search/filtering

Alex Klibisz, 3/15/2016

The Problem

Let’s say you’re implementing some client-side filtering for a large list of objects, and you want to be able to match a query against multiple string-like properties on those objects.

For example…

You have 1000 restaurants loaded into your SPA with a format like this:


const restaurants = [
  {
    "name": "Copper Cellar",
    "food_type": "Burgers",
    "neighborhood": "UT Campus"
  },
  {
    "name": "Soccer Taco",
    "food_type": "Mexican",
    "neighborhood": "Market Square"
  }
];

You want to provide a three-field filter area where a user can filter by restaurant name, food type, and neighborhood:

But you also need to allow for some fuzziness or typos in the filters. For example, a query with food type Mxican and neighborhood marketsquare should still bring up Soccer Taco.

One way to do it...

So we need a multi-property string filter with some room for fuzziness. First, let’s use a library to handle the fuzziness because that’s actually difficult, and we don’t want to re-invent the wheel. My favorite has been string_score. After you import it, it works like so:


const score = "string one".score("string two");
// score is a value between 0 and 1.

Let’s create a query object from our filter form:


const filterQuery = {
  "name": "",
  "food_type": "Mxican",
  "neighborhood": "marketsquare"
};

Let’s declare a function that will take the query and all of the restaurant objects to find matching ones by some threshold for fuzziness:


function matchFilter(allItems, query, threshold) {
  return [];
}

And now let's fill in the guts:


function matchFilter(allItems, query, threshold) {
   // Create an array of properties that are defined in the query.
   // For the example, it will be [food_type, neighborhood]
   const properties = Object.keys(query)
   .filter(key => query[key].trim().length > 0);
   // Create a comparison string for the query item.
   // For the example, it will be “Mxicanmarketsquare”
   const queryComp = properties.map(p => query[p]).join();
   // Filter down to get the matching items.
   const matchingItems = allItems.filter(item => {
     // Create a comparison string for the current
     // item that consists of the property values
     // that are included in the query.
     const itemComp = properties.map(p => item[p]).join();
     // Compare itemComp string to the queryComp.
     // Statement evaluates to true, then the item matches!
     return itemComp.score(queryComp) >= threshold;
   });
  return matchingItems;
}

So you end up comparing two strings at a time: one is the query comparison string, and the other is an item comparison string which gets created for each of the items. You use these comparison strings to filter down to only the items which have a comparison string above the specified threshold when compared to the query string.

For the example, this would go as follows:

  1. The properties we care about are [“food_type”, “neighborhood”], because the name property was left blank by the user.
  2. Based on the properties, the query string becomes “Mxicanmarketsquare”
  3. Look at item 0: compare “Mxicanmarketsquare” to “BurgersUT Campus”. This is likely not a match.
  4. Compare “Mxicanmarketsquare” to “MexicanMarket Square”. This is likely a match!
  5. Your only match is indeed Soccer Taco! You win and you get Tacos!!

Demo time:

Here’s a very bare-bones demo on JSFiddle: https://jsfiddle.net/aklibisz/m2emywc4/2/

I used it in real life!

This isn’t intended to be a perfect solution, but it can be an adequate solution if you’re restricted to client-side filtering. For example, I used it when building the client-side course filtering for the StudyLoop web app:

Potential Variations

  1. Converting the two comparison strings to be all lowercase.
  2. Incrementally decreasing the score threshold until a certain number of matches are found.

Let me know what you think in the comments!

Related Posts

Comments