4/7/2022»»Thursday

Multi Match

4/7/2022
    66 - Comments

When users make a call to our search endpoint we match their input from the q parameter to both, title and text of the news articles in our Elastisearch cluster (by default). We do not use a multi-match query for that. Instead, we use copyto parameter to index both values in one field (that is then searched).

Multi Match

The Excel VLOOKUP function by default allows you to find only a single match and will return the corresponding row of a selected column value. What if you want to find VLOOKUP multiple matches, not just the first one? In this post let us explore this more complicated scenario. Instead of VLOOKUP however we will use INDEX and MATCH.

  • Multi-Match is played twice a week on Monday and Thursday. 6 main balls are drawn from a drum of balls numbered from 1 to 43. To win the Multi-Match Jackpot, you need to match all 6 main balls drawn. The odds of winning the Jackpot in Multi-Match is 1 in 6,096,454.
  • Multi-Match is a Maryland Lottery game with a lotto play style and a $500,000 minimum jackpot that rolls over and increases until someone wins it. With four ways to win and two weekly draws, your only question soon could be 'cash or annuity?'

Using the MATCH Function in Excel

To find the first MATCH of the “A” value in column B:B we use the following formula as shown on the image below:

See the scenario below:

Finding multiple matches in Excel

Now say we want to find all matches of “A” in column B:B as seen below.
Below the formulas in cells E2-E4. In E2 we find the corresponding row of the first “A”, then in subsequenty (E3-E4) we look for the rows of the next found “A”. You can drag this formula down as much times as needed.

VLOOKUP Multiple Matches

To do a multiple match VLOOKUP we simply need to expand on the above Multiple MATCH example and add the INDEX function like so:

Using VBA to do a VLOOKUP Multi Match

Multi Match

In case you want a more sophisticated approach to doing a multi match INDEX MATCH / VLOOKUP you can also use the VBA Dictionary to record all instances of all lookup values along with selected columns. A simple version of this approach can be found in my post about using VLOOKUP in VBA. Below, however, I expanded this example by using a VBA Collection inside the VBA Dictionary to store value associated with each match of every lookup value (basically creating a very simply tree-like structure).

Based on the “A1:B10” table above the VBA code below will create my dictionary dict object.

After creating the dictionary I can now print all values from column “A:A” for any value of column “B:B”:

Related posts:
You are looking at preliminary documentation for a future release.Not what you want? See thecurrent release documentation.

The multi_match query builds on the match queryto allow multi-field queries:

Multi Match

The query string.

The fields to be queried.

fields and per-field boostingedit

Fields can be specified with wildcards, eg:

Individual fields can be boosted with the caret (^) notation:

The query multiplies the subject field’s score by three but leaves themessage field’s score unchanged.

If no fields are provided, the multi_match query defaults to the index.query.default_fieldindex settings, which in turn defaults to *. * extracts all fields in the mapping thatare eligible to term queries and filters the metadata fields. All extracted fields are thencombined to build a query.

There is a limit on the number of fields that can be queriedat once. It is defined by the indices.query.bool.max_clause_countSearch settingswhich defaults to 1024.

Types of multi_match query:edit

The way the multi_match query is executed internally depends on the typeparameter, which can be set to:

Multi Match

best_fields

(default) Finds documents which match any field, butuses the _score from the best field. See best_fields.

most_fields

Finds documents which match any field and combinesthe _score from each field. See most_fields.

cross_fields

Treats fields with the same analyzer as though theywere one big field. Looks for each word in anyfield. See cross_fields.

phrase

Runs a match_phrase query on each field and uses the _scorefrom the best field. See phrase and phrase_prefix.

phrase_prefix

Runs a match_phrase_prefix query on each field and usesthe _score from the best field. See phrase and phrase_prefix.

bool_prefix

Creates a match_bool_prefix query on each field andcombines the _score from each field. Seebool_prefix.

The best_fields type is most useful when you are searching for multiplewords best found in the same field. For instance “brown fox” in a singlefield is more meaningful than “brown” in one field and “fox” in the other.

The best_fields type generates a match query foreach field and wraps them in a dis_max query, tofind the single best matching field. For instance, this query:

would be executed as:

Normally the best_fields type uses the score of the single best matchingfield, but if tie_breaker is specified, then it calculates the score asfollows:

  • the score from the best matching field
  • plus tie_breaker * _score for all other matching fields

Also, accepts analyzer, boost, operator, minimum_should_match,fuzziness, lenient, prefix_length, max_expansions, fuzzy_rewrite, zero_terms_query,auto_generate_synonyms_phrase_query and fuzzy_transpositions,as explained in match query.

Match 5

operator and minimum_should_match

The best_fields and most_fields types are field-centric — they generatea match query per field. This means that the operator andminimum_should_match parameters are applied to each field individually,which is probably not what you want.

Take this query for example:

This query is executed as:

In other words, all terms must be present in a single field for a documentto match.

See cross_fields for a better solution.

The most_fields type is most useful when querying multiple fields thatcontain the same text analyzed in different ways. For instance, the mainfield may contain synonyms, stemming and terms without diacritics. A secondfield may contain the original terms, and a third field might containshingles. By combining scores from all three fields we can match as manydocuments as possible with the main field, but use the second and third fieldsto push the most similar results to the top of the list.

This query:

would be executed as:

The score from each match clause is added together, then divided by thenumber of match clauses.

Also, accepts analyzer, boost, operator, minimum_should_match,fuzziness, lenient, prefix_length, max_expansions, fuzzy_rewrite, and zero_terms_query.

The phrase and phrase_prefix types behave just like best_fields,but they use a match_phrase or match_phrase_prefix query instead of amatch query.

This query:

would be executed as:

Also, accepts analyzer, boost, lenient and zero_terms_query as explainedin Match, as well as slop which is explained in Match phrase.Type phrase_prefix additionally accepts max_expansions.

phrase, phrase_prefix and fuzziness

The fuzziness parameter cannot be used with the phrase or phrase_prefix type.

The cross_fields type is particularly useful with structured documents wheremultiple fields should match. For instance, when querying the first_nameand last_name fields for “Will Smith”, the best match is likely to have“Will” in one field and “Smith” in the other.

One way of dealing with these types of queries is simply to index thefirst_name and last_name fields into a single full_name field. Ofcourse, this can only be done at index time.

The cross_field type tries to solve these problems at query time by taking aterm-centric approach. It first analyzes the query string into individualterms, then looks for each term in any of the fields, as though they were onebig field.

A query like:

is executed as:

In other words, all terms must be present in at least one field for adocument to match. (Compare this tothe logic used for best_fields and most_fields.)

That solves one of the two problems. The problem of differing term frequenciesis solved by blending the term frequencies for all fields in order to evenout the differences.

In practice, first_name:smith will be treated as though it has the samefrequencies as last_name:smith, plus one. This will make matches onfirst_name and last_name have comparable scores, with a tiny advantagefor last_name since it is the most likely field that contains smith.

Multi Match Rules

Note that cross_fields is usually only useful on short string fieldsthat all have a boost of 1. Otherwise boosts, term freqs and lengthnormalization contribute to the score in such a way that the blending of termstatistics is not meaningful anymore.

If you run the above query through the Validate, it returns thisexplanation:

Also, accepts analyzer, boost, operator, minimum_should_match,lenient and zero_terms_query.

The cross_field type can only work in term-centric mode on fields that havethe same analyzer. Fields with the same analyzer are grouped together as inthe example above. If there are multiple groups, they are combined with abool query.

For instance, if we have a first and last field which havethe same analyzer, plus a first.edge and last.edge whichboth use an edge_ngram analyzer, this query:

would be executed as:

In other words, first and last would be grouped together andtreated as a single field, and first.edge and last.edge would begrouped together and treated as a single field.

Having multiple groups is fine, but when combined with operator orminimum_should_match, it can suffer from the same problemas most_fields or best_fields.

You can easily rewrite this query yourself as two separate cross_fieldsqueries combined with a bool query, and apply the minimum_should_matchparameter to just one of them:

Either will or smith must be present in either of the firstor last fields

You can force all fields into the same group by specifying the analyzerparameter in the query.

which will be executed as:

By default, each per-term blended query will use the best score returned byany field in a group, then these scores are added together to give the finalscore. The tie_breaker parameter can change the default behaviour of theper-term blended queries. It accepts:

Multi Match Drawing Days

0.0

Take the single best score out of (eg) first_name:willand last_name:will (default for all multi_matchquery types except bool_prefix and most_fields)

1.0

Add together the scores for (eg) first_name:will andlast_name:will (default for the bool_prefix andmost_fieldsmulti_match query types)

0.0 < n < 1.0

Take the single best score plus tie_breaker multipliedby each of the scores from other matching fields.

cross_fields and fuzziness

The fuzziness parameter cannot be used with the cross_fields type.

The bool_prefix type’s scoring behaves like most_fields, but using amatch_bool_prefix query instead of amatch query.

The analyzer, boost, operator, minimum_should_match, lenient,zero_terms_query, and auto_generate_synonyms_phrase_query parameters asexplained in match query are supported. Thefuzziness, prefix_length, max_expansions, fuzzy_rewrite, andfuzzy_transpositions parameters are supported for the terms that are used toconstruct term queries, but do not have an effect on the prefix queryconstructed from the final term.

The slop parameter is not supported by this query type.

Multi Match Lottery

« Match phrase prefix queryQuery string query »

Multi Match Draw Time

Most Popular