Multi Match
When users make a call to our search endpoint we match their input from the q parameter to both, title and text of the news articles in our Elastisearch cluster (by default). We do not use a multi-match query for that. Instead, we use copyto parameter to index both values in one field (that is then searched).
The Excel VLOOKUP function by default allows you to find only a single match and will return the corresponding row of a selected column value. What if you want to find VLOOKUP multiple matches, not just the first one? In this post let us explore this more complicated scenario. Instead of VLOOKUP however we will use INDEX and MATCH.
- Multi-Match is played twice a week on Monday and Thursday. 6 main balls are drawn from a drum of balls numbered from 1 to 43. To win the Multi-Match Jackpot, you need to match all 6 main balls drawn. The odds of winning the Jackpot in Multi-Match is 1 in 6,096,454.
- Multi-Match is a Maryland Lottery game with a lotto play style and a $500,000 minimum jackpot that rolls over and increases until someone wins it. With four ways to win and two weekly draws, your only question soon could be 'cash or annuity?'
Using the MATCH Function in Excel
To find the first MATCH of the “A” value in column B:B we use the following formula as shown on the image below:
See the scenario below:
Finding multiple matches in Excel
Now say we want to find all matches of “A” in column B:B as seen below.
Below the formulas in cells E2-E4. In E2 we find the corresponding row of the first “A”, then in subsequenty (E3-E4) we look for the rows of the next found “A”. You can drag this formula down as much times as needed.
VLOOKUP Multiple Matches
To do a multiple match VLOOKUP we simply need to expand on the above Multiple MATCH example and add the INDEX function like so:
Using VBA to do a VLOOKUP Multi Match
In case you want a more sophisticated approach to doing a multi match INDEX MATCH / VLOOKUP you can also use the VBA Dictionary to record all instances of all lookup values along with selected columns. A simple version of this approach can be found in my post about using VLOOKUP in VBA. Below, however, I expanded this example by using a VBA Collection inside the VBA Dictionary to store value associated with each match of every lookup value (basically creating a very simply tree-like structure).
Based on the “A1:B10” table above the VBA code below will create my dictionary dict object.
After creating the dictionary I can now print all values from column “A:A” for any value of column “B:B”:
The multi_match
query builds on the match
queryto allow multi-field queries:
The query string. |
The fields to be queried. |
fields
and per-field boostingedit
Fields can be specified with wildcards, eg:
Individual fields can be boosted with the caret (^
) notation:
The query multiplies the |
If no fields
are provided, the multi_match
query defaults to the index.query.default_field
index settings, which in turn defaults to *
. *
extracts all fields in the mapping thatare eligible to term queries and filters the metadata fields. All extracted fields are thencombined to build a query.
There is a limit on the number of fields that can be queriedat once. It is defined by the indices.query.bool.max_clause_count
Search settingswhich defaults to 1024.
Types of multi_match
query:edit
The way the multi_match
query is executed internally depends on the type
parameter, which can be set to:
| (default) Finds documents which match any field, butuses the |
| Finds documents which match any field and combinesthe |
| Treats fields with the same |
| Runs a |
| Runs a |
| Creates a |
The best_fields
type is most useful when you are searching for multiplewords best found in the same field. For instance “brown fox” in a singlefield is more meaningful than “brown” in one field and “fox” in the other.
The best_fields
type generates a match
query foreach field and wraps them in a dis_max
query, tofind the single best matching field. For instance, this query:
would be executed as:
Normally the best_fields
type uses the score of the single best matchingfield, but if tie_breaker
is specified, then it calculates the score asfollows:
- the score from the best matching field
- plus
tie_breaker * _score
for all other matching fields
Also, accepts analyzer
, boost
, operator
, minimum_should_match
,fuzziness
, lenient
, prefix_length
, max_expansions
, fuzzy_rewrite
, zero_terms_query
,auto_generate_synonyms_phrase_query
and fuzzy_transpositions
,as explained in match query.
operator
and minimum_should_match
The best_fields
and most_fields
types are field-centric — they generatea match
query per field. This means that the operator
andminimum_should_match
parameters are applied to each field individually,which is probably not what you want.
Take this query for example:
This query is executed as:
In other words, all terms must be present in a single field for a documentto match.
See cross_fields
for a better solution.
The most_fields
type is most useful when querying multiple fields thatcontain the same text analyzed in different ways. For instance, the mainfield may contain synonyms, stemming and terms without diacritics. A secondfield may contain the original terms, and a third field might containshingles. By combining scores from all three fields we can match as manydocuments as possible with the main field, but use the second and third fieldsto push the most similar results to the top of the list.
This query:
would be executed as:
The score from each match
clause is added together, then divided by thenumber of match
clauses.
Also, accepts analyzer
, boost
, operator
, minimum_should_match
,fuzziness
, lenient
, prefix_length
, max_expansions
, fuzzy_rewrite
, and zero_terms_query
.
The phrase
and phrase_prefix
types behave just like best_fields
,but they use a match_phrase
or match_phrase_prefix
query instead of amatch
query.
This query:
would be executed as:
Also, accepts analyzer
, boost
, lenient
and zero_terms_query
as explainedin Match, as well as slop
which is explained in Match phrase.Type phrase_prefix
additionally accepts max_expansions
.
phrase
, phrase_prefix
and fuzziness
The fuzziness
parameter cannot be used with the phrase
or phrase_prefix
type.
The cross_fields
type is particularly useful with structured documents wheremultiple fields should match. For instance, when querying the first_name
and last_name
fields for “Will Smith”, the best match is likely to have“Will” in one field and “Smith” in the other.
One way of dealing with these types of queries is simply to index thefirst_name
and last_name
fields into a single full_name
field. Ofcourse, this can only be done at index time.
The cross_field
type tries to solve these problems at query time by taking aterm-centric approach. It first analyzes the query string into individualterms, then looks for each term in any of the fields, as though they were onebig field.
A query like:
is executed as:
In other words, all terms must be present in at least one field for adocument to match. (Compare this tothe logic used for best_fields
and most_fields
.)
That solves one of the two problems. The problem of differing term frequenciesis solved by blending the term frequencies for all fields in order to evenout the differences.
In practice, first_name:smith
will be treated as though it has the samefrequencies as last_name:smith
, plus one. This will make matches onfirst_name
and last_name
have comparable scores, with a tiny advantagefor last_name
since it is the most likely field that contains smith
.
Multi Match Rules
Note that cross_fields
is usually only useful on short string fieldsthat all have a boost
of 1
. Otherwise boosts, term freqs and lengthnormalization contribute to the score in such a way that the blending of termstatistics is not meaningful anymore.
If you run the above query through the Validate, it returns thisexplanation:
Also, accepts analyzer
, boost
, operator
, minimum_should_match
,lenient
and zero_terms_query
.
The cross_field
type can only work in term-centric mode on fields that havethe same analyzer. Fields with the same analyzer are grouped together as inthe example above. If there are multiple groups, they are combined with abool
query.
For instance, if we have a first
and last
field which havethe same analyzer, plus a first.edge
and last.edge
whichboth use an edge_ngram
analyzer, this query:
would be executed as:
In other words, first
and last
would be grouped together andtreated as a single field, and first.edge
and last.edge
would begrouped together and treated as a single field.
Having multiple groups is fine, but when combined with operator
orminimum_should_match
, it can suffer from the same problemas most_fields
or best_fields
.
You can easily rewrite this query yourself as two separate cross_fields
queries combined with a bool
query, and apply the minimum_should_match
parameter to just one of them:
Either |
You can force all fields into the same group by specifying the analyzer
parameter in the query.
which will be executed as:
By default, each per-term blended
query will use the best score returned byany field in a group, then these scores are added together to give the finalscore. The tie_breaker
parameter can change the default behaviour of theper-term blended
queries. It accepts:
Multi Match Drawing Days
| Take the single best score out of (eg) |
| Add together the scores for (eg) |
| Take the single best score plus |
cross_fields
and fuzziness
The fuzziness
parameter cannot be used with the cross_fields
type.
The bool_prefix
type’s scoring behaves like most_fields
, but using amatch_bool_prefix
query instead of amatch
query.
The analyzer
, boost
, operator
, minimum_should_match
, lenient
,zero_terms_query
, and auto_generate_synonyms_phrase_query
parameters asexplained in match query are supported. Thefuzziness
, prefix_length
, max_expansions
, fuzzy_rewrite
, andfuzzy_transpositions
parameters are supported for the terms that are used toconstruct term queries, but do not have an effect on the prefix queryconstructed from the final term.
The slop
parameter is not supported by this query type.