Full-Text Search: the NEAR parameter and the WEIGHT parameter

1) How near to each other do the two words/phrases have to be to be considered near enough to satisify the nearness critera of the NEAR parameter?

2) How is the WEIGHT parameter different from using the OR parameter? Example:

WHERE CONTAINS(Description, 'ISABOUT (performance weight (.8), comfortable weight (.4), smooth weight (.2) )' );

WHERE CONTAINS(Description, ' "performance" OR "comfortable" OR "smooth" ')

Description contains:

The ride was smooth. I was not comfortable in the car but the ride was smooth. The performance was poor but the ride was smooth.

Both WHERE clauses above would get a match on the previous three records. How does WEIGHT provide something extra?
more ▼

asked Jul 28, 2010 at 06:31 AM in Default

hardingr2000 gravatar image

11 2 2 2

(comments are locked)
10|1200 characters needed characters left

3 answers: sort newest

After taking what I learned above and running a query using CONTAINSTABLE to generate the rankings, I find that I am still confused about how the Rank is calculated.

I have a column containing "7323|Dvorak 10-1 H". When I run my query the following rankings are created:

Rank Search Parameter

84 ISABOUT ("TYSVER" weight (0.9), "10" weight (0.8), "1" weight (0.6), "WIW" weight (0.4))

66 ISABOUT ("DVORAK" weight (0.9), "1-31H" weight (0.8))

53 ISABOUT ("DVORAK" weight (0.9), "2-22H" weight (0.8))

53 ISABOUT ("DVORAK" weight (0.9), "10-1H" weight (0.8))

53 ISABOUT ("DVORAK" weight (0.9), "11-4H" weight (0.8))

As you can see the best and most accurate match is the fourth one down and ranked at 53. Why wasn't this ranked the highest because it matched exactly on the first and the second search parameters?
more ▼

answered Jul 30, 2010 at 12:54 PM

hardingr2000 gravatar image

11 2 2 2

(comments are locked)
10|1200 characters needed characters left

As is usually the case, when I really focus on and think about the documentation I will find my answer. The answer was in front of me all the time.

From the Microsoft link http://msdn.microsoft.com/en-us/library/ms187787.aspx here is part of the documentation on CONTAINS. If I now understand NEAR and WEIGHT correctly, their real power is the effect they have on the calculation of RANK when using CONTAINSTABLE.

Specifies a match of words or phrases that must be in the document that is being searched. Like the AND operator, requires both the search terms to exist in the document being searched.

NEAR | ~ Indicates that the word or phrase on each side of the NEAR or ~ operator must occur in a document for a match to be returned. Several proximity terms can be chained, as in a NEAR b NEAR c or a ~ b ~ c. Chained proximity terms must all be in the document for a match to be returned.

When used in the CONTAINSTABLE function, the proximity of the search terms affects the ranking of each document. The nearer the matched search terms are in a document, the higher the ranking of the document. If matched search terms are >50 terms apart, the rank returned on the document is 0.

For example, CONTAINS (column_name, 'fox NEAR chicken') and CONTAINSTABLE (table_name, column_name, 'fox ~ chicken') would both return any documents in the specified column that contain both "fox" and "chicken". In addition, CONTAINSTABLE returns a rank for each document based on the proximity of "fox" and "chicken". For example, if a document contains the sentence, "The fox ate the chicken," its ranking would be high.

NEAR indicates the logical distance between terms, rather than the absolute distance between them. For example, terms within different phrases or sentences within a paragraph are treated as farther apart than terms in the same phrase or sentence, regardless of their actual proximity, on the assumption that they are less related. Likewise, terms in different paragraphs are treated as being even farther apart.

Specifies that the matching rows (returned by the query) match a list of words and phrases, each optionally given a weighting value.

ISABOUT Specifies the keyword.

WEIGHT(weight_value) Specifies a weight value, which is a number from 0.0 through 1.0. Each component in may include a weight_value. weight_value is a way to change how various portions of a query affect the rank value assigned to each row matching the query. WEIGHT does not affect the results of CONTAINS queries, but WEIGHT impacts rank in CONTAINSTABLE queries.

Thank you for your help.
more ▼

answered Jul 29, 2010 at 07:59 AM

hardingr2000 gravatar image

11 2 2 2

(comments are locked)
10|1200 characters needed characters left
more ▼

answered Jul 28, 2010 at 01:31 PM

Fatherjack gravatar image

Fatherjack ♦♦
42.8k 75 79 108

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Jul 28, 2010 at 06:31 AM

Seen: 3279 times

Last Updated: Jul 28, 2010 at 06:31 AM