question

hardingr2000 avatar image
hardingr2000 asked

Full-Text Search: the NEAR parameter and the WEIGHT parameter

1) How near to each other do the two words/phrases have to be to be considered near enough to satisify the nearness critera of the NEAR parameter? 2) How is the WEIGHT parameter different from using the OR parameter? Example: WHERE CONTAINS(Description, 'ISABOUT (performance weight (.8), comfortable weight (.4), smooth weight (.2) )' ); WHERE CONTAINS(Description, ' "performance" OR "comfortable" OR "smooth" ') Description contains: The ride was smooth. I was not comfortable in the car but the ride was smooth. The performance was poor but the ride was smooth. Both WHERE clauses above would get a match on the previous three records. How does WEIGHT provide something extra?
full-textor
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image
Fatherjack answered
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

hardingr2000 avatar image
hardingr2000 answered
As is usually the case, when I really focus on and think about the documentation I will find my answer. The answer was in front of me all the time. From the Microsoft link http://msdn.microsoft.com/en-us/library/ms187787.aspx here is part of the documentation on CONTAINS. If I now understand NEAR and WEIGHT correctly, their real power is the effect they have on the calculation of RANK when using CONTAINSTABLE. Specifies a match of words or phrases that must be in the document that is being searched. Like the AND operator, requires both the search terms to exist in the document being searched. NEAR | ~ Indicates that the word or phrase on each side of the NEAR or ~ operator must occur in a document for a match to be returned. Several proximity terms can be chained, as in a NEAR b NEAR c or a ~ b ~ c. Chained proximity terms must all be in the document for a match to be returned. **When used in the CONTAINSTABLE function, the proximity of the search terms affects the ranking of each document. The nearer the matched search terms are in a document, the higher the ranking of the document. If matched search terms are >50 terms apart, the rank returned on the document is 0.** For example, CONTAINS (column_name, 'fox NEAR chicken') and CONTAINSTABLE (table_name, column_name, 'fox ~ chicken') would both return any documents in the specified column that contain both "fox" and "chicken". In addition, CONTAINSTABLE returns a rank for each document based on the proximity of "fox" and "chicken". For example, if a document contains the sentence, "The fox ate the chicken," its ranking would be high. NEAR indicates the logical distance between terms, rather than the absolute distance between them. For example, terms within different phrases or sentences within a paragraph are treated as farther apart than terms in the same phrase or sentence, regardless of their actual proximity, on the assumption that they are less related. Likewise, terms in different paragraphs are treated as being even farther apart. Specifies that the matching rows (returned by the query) match a list of words and phrases, each optionally given a weighting value. ISABOUT Specifies the keyword. WEIGHT(weight_value) Specifies a weight value, which is a number from 0.0 through 1.0. Each component in may include a weight_value. weight_value is a way to change how various portions of a query affect the rank value assigned to each row matching the query. **WEIGHT does not affect the results of CONTAINS queries, but WEIGHT impacts rank in CONTAINSTABLE queries.** Thank you for your help.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

hardingr2000 avatar image
hardingr2000 answered
After taking what I learned above and running a query using CONTAINSTABLE to generate the rankings, I find that I am still confused about how the Rank is calculated. I have a column containing "7323|Dvorak 10-1 H". When I run my query the following rankings are created: Rank Search Parameter 84 ISABOUT ("TYSVER" weight (0.9), "10" weight (0.8), "1" weight (0.6), "WIW" weight (0.4)) 66 ISABOUT ("DVORAK" weight (0.9), "1-31H" weight (0.8)) 53 ISABOUT ("DVORAK" weight (0.9), "2-22H" weight (0.8)) 53 ISABOUT ("DVORAK" weight (0.9), "10-1H" weight (0.8)) 53 ISABOUT ("DVORAK" weight (0.9), "11-4H" weight (0.8)) As you can see the best and most accurate match is the fourth one down and ranked at 53. Why wasn't this ranked the highest because it matched exactly on the first and the second search parameters?
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.