question

Rafael Colucci avatar image
Rafael Colucci asked

Sql Server 2008 - Line Break and fulltext queries

I have a database and a fulltext indexed table. Lets call this table test. This table has one field called testfield. Now, lets insert only one record as follow: insert into test values ('word' + Char(13) + Char(10) + 'test') This query inserts a word with a LINE_BREAK. Now, lets query this table using fulltext: select * from test where contains(testfield, '"word test"') In that case, this query returns nothing at all. Also: select * from test where contains(testfield, '"wordtest"') Returns nothing (that was expected do be this way) Now, lets query the table again modifying the search word: select * from test where contains(testfield, '"word' + Char(13) + Char(10) + 'test"') In that case the query returns the correct line. The question is: Why does this happen? I mean, SQL should ignore line breaks when searching for a word and it does not. I do not believe that this is the default behavior of a fulltext engine. At least it is not acceptable. What happens if my users insert lines with line breaks on my table (and this is usual, since they can write anything they want because the field is an memo field)? Is there any way of correcting this? **EDIT** It only happens when I choose brazilian as the language for FT. If I choose english, none of the problems I mentioned happens. **EDIT** On Sql Server Denali CTP3 neither english nor portuguese works. Maybe this is bug in the english steamer.
sql-server-2008full-text
6 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@Rafael Colucci Who knows, maybe SQL Server's full text search ignores the line breaks only if they are proper. In your case, data which contains char(13) does not really have line breaks. char(13) is a **carriage return** and has nothing to do with the line break. Try to issue a select from you table with results to text. Copy the value of that column in question and paste it into notepad to see that char(13) is not a line break. A proper line break in Windows is actually **carriage return** plus **linefeed**, meaning **char(13) + char(10)**. This is the only one which makes perfect sense if you remember the good old days when typewriters were used. Back then the secretary would type some text, grab the handle and move the caret to the left (carriage return). Upon completion of the move, the carriage would automatically drop one line down (linefeed). So, the only true line break is char(13) + char(10). Admittedly, many modern editors are programmed to forgive the lack of one of the characters. For instance SSMS itself handles char(10) as a line break. You can try to restate your inserts to have either char(13) + char(10) or at least char(10) in place of your original char(13).
0 Likes 0 ·
Rafael Colucci avatar image Rafael Colucci commented ·
@Oleg the same thing happens with Char(13) + Char(10).
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@Rafael Colucci This is a mystery. I just tested on my end and the behaviour is different, that is the line breaks are completely ignored, just like you expected:
-- insert 4 records: 1 with a fullblown linebreak
-- 2 with rubbish linebreaks and one - no break at all
insert into dbo.test(testfield) 
values 
    ('word test'), 
    ('word' + char(13) + 'test'), 
    ('word' + char(13) + char(10 )+ 'test'), 
    ('word' + char(10 )+ 'test');


-- this returns 4 records
declare @z varchar(20) = '"word' + char(13) + char(10) + 'test"';
select * from test where contains(testfield, @z);

-- this returns 4 records, so the breaks are indeed ignored
set @z = '"word test"';
select * from test where contains(testfield, @z);

-- no records returned (as expected)
set @z = '"wordtest"';
select * from test where contains(testfield, @z);
go

-- the results of first 2 statements:
recordID    testfield
----------- ---------
1           word test

2           word
            test

3           word
            test

4           word
            test
0 Likes 0 ·
Rafael Colucci avatar image Rafael Colucci commented ·
What language did you use?
0 Likes 0 ·
Rafael Colucci avatar image Rafael Colucci commented ·
Wow! It did not happened when I changed the language do english. This is happening only when I use the brazilian language!!!! The problem is I really need to use the brazilian ft.
0 Likes 0 ·
Show more comments

1 Answer

·
Fatherjack avatar image
Fatherjack answered
Can you add the Char(13) to your noise word list or stop list to see if this resolves your condition?
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
Still char(13) by itself is meaningless and it is best to avoid it if at all possible because it makes the front ends suffer for no good reason (necessitates addition of a silly replace code) :)
0 Likes 0 ·
Rafael Colucci avatar image Rafael Colucci commented ·
No, i cant add Char(13) as noise word. Sql does not allow that.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@Rafael Colucci It looks like the linefeed and carriage return are already included in the noise word list out of the box. Maybe your problem is somehow related to the index itself or the language settings. The sample I posted under your question works as expected, so I am puzzled :(
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.