How do we lower latency in transactional replication from publisher when running a recurring large batch job that grows the transaction log?
We just implemented a new job on a medium traffic database that uses transactional replication, that will process inputs from an Excel spreadsheet and call a stored procedure to create records within various tables in that DB. One of the fields involved is a text data type that can range from 10k-30k characters. Within the job, a custom sql server function parses out a url from the large html body and writes it to a table. Due to the intensive nature, or what seems to be that of the process, we only process 35 records or rows from the spreadsheet per job execution every half hour. What we are seeing is that the transaction log is growing at an alarming rate and is thus causing considerable latency from the publisher to the subscriber because the log reader agent scans the entire transaction log and finds 0 records for replication. To put it into perspective, the log sits around 30GB in size, and cannot be much smaller because running the job 5-6 times will grow past that size. We are doing frequent backups of the distribution database (every 5 mins) and the tran log every 15. We are in dire need of a solution where we could decrease the latency, or decrease the amount of reading (I/O) from the log. I realize that there could be work that needs to be done on the app/job side, but is there anything within the replication configuration that can be done to help with this latency and wasted scanning from the log? Any help would be much appreciated. Thank you.
First, Transactional replication is not the best for application that requires low latency, especially when you have text columns in the replication (true for all replication) and when you do bulk inserts. If the Insert on this table is only from the batch jobs, then you should remove this articles from the publication and consider SSIS for the data load. This SSIS package should be executed simultaneously at each publisher and subscriber. If this is a transactional table and the inserts are from different sources, you should consider a staging table to load your data initially and later you can move the records to the original table with small batches, this method limits latency. I doubt you can use system procedures like sp_repldone, sp_replcmds, sp_repltrans you can do some tweaks to mark the huge transactions as replicated and later you can export these marked transactions to the subscribers. I haven't tried this method, needed ASKSSC experts advice here.
Thank you for your quick response. To further the discussion, we actually are not replicating the large data types across, and we have to use transactional because it is needed for real-time reporting. I actually do not think that the traffic is very high at all through the replication pipeline from this database in terms of the number of transactions; however, because the job seems to be growing the transaction log a lot, which we are not sure why, the logreader agent gives the following message on all of the published articles: The Log Reader Agent is scanning the transaction log for commands to be replicated. Approximately 6000000 log records have been scanned in pass # 3, 0 of which were marked for replication, elapsed time 72448 (ms). This is indicative that it is scanning over the large 30GB file, but finds nothing to replicate. I think that the excessive overhead of the log growing is straining the logreader and it is basically doing a table scan operation. Consequently, when there is data to replicate, it takes longer to find it and commit it. The process effectively would perform a lot of small updates to various tables in the DB, but the large data types are not being used in the replication; they may be, however, taking a lot of processing power and or log space to parse out what is needed. I think exploring SSIS and staging tables is a great idea, and we will look into that route. Within an SSIS package though, I am not sure how we would get around the need to call that function in SQL to parse out the string we need from the text data type, and then we would still need to call a custom procedure sp_createJob to process the staged data to various tables. Thanks again!