In the world of heterogeneous information systems many applications and business processes require to synchronize databases. The most challenge of sync procedure is to update modified data in near to real time way since both source and destination databases may be in use constantly.
Classical approach to database synchronization is based on full scanning the original and target databases to find the data being changed. Then necessary actions are applied over this data:
The problem of such method is long overall duration of synchronization process. For example, there are two databases - SQL Server and MySQL, both having over one million of rows. Average performance of classical synchronization is about 100 rows per second. It cannot be improved by using high speed database access components or techniques since this method does not allow bulk manipulations over the data. Consequently, each pass of classical synchronization takes more than 2,5 hours for considered databases that makes impossible to run it on hourly basis.
Another approach to the problem is to iterate only those rows that has been modified since the previous run of synchronization. It can be implemented according to these algorithm:
This technique is known as incremental or trigger-based database synchronization. Let us consider how incremental method may improve timeframe of database synchronization on the example above. Even with record-by-record processing, incremental synchronization can be run in near to real time mode if it is scheduled properly. But it can also implement bulk rows processing technique since all modified data is collected in the single place grouped by type of operation: insert, delete or update. Bulk rows processing gives more than 10 times higher performance of the synchronization process.
Incremental or trigger-based database synchronization applies the following requirements:
In order to evaluate benefits of incremental synchronization Microsoft SQL - MySQL Sync tool can be used.