Table of Contents
This User Guide will introduce both basic and advanced concepts in the configuration of SymmetricDS. By the end of this chapter, you will have a better understanding of SymmetricDS' capabilities, and many of its basic concepts.
SymmetricDS is an asynchronous data replication software package that supports multiple subscribers and bi-directional synchronization. It uses web and database technologies to replicate tables between relational databases, in near real time if desired. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage. The software can be installed as a standalone process, as a web application in a Java application server, or it can be embedded into another Java application.
A single installation of SymmetricDS attached to a target database is called a node. A node is initialized by a properties file and is configured by inserting configuration data into a series of database tables. It then creates database triggers on the application tables to be synchronized so that database events are captured for delivery to other SymmetricDS nodes.
In most databases, the transaction id is also captured by the database triggers so that the insert, update, and delete events can be replicated transactionally via the transport layer to other nodes. The transport layer is typically a CSV protocol over HTTP or HTTPS.
SymmetricDS supports synchronization across different database platforms through the concept of Database Dialects. A Database Dialect is an abstraction layer that SymmetricDS interacts with to insulated the main synchronization logic from database-specific implementation details.
SymmetricDS is extendable through extension points. Extension points are custom, reusable Java code that are configured via XML. Extension points hook into key points in the life-cycle of a synchronization to allow custom behavior to be injected. Extension points allow custom behavior such as: publishing data to other sources, transforming data, and taking different actions based on the content or status of a synchronization.
The idea of SymmetricDS was born from a real-word need. Several of the original developers were, several years ago, implementing a commercial Point of Sale (POS) system for a large retailer. The development team came to the conclusion that that the software available for trickling back transactions to corporate headquarters (frequently known as the 'central office' or 'general office') did not meet the project needs. The list of project requirements made finding the ideal solution difficult:
Sending and receiving data with up to 2000 stores during peak holiday loads.
Supporting one database platform at the store and a different one at the central office.
Synchronizing some data in one direction, and other data in both directions.
Filtering out sensitive data and re-routing it to a protected database.
Preparing the store database with an initial load of data from the central office.
The team ultimately created a custom solution that met the requirements and led to a successful project. From this work came the knowledge and experience that SymmetricDS benefits from today.
At a high level, SymmetricDS comes with a number of features that you are likely to need or want when doing data synchronization. A majority of these features were created as a direct result of real-world use of SymmetricDS in production settings.
After a change to the database is recorded, the SymmetricDS nodes interested in the change are notified. Change notification is configured to perform either a push (trickle-back) or a pull (trickle-poll) of data. When several nodes target their changes to a central node, it is efficient to push the changes instead of waiting for the central node to pull from each source node. If the network configuration protects a node with a firewall, a pull configuration could allow the node to receive data changes that might otherwise be blocked using push. The frequency of the change notification is configurable and defaults to once per minute.
In practice, much of the data in a typical synchronization requires synchronization in just one direction. For example, a retail store sends its sales transactions to a central office, and the central office sends its stock items and pricing to the store. Other data may synchronize in both directions. For example, the retail store sends the central office an inventory document, and the central office updates the document status, which is then sent back to the store. SymmetricDS supports bi-directional or two-way table synchronization and avoids getting into update loops by only recording data changes outside of synchronization.
SymmetricDS supports the concept of channels of data. Data synchronization is defined at the table (or table subset) level, and each managed table can be assigned to a channel that helps control the flow of data. A channel is a category of data that can be enabled, prioritized and synchronized independently of other channels. For example, in a retail environment, users may be waiting for inventory documents to update while a promotional sale event updates a large number of items. If processed in order, the item updates would delay the inventory updates even though the data is unrelated. By assigning changes to the item tables to an item channel and inventory tables' changes to an inventory channel, the changes are processed independently so inventory can get through despite the large amount of item data.
Channels are discussed in more detail in Section 3.5, “Choosing Data Channels”.Many databases provide a unique transaction identifier associated with the rows that are committed together as a transaction. SymmetricDS stores the transaction identifier, along with the data that changed, so it can play back the transaction exactly as it occurred originally. This means the target database maintains the same transactional integrity as its source. Support for transaction identification for supported databases is documented in the appendix of this guide.
Using SymmetricDS, data can be filtered as it is recorded, extracted, and loaded.
Data routing is accomplished by assigning a router type to a ROUTER configuration.
Routers are responsible for identifying what target nodes captured changes should be delivered to. Custom
routers are possible by providing a class implementing IDataRouter
.
As data changes are loaded in the target database, a class implementing IDataLoaderFilter can change the data in a column or route it somewhere else. One possible use might be to route credit card data to a secure database and blank it out as it loads into a centralized sales database. The filter can also prevent data from reaching the database altogether, effectively replacing the default data loading process.
Columns can be excluded from synchronization so they are never recorded when the table is changed. As
data changes are loaded into the target database, a class implementing
IColumnFilter
can remove a column altogether from the synchronization. For example, an employee table may be
synchronized to a retail store database, but the employee's password is only synchronized on the
initial insert.
As data changes are extracted from the source database, a class implementing the
IExtractorListener
interface is called to filter data or route it somewhere else. By default, SymmetricDS provides a
handler that transforms and streams data as CSV. Optionally, an alternate implementation may be
provided to take some other action on the extracted data.
By default, SymmetricDS uses web-based HTTP or HTTPS in a style called Representation State Transfer (REST). It is
lightweight and easy to manage. A series of filters are also provided to enforce authentication and to restrict
the number of simultaneous synchronization streams. The
ITransportManager
interface allows other transports to be implemented.
Administration functions are exposed through Java Management Extensions (JMX) and can be accessed from the Java JConsole or through an application server. Functions include opening registration, reloading data, purging old data, and viewing batches. A number of configuration and runtime properties are available to be viewed as well.
SymmetricDS also provides functionality to send SQL events through the same synchronization mechanism that is used to send data. The data payload can be any SQL statement. The event is processed and acknowledged just like any other event type.
SymmetricDS is written in Java 5 and requires a Java SE Runtime Environment (JRE) or Java SE Development Kit (JDK) version 5.0 or above.
Any database with trigger technology and a JDBC driver has the potential to run SymmetricDS. The database is abstracted through a Database Dialect in order to support specific features of each database. The following Database Dialects have been included with this release:
MySQL version 5.0.2 and above
Oracle version 8.1.7 and above
PostgreSQL version 8.2.5 and above
Sql Server 2005
HSQLDB 1.8
H2 1.x
Apache Derby 10.3.2.1 and above
IBM DB2 9.5
Firebird 2.0 and above
See Appendix C, Database Notes, for compatibility notes and other details for your specific database.
SymmetricDS 2 builds upon the existing SymmetricDS 1.x software base and incorporates a number of architectural changes and performance improvements. If you are brand new to SymmetricDS, you can safely skip this section. If you have used SymmetricDS 1.x in the past, this section summarizes the key differences you will encounter when moving to SymmetricDS 2.
The first significant architectural change involves SymmetricDS's use of triggers. In 1.x, triggers capture and record data changes as well as the nodes to which the changes must be applied as row inserts into the DATA_EVENT table. Thus, the number of row-inserts grows linearly with the number of client nodes. This can lead to an obvious performance issue as the number of nodes increases. In addition, the problem is made worse at times due to synchronizing nodes updating the same DATA_EVENT table as part of the batching process while the row-inserts are being created.
In SymmetricDS 2, triggers capture only data changes, not the node-specific details. The node-specific row-inserts are replaced with a new routing mechanism that does both the routing and the batching of data on one thread. Thus, the real-time inserts into DATA_EVENT by applications using synchronized tables have been eliminated, and database performance is therefore improved. The database contention on DATA_EVENT has also been eliminated, since the router job is the only thread inserting data into that table. The only other access to the DATA_EVENT table is from selects by synchronizing nodes.
As a result of these changes, we gain the following benefits:
data_event
table, unlike the possible contention in 1.X.
Because routing no longer takes place in the SymmetricDS database triggers, a new mechanism for routing was needed. In
SymmetricDS 1.x, the node_select
expression was used for specifying the desired data routing. It was a SQL
expression that qualified the insert into DATA_EVENT from the SymmetricDS triggers. In SymmetricDS 2 there is a
new extension point called the data router. Data routers are configured in the router table with a router_type
and
a router_expression
. Several different routers have been provided to serve the majority of users' routing
needs, but the framework is in place for a SymmetricDS programmer to develop domain- or application-specific
routers. See Section 4.6.2, “Router” for a complete list of provided routers.
Since the routing and capturing of data are now performed with two separate mechanisms, the two concepts have been separated into separate configuration tables in the database, with a join table (TRIGGER_ROUTER) specifying the relationships between routing (ROUTER) and capturing of data (TRIGGER). This solves a long standing issue with some databases which only allow one trigger per table. On those database platforms, we can now route data in multiple directions since we only require one SymmetricDS trigger to capture data. This also helps performance in those scenarios, since we only capture the data once instead of once per routing instance.
As part of the new routing job, we have introduced another new extension point to allow more flexibility in the way data events get batched. A batch is the unit by with captured data is sent and committed on target nodes. In SymmetricDS 2, batching is now configured on the channel configuration table. This provides additional flexibility for batching:
Another significant change to note in SymmetricDS 2 is the removal of the incoming and outgoing batch history tables. This change was made because it was found that over 95% of the time the statistics the end user truly wanted to see were those for the most recent synchronization attempt, not to mention that the outgoing batch history table was difficult to query. The most valuable information in the batch history tables, the batch statistics, have been moved over to the batch tables themselves. The statistics in the batch tables now always represent the latest synchronization attempt.