Database and Datascience: Overview of Microsoft SQL Server Replication

Overview of Microsoft SQL Server Replication

Donatien Mbadi, Database Expert – dmbadi@blogspot.com

1. Introduction

Availability of databases and data relevance are critical for business, making backup and replication of databases a necessity.

The aim of this document is to present an overview of replication using Microsoft SQL Server. With Microsoft SQL Server replication, it is possible to create an identical copy of primary database or synchronize changes between multiple databases and maintain data consistency and integrity.

The replication procedure is quite the same event if you are using 2008, 2012, 2014, 2016 or 2019 Versions and SQL Server Management Studio (SSMS). We are covering:

- Snapshot Replication

- Transactional Replication

- Merge Replication

- Peer-to-peer Replication

2. SQL Server Replication Terminology

Before driving configuration and setting up Microsoft SQL Server replication, let’s first introduce the main terms and replication models.

- Articles are the basic units to be replicated, such as tables, procedures, functions or views. Multiple articles can be created for the same object and article cam be scaled vertically or horizontally.

- A publication is a logical collection of articles. This is the final set of entities from the database that are designated for replication.

- A filter is a set of conditions for an article. With SQL Server replication, you can use filters and select custom entities for replication. For instance, by filters, you can select only the most critical tables and fields and replicate only this data.

- A distributor is and Database instance configured for collecting transactions from publications and distribute them to subscribers.

- A publisher is the main database copy on which publication is configured, making data available to other servers that are configured to be used in the replication process.

- A subscriber is a database that receives the replicated data from a publication. One subscriber can receive data from one or more publishers and publication.

Note: A distributor acts as a database for storing replicated transactions. A distributor database can be considered as the publisher and the distributor at once. In the local model, a single instance runs both publisher and distributor. In the remote model, a remote distributor can be used when you want subscribers to get different publications.

- Agents are components that can act as background services and used to schedule automated execution jobs, such as Database backup or Database replication. There are five types of agents in Microsoft SQL Server:

o Snapshot Agent: It’s an executable file that prepares snapshot files containing schema and data of published tables and database objects.

o Log Reader Agent: it’s used for transactional replication which moves transactions marked for replication from the transaction log on the publisher to the distributor database.

o Distribution Agent: it’s used with snapshot and transactional replication. It applies the initial snapshot to the subscriber and moves transitions held in the distribution database to subscribers. It tuns at either distributor for push subscriptions or at the subscriber for pull subscriptions.

o Merge Agent: it’s used with merge replication. It applies the initial snapshot to the subscriber and moves incremental data changes that occur.

o Queue Reader Agent: it’s used with transactional replication with the queued updating option. It runs at the distributor and moves changes made at the subscriber bac to the publisher.

- Subscription is a request for a copy of a publication that must be delivered to the subscriber. It’s used to define the publication data that must be received and where and when data will be received. There are two types of subscription, push and pull.

o Push subscription: changed data is forcibly transmitted from distributor to the subscriber database; no request from the subscriber is needed.

o Pull subscription: changed data made on the publisher is requested by a subscriber, the agent runs on the subscriber side.

- Metadata is the data used to describe entities of the database.

3. SQL Server Replication Types

Replication is a technology for copying and synchronizing data between database continuously or regularly at scheduled intervals. Microsoft SQL Server replication can be:

- One way

- One-to-many

- Bidirectional

- Many-to-one

As mentioned in introduction, there are four Microsoft SQL Server replication types:

- Snapshot Replication

- Transactional Replication

- Merge Replication

- Peer-to-peer Replication (from 2016…)

We are going to describe in detail, each type of replication.

3.1. Snapshot replication

Snapshot replication distributes data exactly as it appears at a particular moment in time and it does not monitor updates to the data. During synchronization of the database, the entire snapshot is generated and then sent to the subscribers. The snapshot is simply an image of all the database object in the publication. No change tracking is performed for snapshot replication.

Two agents are used in this type of replication:

- Snapshot Agent

- Distribution Agent

The snapshot agent is responsible for creating files with the schema of the publication and the data. The files are temporarily stored in the snapshot folder on the distribution server, and the distribution tasks are recorded in the distribution database.

The work of the distribution agent is to move the schema and the data from the distributor to the subscribers.

The use of snapshot replication is most appropriate under any of the following circumstances:

- When data changes infrequently

- When it’s acceptable to have data copies that are out of date with the publisher for some time

- When you are replicating small volumes of data

- When a large volume of changes happens within a short period

For example, snapshot replication can be used when a sales organization maintains a product price list and the prices are all updated at the same time once or twice each year, replicating the entire snapshot of data after it has changed is recommended.

How it works?

1. Snapshot Agent establish a connection from distributor to publisher and generates fresh snapshot into snapshot folder by placing locks

2. Snapshot agent writes copy of the table schema for each article to .sch file. If other database objects are published, such as indexes, constraints, stored procedures, views and so on, additional scripts files are generated.

3. Copies data from published table at the publisher and writes data to the snapshot folder in the form of .bcp (Bulk copy Program) file

4. The snapshot agent appends rows to the MSrepl_commands and MSrepl_transactions tables in the distribution database. The entries in the MSrepl_commands table are commands indicating the location of .sch and .bcp files and any other snapshot files and references. The entries in the MSrepl_transactions table are relevant to synchronizing the Subscriber.

5. Releases any locks on published tables.

3.2. Transactional replication

Transactional replication is the periodic automated replication when the data is distributed from a master database to a database replica in real time or near-real-time. Transactional replication is more complex than snapshot replication because not only is the final state of database replicated, but all made transactions are also replicated, which makes it possible to monitor the entire transaction history on the database replica.

Transactional replication starts with a snapshot of the publication database objects and data. As soon the initial snapshot is taken, subsequent data changes and schema modifications made at the publisher are usually delivered to the subscriber as they occur (in near real time). The data changes are applied to the subscriber in the same order and within the same transaction boundaries as they are occurred at the publisher; therefore, within a publication, transactional consistency is guaranteed.

Three agents are used in this type of replication:

- Snapshot Agent

- Log Reader Agent

- Distribution Agent

The Log Reader Agent runs at the distributor; it typically runs continuously but can also run according to a schedule you establish. When executing, the Log Reader Agent first reads the publication transaction log and identifies any INSERT, UPDATE and DELETE statements, or other modifications made to the data in transactions that have been marked for replication. Next, the agent copies those transactions in batches to the distribution database at the distributor. The Log Reader Agent use internal stored procedure called sp_replcmds to get the next set of commands marked for replication from the log. The distribution database then becomes the store-and-forward queue from which changes are sent to subscribers, only committed transactions are sent to the distribution database. After the entire batch of transactions has been written successfully to the distribution database, it is committed. Following the commit of each batch of commands to the distributor, the Log Reader Agent calls sp_repldone procedure to mark where replication was last completed. Finally, the agent marks the rows in the transaction log that are ready to be purged. Rows still waiting to be replicated are not purged.

The Distribution Agent runs at the distributor for push subscriptions and at the subscriber for pull subscriptions. The agent moves transactions from the distributions database to the subscriber. Il a subscription is marked for validation, the Distribution Agent also checks whether data at the publisher and subscriber match.

The transaction replication is typically used in each of the following cases:

- You want incremental changes to be propagated to subscribers as they occur.

- The application requires low latency between the time changes are made at the publisher and the changes arrive at the subscriber

- The application requires access to intermediate data states. For example, if a row changes five times, transactional replication allows an application to respond to each change not simply the net data change to the row.

- The publisher has a very high volume insert, update and delete activity

- The publisher or subscriber is a non-SQL Server database, such as Oracle

By default, subscribers to transactional publications should be treated as READ-ONLY, because change are not propagated back to the publisher. However, transactional replication does offer options that allow updates at the subscriber.

How it works?

1. The snapshot Agent prepares snapshot files containing schema and data of published tables and other database objects, stores the files in the snapshot folder and records synchronization jobs in the distribution database on the distributor.

2. The Log Reader Agent monitors the transaction log of each database configured for transactional replication and copies the transactions marked for replication from the transaction log into the distribution database which acts as a reliable store-and-forward queue.

3. The Distribution Agent copies the initial snapshot files from the snapshot folder and the transactions held in the distribution database tables to subscribers.

4. Incremental changes made at the publisher flow to subscribers according to the schedule of the Distribution Agent, which can run continuously for minimal latency or at scheduled intervals.

3.3. Merge replication

Merge Replication is a method for copying and distributing data and database objects from one SQL Server to one or more SQL Servers and synchronizing the database for consistency. It’s a good solution to the problem of moving data between a central server and mobile clients that are only occasionally connected.

Merge replication is then a type of bidirectional replication that is usually used in server-to-client environments for synchronizing data across database servers when they cannot be connected continuously. When the network connection is established between both database servers, merge replication agents detect changes made on both databases and modify databases to synchronize and update their state. Merge replication is similar to transactional replication, but data is replicated from the Publisher to the Subscriber and inversely.

Two agents are used in this type of replication:

- Snapshot Agent

- Merge Agent

The Snapshot Agent takes one snapshot and put it in replica data folder the Merge Agent applies snapshot to the subscriber.

The Merge Agent merges any data changes that occurred at the publisher or subscriber after initial snapshot was taken.

Merge Replication is typically used in server-to-client environments. It’s appropriate in any of the following situations:

- Multiple subscribers might update the same date at various times and propagate those changes to the publisher and to other subscribers.

- Subscribers need to receive data, make changes offline and later synchronize changes with the publisher and other subscribers.

- Each subscriber requires a different partition of data

- Conflicts might occur and when they do, you need the ability to detect and resolve them.

- The application requires net data change rather than access to intermediate data states.

For example, merge replication can be used by multiple peer stores that work with a shared warehouse. Each store is permitted to change the information in the warehouse database and at the same time all stores must have the updated state of their databases after the shipment of goods or delivery of supplies to the warehouse. Merge replication can be used in cases where the updated information must be available for the main (or central) database and branch databases simultaneously.

How it works?

1. Data gets added from the publisher’s or the subscriber’s end

2. The tracking table tracks the data that has been changed

3. Merge Agent sends this data to the distribution database

4. The distribution database sends this data back to the Merge Agent

5. Merge Agent then sends this data to the publisher and subscriber(s)

3.4. Peer-to-peer replication

Peer to peer replication is a replication type where the publisher server replicates data to multiple subscriber servers at the same time. Peer to peer replication is useful for multiple data center locations across the globe. One centralized data center manages the data on the other data center data. It provides a scale-out and high availability solution by maintaining copies of data across multiple server instances, also referred as nodes. Peer to peer replication can help scale out an application that uses a database.

The concept of Peer to peer replication is based on transactional replication that circulates transaction data across multiple nodes. However, in Peer to peer replication, each node is treated as a publisher as well as subscriber that means, they can send and receive transaction to each other to synchronize data across all the nodes.

The main difference between Peer to peer replication and Merge replication is that for merge replication there is only one publisher and one and more subscribers, but on Peer to peer replication, all nodes are both publishers and subscribers. Secondly, Peer to peer replication is transactional which means it transmits transactionally consistent changes. In contrast, merge replication is trigger based. In the background implementation they also use different agents.

Requirements for configuring Peer to peer replication:

- Peer to peer replication is supported in the Enterprise Edition of SQL Server only

- Row level and column level filtering cannot ne used, which means each and every database needs to have all the data rows and columns

- There should be identical publication name on all participating nodes

- Each node should have its distribution database; this eliminates the risk of single point of failure

- Peer to peer subscriptions cannot be reinitialized; Restoration of backup should be done in that case

- We cannot add tables in multiple peers to peer publications in the same publication database.

4. Requirements for Microsoft SQL Server

- The following ports must be opened for inbound traffic:

TCP 1433, 1434, 2382, 2383, 135, 80, 443

UDP 1434

- Don’t forget to configure firewall and enable the appropriate ports for inbound traffic on each host before installing Microsoft SQL Server.

- Hosts attended in SQL Server replication must resolve each other by hostname.

- Before configuring SQL Server replication, the following software must be installed for Microsoft SQL Server:

o .NET Framework – A set of libraries

o Microsoft SQL Server – The database server software

o Microsoft SQL Server Management Studio (SSMS) – Software for managing SQL Server databases using Graphical User Interface (GUI)

Very Important:

If you want to configure SQL Server replication, you can use the second database server (where the subscriber is configured) of a version within two versions on the source database server on which the publisher is configured. For example, I you install Microsoft SQL Server 2016 on the first machine where the source database is located, replication will not work if you install Microsoft SQL Server 2008 on the second machine. The distributor can be configured on 2016, 2017 and 2019 and the subscriber can be configured on 2012 or more.

Next articles:

1. Step by Step to configure Snapshot replication

2. Step by Step to configure Transactional replication

3. Step by Step to configure Merge replication

4. Step by Step to configure Peer to peer replication

Database and Datascience

mercredi 2 décembre 2020

Overview of Microsoft SQL Server Replication

Aucun commentaire:

Enregistrer un commentaire

oracle

oracle