October 2015 – Data Architecture

DATABASE BACKUP CONSIDERATIONS

Posted on 2015-10-15
/Under backups, storage

INTRODUCTION

Below is an overview of database backups. OS or user workstation file backups are not considered. The below information may be useful for non-database backups, but databases (which are data focused), are a special case of backups. This blog is meant to assist in creating a process. Don’t expect a suggestion of exactly how to implement your specific database backup process.

WHY BACKUP

I’ve heard Enterprise IT people that should know better say that database backups are not important because we have a cluster with the primary database in one data center, and a hot standby in another data center. Without backups, there would be no recovery if a disaster took out booth data centers, no recovery if a release to production failed and lost data, no recovery if a malicious hacker wiped the data, no recovery if a bug happened to delete the wrong data… That run-on sentence could run-on for as far as the imagination can see.

My latest catch phrases regarding database backups are “HA is not a backup”… “DR is not a backup”… “Remember ransomware (which just happens to be recently making the national news. Insert any malware, the paradigm still works)”.

With all my 35+ years of IT experience, I will tell you that if it’s a computer, and your data is important to you, you need to manage a backup process that involves multiple backups and off-site copies of that backup, regardless of HA (High Availability), DR (Disaster Recovery), period.

ENGINE VERSIONS REFERENCED

Database engine versions referenced below are versions published by the vendor within the last 10 years or so (ie, explicitly excluding Microsoft Server 2000 and prior, excluding MySQL 4x and prior, excluding Oracle 9x and prior, excluding Informix 7x and prior, excluding DB2 8x and prior). Amazon RDS (MySQL clone via SaaS (Software as a Service)) automated backups and DB Snapshots are also excluded from the discussion below as this style of backup is very rigid.

DATABASE BACKUP STRATEGY

Please note, I’m not all-knowing. I’ll not be able to design a backup system that will meet your needs. There will be some remaining questions that only someone that is working day-to-day with your business can answer. Things like, “how long do we need to keep this data”, or “how many minutes of data loss are we willing to risk given the effort involved in backing up”. I do provide some examples that have worked for me. See below.

GENERAL BACKUP INFORMATION

There is much ambiguity in the IT industry regarding the definition of ‘Incremental’ & ‘Differential’ backups. The below seems to be the most common definition, and the one I use below. Backup Types…

FULL Backup is a backup of all information (data, stored procedures, objects in general) in the database
DIFFERENTIAL backup is a backup of the information that has changed since the last Full backup
INCREMENTAL backup is a backup of the data that has changed since the last Full, Differential, or Incremental backup
TRANSACTION LOG (TLOG) backup is a backup of the transactions in the transaction log since the last Transaction Log backup, Incremental, Differential, or Full backup.
Informix refers to a FULL backup as a level 0 backup, DIFFERENTIAL as level 1 backups, etc.

Backup types…

Logical – records SQL & DDL, and the order that the SQL is to be applied
Physical – records database blocks

BACKUP INFORMATION SPECIFIC TO ENGINE

SQL Server is somewhat unique with these items…

All backups in SQL Server are Physical and “On-line” (the database can be used while users are active).
To back up Agent jobs and similar server (vs database) objects, backup the ‘msdb’ database
To back up users, backup the ‘master’ database. (note, restoring ‘master’ must be done from single user mode)
The Recovery Model is an option for each database. It indicates how the database handles transaction logs. The default for a new database is the value from the ‘model’ database. Recovery Models…

SIMPLE Recovery Model indicates that Point In Time restores cannot be accomplished.
FULL Recovery Model enables the Point In Time recovery.
BULK LOGGED Recovery Model means that Point In Time recovery may be available if a load operation (select into, BCP, etc) has not been performed since the last Full/Differential backup. Frequently databases will move between this Recovery Type and Full Recovery as needed. This recovery type will be ignored below. If this Recovery Model is used, the assumption is that the database will be changed back to FULL Recovery and a FULL backup then will be taken.

LOCKS

SQL Server does not lock DML nor DDL while backup is running. DROP DATABASE does lock though, applicable when restoring. Some engines will have concerns with DML and/or DDL locking as described above (specifically MySQL & MariaDB). Be sure you understand any locking issues with your specific engine. Test & monitor to ‘tune’ backups at your site.

PERFORMANCE

BACKUPS, in general, are designed to be as quick, efficient, and as unobtrusive as possible. Unless your server is way underpowered, backups will have no (or a very minimal) impact (some estimate between 5% – 20%) on the system, and only for the duration of the backup (which may be less than a minute). The IO subsystem may be affected slightly more than the CPU or RAM, even with encryption. With compression, you will see a slight CPU increase, but even so, backups in general will cause very minimal CPU & IO impact. Note, some compression algorithms are SAN de-duplication friendly (will compress during backups, and will dedup on the SAN as if the data has not been compressed), but in general, don’t turn off backup compression because someone tells you that your SAN dedups. SAN administrators like to see their dedup rate as high as possible, without taking into consideration that backup compression has the advantage of significantly reducing the size of the data transferred to the SAN (frequently a major IO bottleneck).

RESTORES, on the other hand, may take longer than the backups as there are many components that may need to be brought together and checked once restored. An example of what will need to be brought together is the relationship of TLOG backups to FULLS / DIFFERENTIALS / INCREMENTALS, DIFFERENTIALS to FULLS, INCREMENTALS to FULLS / INCREMENTALS, and having all the related backups available and specified to the restore. This means that you’ll need to restore a Full backup, then a specific Incremental, and a specific set of Tlogs. Most of the engines maintain that relationship, but the files need to be made available when the engine needs them. Third party software can make that task easier. See below.

To reduce the impact on the IO subsystem, it’s best to put the backup storage on its own device. In the era of SAN, NAS, RAID, this is not always possible. If it’s possible to dedicate a device to the backups, moving the backup’s storage would reduce the IO on the database IO subsystem by up to 50%. If the database containers are on SAN, consider placing the backups on NAS. This is a minor point most of the time, but is something to consider… especially if becomes IO an issue.

When storage is network based (NAS), separating the IO is best accomplished by adding a NIC (Network Interface Card) to the involved systems and creating a separate VLAN (Virtual Local Area Network) just for backups.

BACKUP RETENTION

This is another one of the items that is site specific. it addresses the question: how long should the backups be available? This needs to be balanced with 1) how much storage is available, 2) how long will the current version of the database engine be able to read the backups, 3) how long will a problem go unnoticed, 4) users always want their backups kept forever, 5) cost of adding backup storage, 6) if on tape, how long will the tape drives be available to read the tapes, etc.

If backups are to be retained beyond approximately a few years, be sure to keep copies of the database engine installer, and the operating system installer for the operating system on which the db engine would run. Some engines will not let you restore from a version of the engine older than one major release. Some are more liberal at three major releases. For example, SQL Server 2000 backups cannot be restored to SQL Server 2012 or SQL Server 2014, but can be restored to SQL Server 2005, SQL Server 2008, or SQL Server 2008R2.

MONITOR & AUTOMATED RERUN

See ‘SOFTWARE’ below. This is noted here to highlight the need.

COMPRESS & ENCRYPT

This is a site specific consideration. Unless the server is severely underpowered with respect to CPU, and does not affect the IO subsystem, always compress database backups (see above). Seriously consider encrypting all database backups, especially if any of your data are subject to the following standards: HIPAA, PCI DSS, FISMA, GLBA, ISO 2701, FACTA, PIPEDA, ECHR, ECPA, etc.

CATEGORIES

Categorize your backups into a few site specific categories. For example: DATA WAREHOUSE, TRANSACTIONAL PROD, and NON-PROD. When categorizing, consider the following…

Size of data as it affects backup times of FULL BACKUPS
How often the data changes as it affects DIFFERENTIAL/INCREMENTAL/TLOG backup times
How important is the data (main credit card processing system vs internal development version of the report writing system). This may affect retention, Recovery Model, etc.

OFF SITE

Consider always having a copy of your backups moved off-site after a more current backup has completed successfully.

Note, consider encrypting your backups prior to moving off site. See above.

BACKUPS

By definition, FULL backups back up the entire database. They serve as the basis for any RESTORE. After the FULL is taken, DIFFERENTIALS may be taken. Each DIFFERENTIAL backup is associated with one and only one FULL backup.

To use that DIFFERENTIAL backup to RESTORE, the FULL backup must first be restored, then the DIFFERENTIAL is optionally ‘applied’. Transaction Log (Tlog) backups that occur after the most recently restored FULL or DIFFERENTIAL backup may be used to roll the database forward to a specific Point In Time. A database schedule considers what’s involved in a RESTORE when determining a BACKUP schedule. See below.

Incremental vs Differential

The above assumes that just FULLS & DIFFERENTIAL were taken. If INCREMENTALS are also taken, the backup/restore process would look like this… After the FULL is taken, DIFFERENTIALS may be taken, then INCREMENTALS may be taken. Each INCREMENTAL backup is associated with either: one and only one FULL backup or the prior INCREMENTAL ‘chain’. To use the INCREMENTALS to restore, the FULL BACKUP must first be restored, the DiFFERENTIAL restored, and then each of the INCREMENTALS in the backup ‘chain’ would need to be restored in order.

BACKUP SCHEDULE

This is another site specific consideration. The team that is responsible for the data should, for each of the CATEGORIES, formulate a BALANCE of the following that should accommodate near 100% of the needs…

Recovery Model
Backup type (FULL, DIFFERENTIAL, INCREMENTAL, TLOG)
Retention period
Scheduled time when the backup should be run. It should be a time when write activity to the database is low, or just after a significant change.
Effort involved in restoring vs resources available for the backup (ie, a FULL backup usually uses more resources than a DIFFERENTIAL).

There may be other considerations as to when to run backups. For example, one may wish to back up a data warehouse immediately after the monthly load. These ‘abnormally scheduled’ (one-off) database backups should be broken out into separate jobs that can be submitted or triggered by a successful load.

The schedule should accommodate near 100% of the needs. There may be a few database that have additional backup needs beyond those broad needs defined here. Those should be a separately scheduled job to backup the specific database or server. That job most likely will simply preform an additional Differential or a Tlog backup.

With a backup schedule, a sample may be worth a thousand words, or at least provide a suggestion. See below for a SQL Server example.

SAMPLE SCHEDULE

Here is a sample schedule for database backups.

Category=Production Transactional Important

Full=Weekly

Differential=Daily

Tlogs=10 mins*

Retain=weekly for three months, daily for two weeks, and Tlogs for five days

Category=Production Transactional Not-so-important

Full=Weekly

Differential=Daily

Tlogs=1 hour*

Retain=weekly for three months, daily for two weeks

Category=Non-Production Transactional

Full=2x/Month

Differential=None

Tlogs=none

Retain=2x/Month for two months

Category=Data Warehouses / Big Data

Full=Monthly

Differential=Weekly per ETL

Tlogs=none

Retain=Monthly for three months, weekly for five weeks

Generally Monthly backups will be submitted the first Friday of the month at 7:00 PM and will have the weekend to run.
Generally weekly backups will be submitted after 7:00 PM Friday and will have the weekend to run (exclude the 1st Fri if conflict).
Generally Daily backups will be submitted Mon – Thu after 7:00 PM.
Generally Tlog backups will be submitted on TBD schedule, as required to fulfill SLA, not to exceed 2 hours.

* with Tlog (Transaction Log) backups, the database will use Full Recovery Model. All other databases will use Simple Recovery Model

TEST

Backups should be tested by restoring a database from each category to another server. The test should be implemented any time the database engine is upgraded, the backup/restore program is upgraded, or other major changes are made to the backup software. At a minimum, test annually.

SOFTWARE

In the above, there was no mention of software to assist the backup process. That evaluation is left as an excessive for the reader. Backup software will implement and manage many of the decisions made using the above as a guide. Some things to contemplate…

Backup software should not only handle the backup of databases (the easy part), but be able to automatically…

schedule backups
verify backups to insure an exact copy was obtained or that an exact copy was restored
compress and/or encrypt backups prior to hitting the wire, and the converse when restoring
resubmit a failed backup automatically
parallelize backups (ie, mysqldumper)
report backup failures (via SNMP, email, etc)
manage backup dependencies (tlog related to incremental, which is related to the full)
provide GUI/Wizards for managing backups & restores, interactive help & other prompts/information
central administration of backups & restores
ability to kill connections to database prior to backups or restores (especially restores)
work with different medias, tape, cloud, SAN, NAS, local disk, etc
be HA aware

and most importantly, be able to effectively assist in restoring all or part of a database!

A prior employer had software that, at the time, was not so easy to use to restore a database. It frequently could not find backup sets required for restores. There was a running joke… “they don’t call it NetRestore for a reason.”

REFERENCE

SAN Deduplication, http://www.brentozar.com/archive/2009/11/why-dedupe-is-a-bad-idea-for-sql-server-backups/

Troy Frericks.

blog 14-Oct-2015

HELLO WORLD

Posted on 2015-10-02
/Under DataArchitecture

Hello world. Can you hear me now? What other trite expressions can I use to start a new blog?

Moving past that, probably the best way to start a new blog is with a quick introduction of self and the blog. So, here goes…

I’m an IT professional and have been in the field since high school. I’ve been interested in electronics and computers prior to high school, but it was my first day that really launched a career. With anticipation, I sought out the “computer room” that first day. From that point forward, every morning before school, every afternoon after school, and every study hall was spent there. There was a small room, about the size of a large walk-in closet; maybe 1/3 the size of an average classroom. There were three terminals in the room that connected to computers at AEA 10, I believe. One of the teletypes was an Teletype 33. It connected to a PDP 8 by a 110 baud modem. The other two terminals were were DECwriters. Those were connected to an HP 3000 by a 300 baud modem.

So, what’s this have to do with launching a career? Well, one morning, on one of the first days in high school, I was in the “computer room” working on a Basic program to calculate how much one could make in a month if they had a job that paid a penny on the first day, and each day the pay doubled. Mr Kibby copied from a scrap of paper onto the chalkboard “Help Wanted. Need someone with computer skills to help with IBM 360 mainframe. Call Marilyn at…”. I signed out of the HP 3000. Waited for the modem to disconnect, picked up the phone and dialed. I had a geometry class, then a study hall. I was 16 years old, so I could drive… and I had a car! I drove the 10 blocks, interviewed for 45 minutes, and had a part-time evening job starting that evening. My first real IT job!

While obtaining a degree at Iowa State University, I worked for the Computation Center as an operator, then as a programmer. I absolutely loved the VAX and VAX VMS. VMS was the first C2 certified operating system. As I was in a position that enforced security on the VAX, I became very interested in the internals of VMS and security in general.

Enough about me for now. I promised ‘quick’, so to honor that, let me conclude by saying that this blog is for techies. It’ll get in depth with enterprise databases… Things like redo logs, internals, backups, etc. I’ll touch on many engines, but will limit to: Oracle, DB2, SQL Server, MariaDB/MySQL, Informix, Sybase, and PostgreSQL, even Data Warehouses/Marts, Hadoop/DataLakes, ETLs. There will be information about Cyber-security, Database Administration, Data Architecture, Sec/DevOps. Given that I’m also subject to an occasional tangent, I’ll occasionally write about Linux, Windows, Android, IOS, Java, C, PowerBuilder, encryption, data in general and especially data security, etc.

My goal… for me is to document for later reference… for you is to inform. I’m open to constructive replies/comments as we all have something to learn.

Troy Frericks.

blog 2-Oct-2015