Introduction
Many times when working with customers, we hear "We were told never to use Oracle GoldenGate (OGG) to do initial loads, so how do we load our non-Oracle data to Oracle?" It is true that native database utilities can be used to make copies of large databases faster than OGG in a homogeneous (like-to-like) implementation. However, when loading data of dissimilar structure OGG may be the best, and possibly only, data load solution. The key is to use some of the more powerful, and advanced features of the product to perform parallel data extraction, conversion, and apply. In this article we shall present examples of these features to perform an Oracle Database to MySQL initial load. Although the database queries and OGG configuration are specific for these two databases, the concepts presented may be used for any initial load process.The scripts and information provided in this article are for educational purposes only. They are not supported by Oracle Development or Support, and come with no guarantee or warrant for functionality in any environment other than the test system used to prepare this article.
Main Article
Initial load is the cornerstone of all successful OGG implementations. As such, sufficient time for planning and testing must be allotted to determine the correct target database table definitions and optimal OGG settings.Source to Target Table DDL
This important task can be cumbersome because there are no utilities provided by Oracle for converting DDL between the various types of databases. Things to double check include primary key or unique index definitions, foreign key constraints, data type support, reserved words, and default precisions. For example, in our source Oracle 11g database we have several tables with columns defined as TIMESTAMP. The default for Oracle TIMESTAMP includes a six digit fractional seconds precision value, which correlates to TIMESTAMP(6). The default for MySQL is to not include the fractional seconds precision, which correlates to TIMESTAMP(0). To ensure the Oracle TIMESTAMP is correctly applied the target MySQL columns must be specified as TIMESTAMP(6), otherwise the fractional seconds data will be truncated resulting in a potential out of sync condition between the source and target databases.Source Data Size
Query the database to get the source data size; that is; the number of rows per table, average row length, and total bytes per table. This information will be used to determine the number of OGG initial load data extraction processes required. To get this information for my source database I ran the following query (table statistics must be current for this query to work):set lin 300; COL TABLE_NAME FORMAT A30; COL NUM_ROWS FORMAT 999,999,999; COL AVG_ROW_LEN FORMAT 999,999,999; COL TOTAL_BYTES FORMAT 999,999,999,999,999; SELECT a.table_name, a.num_rows, a.avg_row_len, b.total_bytes FROM (SELECT table_name, num_rows, avg_row_len FROM user_tables) a, (SELECT SUM (BYTES) total_bytes, segment_name AS table_name FROM user_extents WHERE segment_type = 'TABLE' GROUP BY segment_name) b WHERE a.table_name = b.table_name order by num_rows;Which returns the following:
TABLE_NAME NUM_ROWS AVG_ROW_LEN TOTAL_BYTES ------------------------------ ------------ ------------ -------------------- NEXT_ORDER 1 8 65,536 NEXT_CUST 1 7 65,536 ORDERS_TOTAL 3 54 65,536 CATEGORIES 21 44 65,536 CATEGORIES_DESCRIPTION 21 18 65,536 PRODUCTS 28 86 65,536 PRODUCTS_DESCRIPTION 28 562 65,536 PRODUCTS_TO_CATEGORIES 28 9 65,536 CUSTOMERS_LKUP 36 46 65,536 CUSTOMERS_INFO 1,089 37 131,072 CUSTOMERS 1,089 74 131,072 ORDERS 30,001 311 11,534,336 ORDERS_STATUS_HISTORY 30,002 63 3,145,728 ORDERS_PRODUCTS 389,733 59 28,311,552 NOPKTBLE 500,000 30 18,874,368 MILLITBLE 1,999,833 408 965,738,496 MILLTBLEA 2,000,000 1,188 3,019,898,880 17 rows selected.
TCP/IP Bandwidth Delay Product
The TCP/IP Bandwidth Delay Product (BDP) is used to determine the optimal setting of the OGG TCP Buffer. Use the "ping" utility to send 10 data packets to the target. This will give us the average round trip time (RTT)[lpenton@oelr5u7 ~]$ ping oelmysql -c 10 PING oelmysql (192.168.106.30) 56(84) bytes of data. 64 bytes from oelmysql (192.168.106.30): icmp_seq=1 ttl=64 time=0.667 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=2 ttl=64 time=0.550 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=3 ttl=64 time=0.805 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=4 ttl=64 time=0.811 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=5 ttl=64 time=0.656 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=6 ttl=64 time=0.873 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=7 ttl=64 time=0.892 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=8 ttl=64 time=0.684 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=9 ttl=64 time=0.868 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=10 ttl=64 time=0.894 ms --- oelmysql ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9006ms rtt min/avg/max/mdev = 0.550/0.770/0.894/0.115 msCompute the network bandwidth between the two servers, in bytes and multiply the quotient by the Average RTT to get the BDP value. The link between my servers is 1,000 megabits, so my BDP value is computed as follows.
BDP = (1,000,000,000 / 8) * .00077
BDP = 96,250 bytes
The optimal OGG TCP Buffer setting is three times the BPD:
96,250 * 3 = 288,750 bytes.
It should be noted that for non-Windows systems, the source and target server Network Stack default settings may not be sufficient to support this buffer size. Additional tuning at the server OS level may be required to improve TCP/IP performance, which is beyond the scope of this article.
Initial Load Using OGG Trails
Since the majority of initial load problems are target side related, for recoverability, the best practice is to use OGG Trail files to hold the initial load data. In the the following sections we shall discuss configuration options for this load method.Source Side OGG Extract Configuration
Since we have all of the basic information we need about the source side and communications bandwidth to the target, we can now do an initial OGG configuration. Based upon the number of rows and row length data of each table, we are going to create four Extracts that will read directly from the tables. The Extracts will be set to read from the tables as follows:Extract | Table Read |
---|---|
TMYSQL | CATEGORIES, CATEGORIES_DESCRIPTION, CUSTOMERS, CUSTOMERS_INFO, CUSTOMERS_LKUP, NEXT_CUST, NEXT_ORDER, NOPKTBLE, ORDERS, ORDERS_PRODUCTS, ORDERS_STATUS_HISTORY, ORDERS_TOTAL, PRODUCTS, PRODUCTS_DESCRIPTION, PRODUCTS_TO_CATEGORIES |
TMYSQLA | MILLTBLE |
TMYSQLC | MILLTBLEA |
TMYSQLD | MILLTBLEA |
userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 reportcount every 60 seconds, rate
In the above settings, the clear text password is displayed for demonstration purposes only. Best practice dictates that database access passwords be encrypted. The RMTHOST designation also contains settings for the Extract's TCP buffer size. REPORTCOUNT is set for demonstration purposes only. If set too low this parameter will have a negative performance impact because it induces a slight delay while each Extract process writes statistics to its report file.
Initial load is the only time OGG Extracts should be configured to immediately transmit data to the target via TCP/IP. Primary Extract processes that read database transaction logs must write to a local OGG Extract Trail, which is then read by an Extract Data Pump for transmission to the target. Configuring the Primary Extract to transmit captured data over TCP/IP will result in sever performance issues and other failures if your network is not up 100% of the time.
We will use an OGG RMTFILE to store records to be applied to the target table by Replicat. To determine the size requirements of each RMTFILE, compute the size requirements for the OGG record header by multiplying the number of rows by 50. Add the product to the total bytes value gathered for each table previously.
The RMTFILE parameter setting for each Extract will be:
Extract | Parameter Setting |
---|---|
TMYSQL | rmtfile ./dirdat/tm, maxfiles 999999, megabytes 190, append |
TMYSQLA | rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, append |
TMYSQLC | rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, append |
TMYSQLD | rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, append |
TMYSQL | TMYSQLA |
---|---|
extract tmysql userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tm, maxfiles 999999, megabytes 190, append reportcount every 60 seconds, rate tableexclude east.millitble tableexclude east.milltblea table east.*; | extract tmysqla userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, append reportcount every 60 seconds, rate table east.millitble; |
SQL> select median (cola) as COLA from milltblea;
COLA ------------------------ 499,289,425,723,612,740
TMYSQLC | TMYSQLD |
---|---|
extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, append reportcount every 60 seconds, rate table east.milltblea, filter (cola < 499289425723612740); | extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, append reportcount every 60 seconds, rate table east.milltblea, filter (cola >= 499289425723612740); |
GGSCI (oelr5u7) 37> add extract tmysql, sourceistable EXTRACT added. GGSCI (oelr5u7) 38> add extract tmysqla, sourceistable EXTRACT added. GGSCI (oelr5u7) 39> add extract tmysqlc, sourceistable EXTRACT added. GGSCI (oelr5u7) 40> add extract tmysqld, sourceistable EXTRACT added.
Test The Extract Configuration
Use the GGSCI command "START EXTRACT" to start each group, test the configuration, and get the data read rates. The GGSCI commands "INFO * TASK" and "STATUS * TASK" maybe used to get process status information. Since we specified the REPORTCOUNT parameter for each Extract, the command "VIEW REPORT [extract name]" maybe used to see real-time processing rates. This initial configuration results in the following runtime results:Extract | Total Runtime (secs) | Total Bytes | Bytes Per Second |
---|---|---|---|
TMYSQL | 29 | 182,975,913 | 6,309,514.24 |
TMYSQLA | 70 | 1,078,996,362 | 15,414,233.74 |
TMYSQLC | 531 | 1,319,636,259 | 2,485,190.69 |
TMYSQLD | 529 | 1,318,984,455 | 2,493,354.36 |
Complete The Source Side Configuration
To complete the source side set-up: (1) use the OGG DEFGEN utility to create source database defines, (2) copy the DEFGEN output to the target server, (3) configure the Primary Extract, and (4) configure the Extract Data Pump(s). Prior to starting the initial load be sure to terminate any long running transactions and start the Primary Extract and Extract Data Pumps.Configure The Target Side OGG Replicats
Because Replicat must build and execute DML statements from the data read by Extract, it is much slower. Therefore, we will need multiple Replicat processes to load the target database quickly. To illustrate this lets look at one load stream. The source TMYSQL Extract reads all of the smaller tables:TABLE_NAME NUM_ROWS AVG_ROW_LEN TOTAL_BYTES ------------------------------ ------------ ------------ -------------------- NEXT_ORDER 1 8 65,536 NEXT_CUST 1 7 65,536 ORDERS_TOTAL 3 54 65,536 CATEGORIES 21 44 65,536 CATEGORIES_DESCRIPTION 21 18 65,536 PRODUCTS 28 86 65,536 PRODUCTS_DESCRIPTION 28 562 65,536 PRODUCTS_TO_CATEGORIES 28 9 65,536 CUSTOMERS_LKUP 36 46 65,536 CUSTOMERS_INFO 1,089 37 131,072 CUSTOMERS 1,089 74 131,072 ORDERS 30,001 311 11,534,336 ORDERS_STATUS_HISTORY 30,002 63 3,145,728 ORDERS_PRODUCTS 389,733 59 28,311,552 NOPKTBLE 500,000 30 18,874,368These sixteen tables contain a total of 952,109 rows of data; which the Extract was able to read and send to the target in 29 seconds. A single Replicat took 4 minutes and 13 seconds to apply the data, or 8.7 times the total read time. Using this figure as a baseline, we can extrapolate that our larger tables will require multiple Replicats to apply the data in a timely fashion. As we saw with the Extracts, this is where the MAP parameter option FILTER and column conversion function @RANGE comes to our aid. Using this information, we can set a test target side configuration. The Replicat configuration used to apply all of the smaller table data is:
LMYSQL |
---|
replicat lmysql sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysql.dsc, purge reportcount every 60 seconds, rate overridedups end runtime map east.*, target east.*; |
LMYSQLA | LMYSQLB |
---|---|
replicat lmysqla sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqla.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLITBLE, TARGET east.MILLITBLE, filter (@RANGE (1,2)); | replicat lmysqlb sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlb.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLITBLE, TARGET east.MILLITBLE, filter (@RANGE (2,2)); |
LMYSQLC | LMYSQLD |
---|---|
replicat lmysqlc sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlc.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, FILTER (COLA < 249644712861806000) ; | replicat lmysqld sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqld.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, FILTER (COLA >= 249644712861806000) ; |
LMYSQLE | LMYSQLF |
---|---|
replicat lmysqle sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqle.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, FILTER (COLA < 748934138585419000) ; | replicat lmysqlf sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlf.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, filter (COLA >= 748934138585419000) ; |
GGSCI (oelmysql) 1> dblogin sourcedb east@oelmysql, userid gguser, password Oracle1 Successfully logged into database. GGSCI (oelmysql) 2> info checkpointtable No checkpoint table specified, using GLOBALS specification (east.checkpoint)... Checkpoint table east.checkpoint created 2013-11-06 04:44:30. GGSCI (oelmysql) 3> add replicat lmysql, exttrail ./dirdat/tm REPLICAT added. GGSCI (oelmysql) 4> add replicat lmysqla, exttrail ./dirdat/tn REPLICAT added. GGSCI (oelmysql) 5> add replicat lmysqlb, exttrail ./dirdat/tn REPLICAT added. GGSCI (oelmysql) 6> add replicat lmysqlc, exttrail ./dirdat/tq REPLICAT added. GGSCI (oelmysql) 7> add replicat lmysqld, exttrail ./dirdat/tq REPLICAT added. GGSCI (oelmysql) 8> add replicat lmysqle, exttrail ./dirdat/tr REPLICAT added. GGSCI (oelmysql) 9> add replicat lmysqlf, exttrail ./dirdat/tr REPLICAT added.
Test The Replicat Configuration
Use the GGSCI command "START EXTRACT" to start each source side group, then on the target side use the command "START REPLICAT" to start each of the Replicat load processes to test the configuration, and get the data read rates. Remember not to start any Replicats where @RANGE or FILTER have been specified until after the source side Extracts complete their processing. The GGSCI commands "INFO L*" and "STATUS L*" maybe used to get process status information. Since we specified the REPORTCOUNT parameter for each Replicat, the command "VIEW REPORT [replicat name]" maybe used to see real-time processing rates. This initial configuration results in the following runtime results:Replicat | Total Runtime (secs) | Total Records | Records Per Second |
---|---|---|---|
LMYSQL | 124 | 502,258 | 4,050.47 |
LMYSQLA | 2,177 | 1,002,367 | 460.44 |
LMYSQLB | 2,158 | 997,466 | 462.22 |
LMYSQLC | 1,360 | 500,331 | 367.89 |
LMYSQLD | 1,354 | 499,669 | 369.03 |
LMYSQLE | 1,389 | 497,912 | 358.47 |
LMYSQLF | 1,390 | 502,088 | 361.21 |
OGG Direct Load
Using OGG Direct Load, where Extract reads from the source tables and sends this data via TCP/IP directly to the target Replicat, may be faster for smaller database loads. However, there are limitations with the Direct Load method:- LOB data types are not supported.
- There is no recoverability. Any failure in the load mechanism requires a truncate of the target table and reload from the source.
- TCPBUFSIZE and TCPFLUSHBYTES settings cannot be configured in the Extract. The default setting of 30,000 bytes will be used.
- Because there is a one-to-one relationship between the Extract and Replicat, the FILTER and @RANGE options may only be used in Extract.
Extract | Replicat | Total Bytes | Load Time (secs) | Bytes Per Second |
---|---|---|---|---|
extract tmysql userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysql tableexclude east.millitble tableexclude east.milltblea tableexclude east.nopktble table east.*; | replicat lmysql sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysql.dsc, purge overridedups map east.*, target east.*; | 112,297,398 | 174 | 408,752.71 |
extract tmysqla userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqla table east.nopktble; | replicat lmysqla sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqla.dsc, purge overridedups map east.*, target east.*; | 70,678,515 | 159 | 444,518.96 |
extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqlb table east.millitble, filter (@RANGE (1,4)); | replicat lmysqlb sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlb.dsc, purge overridedups map east.*, target east.*; | 271,046,102 | 902 | 300,494.57 |
extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqlc table east.millitble, filter (@RANGE (2,4)); | replicat lmysqlc sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlc.dsc, purge overridedups map east.*, target east.*; | 269,056,311 | 898 | 299,617.27 |
extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqld table east.millitble, filter (@RANGE (3,4)); | replicat lmysqld sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqld.dsc, purge overridedups map east.*, target east.*; | 269,854,784 | 863 | 312,693.84 |
extract tmysqle userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqle table east.millitble, filter (@RANGE (4,4)); | replicat lmysqle sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqle.dsc, purge overridedups map east.*, target east.*; | 269,039,165 | 831 | 323,753.51 |
Change The Extract Output
As demonstrated previously, Extract reads data directly from the source tables quickly, the bottleneck is Replicat. The Extract options FORMATASCII, FORMATSQL, and FORMATXML maybe used to change Extract's output so data load mechanisms other than Replicat may be used. When using the output formatter option, the Extract configuration would be similar to the following:FORMATASCII | FORMATSQL | FORMATXML |
---|---|---|
extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 formatascii rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge reportcount every 60 seconds, rate table east.millitble; | extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 formatsql rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge reportcount every 60 seconds, rate table east.millitble; | extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 formatxml rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge reportcount every 60 seconds, rate table east.millitble; |
Target Database Instantiation
No matter which initial load procedure is chosen, the process for performing a target database instantiation will remain the same:- Make database changes required to support OGG transactional data capture.
- Configure the OGG source and target components.
- Stop all long running transactions.
- Start the Primary Extract and Extract Data Pump.
- If the source database is Oracle get the current SCN. If the source database is DB2, SQL Server, or Sybase get the current LSN.
- Start the Primary Extract and Extract Data Pump.
- Perform the initial load.
- Wait for the initial load to complete.
- If the source database is Oracle, DB2, SQL Server, or Sybase start each Replicat specifying the ATCSN option and the CSN/LSN value obtained previously from the source database.
- Ensure the parameters HANDLECOLLISIONS and END RUNTIME exists in each Replicat parameter file.
- Start each Replicat.
- Monitor the Replicat activity.
- As each Replicat enters the STOPPED state, remove the parameters HANDLECOLLISIONS and END RUNTIME from its parameter file.
- Restart the Replicat.
Summary
Oracle GoldenGate may be configured to perform the initial load of heterogeneous data efficiently via the use of advanced configuration options.All content listed on this page is the property of Oracle Corp. Redistribution not allowed without written permission