Blog do projektu Open Source JavaHotel

sobota, 26 września 2015

MSSQL -> DB2 migration, useful awk scripts

Database migration is a painstaking task, mostly manual, but with a little help of simple awk scripts it can be alleviated. As an example, I'm using Adventure Works Databases 2012 sample data and by means of "Generate scripts" tool I'm creating text scripts containing all SQL objects.
First task
Output scripts are encoded as UTF-16, therefore one can spend a lot of time running grep command without any result. Thus, the first thing to do is to transform scripts to UTF-8 encoding and change CRLF line break to LF.
iconv -f UTF-16 $1 | dos2unix >`basename $1`
List of objects
Before starting any activity, it is a good idea listing all objects to migrate and prepare a working spreadsheet to track migration progress (listobjects.awk)
awk -v objtype=ALL -f listobjects.awk iscript.sql  >list.csv

For tables migration very useful is free tool DCW (Data Conversion Workbench). This tool extracts table schema directly from MSSQL and transforms them to DB2 format. Important: the tool does not handle foreigns key correctly, but it can be improved by a Python tool.
Split objects into files
Although MSSQL "Generate scripts" tool allows exporting objects into separate files, very often we have the objects in one big text file.  Nevertheless, during migration it is very convenient to have a single object in a single file and migrate them one after one. obj.awk script executes this task.
awk -v extracttype=ALLOTHERS -f obj.awk iscript.sql 
As a result, all objects except tables, are put into directories: views, triggers, types, procedures and function.
Example: view directory
sbartkowski@linuxmachine:~/work/testb$ ls views
humanresources_vemployeedepartmenthistory.sql  production_vproductanddescription.sql           sales_vindividualcustomer.sql
humanresources_vemployeedepartment.sql         production_vproductcatalogimages.sql            sales_vpersondemographics.sql
humanresources_vemployee.sql                   production_vproductcatalog.sql                  sales_vsalespersonsalesbyfiscalyears.sql
humanresources_vjobcandidateeducation.sql      production_vproductmodelcatalogdescription.sql  sales_vsalesperson.sql
humanresources_vjobcandidateemployment.sql     production_vproductmodelinstructions.sql        sales_vstorewithaddresses.sql
humanresources_vjobcandidate.sql               production_vworkorderrouting.sql                sales_vstorewithcontacts.sql
person_vadditionalcontactinfo.sql              purchasing_vvendorwithaddresses.sql             sales_vstorewithdemographics.sql
person_vstateprovincecountryregion.sql         purchasing_vvendorwithcontacts.sql              sales_vterritorysalesdrilldown.sql
production_vmanufacturinginstructions.sql      sales_vcompanysales.sql

Tables are split using another parameter. As an input file, the result of DCW tool is used.
awk -v extracttype=TABLES -f obj.awk tables.sql
Objects are put into tables and constraint directories.
sbartkowski@linuxmachine:~/work/testb$ ls tables
humanresources_department.sql                 production_location.sql                               purchasing_purchaseorderheader.sql
humanresources_employeedepartmenthistory.sql  production_productcategory.sql                        purchasing_shipmethod.sql
humanresources_employeepayhistory.sql         production_productcosthistory.sql                     purchasing_vendor.sql
humanresources_employee.sql                   production_productdescription.sql                     sales_countryregioncurrency.sql
humanresources_jobcandidate.sql               production_productdocument.sql                        sales_creditcard.sql
humanresources_shift.sql                      production_productinventory.sql                       sales_currencyrate.sql
person_address.sql                            production_productlistpricehistory.sql                sales_currency.sql
person_addresstype.sql                        production_productmodelillustration.sql               sales_customer.sql
person_businessentityaddress.sql              production_productmodelproductdescriptionculture.sql  sales_personcreditcard.sql
person_businessentitycontact.sql              production_productmodel.sql                           sales_salesorderdetail.sql
person_businessentity.sql                     production_productphoto.sql                           sales_salesorderheadersalesreason.sql
person_contacttype.sql                        production_productproductphoto.sql                    sales_salesorderheader.sql
person_countryregion.sql                      production_productreview.sql                          sales_salespersonquotahistory.sql
person_emailaddress.sql                       production_product.sql                                sales_salesperson.sql
person_password.sql                           production_productsubcategory.sql                     sales_salesreason.sql
person_personphone.sql                        production_scrapreason.sql                            sales_salestaxrate.sql
person_person.sql                             production_transactionhistoryarchive.sql              sales_salesterritoryhistory.sql
person_phonenumbertype.sql                    production_transactionhistory.sql                     sales_salesterritory.sql
person_stateprovince.sql                      production_unitmeasure.sql                            sales_shoppingcartitem.sql
production_billofmaterials.sql                production_workorderrouting.sql                       sales_specialofferproduct.sql
production_culture.sql                        production_workorder.sql                              sales_specialoffer.sql
production_document.sql                       purchasing_productvendor.sql                          sales_store.sql
The obj.awk script also performs simple MSSQL->DB2 transformation (modifmsobj.awk)


function outputline(line) {
  if (PRINT == 1) print line

/^@/ { 
     PRINT = 0
     print $0

  line = gensub(/\[|\]/,"","g",$0)   
  line = gensub(/@/,"v","g",line)   
  line = gensub(/+/,"||","g",line)   
  line = gensub(/ISNULL/,"COALESCE","g",line)   

  • Removes []
  • Replaces @ variable marker with v character
  • MSSQL uses + for string concatenation. The script replaces + with ||. Important: all + are replaced, also + used for artihtmetic computation.
  • Replaces ISNULL with COALESCE
Also for types a simple tranformation is conducted (modiftypes.awk).
Although MSSQL->DB2 migration is still manual and mundane process by applying very simple automation we can speed it up and make it more organized. Very interesting object splitting combine with transformation. This transformation (take a look into the body of obj.awk script) can be enhanced regarding the needs.