Blog do projektu Open Source JavaHotel

poniedziałek, 26 grudnia 2011

XML or not XML

Problem
For some time I have been working on adding invoice making to javahotel application. I had to resolve a problem how to keep invoice data in the database. One invoice contains a lot of data of different data types and it is possible to design a table with a set of columns reflecting invoice data.
But I found the following problems:
  1. There are a lot of different invoice types and every invoice type contains a set of data common to all invoices and some data specific to the invoice type given. So I have to design a different relational table for every invoice type or one common table containing the superset of all possible data. The second approach means that probably a lot of columns would be empty for  a particular invoice.
  2. Invoice contains also a list of invoice details (different  for every invoice). So I have to design the second table with invoices details ("details" could be different for every invoice type) and connect them by one-to-many relation.
  3. I cannot predict at this moment  all possible invoice types. Probably in the future it will be necessary to add new data to the invoice or redesign existing. But modifying relational data in a production environment is a complicated task and should be avoided as much as possible.
Solution idea
The solution is to keep invoice data as XML string in relational table. But XML format is convenient at server-side and less convenient at client-side. Although GWT provides tools for handling XML types (link) it requires some additional effort and not reads well. Designing a POJO class for keeping invoice data and deserialize XML into POJO object is having the same disadvantages as described above.
But I found that very flexible and useful entity for keeping data is simple Map class. Instead of :
 POJO container;
 Object val =  container.getInvoiceDate();
just use:
Map container;
Object val = container("InvoiceData");
Map does not require redesigning if invoice structure changes and could be the same for any invoice type.
So the solution is to keep invoice data as XML string in the database and transform it to the Map class before transporting to client side and opposite.
Solution
So I created a general solution for transforming XML to Map (and opposite) at server side and use Map as a transient object. Client is receiving and using Map and is sending Map back to the server. General sequence of actions is as follows.
  1. Client request
  2. Server retrieves XML string from database and transforms to Map
  3. Map is sent back to client
  4. Client sends Map to server
  5. Server transforms Map to XML and XML string is stored in the database.
Common data structure
This package is common for client and server.

It is an interface (Map) for keeping data. Should be extended for concrete implementation.  It is a subset of Map interface. Keys are XPath selector for the node containing data.

package com.gwtmodel.table.mapxml;

import java.util.Set;

/**
 * @author hotel
 *
 */
public interface IDataContainer {
   
    public Set getKeys();

    public Object get(Object key);

    public Object put(String key, Object val);

}
It is an abstract template class for the whole data (invoice). It contains main body (as IDataContainer) and list of invoice details (as list of IDataContainer). This class should extended for concrete implementation.
package com.gwtmodel.table.mapxml;

import java.io.Serializable;
import java.util.List;

/**
 * @author hotel Template class for keeping IDataContainer data.
 */
@SuppressWarnings("serial")
abstract public class DataMapList implements
        Serializable {

    /** Main body of data. */
    private T dFields;
    /** List of lines in shape of IDataContainer. */
    private List dLines;

    public DataMapList(T dFields, List dLines) {
        this.dFields = dFields;
        this.dLines = dLines;
    }

    /**
     * @return the dFields
     */
    public T getdFields() {
        return dFields;
    }

    /**
     * @return the dLines
     */
    public List getdLines() {
        return dLines;
    }

    /**
     * Adds next line to the dLines and returns IDataContainer just added
     * 
     * @return IDataContainer added
     */
    abstract public T addToLines();
}

It is limited to one list of data but can be easily extended.
Solution for server-side
The package is available here.
The solution requires to define pattern XML. An example is available here. Pattern is an XML file with empty content (only tag structure is defined) and type attribute which describes the content of the tag. Interpreting types and transforming between Map object and XML texts is done by IXMLTypeFactory interface (look below).
So creating XML string from IDataContainer map is done by means of filling the pattern XML with text content. Map keys are XPath pointers for XML document.
And opposite - deserialize XML string into IDataContainer is done by scanning XML file, taking 'type' info from pattern XML and filling the IDataContainer with key values.
For concrete implementation it is necessary to customize IXMLTypeFactory interface.

package com.gwtmodel.mapxml;

/**
 * @author hotel Factory interface providing user specific data
 */
public interface IXMLTypeFactory {

    /** Common types. */
    String DATE = "date";
    String DECIMAL = "decimal";
    String INT = "int";
    String INTEGER = "integer";
    String LONG = "long";

    /** Name of type attribute. */
    String TYPE = "type";

    /** Returns name of 'lines' element. */
    String getLinesTag();

    /** Returns name of single line in 'lines' section. */
    String getLineTag();

    /**
     * Creates object from string (element text). Object will be inserted into
     * IDataContainer map.
     * 
     * @param xType
     *            Type string (or null). Values is taken from 'type' attribute
     * @param s
     *            String value
     * @return Object
     */
    Object contruct(String xType, String s);

    /**
     * Opposite to construct. Transforms object to string
     * 
     * @param xType
     *            Type string (or null).
     * @param o
     *            Object taken from IDataContainer map
     * @return String to be inserted into XML string
     */
    String toS(String xType, Object o);
}

There is also available default implementation of this interface. It can be used "as is" or extended if customization is necessary.
package com.gwtmodel.mapxml;

import java.math.BigDecimal;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import com.gwtmodel.table.common.CUtil;
import com.javahotel.common.command.PaymentMethod;

/**
 * @author hotel Default implementation of IXMLType Factory. Can be used 'as is'
 *         or extended
 */
public class SimpleXMLTypeFactory implements IXMLTypeFactory {

    protected final SimpleDateFormat fo = new SimpleDateFormat("yyyy-MM-dd");

    public final static String PAYMENT = "pay";

    protected boolean isInt(String xType) {
        return xType.equals(INT) || xType.equals(INTEGER);
    }

    @Override
    public Object contruct(String xType, String s) {
        if (CUtil.EmptyS(xType)) {
            return s;
        }
        if (xType.equals(PAYMENT)) {
            return PaymentMethod.valueOf(s);
        }
        if (xType.equals(DATE)) {
            try {
                return fo.parse(s);
            } catch (ParseException e) {
                e.printStackTrace();
                return null;
            }
        }
        if (xType.equals(DECIMAL)) {
            return new BigDecimal(s);
        }
        if (isInt(xType)) {
            return new Integer(s);
        }
        if (xType.equals(LONG)) {
            return new Long(s);
        }
        return s;
    }

    @Override
    public String toS(String xType, Object o) {
        if (CUtil.EmptyS(xType)) {
            return (String) o;
        }
        if (xType.equals(DATE)) {
            Date d = (Date) o;
            return fo.format(d);
        }
        if (xType.equals(PAYMENT)) {
            PaymentMethod pa = (PaymentMethod) o;
            return pa.toString();
        }
        if (xType.equals(DECIMAL)) {
            BigDecimal b = (BigDecimal) o;
            return b.toString();
        }
        if (xType.equals(LONG)) {
            Long l = (Long) o;
            return l.toString();
        }
        if (isInt(xType)) {
            Integer i = (Integer) o;
            return i.toString();
        }
        return (String) o;
    }

    @Override
    public String getLinesTag() {
        return "//Lines";
    }

    @Override
    public String getLineTag() {
        return "Line";
    }

}
Additional classes description

Example implementation

Implementation of IDataContainer.

Implementation of DataMapList.

Entity bean for keeping invoice data: link. Invoice data is stored as:
@Basic(optional = false)
private Text invoiceXML;
Transient object used at client side: link. All keys (XPath selector) are defined as constants

Source code for Map->XML transformation.


      sou.getInvoiceD().getdFields()
                .put(InvoiceP.INVOICENUMBER, sou.getName());
        String xml = HotelCreateXML.constructXMLFile(iC, IMess.INVOICEPATTERN,
                sou.getInvoiceD());
        boolean ok = false;
        if (xml != null && HotelVerifyXML.verify(iC, xml, IMess.INVOICEXSD)) {
            ok = true;
        }
        if (!ok) {
            iC.getLog().getL().info(xml);
            String mess = iC.logEvent(IMessId.INPROPERINVOICEXML,
                    dest.getName(), IMess.INVOICEXSD);
            throw new HotelException(mess);
        }
        dest.setInvoiceXML(xml);

Code for opposite XML->Map transformation.
        HotelChangeXMLToMap.constructMapFromXML(iC, dest.getInvoiceD(),
                sou.getInvoiceXML());
        String i = (String) dest.getInvoiceD().getdFields()
                .get(InvoiceP.INVOICENUMBER);

Additional information
There is also additional disadvantage for XML - searching and query running. Because it is not stored as a regular relational columns standard SQL statement cannot be used. So searching in XML requires additional effort - for instance in Google App Engine there is no standard method for scanning through XML. In order to find something one has to scan through table, retrieve XML string and by means of Java programming search through XML.
But there is good news for DB2 users. DB2  (available also for DB2-Express free edition) provides built-in engine (pureXML) for querying XML data without extracting them to the application.
For instance:
Finding all invoices greater then 10000$ can be as easy as running a query:

select * from invoice

where xmlexists('$c/Invoice/Total/Total > "10000"'

passing invoice.invoiceXML as "c")

wtorek, 6 grudnia 2011

Oracle, awk, tablespace

Problem
Assume we have the Oracle database schema export and want to deploy this schema to another database.
CREATE TABLE "SCOTT"."BONUS" 
( "ENAME" VARCHAR2(10), 
"JOB" VARCHAR2(9), 
"SAL" NUMBER, 
"COMM" NUMBER 
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING 
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) 
TABLESPACE "T_SPACE1" 
Probably after deploying this table definition we receive the error message ORA-00959 (Table space T_SPACE1 does not exist).  Of course - no problem to create T_SPACE1 table space manually but what to do if we have thousands of tables with tens or even hundreds different table spaces ?
Solution
I found the following awk script useful. It simply extracts all table spaces names from table schema export and creates a script for creating all table spaces used. Datafile name is the same as table space name.
BEGIN {  }

/TABLESPACE/ {
   for (i=1; i < NF; i++) {
     if ($i == "TABLESPACE") {
       TSPACENAME=toupper($(i+1))
       gsub(",.*","",TSPACENAME);
       namespaces[TSPACENAME] = 1;
     }   
   }
}
END { 
  for (i in namespaces) {
    TSPACENAME=i;
    TFILENAME=tolower(TSPACENAME); gsub("\"","",TFILENAME); 
    TFILENAME = TFILENAME ".dbf";

    print "CREATE TABLESPACE " TSPACENAME
    print "DATAFILE '/home/oracle/app/oracle/oradata/test/" TFILENAME "'"
    print "size 1m autoextend on next 1m maxsize 2048m EXTENT MANAGEMENT LOCAL;"
    print
  }
}
You can catch the standard output and modify it before deployment - for instance remove creating TEMP and USERS table space. An example of usage:
awk -f namespace.awk < MY_USERS_TABLES.SQL >create_tablespace.sql
Additional remarks

  1. This script assumes that TABLESPACE clause and the following table space name is in the same line. It does not hold true every time. It is possible to extend this script but it requires more coding.
  2. The predicat i<NF (not i<=NF) is on purpose. Just skips if TABLESPACE clause is the last in the line. Probably it makes sense to add a warning message on that.
  3. The gsub(",.*","",TSPACENAME); regards the case when table space name is followed by , (coma). For instance  TABLESPACE "T_SPACE", PARTITION .. Unfortunately awk does not allow to specify more than one field deliminator, so coma is included as a part of table space name and should be removed after that.