https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.3/com.ibm.swg.im.bigsql.doc/doc/admin_monitor-bigsql-query.html
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.wlm.doc/doc/c0055265.html
GitHub project
Inspiration
BigSQL is nothing more than DB2 running on the top of Hadoop. It also inherits a lot of goodies from DB2 including a sophisticated rigging of metrics. But the metrics are only numbers and by itself do not provide anything meaningful unless one is deeply versed in DB2 internals.
Metrics are cumulative and are growing constantly. Instead of looking at pure values much more interesting is observing how the values are changing over time and trying to discover some patterns or trends. For instance, if the metric suddenly starts to surge it can indicate that the BigSQL is under a heavy workload.
Also, it could be interesting to use the metrics for prediction, for instance, that there is a risk of delays or prolonged response if an adequate pattern is found.
Solution
I developed a simple solution, it can be downloaded here.
The purpose of the tool is to retain historical data and to get easy access to the difference between consecutive metric values for analysis.
The solution contains the following elements:
- Database schema to keep historical data. The metrics are pivoted, instead of a single row of metrics it breaks down the row into a series of records: metric id/value
- The supporting view providing a difference between consecutive metric values instead of the pure value.
- Several stored procedures to harvest and extract collected metrics.
- Two methods of metrics collecting are available: as Linux crontab job or as DB2 scheduled task.
Simple analysis
I also developed a simple tool for heavy workload prediction. Although in DB2 there are several hundred different metrics, only subset of them seems to be related to workload specific for BigSQL. The idea is to nail down a period of normal workload and to monitor the current period. If the average of several metrics being observed exceeds significantly the average of the normal period, the alarm is raised that heavy workload is underway.
But during my testing, the solution does not seem to be usable. More tuning is necessary or a different approach should be built which has teeth.
Brak komentarzy:
Prześlij komentarz