Monday, 23 January 2012

Minority Report

Ranking is a very usual requirement of BI reporting. People tends to sort things in order and focus on the top performers. Which sales region is generating the most of revenue? What are my most popular products? Who are my best performing employees? All these type of queries can be easily answered via invoking the Rank() function on relevant fact measure.
What about "the other 80%" (Pareto Principle)? We normally are not interested in the details of "insignificant" individual contributors. But the overall collective contribution of "the others" can be very valuable business information. One example can be risk control. From the management's perspective, the bigger contribution "the others" generate, the lower risks the business will have.

Here I would like to share with you two different apporaches for composing this kind of minority report in OBIEE 11g.

 I will be using the data and subject area "Sample Sales" from OBIEE SampelApp.

The first approach leverage CASE function and attribute column.
1, Create a new analysis with subject area "Sample Sales', drag in the column "P1 Product" and "1 - Revenue" twice.

2, Put the following code in the column formular of the second "P1 Product". Change the column title to "Top Individuals".

case when rank("Base Facts"."1- Revenue")<=5 then "Products"."P1  Product" else 'All other Non Toppers' end

3, Put the following code in the column formular of the second "1 - Revenue". Change the column title to "Rank". Add "sort acending" on this column.

case when rank("Base Facts"."1- Revenue") <= 5 then rank("Base Facts"."1- Revenue") else 6 end

4, Open the column formula for "Rank", tick the box for "treat as attribute column". This setting specify the column to be handled as an attribute column, which prevents the aggregation of the values. For the detail explanation of "attribute column", you can click on here and go one page up.

5, Go to the result tab, and add a new pivot table view. Click on the pencil button to modify the view and make it's defined like below.

6, Then you get your "minority report". You can also create a chart to get some intuitive views.

7, The only flaw of this apporach is Table view wouldn't give us the result we want. Once column "P1 Product" is exclude from the layout, the table aggregate everything into one record. This is because of the limit of "attribute column" function. More explanation can also be find from the document link mentioned in step 4.

The second approach is more straightforward in my point view. Instead of retrieving everything from database and then rely on the CASE function to categorize records on the presentation layer, This approach will retrieve "Top 5" and "Non Top 5" as two separate data set from database directly. Then uses the "combine function" in Answers to combine the data sets together as report result.

1, Create a new analysis with subject area "Sample Sales', drag in the column "P1 Product" and "1 - Revenue" twice.

2, Put the code below as the column formular of the second " 1 - Revenue" column. Change the column title to "Rank".

Rank("Base Facts"."1- Revenue")

3, Add a filter on column "Rank".

4, Below the save button, click on "combine results based on ..." button to add another set of criteria. Chose the same "Sample Sales' subject and add same columns in criteria.

5, Put the code below as the column formular of the second "P1 Product". Change the column tilte to "All other Non Toppers".

'All other Non Toppers'

6, Put the code below as the column formular of the second " 1 - Revenue" column. Change the column title to "Rank".

Rank("Base Facts"."1- Revenue")

7, Add a filter on column "Rank".

8, Click on "Result Column" and add sort ascending on column "Rank".

9, Click on the result tab and click on pencil button to modify the table view. Exclude column "P1 Product" from the table layout.

This approach works well with both table view and pivot table view.

In summary, I think the key learning from this practices is:
 1, Attribute Column is your friend when you want to avoid the aggregation on certain column;
 2, Don't forget about the "combine result" button on top right of your Answers interface. It helps a lot in certain circumstances.

Sunday, 15 January 2012

Miss is as good as a mile

A few days ago, I was reviewing my prior post and something unusual caught my attention.

How come those densely populated suburbs in Sydney are not covered by the thematic view for postal areas??? In order to further confirm the issue, I moved the map around to another state and I found some thematic views are put on Tasman sea (see below)!!! I've got an alignment issue.
This is a critical issue for a BI Mapview as Business Intelligence are not accurately aligned with the real location on map. A few centimeters shifting of map theme (geometries that represents regions) on Google Map will make users look at the business information that is actually relevant with area/spot a few kilometers away. This makes the whole spatial analysis meaningless.

After spending some time searching and studying, I found the root cause of this problem - the inconsistency between SRIDs for Google Map and spatial data in database. SRID represents "Spatial Reference system IDentifier" which is a unique value used to unambiguously identify projected, unprojected, and local spatial coordinate system definitions. These coordinate systems form the heart of all GIS applications. Older map uses old reference system while newer maps use more accurate reference system made using satellites and sophisticated electronic equipment. The coordinates for a point on the earth can vary significantly depending on the SRID used. Consequently, if all spatial data is not in the same spatial reference system, the geographic layers will not overlay and the analysis will become misleading.

In our case, the map service provider Google is using SRID 3785.
 However,when import spatial data from shapefiles into Oracle database, the SRID is set to 8307 by default. It can be retrieved by running the SQL Script below:

select table_name, column_name, srid from user_sdo_geom_metadata
where table_name = 'NSW_ELB_REGION';

Because it's not practical for us to change the SRID of Google Map, we need to do something on our spatial data in Oracle database so that it is compatible with SRID 3785.

In Oracle Spatial Developer's guide, three approaches are suggested for fixing this issue:
1, Use SRID 4055 data;
2,   Declare an EPSG rule between SRID 8307 and 3785;
3,   Call oracle stored procedures to convert spatial data that physically stored in database;

I've tested the first two approaches in my environment. Either of them fixed the alignment issue perfectly for me.

For the first approach, all you have to do is specify the SRID value as 4055 while import the shapefile data with Oracle Mapbuilder (step in my last post). Then the data will be loaded into Oracle database as SRID 4055 which is compatible with Google's SRID 3785.

However, if reloading the data is not a feasible option for you, you can run the following SQL script to declare a transformation rule between SRID 8307 and 3785. As the result, when Oracle retrieve the spatial data that you loaded, it will apply the rule to transfer it to SRID 3785 and send it to Oracle Mapviewer for rendering.
CALL sdo_cs.create_pref_concatenated_op(830723785,
TFM_PLAN(SDO_TFM_CHAIN(8307, 1000000000, 4055, 19847, 3785)),

Now, the SRIDs between the spatial data and Google Map are consistent and I can have my Business Intelligence perfectly overlay with Google Map. No more CBD region displayed as middle of no where or resident postal area sitting on the surface of sea.