Users of Essbase have some control over the performance of a database and how responsive it is when retrieving data.  With a basic understanding of how Essbase stores data, users can optimize performance by changing the order of the dimensions and members in a report.

It might be helpful to read our article on sparse and dense dimensions.  Here is a brief overview:

An Essbase database is comprised of thousands, if not millions or billions, of data blocks.  Each block of data, and its size, is defined by the dense dimensions in the Essbase outline.  The volume of blocks is dictated by the unique combinations of sparse dimension members.  If Time and Accounts are dense, each block created would hold all the months for every account.  If Organization and Product are sparse dimensions, there would be a block for each unique combination of Organization and Product.  A block would exist for Center 10 / Product A, as well as Total Organization / Total Product.  If the outline has 20 members in Organization and 15 members in Products, the database could have up to 300 independent blocks.

If a report is written to show an entire income statement for all 12 months for Total Product and Total Organization, how many blocks would have to be queried?  Remember, there is a block for each unique member combination of Organization and Product.  The answer is one, because there is a block for Total Organization/Total Product that includes every account and every member in the time dimension.

How many blocks would be accessed if a report pulled Total Sales (a member in the Accounts dimension) in January for every product?  Since the Product dimension is sparse and there are 15 products, 15 blocks would have to be opened to return the results.

Here is where your understanding of what sparse and dense represents will help you improve your reports.  Opening a data block, reading the contents, and closing it, is similar to opening, reading, and closing a spreadsheet.  It is much faster to open one spreadsheet, or block, than 15 spreadsheets.  So, if the database retrieves are written in such a way to minimize the number of blocks that need to be accessed, or the order in which they are accessed, performance can improve.

I will agree that if data for all 15 products is needed for the report, all 15 blocks have to be opened.  There is no way around that.  That said, often times users will build one worksheet for income statement and one worksheet for balance sheet.  This means that the report is making two passes on the same blocks.  In theory, it takes twice as long to open/read/close a data block 2 times than it does once.  It is faster to have the income statement and the balance sheet accounts in one worksheet, which only makes one pass on the required blocks.  One worksheet for Income Statement and one for Balance Sheet can be created with cell references to the worksheet that has the retrieved data, if 2 separate reports are required.

I frequently see another example of a report requiring multiple passes to the same data block.  Using our example dimensions above, assume product information is required in a report for multiple accounts.

 

 

Jan

Feb

Mar

Income

Product A

 

 

 

Income

Product B

 

 

 

Income

Product C

 

 

 

Income

Product D

 

 

 

Expense

Product A

 

 

 

Expense

Product B

 

 

 

Expense

Product C

 

 

 

Expense

Product D

 

 

 

The Essbase retrieve above would start from the top of the spreadsheet and move down the rows to retrieve the data from Essbase.  This cycle would open the Product A block, then B, C, and D, and retrieve the associated income for each.  It would then have to reopen the same 4 blocks to access expenses. 

The following example, again going from top to bottom, would access both income and expense while the block is open.  The way this retrieve is setup, it eliminates the need to access the same block multiple times, yet still pulls the required information.

 

 

Jan

Feb

Mar

Income

Product A

 

 

 

Expense

Product A

 

 

 

Income

Product B

 

 

 

Expense

Product B

 

 

 

Income

Product C

 

 

 

Expense

Product C

 

 

 

Income

Product D

 

 

 

Expense

Product D

 

 

 

These examples are very small.  In a real world example, a report of this size would not produce significant variances in the time it takes to retrieve them.  Users often have spreadsheets that are hundreds of rows long and take minutes to retrieve.  In these situations, eliminating the need to access the same block multiple times can produce notable improvements in the time it takes to retrieve data from Essbase.

With a basic understanding of how your database is setup, users of Essbase can help themselves with some simple changes to the format of the retrieve worksheet.  If access to the dimension properties in your database is unavailable, ask your system administrator to supply them for you.

 

Executing calculations that only run on blocks that have changed is a great feature in Essbase.  It enables administrators to calculate the database in a fraction of the time and is referred to as calculating “dirty” blocks, or an update calc.  This is awesome.  “Why shouldn’t I use it all the time?” you might ask.  Understanding how the Essbase calc engine works is critical to answering this question.

The Essbase calc engine calculates each block in a specific order (see figure 1).  The first block it calculates is the first level 0 block of the first sparse dimension.  It then traverses to higher levels and moves through the dimension from top to bottom until the entire dimension is consolidated.

When a level 0 block is changed, it and all of its parents, are tagged as dirty (it needs to be calculated again). When a calculation is executed on just dirty blocks, the process is the same except that it skips all the “clean” blocks.  Once the block is calculated the dirty tag is changed to clean.  So far, so good!

Revisit figure 1, which is a very simple example.  It shows a very simple hierarchy with the order in which the blocks are calculated, 1 through 10.

Figure 2 shows what happens if New York is updated.  Blocks 5, 6, and 10 are tagged as dirty.  The next calculation, if set to calculate only the dirty blocks, would only calculate blocks 5, 6, and 10, in that order.

Here is where things get a little ugly.  When an application has write access, as a planning or forecasting application would, it is very possible that users are updating data DURING the calculation process.  The timing of these events is critical to understand why calculating only dirty blocks can cause inconsistencies.

When a calculation has started, it identifies which blocks need calculated (5, 6, and 10 in this example).  Immediately after that, it starts calculating block 5.  If Texas is updated while block 5 is being calculated, what happens?

Figure 3 shows the state of the clean/dirty blocks when the calculation is finished with block 5.  It is exactly what you might expect at this point.  Blocks 6 and 10 are still dirty.  The update of Texas caused Blocks 1, 3, and 10 to be tagged dirty.

This is the critical piece.  Keep in mind how the calculation engine works.  It will continue to calculate blocks 6 and 10.  Also note that the calculation running does NOT reevaluate what needs calculated.  It will not calculate blocks 1 and 3. 

Figure 4 shows the state of the blocks after the calculation finishes.  Only blocks 1 and 3 are dirty at this point because 10 was included in the calculation.

When the next calculation is executed, the only blocks that are dirty are 1 and 3.  Can you see the problem now?  After blocks 1 and 3 are calculated, is block 10 accurate?  Does U.S. equal the total of South, East, and West?  Unfortunately, it does not. 

One could argue that it will get updated the next time data is changed.  In a very simple example with 3 levels, this would probably correct itself rather quickly, if the problem happened at all.  In a more realistic example where a company has 10 or 20 levels in their organization dimension, the problem is likely to be a reoccurring problem and may not be corrected until a full calculation is executed.  In most situations, it is not acceptable to have a database where it consolidates correctly only some of the time without any warning that it is not accurate.  Reporting can be incorrect, and bad management decisions can result.

Using the dirty calc feature is a great tool to have in your arsenal.  It can save hours of processing time.  It can make you look like a genius.  Without understanding its pitfalls, it can be the source of countless wasted hours trying to figure out why a cube isn’t consolidating correctly.  A worst case scenario is when a cost center manager updated their budget, it never gets consolidated correctly, and the problem isn't identified until it is too late.

 

 

Get notified when a new post is published.