This section will talk about how to dive into debugging critical issues with Oracle EPM.
Start a Problem Log
The most useful habit to develop during issue resolution is to start a detailed log about the issue. Some problems can take days or weeks to resolve and require trying hundreds of different prospective resolution attempts. It is easy for a “small” problem to become a long winded issue. Consequently, it is hard to foresee when the issue will resemble the analogous onion: keep peeling off layers and finding more and more to fix. If the problem log is created initially, all the important details can be captured. Additionally, it is much easier to bring others up to speed (management) and create support tickets when all of the information is documented. This log should include the error as the end user sees it, the error from any logs you are able to capture, screenshots, timestamps, and things that you have tried along with the results.
Reproduce the Issue
The first thing to find out is whether the issue is reproducible. It is very difficult to solve an issue that is not reproducible. Many errors are simply ‘glitches’ and may have been caused by a very improbable event, such as a database hiccup. For instance, a database problem propagates into the Oracle EPM system, forcing it into a bad state. Such a problem may never produce itself again. Consequently, an initial step toward resolution is to restart the Oracle EPM services to bring them back into a ‘known state’. If the problem is not immediately reproducible after the restart, go back to the problem log and record everything you can. This type of issue will need to be profiled over a period of time to try and discover patterns if it occurs again.
The Numerous Logs
Once it is discovered that the issue is not a simple glitch, it is time to start digging. As mentioned previously, the first place to track down the cause of an issue is in the logs. The logs come in various forms. Here is a general breakdown of the log types:
|Windows Event Viewer||This is helpful for general system related messages. Also some modules built on Windows Technology (DCOM) will log messages here. For example, Financial Management (HFM) and Financial Data Quality Management (FDQM).|
|Application Logs||The application logs are actually generated by the Hyperion code itself. These often contain the most useful information.|
|Application Server Logs||This type of log pertains to a Java based Web Application. Most of the Hyperion modules with a web based front end have Application Server Logs. The Application Server Logs run within the WebLogic, Tomcat, or WebSphere container.|
|Web Server logs||The web server controls the handoff of web requests between the Hyperion Modules. The best way to use this log is to look for error codes (404, 401… etc) in the web log and review the corresponding URL that was used to ensure it is correct. Sometimes it might be obvious that the URL in the web log has the wrong domain, points to the wrong server, or cannot resolve the context.|
Start by reviewing the log for the product where the error is occurring. The Application Logs and Application Server Logs will be most useful at first. The goal is to find a useful error message that can be used in the next process to find a resolution to the problem.
Common Log Locations:
Unfortunately, the actual log locations change drastically between recent versions of Oracle/Hyperion products. As stated before, searching for *.log might be useful.
Example Application Server Logs:
|Essbase Admin Services||Svr2||/Oracle/Middleware/user_projects/domains/EPMSystem/servers/EssbaseAdminServices0/logs|
|Analytic Provider Services||Svr2||/Oracle/Middleware/user_projects/domains/EPMSystem/servers/AnalyticProviderServices0/logs|
Example Application Logs
|Reporting and Analysis Core||Svr3||/Oracle/Middleware/user_projects/epmsystem1/diagnostics/logs/ReportingAnalysis/
Sifting Through the Logs:
It helps to know which modules depend on each other in order quickly pick out the respective log files to analyze. The basic idea is to determine which products are interacting and to review each log in detail for messages. It is important to review the logs of the product not only during runtime (as it is happening), but also during startup. Sometimes the fastest way to cut out the fluff is to stop the services, move or delete all the existing logs and start the environment back up. This ensures any log messages are relevant to the issue. Alternatively, one has to sift through potentially large logs looking for timestamps to ensure relevance, which can be daunting.
|Shared Services||Relational Database, MSAD/LDAP|
|Lifecycle Management (LCM)||Shared Services, LCM Source/Target applications|
|Hyperion Planning||Shared Services, Essbase, Business Rules, Relational Database per App|
|Business Rules||Shared Services, Hyperion Planning, Essbase, Relational Database (single database)|
|Hyperion Financial Management||Shared Services, Relational Database (single database), DCOM (Event Viewer)|
|Financial Data Quality Management (FDM)||Shared Services, Relational Database per App, Adapters for Essbase, Planning, HFM…etc, DCOM (Event Viewer)|
|Strategic Finance||Shared Services, Relational Database (optional)|
|Data Relationship Management||Shared Services, Database Client, Adapters, DCOM (Event Viewer)|
Found an Error Message!
After discovering the error message, the first thing to ask is does this message make any sense? Try to use it within the context of your problem to solve the issue. Often, it is necessary to use external resources to resolve the issue. Use resources like Google, the Oracle Support Knowledgebase, and the Oracle Forums to further research the issue. Most often there will be information regarding your issue.
Note: If possible do not searching using end user messages, i.e. what the user sees when encountering the error. Rather, find a detailed message in the logs. The end user messages are usually very generalized and can provide misleading information because of the vast number of issues which might match the general error message.
If there is still a struggle to discover a useful error message, most of the Hyperion modules use a logging mechanism that can be changed into debug mode. The actual method will differ based on product, for instance, most modules use log4j and there is often a .properties file you can change the logging level from “ERROR” or “WARN” to “DEBUG”. For instance, to enable debugging in Hyperion Planning: log into a Planning application, go to Administration -> Manage Properties, Select the System tab, Add the property DEBUG_ENABLED with a value of True. After changing the logging level, the service will need to be restarted to reflect the changes. Turning on application debugging should provide more context clues around what the product is doing at the time and help pinpoint the error.
If these resources do not help, an Oracle Support Ticket may be required. Additionally, the Oracle Forums can be a good place to post a question. When creating a support ticket and posting to a forum, please include as much information as possible. This is where the Problem Log will come in handy.
This is a good time to look for updates and patches to the product. Check for patches and updates on http://support.oracle.com. Read the release notes for anything matching your problem. Even if there is nothing coming up, some obscure errors can be solved by simply applying a patch. Not all bugs will be in the release notes for the patch. Oracle’s hpatch process is pretty straight forward, but older environments might take some time to apply the patch. Always read the entire release notes and installation instructions before applying a patch. Also, sometimes patches are not as proven as the initial installers. This is because some patches may have just been release and only tested with a handful of clients. So ensure there is a good backup process in case the patch causes unintended problems. The Oracle hpatch process has a back out feature, but it is not always useful if the patch is half way installed and failed.
Finally, the last part of troubleshooting is intuition. As more problems are encountered and resolved, one can become more confident in resolving upcoming issues. There is no way to have encountered every issue and know the resolution, so the best that you can do is arm yourself with a good knowledge of the architecture, have a set of best practices, and lots of patience for problem solving.