This article will discuss some best practices around troubleshooting and debugging your Oracle EPM environment.
Oracle EPM (Hyperion) is a complex system that is composed many modules that work together to perform different functions. Typically, in design, creating a modular approach is a best practice. However, the Oracle EPM modules were created with disparate architectures because they acquired over time through purchasing technology from many smaller companies. As a result, the modules interact only at a high level, through network APIs. The result is a complex system with many breaking points, and error messages that can be less than informative. Consequently, troubleshooting and debugging Hyperion is an art form.
The fastest way to debug a critical problem is to have a good understanding of what components represent a known, “good” state. Only by identifying the abnormal element in the system can one start to resolve a critical issue.
Below Are Best Practices to Follow to Document a Known, Good State
What Should Be Running?
The first question you should ask is, “is everything up?” Typically you would start by checking the URLs you use for each product – ensuring you can login. These URLs would typically include the load balancing name.
Example of End User/Administrative URLs
|Essbase Admin Services||http://hyperion.svr.com/easconsole/console.html|
Set Up Port Monitoring Software
You can also get a ton of info in a quick glance by setting up port monitoring software. Remember to include the relational database in your monitoring. Sometimes the DBA will take the database down unexpectedly, or the database user IDs will expire. Additionally, it is easy to have 20-30 Hyperion related services or processes running per environment distributed among multiple hosts. It’s too time consuming and error prone to check manually.
An Example of a Custom Java Based Port Monitor
To gather a list of ports within the environment there are a few aids to use for reference. Oracle EPM Version 11 includes a diagnostics web form that will show the status of some of the products. This can be found under the Windows Start Menu, under Programs -> Foundation. This might be a good place to get started.
Also, refer to the Oracle EPM Documentation, Install Start Here Document for typical port reference.
An Example of Some Common JVM Ports
|Analytic Provider Services||svr04||13080|
|Essbase Admin Services||svr04||10080|
Stopping and Starting the Environment
Operational procedures are important too. This means having a reliable start and stop procedure for the environment. Given the complexity of the Hyperion environment, there is no excuse for problems occurring while bringing an environment back up after routine maintenance. One common issue that comes up is a service did not start completely, which can be found by using the port monitoring method above. Also, sometimes the services do not fully stop – causing residual processes to mess up the restart procedure. This might include extra ESSVR processes indicating the Essbase application did not stop properly, or a JVM process which is hung. As a precaution, check the port monitoring software after stopping the environment to ensure all the components are indeed down. It is time consuming when this type of issue is encountered. In a Windows environment you might take the environment down through your normal process, then kill any remaining “Java” and “Hyperion” related processes using task manager. In a Unix environment, after stopping the environment, “kill -9 -1” can be used as a last resort. A reboot will always solve this type of issue, though not often actually needed; it might be faster as a last resort for the unskilled admin needing to fix unexplainable issues.
Functional Validation Script
Functional validation is a set of very simple actions that can be performed within a few minutes to validate the functionality of the environment. This is absolutely necessary to perform before handing the system back over to end users after a routine maintenance. This task can be either performed by the Hyperion Admin or handed over to the Help Desk. It should, at a minimum, include running a few types reports (HFM, FR, WA), log into Essbase, Planning, HFM, etc… As you discover other reoccurring issues you may want to include more specific checks to ensure common “gotchas” are ironed out before handing over to the end users.
Knowledge of the Architecture
The Hyperion administrator has to be technologically savvy. However, many times Hyperion will tell you exactly what the problem is. The first place to look when encountering an issue is the logs! Take time to find and document the log files for every product in your environment. Most administrators are a bit intimidated by the number of logs generated by the system, but this is the first place to look. Familiarize yourself with the various logs in the environment. This includes WebLogic logs, JVM logs, Hyperion logs, and operating system logs. If you are lost, a good place to start is search for *.log, sort by date, and look at the directory for the product name. You might even keep a record of each log after starting the service successfully so that you can compare to a good state if there is a problem.
Additionally, it is necessary to have context around how the environment works. This can be achieved by reading the product manuals, looking at architecture diagrams, and attending training sessions. Get involved in the environment. Try to understand both the technical details (ports, logs, different components…etc) and the functional basics (create a planning form, edit an outline, load data into Essbase, run a consolidation) of each Oracle Module.
A Good State: Create a Knowledge Base
Hyperion issues commonly come up more than once. Create a shared, searchable, knowledgebase to track issues and their resolutions. This will help you train your team as it expands and changes over time.
The focus of this article was to provide the Oracle EPM Administrator with best practices on how to document the current state of the Oracle EPM System. This is the most important part of being able to understand complex issues as they come up. More technical detail will be provided in the Oracle EPM Troubleshooting and Debugging Guide (Part 2 of 2).