Saturday, November 10, 2007

SSO for Oracle Grid Control

Although there is a lot happening, it seems like I'm almost never pick myself together to write a post over here.

This time I'll share a solution I made up to create a Single Sign-On module for Oracle Grid Control. Well I assume you already know what Grid Control is all about but if you don't you can check it out on Oracle's web site.

Something like a year ago I first heard about this product and me and some of the people I'm working with agreed immediately that we should do a POC of it in our network... Few weeks later we had it installed and agents were deployed on over 50 servers. Out of few minor problems as with any other product we integrated into our IT environment it was working and pretty much kept its word.

A while has passed until we actually put it on our production environment and now we use the latest version (Grid Control R3).

All this time there was a very annoying process in this product – as you guessed – the logon screen that shows up each time you want to use the application.
I picked up the phone and asked Oracle support team if they have any SSO solution for Grid Control because there is no logic in asking a user his credentials if he already authenticated against a domain. Oracle's local support team said that there is no SSO solution and when I asked them why, they said that this product is supposed to serve a small group of admin users and because of that there is no excuse to create a SSO module for it. This is where I come into the picture and in the next paragraphs I will explain how I built SSO module for Oracle Grid Control.

Well the Management Server is a normal J2EE application running on a standard Oracle Application Server and as any J2EE application we could relative easily change it. The Idea was to add a MOD_NTLM to the HTTP Server (OAS is based on Apache) and change the logon module to use the remote user id and log him in with the right permissions.

So I'll start from the bottom – DB.
SSO in its basic level is all about logging someone with his domain authentication to our application and giving him the right permissions. This means that we need two tables:
First is a table with our local application users(id, username and password).
Second is a table which maps between a domain user to a local user/role(domain username, local user id from the first table).

On top of it we should create a simple page which take the remote username ,ask the DB which local user should he be logged on with, put this values into the form and add some java script that submits the logon form for the user.

Well this is a problem because I was counting that the logon page is some kind of a standard technology(JSP or servlet) but…
The logon screen is a .uix page (Oracle's tag based server side pages) called logon.uix located in the OC4J_EM directory with all other application files… There is no way I will learn another language for this I told myself and so I came up with the next idea:
I'll put some java script code in the uix (there is a place to put raw html in those pages) logon screen that opens up a http request to another page that is filtered with MOD_NTLM which will do the SSO logic (select the right user from our tables). This second page will be a JSP that will write back a XML response with this information and back to the uix java script that will parse it and put it into the username and password fields and submit the form (yeah you are right, this is known as AJAX).

This is it. SSO is actually very easy to create.
'Till next time,
Good luck!

*You should put some encryption on the first table because saving DB passwords in clear text is not recommended.

*I never tried it but this solution should work fine to be a SSO for Oracle Enterprise Manager (OMS) for any Oracle Forms Server, Oracle Discoverer or any other Oracle Application Server(R3 and above).

Tuesday, May 15, 2007

IIS based Web Service - Delayed Response

After a long break updating my blog, I'm back.
I had some interesting issues in the last few month and I hope I'll have the time posting them here....

I'll begin with the oldest one.
Few months back one of the developers in the development team was complaining about a very slow response times from a web service he developed and was running in our Test environment.
He told me that the problem only occurs in the first request, or on the first request after long time (30 mins or above), but he cannot be exact and can't reproduce it.
The normal response time is about 1-2 seconds but to the first one, which takes almost 30!
He also told me that there is nothing 'heavy' or complex in the initialization process and he suspects that the problem is in IIS or the CLR...

I followed his directions and the first thing came in my mind is the "Idle timeout" that can be configured to IIS Application Pools. Surprisingly it was actually configured to 20 mins! I turned it off (to never shutdown an idle application pool).

After couple of hours he came back complaining that the problem still occurs.
My next step was to look for some ASP.NET configuration options in web.config and machine.config but I found nothing really related...

I had no direction and the only thing in my mind was that maybe something is wrong with the server(IIS or .NET Framewok), so I set up a virtual server with other OS version(Wndows 2003 Web Edition) and tried causing the problem to appear again.

I noticed that the only way to reproduce the behavior was to boot the server - Killing the worker process or restarting the IIS didn't trigger it.

Now with the option to reproduce in my hands, I could really start digging. I started (Microsoft's) System Internals' Process Explorer and watched the w3wp.exe of the WebService (w3wp.exe is the instance of the "Application Pool" you see in IIS). I checked to see if it was working and using CPU and not just waiting or hang on something. I drilled down to its threads tring to detect where the problem is. If the problem was in his code I could see it, but it was happening very long time before his code was invoked. I saw some threads waiting on a method named CompareAssemblyIdentity (a method used to "compares two assembly identities to determine whether they are equivalent"). Few moments later I noticed some new child processes running under the worker process, named csc.exe (C# Compiler) and they (in their turn) created another process as well !! Only then I realized the problem! ASP.NET is Recompiling the classes for the first time the Web Service is used! It is that simple.

Boot is not the only trigger for ASP.NET recompilation and it wasn't the case of the poor developer as well. The other trigger is much more relevant to his case - changes in the code. He just forget telling me he was still working on it and publishing his code relative frequently to the test server...

Tuesday, February 20, 2007

Kernel Memory leaks - Part 2

After a while not able to reproduce the case I've managed to get it again and copy all the data from Poolmon. The total of Paged-pool shown in the upper-right corner says 207MB but when summing each allocation for each tag (only Paged of curse) I got 270MB !!

Still have no answer.

Thursday, January 11, 2007

Kernel Memory Leaks

For the last couple of years we are getting alot of servers' hangs ,due to both Paged and Non-Paged Pool depleted. I've managed to trace the leaking applications and either it an application we wrote or a third party application - closing it (killing it's process) just frees part of the leaked handles.

When monitoring with poolmon I saw the handles getting freed but not as they should have been.
Lets say that the total memory those handles were taking is 100MB in paged pool and the total use of paged pool was 150MB. Killing the process that created them should free them all right? - Wrong. While killing the process freed 90MB of the handles, the total should have dropped to 60MB, but its not! There seems to be unlisted handles either by purpose or by mistake/bug of the OS.

I intend to get to the bottom of this.