Brian Koehmstedt
2014-09-17 03:23:00 UTC
After some extended heap analysis sessions, I think I've narrowed down
my Shib IdP 2.4.0 memory leak to the default Rhino engine. The original
observation of a leak and corresponding Tomcat crashes was recently
discussed in:
http://marc.info/?t=140926984900011&r=1&w=2
"Problem with tomcat hanging on Shib 2.4 "
I'll describe my analysis in a moment, but besides sharing what I've
learned, I'm still scratching my head over why this problem isn't being
more widely felt as it seems to be, on the surface, a generalized
problem with Rhino (as described below). This started happening when we
migrated from Solaris to RedHat, as discussed in previously mentioned
thread, so maybe it's environmental, but I'm not quite yet ready to
blame RedHat or OpenJDK just yet, although that possibility certainly
remains. (And the next step is to try and reproduce this consistently
in a test environment and see if the problem goes away with Oracle
JDK.) The other possibility is we have some odd quirk in our
configuration that is triggering this, possibly introduced when we did
the migration.
At any rate -- on to describing what I found.
First, the obvious stuff: I turned on GC stat logging and took heap
dumps at intervals, and the evidence was clear of heap usage steadily
growing. At the very top of the overall size usage chart in the jhat
histogram were character arrays (189MB at last heap dump and growing).
Further investigations led to the conclusion this was mostly character
arrays containing strings from the Incommon metadata file...leading me
to the early hypothesis that the old Incommon data wasn't entirely being
garbage collected.
Further digging led me to:
In java-shib-common, ScriptMatchFunctor.java (contains script exec
code), in getScriptContext(), it instantiates a SimpleScriptContext and
sets the filterContext attribute to an instance of
ShibbolethFilteringContext. Normally, it seems this context should come
and go with each script execution and thus be garbage collected.
But, when adding the filterContext to the ScriptContext, internally in
the JDK this seems to trigger the Rhino engine to analyze the
filterContext class (ShibbolethFilteringContext) for methods belonging
to that object and cache what it finds.
It seems to cache the field members and methods in an instance of
sun.org.mozilla.javascript.ClassCache (via
sun.org.mozilla.javascript.JavaMembers and
sun.org.mozilla.javascript.NativeJavaMethod).
When it does this, this cache seems to result an in indirect reference
to the original ShibbolethFilteringContext object being kept. (See below
for the nitty gritty reference chain from jhat.)
And that filteringContext holds a ref to a SSORequestContext instance,
which in turn holds a ref to an EntityDescriptorImpl instance, which in
turn holds a ref to an EntitiesDescriptorImpl instance, which holds a
ref to all the other entities in the Incommon metadata file.
This seems to be causing memory leaks for my deployment. It seems that
not just one stale filterContext is not-garbage-collected, but over
time, this builds up to be several stale filterContext objects, all
holding stale refs to old metadata loads (in EntitiesDescriptorImpls).
The jhat Histogram shows the following instance counts:
edu.internet2.middleware.shibboleth.common.attribute.filtering.provider.ShibbolethFilteringContext
11
org.opensaml.saml2.metadata.impl.EntitiesDescriptorImpl 9
The ref chain info was found by using jhat to find the paths, from root,
to a particular object that seemed to be stale:
select heap.livepaths(heap.findObject(0xca99c500));
(This took a long time to run on my 1gb heap dump. I had to let it run
overnight and jhat needed a 5gb heap to run this query, -J-Xmx5000M.)
Relevant part of the chain showing ClassCache holding reference to a
ShibbolethFilterContext.
...
->***@0xc4fa9560 (field classTable)
->***@0xc8d2c558 (field table)
->[Ljava.util.HashMap$Entry;@0xc8d2c588 (Element 12 of
[Ljava.util.HashMap$Entry;@0xc8d2c588)
->java.util.HashMap$***@0xcedcdea8 (field value)
->***@0xcedcdec8 (field members)
->***@0xcedcdef0 (field table)
->[Ljava.util.HashMap$Entry;@0xcedcdf30 (Element 198 of
[Ljava.util.HashMap$Entry;@0xcedcdf30)
->java.util.HashMap$***@0xcededef0 (field value)
->***@0xcededf10 (field
parentScopeObject)
->***@0xcedce6d8 (field context)
->***@0xcedce6f8 (field engineScope)
->***@0xcedcea18 (field map)
->***@0xcedcea28 (field table)
->[Ljava.util.HashMap$Entry;@0xcedcea58 (Element 12 of
[Ljava.util.HashMap$Entry;@0xcedcea58)
->java.util.HashMap$***@0xcedceb58 (field value)
->***@0xcedb3af0
(field attributeRequestContext)
->edu.internet2.middleware.shibboleth.idp.profile.saml2.SSOProfileHandler$***@0xcedb3b10
(field peerEntityMetadata)
->***@0xce946bb8
(field parent)
->***@0xcbaeed78
(field signature)
...
my Shib IdP 2.4.0 memory leak to the default Rhino engine. The original
observation of a leak and corresponding Tomcat crashes was recently
discussed in:
http://marc.info/?t=140926984900011&r=1&w=2
"Problem with tomcat hanging on Shib 2.4 "
I'll describe my analysis in a moment, but besides sharing what I've
learned, I'm still scratching my head over why this problem isn't being
more widely felt as it seems to be, on the surface, a generalized
problem with Rhino (as described below). This started happening when we
migrated from Solaris to RedHat, as discussed in previously mentioned
thread, so maybe it's environmental, but I'm not quite yet ready to
blame RedHat or OpenJDK just yet, although that possibility certainly
remains. (And the next step is to try and reproduce this consistently
in a test environment and see if the problem goes away with Oracle
JDK.) The other possibility is we have some odd quirk in our
configuration that is triggering this, possibly introduced when we did
the migration.
At any rate -- on to describing what I found.
First, the obvious stuff: I turned on GC stat logging and took heap
dumps at intervals, and the evidence was clear of heap usage steadily
growing. At the very top of the overall size usage chart in the jhat
histogram were character arrays (189MB at last heap dump and growing).
Further investigations led to the conclusion this was mostly character
arrays containing strings from the Incommon metadata file...leading me
to the early hypothesis that the old Incommon data wasn't entirely being
garbage collected.
Further digging led me to:
In java-shib-common, ScriptMatchFunctor.java (contains script exec
code), in getScriptContext(), it instantiates a SimpleScriptContext and
sets the filterContext attribute to an instance of
ShibbolethFilteringContext. Normally, it seems this context should come
and go with each script execution and thus be garbage collected.
But, when adding the filterContext to the ScriptContext, internally in
the JDK this seems to trigger the Rhino engine to analyze the
filterContext class (ShibbolethFilteringContext) for methods belonging
to that object and cache what it finds.
It seems to cache the field members and methods in an instance of
sun.org.mozilla.javascript.ClassCache (via
sun.org.mozilla.javascript.JavaMembers and
sun.org.mozilla.javascript.NativeJavaMethod).
When it does this, this cache seems to result an in indirect reference
to the original ShibbolethFilteringContext object being kept. (See below
for the nitty gritty reference chain from jhat.)
And that filteringContext holds a ref to a SSORequestContext instance,
which in turn holds a ref to an EntityDescriptorImpl instance, which in
turn holds a ref to an EntitiesDescriptorImpl instance, which holds a
ref to all the other entities in the Incommon metadata file.
This seems to be causing memory leaks for my deployment. It seems that
not just one stale filterContext is not-garbage-collected, but over
time, this builds up to be several stale filterContext objects, all
holding stale refs to old metadata loads (in EntitiesDescriptorImpls).
The jhat Histogram shows the following instance counts:
edu.internet2.middleware.shibboleth.common.attribute.filtering.provider.ShibbolethFilteringContext
11
org.opensaml.saml2.metadata.impl.EntitiesDescriptorImpl 9
The ref chain info was found by using jhat to find the paths, from root,
to a particular object that seemed to be stale:
select heap.livepaths(heap.findObject(0xca99c500));
(This took a long time to run on my 1gb heap dump. I had to let it run
overnight and jhat needed a 5gb heap to run this query, -J-Xmx5000M.)
Relevant part of the chain showing ClassCache holding reference to a
ShibbolethFilterContext.
...
->***@0xc4fa9560 (field classTable)
->***@0xc8d2c558 (field table)
->[Ljava.util.HashMap$Entry;@0xc8d2c588 (Element 12 of
[Ljava.util.HashMap$Entry;@0xc8d2c588)
->java.util.HashMap$***@0xcedcdea8 (field value)
->***@0xcedcdec8 (field members)
->***@0xcedcdef0 (field table)
->[Ljava.util.HashMap$Entry;@0xcedcdf30 (Element 198 of
[Ljava.util.HashMap$Entry;@0xcedcdf30)
->java.util.HashMap$***@0xcededef0 (field value)
->***@0xcededf10 (field
parentScopeObject)
->***@0xcedce6d8 (field context)
->***@0xcedce6f8 (field engineScope)
->***@0xcedcea18 (field map)
->***@0xcedcea28 (field table)
->[Ljava.util.HashMap$Entry;@0xcedcea58 (Element 12 of
[Ljava.util.HashMap$Entry;@0xcedcea58)
->java.util.HashMap$***@0xcedceb58 (field value)
->***@0xcedb3af0
(field attributeRequestContext)
->edu.internet2.middleware.shibboleth.idp.profile.saml2.SSOProfileHandler$***@0xcedb3b10
(field peerEntityMetadata)
->***@0xce946bb8
(field parent)
->***@0xcbaeed78
(field signature)
...
--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org