Memory leak hunting - Rhino script woes leading to stale metadata not being garbage collected

Discussion:

Brian Koehmstedt

2014-09-17 03:23:00 UTC

After some extended heap analysis sessions, I think I've narrowed down
my Shib IdP 2.4.0 memory leak to the default Rhino engine. The original
observation of a leak and corresponding Tomcat crashes was recently
discussed in:
http://marc.info/?t=140926984900011&r=1&w=2
"Problem with tomcat hanging on Shib 2.4 "

I'll describe my analysis in a moment, but besides sharing what I've
learned, I'm still scratching my head over why this problem isn't being
more widely felt as it seems to be, on the surface, a generalized
problem with Rhino (as described below). This started happening when we
migrated from Solaris to RedHat, as discussed in previously mentioned
thread, so maybe it's environmental, but I'm not quite yet ready to
blame RedHat or OpenJDK just yet, although that possibility certainly
remains. (And the next step is to try and reproduce this consistently
in a test environment and see if the problem goes away with Oracle
JDK.) The other possibility is we have some odd quirk in our
configuration that is triggering this, possibly introduced when we did
the migration.

At any rate -- on to describing what I found.

First, the obvious stuff: I turned on GC stat logging and took heap
dumps at intervals, and the evidence was clear of heap usage steadily
growing. At the very top of the overall size usage chart in the jhat
histogram were character arrays (189MB at last heap dump and growing).
Further investigations led to the conclusion this was mostly character
arrays containing strings from the Incommon metadata file...leading me
to the early hypothesis that the old Incommon data wasn't entirely being
garbage collected.

Further digging led me to:
In java-shib-common, ScriptMatchFunctor.java (contains script exec
code), in getScriptContext(), it instantiates a SimpleScriptContext and
sets the filterContext attribute to an instance of
ShibbolethFilteringContext. Normally, it seems this context should come
and go with each script execution and thus be garbage collected.

But, when adding the filterContext to the ScriptContext, internally in
the JDK this seems to trigger the Rhino engine to analyze the
filterContext class (ShibbolethFilteringContext) for methods belonging
to that object and cache what it finds.

It seems to cache the field members and methods in an instance of
sun.org.mozilla.javascript.ClassCache (via
sun.org.mozilla.javascript.JavaMembers and
sun.org.mozilla.javascript.NativeJavaMethod).

When it does this, this cache seems to result an in indirect reference
to the original ShibbolethFilteringContext object being kept. (See below
for the nitty gritty reference chain from jhat.)

And that filteringContext holds a ref to a SSORequestContext instance,
which in turn holds a ref to an EntityDescriptorImpl instance, which in
turn holds a ref to an EntitiesDescriptorImpl instance, which holds a
ref to all the other entities in the Incommon metadata file.

This seems to be causing memory leaks for my deployment. It seems that
not just one stale filterContext is not-garbage-collected, but over
time, this builds up to be several stale filterContext objects, all
holding stale refs to old metadata loads (in EntitiesDescriptorImpls).

The jhat Histogram shows the following instance counts:
edu.internet2.middleware.shibboleth.common.attribute.filtering.provider.ShibbolethFilteringContext
11
org.opensaml.saml2.metadata.impl.EntitiesDescriptorImpl 9

The ref chain info was found by using jhat to find the paths, from root,
to a particular object that seemed to be stale:
select heap.livepaths(heap.findObject(0xca99c500));
(This took a long time to run on my 1gb heap dump. I had to let it run
overnight and jhat needed a 5gb heap to run this query, -J-Xmx5000M.)

Relevant part of the chain showing ClassCache holding reference to a
ShibbolethFilterContext.

...
->***@0xc4fa9560 (field classTable)
->***@0xc8d2c558 (field table)
->[Ljava.util.HashMap$Entry;@0xc8d2c588 (Element 12 of
[Ljava.util.HashMap$Entry;@0xc8d2c588)
->java.util.HashMap$***@0xcedcdea8 (field value)
->***@0xcedcdec8 (field members)
->***@0xcedcdef0 (field table)
->[Ljava.util.HashMap$Entry;@0xcedcdf30 (Element 198 of
[Ljava.util.HashMap$Entry;@0xcedcdf30)
->java.util.HashMap$***@0xcededef0 (field value)
->***@0xcededf10 (field
parentScopeObject)
->***@0xcedce6d8 (field context)
->***@0xcedce6f8 (field engineScope)
->***@0xcedcea18 (field map)
->***@0xcedcea28 (field table)
->[Ljava.util.HashMap$Entry;@0xcedcea58 (Element 12 of
[Ljava.util.HashMap$Entry;@0xcedcea58)
->java.util.HashMap$***@0xcedceb58 (field value)
->***@0xcedb3af0
(field attributeRequestContext)
->edu.internet2.middleware.shibboleth.idp.profile.saml2.SSOProfileHandler$***@0xcedb3b10
(field peerEntityMetadata)
->***@0xce946bb8
(field parent)
->***@0xcbaeed78
(field signature)
...

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Cantor, Scott

2014-09-17 03:29:38 UTC

Permalink

Post by Brian Koehmstedt
I'll describe my analysis in a moment, but besides sharing what I've
learned, I'm still scratching my head over why this problem isn't being
more widely felt as it seems to be, on the surface, a generalized
problem with Rhino (as described below).

Just a quick point, I think very few (possibly almost no-one) uses the
script match functor in the filtering engine. Scripts are used in the
resolver much more often. So could be some bug specific to the use of the
script engine in the filter.

But I, again, wouldn't use OpenJDK if you paid me, so I would definitely
reproduce it on Oracle's first to rule it out.

-- Scott

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Cantor, Scott

2014-09-17 15:23:21 UTC

Permalink

Not a surprising outcome, but I wanted to note that the more common case
of attribute resolution scripts also creates a SimpleScriptContext that
contains a direct reference to a SAMLProfileRequestContext, which is the
same object being referenced in the filter case, so the attribute resolver
should also be creating this problem if it exists.

That seems to point to an environmental issue since it's not being
observed more widely.

-- Scott

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Brian Koehmstedt

2014-09-17 15:49:19 UTC

Permalink

Thanks Scott. I'll work towards trying to duplicate in test with
OpenJDK and then switch out with Oracle JDK to compare.

I think your point though about nobody else using filter scripts is
still interesting. I haven't done an exhaustive analysis on all the
stale contexts in the heap, but whenever I was perusing the heap, it did
seem like the stale stuff was associated with Rhino engines related to
the filter scripts. We also have a couple scripts in
attribute-resolver.xml, and I didn't see any refs to those, so there may
be still some filter quirk in play. (But just because I didn't see stale
resolver engine stuff doesn't mean there aren't some there..may have
been luck of the draw.)

Post by Cantor, Scott
Not a surprising outcome, but I wanted to note that the more common case
of attribute resolution scripts also creates a SimpleScriptContext that
contains a direct reference to a SAMLProfileRequestContext, which is the
same object being referenced in the filter case, so the attribute resolver
should also be creating this problem if it exists.
That seems to point to an environmental issue since it's not being
observed more widely.
-- Scott

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Brian Koehmstedt

2014-09-18 04:00:36 UTC

Permalink

I was able to duplicate this issue in test with RedHat/OpenJDK by
utilizing the following trick:

- Wrote a quick and dirty script to every minute update the
<ds:Reference URI="..."> in a metadata file copied from the Incommon file.
(May not be necessary to randomize URI. Changing the last modified time
stamp may be sufficient. Not sure.)
This was so that the test Shib IdP will recognize a new metadata file.
- Run a web server that serves up this metadata file
- Configure the FileBackedHTTPMetadataProvider in relying-party.xml to
download this changed file every minute.
- Create a dummy attribute (or just use an existing one) in LDAP
- Edit attribute-filter.xml and add 5 to 10
<afp:AttributeFilterPolicy><afp:PolicyRequirementRule/><afp:AttributeRule/></afp:AttributeFilterPolicy>
sections that reference the dummy attribute and for each one, provide a
different matching value. (Example below.)
- Then test by setting the dummy attribute value to, say, 1 and clearing
all cookies, and logging into a test SP. (Also, of course, release the
attribute to the SP that you have in afp:AttributeRule.)
- Allow the FileBackedHTTPMetadataProvider to download a new version of
the metadata (it should do this every minute.)
- Grab the PID of your IDP JVM
- Do jmap -histo:live <PID> | grep EntitiesDescriptorImpl and note the
count.
- Then repeat the above 4 steps, except increment your dummy value to be
something different now so that a new <afp:AttributeFilterPolicy> is
triggered (matched) on your next login.

As you repeat those steps, it accumulates cached
ShibbolethFilteringContexts and EntitiesDescriptorImpl objects in the heap.

Once I was able to duplicate the memory leak in test like this, it
became easy to test with a different JVM.

So I tested with Oracle JDK 1.7.0_67 Linux x64 instead of the OpenJDK
that ships with RedHat 6.5.

The jmap -histo:live shows that the EntitiesDescriptorImpl count does
*not* grow with this JVM (and no stale ShibbolethFilteringContexts).

Now I'm ready to blame OpenJDK. :)

I'll still need to verify by deploying a new Oracle JVM to production
and confirming the memory leak is gone, but based on the above testing,
it looks likely that switching to Oracle JDK and avoiding the JVM that
ships with RedHat will solve the issue.

Too bad because it would be nice to use the package management system on
RedHat for software updates like the JVM. That was one of the reasons
why we migrated to RedHat in the first place.

A couple side notes:

- I found a few things on the web that suggests using
scriptContext.setOptimizationLevel(-1) will prevent Rhino from using the
ClassCache. (Perhaps the Oracle JDK has this nonoptimization level by
default while OpenJDK simply has it set to optimize by default?)
- One potential fix to try for OpenJDK deployments is add
scriptContext.setOptimizationLevel(-1) to the Shibboleth code to see if
it avoids the ClassCaching. And if so, I wonder if it's worth
considering adding to the code. (Is this nonoptimization level the
default in Oracle JDK?)
- I saw postings on the web that indicate Rhino has been removed from
JDK 8 in favor of a "Nashorn." I wonder what the implications of this
are for anybody who is migrating shibboleth to Java 8 and who has
existing Javascript scripts in their Shibboleth configs?

Example attribute-filter.xml config using a dummy attribute with
incrementing values:


<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="1"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>


<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="2"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Ted Fisher

2014-09-18 11:43:58 UTC

Permalink

Off topic here, but you can use Oracle JDK 1.7.0_67 Linux x64 and still have Redhat package management keep it up to date, etc. Redhat has additional repositories including one called "Oracle Java for RHEL". Subscribe to that repo and you can load directly from it with yum and it will get updated when you do yum updates. We use nothing but Oracle/Sun JVM on all RHEL systems and all are managed this way.

Ted F. Fisher
Information Technology Services
BGSU, Bowling Green, Ohio

-----Original Message-----
From: users-bounces-***@public.gmane.org [mailto:users-bounces-***@public.gmane.org] On Behalf Of Brian Koehmstedt
Sent: Thursday, September 18, 2014 12:01 AM
To: Shib Users
Subject: Re: Memory leak hunting - Rhino script woes leading to stale metadata not being garbage collected

I was able to duplicate this issue in test with RedHat/OpenJDK by utilizing the following trick:

- Wrote a quick and dirty script to every minute update the <ds:Reference URI="..."> in a metadata file copied from the Incommon file.
(May not be necessary to randomize URI. Changing the last modified time stamp may be sufficient. Not sure.) This was so that the test Shib IdP will recognize a new metadata file.
- Run a web server that serves up this metadata file
- Configure the FileBackedHTTPMetadataProvider in relying-party.xml to download this changed file every minute.
- Create a dummy attribute (or just use an existing one) in LDAP
- Edit attribute-filter.xml and add 5 to 10 <afp:AttributeFilterPolicy><afp:PolicyRequirementRule/><afp:AttributeRule/></afp:AttributeFilterPolicy>
sections that reference the dummy attribute and for each one, provide a different matching value. (Example below.)
- Then test by setting the dummy attribute value to, say, 1 and clearing all cookies, and logging into a test SP. (Also, of course, release the attribute to the SP that you have in afp:AttributeRule.)
- Allow the FileBackedHTTPMetadataProvider to download a new version of the metadata (it should do this every minute.)
- Grab the PID of your IDP JVM
- Do jmap -histo:live <PID> | grep EntitiesDescriptorImpl and note the count.
- Then repeat the above 4 steps, except increment your dummy value to be something different now so that a new <afp:AttributeFilterPolicy> is triggered (matched) on your next login.

As you repeat those steps, it accumulates cached ShibbolethFilteringContexts and EntitiesDescriptorImpl objects in the heap.

Once I was able to duplicate the memory leak in test like this, it became easy to test with a different JVM.

So I tested with Oracle JDK 1.7.0_67 Linux x64 instead of the OpenJDK that ships with RedHat 6.5.

The jmap -histo:live shows that the EntitiesDescriptorImpl count does
*not* grow with this JVM (and no stale ShibbolethFilteringContexts).

Now I'm ready to blame OpenJDK. :)

I'll still need to verify by deploying a new Oracle JVM to production and confirming the memory leak is gone, but based on the above testing, it looks likely that switching to Oracle JDK and avoiding the JVM that ships with RedHat will solve the issue.

Too bad because it would be nice to use the package management system on RedHat for software updates like the JVM. That was one of the reasons why we migrated to RedHat in the first place.

A couple side notes:

- I found a few things on the web that suggests using
scriptContext.setOptimizationLevel(-1) will prevent Rhino from using the ClassCache. (Perhaps the Oracle JDK has this nonoptimization level by default while OpenJDK simply has it set to optimize by default?)
- One potential fix to try for OpenJDK deployments is add
scriptContext.setOptimizationLevel(-1) to the Shibboleth code to see if it avoids the ClassCaching. And if so, I wonder if it's worth considering adding to the code. (Is this nonoptimization level the default in Oracle JDK?)
- I saw postings on the web that indicate Rhino has been removed from JDK 8 in favor of a "Nashorn." I wonder what the implications of this are for anybody who is migrating shibboleth to Java 8 and who has existing Javascript scripts in their Shibboleth configs?

Example attribute-filter.xml config using a dummy attribute with incrementing values:


<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="1"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>


<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="2"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Brian Koehmstedt

2014-09-18 14:57:28 UTC

Permalink

Good to know Ted. Thanks.

Post by Ted Fisher
Off topic here, but you can use Oracle JDK 1.7.0_67 Linux x64 and still have Redhat package management keep it up to date, etc. Redhat has additional repositories including one called "Oracle Java for RHEL". Subscribe to that repo and you can load directly from it with yum and it will get updated when you do yum updates. We use nothing but Oracle/Sun JVM on all RHEL systems and all are managed this way.
Ted F. Fisher
Information Technology Services
BGSU, Bowling Green, Ohio
-----Original Message-----
Sent: Thursday, September 18, 2014 12:01 AM
To: Shib Users
Subject: Re: Memory leak hunting - Rhino script woes leading to stale metadata not being garbage collected
- Wrote a quick and dirty script to every minute update the <ds:Reference URI="..."> in a metadata file copied from the Incommon file.
(May not be necessary to randomize URI. Changing the last modified time stamp may be sufficient. Not sure.) This was so that the test Shib IdP will recognize a new metadata file.
- Run a web server that serves up this metadata file
- Configure the FileBackedHTTPMetadataProvider in relying-party.xml to download this changed file every minute.
- Create a dummy attribute (or just use an existing one) in LDAP
- Edit attribute-filter.xml and add 5 to 10 <afp:AttributeFilterPolicy><afp:PolicyRequirementRule/><afp:AttributeRule/></afp:AttributeFilterPolicy>
sections that reference the dummy attribute and for each one, provide a different matching value. (Example below.)
- Then test by setting the dummy attribute value to, say, 1 and clearing all cookies, and logging into a test SP. (Also, of course, release the attribute to the SP that you have in afp:AttributeRule.)
- Allow the FileBackedHTTPMetadataProvider to download a new version of the metadata (it should do this every minute.)
- Grab the PID of your IDP JVM
- Do jmap -histo:live <PID> | grep EntitiesDescriptorImpl and note the count.
- Then repeat the above 4 steps, except increment your dummy value to be something different now so that a new <afp:AttributeFilterPolicy> is triggered (matched) on your next login.
As you repeat those steps, it accumulates cached ShibbolethFilteringContexts and EntitiesDescriptorImpl objects in the heap.
Once I was able to duplicate the memory leak in test like this, it became easy to test with a different JVM.
So I tested with Oracle JDK 1.7.0_67 Linux x64 instead of the OpenJDK that ships with RedHat 6.5.
The jmap -histo:live shows that the EntitiesDescriptorImpl count does
*not* grow with this JVM (and no stale ShibbolethFilteringContexts).
Now I'm ready to blame OpenJDK. :)
I'll still need to verify by deploying a new Oracle JVM to production and confirming the memory leak is gone, but based on the above testing, it looks likely that switching to Oracle JDK and avoiding the JVM that ships with RedHat will solve the issue.
Too bad because it would be nice to use the package management system on RedHat for software updates like the JVM. That was one of the reasons why we migrated to RedHat in the first place.
- I found a few things on the web that suggests using
scriptContext.setOptimizationLevel(-1) will prevent Rhino from using the ClassCache. (Perhaps the Oracle JDK has this nonoptimization level by default while OpenJDK simply has it set to optimize by default?)
- One potential fix to try for OpenJDK deployments is add
scriptContext.setOptimizationLevel(-1) to the Shibboleth code to see if it avoids the ClassCaching. And if so, I wonder if it's worth considering adding to the code. (Is this nonoptimization level the default in Oracle JDK?)
- I saw postings on the web that indicate Rhino has been removed from JDK 8 in favor of a "Nashorn." I wonder what the implications of this are for anybody who is migrating shibboleth to Java 8 and who has existing Javascript scripts in their Shibboleth configs?

<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="1"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>

<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="2"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>
--

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Mark Boyce

2014-09-18 15:57:12 UTC

Permalink

Ted,

Thanks! Although I would be exceedingly cautious of allowing Java to update via YUM or any other automated process. Could end up with "unintended consequences".

Cheers,

Mark

Senior Identity Management Analyst
Universiity of California, Office of the President

-----Original Message-----
From: users-bounces-***@public.gmane.org [mailto:users-bounces-***@public.gmane.org] On Behalf Of Ted Fisher
Sent: Thursday, September 18, 2014 4:44 AM
To: Shib Users
Subject: RE: Memory leak hunting - Rhino script woes leading to stale metadata not being garbage collected

Off topic here, but you can use Oracle JDK 1.7.0_67 Linux x64 and still have Redhat package management keep it up to date, etc. Redhat has additional repositories including one called "Oracle Java for RHEL". Subscribe to that repo and you can load directly from it with yum and it will get updated when you do yum updates. We use nothing but Oracle/Sun JVM on all RHEL systems and all are managed this way.

Ted F. Fisher
Information Technology Services
BGSU, Bowling Green, Ohio

-----Original Message-----
From: users-bounces-***@public.gmane.org [mailto:users-bounces-***@public.gmane.org] On Behalf Of Brian Koehmstedt
Sent: Thursday, September 18, 2014 12:01 AM
To: Shib Users
Subject: Re: Memory leak hunting - Rhino script woes leading to stale metadata not being garbage collected

I was able to duplicate this issue in test with RedHat/OpenJDK by utilizing the following trick:

- Wrote a quick and dirty script to every minute update the <ds:Reference URI="..."> in a metadata file copied from the Incommon file.
(May not be necessary to randomize URI. Changing the last modified time stamp may be sufficient. Not sure.) This was so that the test Shib IdP will recognize a new metadata file.
- Run a web server that serves up this metadata file
- Configure the FileBackedHTTPMetadataProvider in relying-party.xml to download this changed file every minute.
- Create a dummy attribute (or just use an existing one) in LDAP
- Edit attribute-filter.xml and add 5 to 10 <afp:AttributeFilterPolicy><afp:PolicyRequirementRule/><afp:AttributeRule/></afp:AttributeFilterPolicy>
sections that reference the dummy attribute and for each one, provide a different matching value. (Example below.)
- Then test by setting the dummy attribute value to, say, 1 and clearing all cookies, and logging into a test SP. (Also, of course, release the attribute to the SP that you have in afp:AttributeRule.)
- Allow the FileBackedHTTPMetadataProvider to download a new version of the metadata (it should do this every minute.)
- Grab the PID of your IDP JVM
- Do jmap -histo:live <PID> | grep EntitiesDescriptorImpl and note the count.
- Then repeat the above 4 steps, except increment your dummy value to be something different now so that a new <afp:AttributeFilterPolicy> is triggered (matched) on your next login.

As you repeat those steps, it accumulates cached ShibbolethFilteringContexts and EntitiesDescriptorImpl objects in the heap.

Once I was able to duplicate the memory leak in test like this, it became easy to test with a different JVM.

So I tested with Oracle JDK 1.7.0_67 Linux x64 instead of the OpenJDK that ships with RedHat 6.5.

The jmap -histo:live shows that the EntitiesDescriptorImpl count does
*not* grow with this JVM (and no stale ShibbolethFilteringContexts).

Now I'm ready to blame OpenJDK. :)

I'll still need to verify by deploying a new Oracle JVM to production and confirming the memory leak is gone, but based on the above testing, it looks likely that switching to Oracle JDK and avoiding the JVM that ships with RedHat will solve the issue.

Too bad because it would be nice to use the package management system on RedHat for software updates like the JVM. That was one of the reasons why we migrated to RedHat in the first place.

A couple side notes:

- I found a few things on the web that suggests using
scriptContext.setOptimizationLevel(-1) will prevent Rhino from using the ClassCache. (Perhaps the Oracle JDK has this nonoptimization level by default while OpenJDK simply has it set to optimize by default?)
- One potential fix to try for OpenJDK deployments is add
scriptContext.setOptimizationLevel(-1) to the Shibboleth code to see if it avoids the ClassCaching. And if so, I wonder if it's worth considering adding to the code. (Is this nonoptimization level the default in Oracle JDK?)
- I saw postings on the web that indicate Rhino has been removed from JDK 8 in favor of a "Nashorn." I wonder what the implications of this are for anybody who is migrating shibboleth to Java 8 and who has existing Javascript scripts in their Shibboleth configs?

Example attribute-filter.xml config using a dummy attribute with incrementing values:


<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="1"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>


<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="2"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org
--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

C R

2014-10-01 07:44:23 UTC

Permalink

For what it's worth, we (KU Leuven University) recently migrated our
production IdP install to Oracle JDK 7 from Redhat's OpenJDK 7. We
experienced problems where Tomcat wasn't reachable from Apache resulting in
many 503 errors. We experienced this twice in a week with very high
load/connections (start academic years).

With Oracle's JDK we haven't seen the problem yet.

Claudio

Post by Mark Boyce
Ted,
Thanks! Although I would be exceedingly cautious of allowing Java to
update via YUM or any other automated process. Could end up with
"unintended consequences".
Cheers,
Mark
Senior Identity Management Analyst
Universiity of California, Office of the President
-----Original Message-----
On Behalf Of Ted Fisher
Sent: Thursday, September 18, 2014 4:44 AM
To: Shib Users
Subject: RE: Memory leak hunting - Rhino script woes leading to stale
metadata not being garbage collected
Off topic here, but you can use Oracle JDK 1.7.0_67 Linux x64 and still
have Redhat package management keep it up to date, etc. Redhat has
additional repositories including one called "Oracle Java for RHEL".
Subscribe to that repo and you can load directly from it with yum and it
will get updated when you do yum updates. We use nothing but Oracle/Sun
JVM on all RHEL systems and all are managed this way.
Ted F. Fisher
Information Technology Services
BGSU, Bowling Green, Ohio
-----Original Message-----
On Behalf Of Brian Koehmstedt
Sent: Thursday, September 18, 2014 12:01 AM
To: Shib Users
Subject: Re: Memory leak hunting - Rhino script woes leading to stale
metadata not being garbage collected
I was able to duplicate this issue in test with RedHat/OpenJDK by
- Wrote a quick and dirty script to every minute update the <ds:Reference
URI="..."> in a metadata file copied from the Incommon file.
(May not be necessary to randomize URI. Changing the last modified time
stamp may be sufficient. Not sure.) This was so that the test Shib IdP
will recognize a new metadata file.
- Run a web server that serves up this metadata file
- Configure the FileBackedHTTPMetadataProvider in relying-party.xml to
download this changed file every minute.
- Create a dummy attribute (or just use an existing one) in LDAP
- Edit attribute-filter.xml and add 5 to 10
<afp:AttributeFilterPolicy><afp:PolicyRequirementRule/><afp:AttributeRule/></afp:AttributeFilterPolicy>
sections that reference the dummy attribute and for each one, provide a
different matching value. (Example below.)
- Then test by setting the dummy attribute value to, say, 1 and clearing
all cookies, and logging into a test SP. (Also, of course, release the
attribute to the SP that you have in afp:AttributeRule.)
- Allow the FileBackedHTTPMetadataProvider to download a new version of
the metadata (it should do this every minute.)
- Grab the PID of your IDP JVM
- Do jmap -histo:live <PID> | grep EntitiesDescriptorImpl and note the count.
- Then repeat the above 4 steps, except increment your dummy value to be
something different now so that a new <afp:AttributeFilterPolicy> is
triggered (matched) on your next login.
As you repeat those steps, it accumulates cached
ShibbolethFilteringContexts and EntitiesDescriptorImpl objects in the heap.
Once I was able to duplicate the memory leak in test like this, it became
easy to test with a different JVM.
So I tested with Oracle JDK 1.7.0_67 Linux x64 instead of the OpenJDK that
ships with RedHat 6.5.
The jmap -histo:live shows that the EntitiesDescriptorImpl count does
*not* grow with this JVM (and no stale ShibbolethFilteringContexts).
Now I'm ready to blame OpenJDK. :)
I'll still need to verify by deploying a new Oracle JVM to production and
confirming the memory leak is gone, but based on the above testing, it
looks likely that switching to Oracle JDK and avoiding the JVM that ships
with RedHat will solve the issue.
Too bad because it would be nice to use the package management system on
RedHat for software updates like the JVM. That was one of the reasons why
we migrated to RedHat in the first place.
- I found a few things on the web that suggests using
scriptContext.setOptimizationLevel(-1) will prevent Rhino from using the
ClassCache. (Perhaps the Oracle JDK has this nonoptimization level by
default while OpenJDK simply has it set to optimize by default?)
- One potential fix to try for OpenJDK deployments is add
scriptContext.setOptimizationLevel(-1) to the Shibboleth code to see if it
avoids the ClassCaching. And if so, I wonder if it's worth considering
adding to the code. (Is this nonoptimization level the default in Oracle
JDK?)
- I saw postings on the web that indicate Rhino has been removed from JDK
8 in favor of a "Nashorn." I wonder what the implications of this are for
anybody who is migrating shibboleth to Java 8 and who has existing
Javascript scripts in their Shibboleth configs?

<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="1"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>

<afp:AttributeFilterPolicy>
<afp:PolicyRequirementRule xsi:type="basic:AttributeValueString"
attributeID="MatchingAttribute"
value="2"/>
<afp:AttributeRule attributeID="givenName">
<afp:DenyValueRule xsi:type="basic:Script">
<basic:Script>
<![CDATA[
<SOME JAVASCRIPT HERE, DOESN'T MATTER WHAT IT IS>
]]>
</basic:Script>
</afp:DenyValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>
--
To unsubscribe from this list send an email to
--
To unsubscribe from this list send an email to
--
To unsubscribe from this list send an email to

Cantor, Scott

2014-09-18 13:45:51 UTC

Permalink

Post by Brian Koehmstedt
- One potential fix to try for OpenJDK deployments is add
scriptContext.setOptimizationLevel(-1) to the Shibboleth code to see if
it avoids the ClassCaching. And if so, I wonder if it's worth
considering adding to the code. (Is this nonoptimization level the
default in Oracle JDK?)

It would be good if you could file a bug with your test case. Even if we
don't get around to it, it gets it recorded, and if it's relevant to V3
anyway, we'll have to verify the situation there.

Post by Brian Koehmstedt
- I saw postings on the web that indicate Rhino has been removed from
JDK 8 in favor of a "Nashorn." I wonder what the implications of this
are for anybody who is migrating shibboleth to Java 8 and who has
existing Javascript scripts in their Shibboleth configs?

https://wiki.shibboleth.net/confluence/display/SHIB2/IdPJava1.8

Rod did a lot of work on the issue already.

Thanks,
-- Scott

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Brian Koehmstedt

2014-09-18 21:36:20 UTC

Permalink

Post by Cantor, Scott
It would be good if you could file a bug with your test case. Even if we
don't get around to it, it gets it recorded, and if it's relevant to V3
anyway, we'll have to verify the situation there.

https://issues.shibboleth.net/jira/browse/SC-195

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org

Cantor, Scott

2014-09-18 14:06:38 UTC

Permalink

Post by Brian Koehmstedt
setOptimizationLevel(-1)

That appears to be a Rhino compiler setting that isn't a method on any
standard classes, so I don't think it's settable in our code. Perhaps it
can be set with a system property.

-- Scott

--
To unsubscribe from this list send an email to users-unsubscribe-***@public.gmane.org