I'm running Shib 2.4 in Tomcat 6 on RHEL5 in VMware with no problems. I'm
using Oracle Java (available from RH) and the latest Tomcat 6 source, not
the RH version. We rebuild once a week for new SPs and have never
(*knocks-on-wood) had a hang.
We also allocate 8GB RAM per VM and a larger footprint for the JVM. We load
InCommon and almost 900 other pieces of Metadata at a go.
export JAVA_OPTS=""
export JAVA_OPTS="$JAVA_OPTS -server -d64 -XX:+PrintCommandLineFlags"
# only create a huge JVM if the operation is 'start'
if [[ "$1" == 'start' ]]; then
export JAVA_OPTS="$JAVA_OPTS -XX:+UseParallelOldGC
-XX:MaxGCPauseMillis=5000"
export JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:-TraceClassUnloading"
export JAVA_OPTS="$JAVA_OPTS -Xmx6144m -Xms4096m"
export JAVA_OPTS="$JAVA_OPTS -XX:MaxNewSize=512m -XX:NewSize=256m"
export JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=2048m -XX:PermSize=512m"
# was "-XX:+CMSPermGenSweepingEnabled"
export JAVA_OPTS="$JAVA_OPTS -XX:+CMSClassUnloadingEnabled"
fi
As also mentioned, the environment needs to be solid. I love my
infrastructure crew! <3 <3
Regards,
-Charles
-----Original Message-----
From: users-bounces-***@public.gmane.org [mailto:users-bounces-***@public.gmane.org] On
Behalf Of Brian Koehmstedt
Sent: Friday, August 29, 2014 11:41 AM
To: users-***@public.gmane.org
Subject: Re: Problem with tomcat hanging on Shib 2.4
I work with John Kamminga, the original poster. I'm out of the office right
now, so the team is looking into it in my absence, but I've taken a peak
while I've been been out. I don't have all the details yet, but I do
believe this is a memory problem as Matthew has suggested and observed at
his location. From Matthew's description, it sounds like we may be hitting
the same problem. Even the timeline is right. (He said every couple of
weeks, which is about what we're seeing.)
In a previous "hang" a few weeks ago (not the latest one John is
describing), I noticed an Out of Memory error in the log file. John should
check for this in the latest hang-up logs, but I am definitely suspecting
either:
- A memory leak
- An unexplained GC problem, as Matthew said. (Although the GCs of JVMs
should be so thoroughly tested and rock solid that I doubt it is a JVM GC
bug. A standard memory leak is much more likely.)
- The JVM just flat out running out of memory due to growing Incommon
metdata file, but it seems like -Xmx1024M should be sufficient even when the
current size of the metadata file. Matthew, I'd be curious to know what you
had your -Xmx parameter set at when you were experiencing the hang-ups.
I've already begun taking heap dumps and analyzing them with jhat.
Analyzing the heap isn't always straight forward, but there is a
"tremendous" amount of char[], String, HashMapEntry, and various XML objects
in the heap. I put "tremendous" in quotes because I don't yet know if it's
a normal amount or abnormal amount. You can't tell just by looking at a
heap. Most of these objects look related to storing data from the Incommon
metadata file. Since this file is growing quite big, the data in the heap
could be normal, in which case -Xmx1024M is no longer sufficient?
One thing I was definitely meaning to do when I got back was add
-XX:+HeapDumpOnOutOfMemoryError.
On 8/29/2014 7:29 AM, Matthew Slowe wrote:
On Thu, Aug 28, 2014 at 11:49:11PM +0000, John Kamminga wrote:
We've migrated our production Shibboleth environment from
Solaris 10 to
Redhat 6 and are now experiencing problems with the app
becoming
unresponsive every couple weeks. A tomcat reboot fixes
it, but we'd like
to find out what is causing it. Has anyone else
experienced issue
migrating to or running on Redhat 6?
Or, does anyone see any potential problems with our
setup?
Here is our environment setup on a Redhat VM.
Redhat Linux version: 2.6.32-431.20.3.el6.x86_64
Shibboleth Idp 2.4
Tomcat 6.0.24
JAVA_OPTS=" -Xmx1024M -XX:MaxPermSize=512M -server
-Djava.library.path=/usr/lib64
-Djavax.net.ssl.trustStore=/jdk/cacerts"
Java -version:
java version "1.7.0_55"
OpenJDK Runtime Environment (rhel-2.4.7.1.el6_5-x86_64
u55-b13)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
First I'm going to refer to a thread on the JISC-SHIBBOLETH mailing
list
last year on the subject (no signin required)
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1310&L=jisc-shibboleth&F=&
S=&P=60
We have three IDPs running in a very similar setup to yourselves (3
RHEL
VMs (each are 2cpu, 4G) on VMware) running, then, 1.7.0_25 (now _55)
each servicing up to 330,000 authentications per day.
Anywhere from a few days to a week or two after startup, the JVM
will go
into a wierd state and stop responding to practically anything. It
appears to get stuck doing some massive Garbage Collect which we've
not
been able to tune out (which is what that thread is about).
Having sunk days of time into it, we bailed and scheduled rolling
overnight tomcat restarts :-(
Take a look at the GC logs (which you may need to turn on) to see if
you're hitting long GCs (hint, not recommendation):
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-XX:+PrintGCApplicationStoppedTime -verbose:gc
-Xloggc:/var/log/tomcat6/gc.log
Good luck!