Return to Jive Software

110,338 Views 70 Replies Last post: Jun 3, 2009 8:44 AM by bretm RSS
bretm Novice 69 posts since
Sep 29, 2008
Currently Being Moderated

Mar 13, 2009 8:09 AM

Low effectiveness percentages for caches... a problem?

We're seeing poor performance across the board for our internal installation, and have a number of fronts we think we can make improvements on... but one question I had was what to expect for cache effectiveness values on:

 

/clearspace/admin/system-cache.jsp

 

I'm seeing numbers all over the board, but only rarely does clearspace mark any of them w/ the red background.  The documentation seemed to suggest 90% cache effectiveness as a target; some of our caches are at ~17%, etc.  But often-times, the caches aren't even full.

 

Thoughts?

Tags: cache, performance, cache_effectiveness, cache_size
Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 13, 2009 8:48 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Hi bretm,

 

There are a couple of things that can cause a document cache to be ineffective:

 

  1. The server is just starting up and the caches are empty, so most requests will be for data that results in a call to the database
  2. Not a lot of activity on the site, so things fall out of the cache before they are accessed again
  3. A fair amount of activity on the site, but short timespans for the caches

 

You can increase the lifespan for caches by editing the applicable value in the jive_startup.xml (requires restart).  The values are stored as the number of milliseconds content should live in the cache.  It may be appropriate in your case to bump them up for the caches that you see are ineffective.  You also want to make sure that your caches have plenty of room--50 to 60% full is a good target to shoot for.  Clearspace will highlight caches that are both ineffective and nearly full.

 

Would you mind if I made this into a public case? I'm sure other customers would benefit from this knowledge as well. 

 

Thanks!

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 13, 2009 9:51 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Here's the exact formula that we use:

 

lowEffec = (hits > 500 && hitValue < 85.0 && freeMem < 20.0);

 

So, if a cache has over 500 hits and is less than 85% effective and is over 80% full, it will be flagged.  If these conditions don't hold true, then you don't need to adjust the allocated memory for the cache.  However, you may still want to increase the lifespan for your caches if you have plenty of memory to spare and don't want things to fall out of the cache. 

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 16, 2009 9:46 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Unfortunately, there isn't any documentation for this, but it should be fairly straightforward.  For instance, if you want to modify the Document cache, search for this in your jive_startup.xml (please note, size may be different from what is shown below):

    <DocumentCache>
      <size>33554432</size>
      <maxLifetime>43200000</maxLifetime>
    </DocumentCache>

To increase the lifespan of the cache, you would need to increase the value of maxLifetime.  The current setting is for 12 hours (12h * 60min/h * 60s/min * 1000ms/s = 43200000 milliseconds).

 

Thanks,

Austen

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 16, 2009 1:12 PM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

It looks like you haven't changed the default settings at all, so here's what you need to do:

 

  1. Login to the admin console
  2. Navigate to System > Settings > Caches
  3. Edit the caches
  4. Change the size for one of them (doesn't matter which one)
  5. Save your changes

 

Once you have customized the cache settings, all of the settings will be written to your jive_startup.xml and you should be able to locate the appropriate section for updating the cache lifetime. 

 

Thanks!

Austen

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 17, 2009 8:15 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Bret,

 

I'm reluctant to provide this because the sizes on my caches almost certainly vary from what you have, but here is a copy of what I have locally.  You must copy/paste the entire <cache> section into the jive_startup.xml on each node.  Please let me know if you have any more questions on this. 

 

Thanks,

Austen

Attachments:
LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 17, 2009 11:45 AM in response to: Austen Rustrum
Re: Low effectiveness percentages for caches... a problem?

Hi Austen,

 

I always wonder why there is a maxLifetime property for cached objects. Clearspace / SBS does as far as I know invalidate objects which were modified and this works clusterwide. So I'd like to set the cache lifetime to -1 /maxInt. I wonder whether this is a bad idea.

 

Do you use weak hashmaps for the caches? This would allow one to set also the size to very big values without getting memory problems.

 

LG

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 17, 2009 2:31 PM in response to: LG
Re: Low effectiveness percentages for caches... a problem?
So I'd like to set the cache lifetime to -1 /maxInt. I wonder whether this is a bad idea.

 

A value of zero or less means unlimited.  Not a bad idea, but it will mean that your caches consume more memory over time, so you'll have to plan accordingly. 

 

Do you use weak hashmaps for the caches? This would allow one to set also the size to very big values without getting memory problems.

 

We use Oracle's Coherence for our distributed caching layer.  Once the cache is full, items will be evicted.  The main concern when setting unlimited cache lifespans is the amount of memory that they will consume over time. 

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 17, 2009 2:36 PM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Are you sure that the caching is really the problem here?  Have you done any thread dumps to confirm that the DB traffic is the bottleneck? 

 

The only way to determine what caches are used for a given request would be to trace through each line of code and identify which caches are being used.  Cache management is primarily done in the Manager layer, though there are a few other places you'll find this logic in the application.  One action may use several managers and each manager may have zero to many caches, not all of which may be used depending on the nature of the request. 

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 18, 2009 8:33 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Bret,

 

Not a lot jumping out from what you have provided.  The thing that would help the most here would be to have some thread dumps from your application server at the time you are experiencing the slowness.  If you can get 3-5 thread dumps during a period where you are replicating the problem and upload them to this case, that will give us the information that we need to identify what is causing the performance problem.  If you can provide a correlated thread level top output with each thread dump, that would be even better.  Since the load on the DB is light, I have a feeling that your performance problems are not with the caching layer, but that remains to be determined.

 

Thanks,

Austen

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 18, 2009 9:48 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

I guess I need some more information:

 

  • Have you witnessed slow performance or are you relying on the results from your testing tools?
  • Are you not able to replicate the performance problem when going directly to the main index.jspa page?  Is this an intermittent thing? 
  • Any chance this is running in a VM? 
  • Do you have specific documents that are slow?  If so, are they large documents? 

 

A series of thread dumps over the time that you are seeing the performance problem will give us a snapshot of what is happening in the system.  Without these, we are forced to guess as to what the problem might be.  From what I understand about the behavior of your system, this doesn't appear to be your problem.  One additional thing that could help would be to identify long running queries in the DB.  Can you grab a report of the top 10 longest running queries?

 

Thanks,

Austen

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 19, 2009 7:31 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Definitely sounds like you have powerful machines, so something else is happening here and I don't think it is a caching problem.  Do you get this same sort of performance when hitting the application servers directly?  Do you know what the IDs for the 287 documents that have performance problems are?  If so, can you take some thread dumps while accessing one of them?  What sorts of customizations do you have?  Any?

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 19, 2009 8:45 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Hi,

 

the thread dumps do not look like standard Sun JVM dumps. I wonder which JVM you are using.

20 seconds to load a page is really bad. Did you see high CPU usage?

Does it perform better when you shutdown one server (no active cluster)?

 

LG

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 19, 2009 9:13 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

It's a Sun JDK, the dumps were generated through the jmx-console and run through links -dump.

20s to load a page *is* really bad   And that's not even the worst of it.

 

Agreed, that is abnormal and needs to be fixed.  Can you please use the standard kill -3 method to take thread dumps?  I have tools that can analyze thread dumps in this format to easily provide me a view into long running threads between sets of thread dumps.  In pouring through these so far I haven't found much besides a quiet server sitting around for work.  There is occasionally 1 thread that is actually doing something in Jive code, but other than that, everything is waiting for work.  Very odd that you would see 20s for the application server to perform one request.

 

Performance doesn't appear to be very different w/ just one node vs two.  For a long time, we were running just one node, and performance seemed to be just as bad (we don't have the same degree of data from that timeperiod though).

 

Adding an additional node won't give you any performance benefit due to the overhead in running the cluster.  The only thing that you'll get is failover.  It takes 3 or more nodes before you start to see a performance benefit.

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 19, 2009 10:57 AM in response to: Austen Rustrum
Re: Low effectiveness percentages for caches... a problem?

Hi Austen,

 

as far as i can tell "the standard kill -3 method to take thread dumps" is out-dated. Today one wants to use "jstack" which is included in the JDK.

http://www.igniterealtime.org/community/docs/DOC-1033 is the JVM document used to identify Openfire/JVM issues.

 

Maybe one needs to install the JDK if only the JRE is installed. So one can also use JConsole: http://java.sun.com/javase/6/docs/technotes/guides/management/jconsole.html - this could be helpful.

https://visualvm.dev.java.net/ may be more useful as it makes better use of the MXBeans and may display blocked or waiting threads. Anyhow I did not use it yet.

 

LG

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 19, 2009 11:01 AM in response to: LG
Re: Low effectiveness percentages for caches... a problem?
as far as i can tell "the standard kill -3 method to take thread dumps" is out-dated. Today one wants to use "jstack" which is included in the JDK.

 

One would think that the Sun provided tools are better, but kill -3 actually provides more information and can be correlated with a thread level top output or ps output.  Not the case with jstack.  Plus, great tools like TDA have been built around the format provided by kill -3:

 

https://tda.dev.java.net/

 

Perhaps your experience is different, but I've always been able to identify the source of the problem with a kill -3 thread dump.

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 19, 2009 9:19 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Two additional things:

 

  1. Can you please follow the instructions here and run the bi-directional datagram test?
    http://wiki.tangosol.com/display/COH34UG/Performing+a+Datagram+Test+for+Network+Performance
    The coherence.jar can be found in the WEB-INF/lib of the exploded war.
  2. How many communities do you have?
Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 23, 2009 1:55 PM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Thanks, looking forward to the results!  If you could make a note of the specific URLs that were slow during your thread dumps, that will help me analyzing them.  Also, I did mean spaces.  We tend to use them synonymously around here, so sometimes I forget the correct term to use

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 24, 2009 2:59 PM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

There are a lot of thread dumps to pour through and analyze here.  Were they all taken during a time when there was a performance problem or were some of them taken during normal load?  I'll have a look through them and get back to you on this tomorrow. 

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 25, 2009 7:35 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Here's my analysis from looking at the first thread dumps on application server #1:

 

 

console.log.dump1.1-4: Nothing happening
console.log.dump1.5-10: View communities action
console.log.dump1.11: ActionManagerProxy.getRecentActivity
console.log.dump1.12: RecentContentWidget and ActivityManagerProxy.getRecentActivity (same thread)
console.log.dump1.13: RecentContentWidget (same thread)
console.log.dump1.14: YourStatusUpdatesWidget and ActivityManagerProxy.getRecentActivity
console.log.dump1.15-19: ActivityManagerProxy.getRecentActivity (same thread)
console.log.dump1.19: CommunityProxy.getParentContainer
console.log.dump1.20: Execute Freemarker result
console.log.dump2.1-5: Nothing happening
console.log.dump2.6: Execute Freemarker result
console.log.dump2.7-19: Nothing happening
console.log.dump2.20: Downloading image


 

Thus far, the thread dumps on application server #2 seem to have a similar pattern emerging:

 

 

console.log.dump1.1: RenderUtils.renderSubjectToText, popularity determination worker (background task)
console.log.dump1.2: CommunityAction.getDocumentCount, RenderUtils.renderSubjectToText (same thread?), popularity determination worker
console.log.dump1.3-5: ActivityManagerProxy.getRecentActivity, popularity determination worker
console.log.dump1.6: CommunityActionsWidget, execute Freemarker result, popularity determination worker
console.log.dump1.7: PopularBlogPostsWidget, ActivityManagerProxy.getRecentActivity, popularity determination worker
console.log.dump1.8: PopularBlogPostsWidget (same thread), ActivityManagerProxy.getRecentActivity, popularity determination worker
console.log.dump1.9: IteratorProxy, ActivityManagerProxy.getRecentActivity, popularity determination worker
console.log.dump1.10: IteratorProxy, ActivityProxy.getJiveObject, popularity determination work


 

After reviewing the first 50 thread dumps, my main area of concern is with the DB.  It seems as if there must be quite a few queries running without indexes.  There are a few things that will help us troubleshoot this:

 

  1. Did the data in your system grow organically or was it loaded by some sort of batch process?
  2. If the data was batched in, have you rebuilt your indexes since performing the upload (i.e. quick repair on the tables that were loaded)?
  3. Would it be possible to get a local copy of your database for testing purposes?  I'd be very interested in setting this up locally to see if I can reproduce the slowness issues.  If so, the dump can be uploaded to ftp.jivesoftware.com (jive_customer / hummbopp).  Just let me know the name of the file you uploaded. 
  4. Do the problems go away when you drop to a single node, unclustered?  From the thread dumps, it looks like you'll have these problems regardless of whether the clustering is enabled.  This will provide a good double-check of that assessment.
  5. What kinds of customizations do you have? 
  6. Can you please provide a report of the 10 longest running queries from MySQL?

 

Thanks!

Austen

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 25, 2009 10:31 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?
Have you been able to make any headway understanding the cache effectiveness data I posted, or coming up w/ some recommended changes there?  I'm looking to do a point release in the next week or so, so I really need to pull the trigger on any changes there.

As LG pointed out below, you need to increase the size for two of the caches to the size I have recommended below (document ID cache and the document versions cache).  This can be done in the admin console under System > Settings > Caches > Edit Caches.  I don't think we need to change the lifespans for any of the caches or run the datagram test, given the historical success stats.

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 30, 2009 9:40 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Hi Bret,

 

I think it is pretty clear from these query times that there are some problems with the DB.  Have you shown these queries to your DBA?  Here's one that really grabs my attention:

 

Count: 1  Time=26.09s (26s)  Lock=0.00s (0s)  Rows=0.0 (0), clearspace_user[clearspace_user]@app02
  INSERT INTO jiveWidgetFrmProp (frameID, name, propValue) VALUES (N,'S','S')

 

A simple insert took 26 seconds to complete.  Several of the other queries that should be returning very quickly are also taking an exceedingly long time.  Consider this one:

 

Count: 4  Time=22.36s (89s)  Lock=0.00s (0s)  Rows=1.0 (4), clearspace_user[clearspace_user]@2hosts
  SELECT id FROM jiveID WHERE idType=N

 

The jiveID table should have less than 100 records in it and this query is returning only 1 of them, so there is no way that it should ever take this long for the query to complete.  Can you please work with your DBA to validate that these numbers and, if so, to correct the DB problems? 

 

Thanks,

Austen

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 30, 2009 2:36 PM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?
Can you provide what queries you'd like to ensure are functioning within a given range?  I can then run some ad-hoc testing of these manually, in addition to the reports from the dba's.

 

There are thousands of queries in the application, so giving a few sample queries will be a bit like looking for the needle in the haystack.  Can you please upload your entire slow queries log to this case so I can do some analysis?  I did see that the counts were low, but they are concerning nonetheless.  The other queries that show up in that list should likewise be very quick to complete. 

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Apr 1, 2009 8:29 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

These numbers should be of great concern.  All of those queries should be returning in sub-millisecond times, not multiple seconds and certainly not an average of 10-20s.  Has your DBA done an analysis of the DB to see what the problem might be?  The lock time on the queries is really low, so it doesn't appear to be a transactional issue.

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Apr 15, 2009 10:49 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

What is your experience when running these queries directly against the DB?  Many of them are filtering only on the primary key, which is indexed and should have a constant time access.  Is the DB running on a VM?  What kind of memory/CPU/network stats do you see on the DB server?  Is this running on MySQL 5.0.x?  Which version of the connector are you running?

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Apr 16, 2009 8:38 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Two things that I would like you to try:

 

  1. Try the 5.1 version of the MySQL connector so we can rule that out (http://dev.mysql.com/downloads/connector/j/5.1.html)
  2. Setup an ssh tunnel so you can hit the application server directly to rule out any connector problems (i.e if your app server runs on port 8080, ssh -L 8080:localhost:8080 <username>@<server>, then hit http://localhost:8080 and see if you can still replicate the performance problem)
Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Apr 16, 2009 9:00 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Can you not replicate the problem with the same data in your staging environment?  From the thread dumps,. it doesn't look like it takes any load to reproduce the issue.  The point of this exercise is to eliminate as many parts of the equation as possible so we can identify the true culprit.  By removing Apache from the equation, we can definitively say whether it has anything to do with the problem.  MySQL connectors have been known to have bugs and it could be something in the JDBC layer causing a problem, so switching the connector would help us rule out a problem with the specific connector that you are using.

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 31, 2009 12:08 PM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Hi Bret,


if you really want to take a look at the SQL queries you may use the very simple one: "SELECT id FROM jiveID WHERE idType=N"
Austen did write: "The jiveID table should have less than 100 records in it and this query is returning only 1 of them, so there is no way that it should ever take this long for the query to complete."

On http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html and http://dev.mysql.com/doc/refman/5.1/en/sql-syntax-prepared-statements.html are examples how to run prepared statements from the command line.

 

You may return all valid IDs (less than 100) and then pick a random one, I assume that the ID 1 exists:

mysql> SELECT id form jiveID
mysql> PREPARE stmt_name FROM "SELECT id FROM jiveID WHERE idType=?";
mysql> SET @test_parm = 1;
mysql> EXECUTE stmt_name USING @test_parm;
mysql> DEALLOCATE PREPARE stmt_name;

 

Anyhow I still wonder whether Clearspace has the same issues when you use a JBoss/Tomcat HTTP connector. Which java options are you using (PermSize, GC settings, ...)? Long running garbage collections because of a small PermSize can cause such trouble, anyhow the JVM should report OutOfMemory errors sooner or later which does not seem to be the case for you.

 

LG

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 25, 2009 1:20 PM in response to: Austen Rustrum
Re: Low effectiveness percentages for caches... a problem?

Hi Austen,

 

what could "Can you please provide a report of the 10 longest running queries from MySQL?" help? Looking at the stack traces I see only a few "com.mysql.jdbc.Connection.execSQL()" statements in the stack traces, so I would say that the SQL is executed very fast.

 

-------

 

There are 80 "ajp-xxx-8009-NN" threads, I wonder whether there is a bottleneck within Apache or mod_jk. Do you encounter the same performance problems when you use the HTTP connector of JBoss?

 

LG

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 25, 2009 1:33 PM in response to: LG
Re: Low effectiveness percentages for caches... a problem?

The AJP threads are coming from the Apache connector and are mostly sitting around waiting for work to do and are nothing to be concerned with.  In fact, the majority of the ~300 threads are pretty boring in the thread dumps, just sitting around and waiting for something to do.  The threads that are actually doing something will show a stack that is somewhere in the Jive code.  If you look at thread dumps over a period of time, you can search for long running threads (same thread ID in the same section of code).  There were a few areas of concern from what I could see, with ActivityManagerProxy and ViewCommunitiesAction being the worst offenders.  The problem isn't necessarily in the DB--it could simply be in the number of calls that are being made--but it definitely could be a DB problem, so we need to rule that out.  By analyzing the long running queries, we'll be able to determine if there is anything in that layer that is cause for concern.

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 25, 2009 9:46 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Hi,

 

just my two cents about cache sizes:

Document ID Cache              0.12 MB   0.12 MB 98.0%
Document Versions Cache        0.12 MB   0.12 MB 92.4%

 

I would double the size of both, and if they fill again >90% I'd double them again and again. Even setting them to 10 MB should not be a problem within your environment.

 

LG

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 25, 2009 10:54 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

One more thing:

 

Since you have the memory capacity to do so, you can switch to the Large Site preset in the Edit Caches page.  This will bump up the sizes on all of your caches to a total size of ~575MB and will increase the sizes of the two problem caches to .5MB.  If you haven't had a lot of traffic go against the site and you expect to have some decent load, this would be a good idea.

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 26, 2009 9:54 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

the threading on this is getting whacked... is this something fixable on your side, or a preference I need to set?

Going to preferences and selecting there "flat view" for threads improve things a lot.

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 25, 2009 10:20 AM in response to: LG
Re: Low effectiveness percentages for caches... a problem?

I'd agree with that, except I would bump them each up to 1MB, especially if you use documents heavily.  Also, once your caches start to get about 60% full, you should start thinking about increasing their size as they can roll over quickly.

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 18, 2009 9:46 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Hi,

 

your graphs are not very helpful. To see if you have a bottleneck you need to run "vmstat 1" on your server and look for peaks - "vmstat 60" does not provide enough details to see why one request did take long. Also the memory usage does not show how often a GC does occur. You should start Openfire with more parameters like

-XX:+PrintGCTimeStamps

 

-XX:+PrintGCDetails

-Xloggc:/tmp/gc.log

and monitor also the /tmp/gc.log file. It could help to use another garbage collector, but one should look at this file before trying this.

 

LG

 

@Austen: Thanks for the info. If the cache uses a weak hashmap then one does not really need to care about memory. The JVM will purge the cache when needed - of course that's not a very good solution but better than an OutOfMemory error.

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 26, 2009 10:52 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Bret,

 

It certainly won't hurt.  On the long running ViewCommunitiesAction, I did see it engaged in various counting operations for subcommunities (doc counts, blog post counts, etc.), but that section of code specifically did not surface in the thread dumps I examined.  As far as the cache effectiveness is concerned, have you tried running a load test against your staging instance?  I'm guessing you have fairly low effectiveness because you have fairly low usage at the moment?  Do you expect that to change in production or will the use be fairly limited?  If the use is limited and you want to prevent the initial hit to the DB, increasing the lifespan for your caches makes sense.  However, if not, your caches could fill up at a much quicker rate since objects won't be expiring from them as quickly.  Just something to be aware of.  Have you run any tests since increasing the sizes of the ineffective document ID and version caches?  Also, how is your community tree structured?  Is it relatively flat or is it fairly deep?  Do you use a lot of negative permissions?

 

Thanks,

Austen

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 26, 2009 11:05 AM in response to: bretm
Re: Low effectiveness percentages for caches... a problem?

Hi Bret,

 

I'd still like to know whether you encounter the same performance problems when you use the HTTP connector of JBoss.

 

LG

 

@Austen: As there are no locks in the thread dumps, no long-running SQL's I wonder whether it helps to change the JVM and the application server ...

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 26, 2009 11:31 AM in response to: LG
Re: Low effectiveness percentages for caches... a problem?

@LG: We can see in the thread dumps that there are long running threads on the application server, and these problems need to be resolved.  From my analysis, the HTTP connector does not appear to be a factor in the performance problem.  Can you explain what you are seeing that is causing you to think this?  Also, what makes you think that there are not long running SQL queries?  As I mentioned before, it is possible that it is not a problem, but in my experience, it is difficult to make that assessment by looking at thread dumps alone, especially if they were taken several seconds apart.  DB metrics are a far more reliable source of information in this regard.

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 26, 2009 12:48 PM in response to: Austen Rustrum
Re: Low effectiveness percentages for caches... a problem?

Hi Austen,

grep -i mysql console.log.dump* | grep -v "MySQL Statement Cancellation Timer"

shows that there are a few SQL statements executed, anyhow max. 1 per dump. So I would say there are no long-running transactions.


"From my analysis, the HTTP connector does not appear to be a factor in the performance problem."

I assume you mean the AJP connector. One thing I notice is that ~65 AJP threads are "RUNNABLE" and ~15 are "WAITING (on object monitor)" - I have no idea why. Anyhow this may be completely irrelevant if Apache is the bottleneck. So testing this with a HTTP connector (port 808x) simply makes sure that Apache/AJP is not causing trouble.

 

LG

Austen Rustrum Jive Employee 6,752 posts since
Feb 19, 2008
Currently Being Moderated
Mar 27, 2009 6:03 AM in response to: LG
Re: Low effectiveness percentages for caches... a problem?

Hi LG,

 

Here's what a connection to the DB looks like in a stack trace:

 

   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:129)
    at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
    at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
    at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
    - locked <0x00002aaad308da50> (a com.mysql.jdbc.util.ReadAheadInputStream)
    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1994)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2411)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
    at com.mysql.jdbc.Connection.execSQL(Connection.java:3250)
    - locked <0x00002aaad2f89030> (a java.lang.Object)
    at com.mysql.jdbc.Connection.setTransactionIsolation(Connection.java:5704)
    - locked <0x00002aaad2fed148> (a com.mysql.jdbc.Connection)
    at sun.reflect.GeneratedMethodAccessor98.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.logicalcobwebs.proxool.WrappedConnection.invoke(WrappedConnection.java:162)
    at org.logicalcobwebs.proxool.WrappedConnection.intercept(WrappedConnection.java:87)
    at $java.io.Serializable$$EnhancerByProxool$$b37f4c28.setTransactionIsolation(<generated>)
    at com.jivesoftware.base.database.ConnectionPool.createCon(ConnectionPool.java:257)
    at com.jivesoftware.base.database.ConnectionPool.getConnection(ConnectionPool.java:164)
    at com.jivesoftware.base.database.DefaultConnectionProvider.getConnection(DefaultConnectionProvider.java:106)
    at com.jivesoftware.base.database.dao.JiveDataSource.getConnection(JiveDataSource.java:34)
    at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:113)
    at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:577)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:641)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:670)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:678)
    at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:721)
    at org.springframework.jdbc.core.simple.SimpleJdbcTemplate.queryForObject(SimpleJdbcTemplate.java:169)
    at com.jivesoftware.base.database.dao.JiveJdbcOperationsTemplate.queryForObject(JiveJdbcOperationsTemplate.java:174)
    at com.jivesoftware.community.impl.dao.UserContainerDAOImpl.getContainer(UserContainerDAOImpl.java:80)
    at com.jivesoftware.community.impl.UserContainerManagerImpl.getUserContainer(UserContainerManagerImpl.java:49)
    at com.jivesoftware.community.JiveContainerManagerImpl.getJiveContainer(JiveContainerManagerImpl.java:111)
    at com.jivesoftware.community.impl.DbBlog.getJiveContainer(DbBlog.java:1894)
    at com.jivesoftware.community.util.JiveContainerPermHelper.isContainerModerator(JiveContainerPermHelper.java:67)
    at com.jivesoftware.community.util.CommentPermHelper.getCanModerateComments(CommentPermHelper.java:46)
    at com.jivesoftware.community.proxy.CommentManagerProxy.getCommentCount(CommentManagerProxy.java:58)
    at com.jivesoftware.community.proxy.CommentManagerProxy.getCommentCount(CommentManagerProxy.java:49)

 

I'm not sure what the MySQL Statement Cancellation Timer is, but it looks like maybe a background thread to kill off connections that are not responding.  In any case, it appears that it misses DB activity.  Also, as I pointed out above, thread dumps are a snapshot in time.  A query that runs for 3 seconds is still a long-running query, especially if it is run multiple times per page request.  Thread dumps that are spaced out over time may or may not catch that.

 

I assume you mean the AJP connector. One thing I notice is that ~65 AJP threads are "RUNNABLE" and ~15 are "WAITING (on object monitor)" - I have no idea why. Anyhow this may be completely irrelevant if Apache is the bottleneck. So testing this with a HTTP connector (port 808x) simply makes sure that Apache/AJP is not causing trouble.

 

Yes, thank you for catching that, I did mean AJP.  It would be a good test to see if hitting the app server directly helps any, but empirical evidence thus far suggests that the problem is on the app server.  As for the difference in the runnable vs. waiting--perhaps only a few connections are actively waiting for work and the rest remain dormant until the load on the app server increases?  Just a thought...

LG Novice 88 posts since
Feb 16, 2006
Currently Being Moderated
Mar 28, 2009 12:25 AM in response to: Austen Rustrum
Re: Low effectiveness percentages for caches... a problem?

Hi Austen,

 

I really wonder what "WAITING" does mean. I did take a look at app01/console.log.dump.1.5 - 1.20 and did search for "ajp-10.5.30.13-8009-8". This seems to be an active or passive AJP thread which seems to take ages to complete. I have no idea if this is one request or not, anyhow it seems that 1.5-1.10 do work/hang in "com.jivesoftware.community.action.ViewCommunitiesAction.execute(ViewCommunitiesAction.java:53-55)".

In the dumps 1.11-1.17 it seems that "com.jivesoftware.community.web.struts.FreemarkerResult.doExecute" works/hangs with "WAITING" while in 1.18-1.20 the thread is RUNNABLE.

It's quite unusal to get dumps which look similar, so one may really wonder whether the JVM has a problem. I also wonder that the line numbers are always displayed. For Jive SBS you have a setup script which sets ulimits, and other system parameters. I wonder wheterh this could help.

 

LG

 

PS: Also http://markmail.org/message/tzq5pyt476yzh5xo#query:org.apache.tomcat.util.net.JIoEndpoint%24Worker.await(JIoEndpoint.java%3A416)+page:1+mid:qwox7atferka7kma+state:results describes a problem which sounds similar: "... a POST request ... can take 250s ... while this same request can pass OK in hundreds of ms. ..." There the situation can be improved with setting the #file ulimit from 1024 to 4096:

" ... to persist on reboot edit your /etc/security/limits.conf and add follow lines:

*       soft    nofile          4096

*       hard    nofile          10240

..."

Senthil Vaiyapuri Novice 206 posts since
Oct 23, 2008
Currently Being Moderated
Mar 29, 2009 9:58 PM in response to: LG
Re: Low effectiveness percentages for caches... a problem?

Hi,

 

  This thread piqued my interest as well.  Here are some suggestions.

 

  1.  Considerable number of ajp processors are waiting for work from the webserver.

       Please consider setting the connectionTimeout  and connection_pool_timeout if applicable,

       to recycle the connections.

       http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html

 

        Also, please review the following link, JBOSS's default mod_jk configuration seem

        to be inadequate

        http://www.jboss.org/community/docs/DOC-11543        /* yay, using clearspace */

 

  2.  Also, it may be worthwhile to check the Min(Max)Servers configuration set (if any) in

       Apache configuration as well.

 

  3.  Are you using the Apache Tomcat APR Native Library ?  If so, please try taking that out and

       see whether it gives any relief.

       http://tomcat.apache.org/tomcat-6.0-doc/apr.html

 

  4.  mysql - any queries showing up in slow query log (if setup to log) ?  If so, please check the access paths,         

                    may be time to optimize some heavily acted upon tables.

 

Best Regards,

-senthil

More Like This

  • Retrieving data ...

Bookmarked By (0)

To better serve our customers we have included functionality to automatically follow up on a case after it has been idle for more than 5 days, and then auto close after an additional 3 days of inactivity. Choose No to acknowledge that this case will remain idle for longer than 5 days.
Making cases public allows other customers to learn from the solution of the case. It can also be used to gain feedback from others in the community. Ask our Support Engineers for more info, but we encourage you to make your cases public.