Tuesday, October 9, 2007

Broken pipes with Tomcat and DBCP connection pooling

Getting "broken pipe" errors showing up in your tomcat logs? Happening every 8 hours or so?I had this problem with MySQL 4.1.21 and I've managed to fix it.

(NB. if you are running 5.0.45 MySQL then you'll get a "com.mysql.jdbc.CommunicationsException: Communications link failure" error which is a symptom of the same problem)

A quick search of the web revealed that lots of people are getting these and there are LOTS of suggested solutions. The most common explanation is that your MySQL connections are getting terminated by MySQL before they have been released by the connection pool. Then the next time you try to get a connection from the pool, you'll get a dead connection which will give a "broken pipe" error.

The first step is to try and reproduce it, without waiting around for 8 hours!

I checked the MySQL timeout setting via "mysql> select @@global.wait_timeout". Mine was set to 28,800 which is, not suprisingly, 8 hours. I then reduced this to one minute by:
  • stopping MySQL ("/usr/local/bin/mysqladmin shutdown"),
  • modifying the /etc/my.inf file and adding a "wait_timeout=60" under the "[mysqld]" heading and saving,
  • restarting MySQL ("/usr/local/mysql/mysqld_safe")
Don't worry if your session timeout hasn't changed ("mysql> select @@session.wait_timeout") - it doesn't seem to affect your terminal session. Your global session will have changed, and new connections will use the global setting.

Next I started my application and saw my connections appear (via "mysql> show processlist;") and then disappear at the 60 second mark. Trying to use the app again resulted in the broken pipe error HOORAY! Retrying the app resulted in one broken pipe error per connection in the pool, then it all works OK.

Now the fix.

Appending ?autoconnect=true to the JDBC driver connection string didn't work.

Killing the idle session before MySQL did was the solution. With the MySQL timeout still set at 60 seconds, I decided to evict idle session after 30 seconds by adding the following to the data source:

minEvictableIdleTimeMillis=30000 (evict after 30 seconds inactivty)
timeBetweenEvictionRunsMillis=10000 (run the evicter every 10 seconds)
numTestsPerEviction=-1 (check every connection)

Now, running the same test above, the connections disappear from the list after about 40 seconds. And retrying the ap just creates a new connection. HOORAY! Fixed!! If you are not happy with your connection pool getting wiped out then you can set "minIdle=n" against the data source. When the items from the pool are evicted, new ones are crested to keep the pool full.

In the end I went for a 6 hour idle eviction, checking very 30 mins.