Learn how to use the thread dump facility in IBM® WebSphere® Application Server V6.1 to learn about your system environment, investigate whether a deadlock is happening, and extract information to help you avoid or resolve deadlock situations with your own applications.
A deadlock occurs when two or more threads form a cyclical dependency on each other. For example, a deadlock occurs if threadA is in a wait state waiting on threadB, while threadB is in a wait state waiting on threadA. Once this condition is established, neither threadA nor threadB will ever make any progress, since these two threads are now hanging indefinitely. Why would anyone ever create such a system? Well, you wouldn't intentionally, but a large volume of threads and complex transactions makes it is easy for this situation to occur.
This article describes how to use the thread dump facility of IBM WebSphere Application Server V6.1 to investigate a system and determine if a deadlock is happening. A real-world example is referred to and described here for the purposes of illustration. In this example, an enterprise application runs in the Web container and an OSGi bundle provides access to a protocol. When traffic is ramped up sufficiently, the application running on the server hangs up so thoroughly that the only task that can be performed is to issue a kill -9
command to finally abort the process. This article describes how this typical deadlock condition was discovered and resolved with methods you can apply to avoid or debug similar conditions in your own applications.
In many protocols, state machines are used to manage the state of each protocol connection. These state machines are sometimes referred to as Connection Finite State Machines (CFSM). A state machine requires atomic actions. This means that different parts of the state machine cannot be updated at different times; when an update occurs, the whole machine must be updated in that operation so that the next operation applies to the new state, and so on. To make this happen, the state machine is structured so that only one thread can perform an operation on the state machine at a time. The access methods to the state machine are all synchronized. Threads wishing to access the state machine are blocked until the thread currently accessing it completes its method call.
In this example protocol, any time a packet is received, transmitted, or times out, the state can be altered. Other conditions, like starts, stops, and disconnects, can also force the state machine to be altered. These inputs to the state machine make it a high risk area for deadlocks, especially when any number of threads can cause these events to happen.
When this problem was first encountered in this application, the application server was so locked up that no logging was occurring, and even the console could not be brought up. Indeed, no TCP traffic was flowing to or from the WebSphere Application Server process. This is a classic symptom of a deadlock. CPU utilization goes to zero, and nothing happens, and no requests are being received. The only available tool to use for debugging was a thread dump of the WebSphere Application Server process.
Thread dumps, or core files, are generated with names in this format:
javacore.date.time.id.txt
For example: javacore.20070919.204717.27050.txt
There are some conditions where a thread dump is created automatically for your Java™ Virtual Machine (JVM), such as when WebSphere Application Server is stopped by some means other than a normal stopserver request. Thread dumps can also be triggered by issuing a signal to the WebSphere Application Server process. For example, to produce a thread dump in a UNIX® environment, you can run this command:
kill -3 process_id
where process_id is the process ID of the WebSphere Application Server JVM. A thread dump can also be created using wsadmin. To force a thread dump using the wsadmin command prompt, issue these commands:
wsadmin
wsadmin>set jvm [$AdminControl completeObjectName type=JVM, process=server1,*]
wsadmin>$AdminControl invoke $jvm dumpThreads
This creates a file such as javacore.20071012.080508.4252.txt in the was_profile_root directory.
If you didn't know what to expect when you open up a thread dump, here is what you'll find: a stack trace of every single thread in WebSphere Application Server. But more importantly, there is also a list of all of the locks in the system. Voila!
The thread dump is a simple text file that can be opened with any text editor. In it, you will find quite a bit of interesting and useful information, as described in the next sections.
The dump begins with some information about the environment in which the WebSphere Application Server process was running (Listing 1). It describes the OS Level, JRE Level, number of processors and so on. This is good information to know, but is not really relevant for the deadlock.
NULL ------------------------------------------------------------------------
0SECTION TITLE subcomponent dump routine
NULL ===============================
1TISIGINFO Dump Event "user" (00004000) received
1TIDATETIME Date: 2007/09/19 at 20:47:17
1TIFILENAME Javacore filename:
/usr/IBM/WebSphere/AppServer/profiles/AppSrv01/javacore.20070919.204717.27050.txt
NULL ------------------------------------------------------------------------
0SECTION GPINFO subcomponent dump routine
NULL ================================
2XHOSLEVEL OS Level : AIX 5.3
2XHCPUS Processors -
3XHCPUARCH Architecture : ppc
3XHNUMCPUS How Many : 4
NULL
1XHERROR2 Register dump section only produced for SIGSEGV, SIGILL or SIGFPE.
NULL
NULL ------------------------------------------------------------------------
0SECTION ENVINFO subcomponent dump routine
NULL =================================
1CIJAVAVERSION J2RE 5.0 IBM J9 2.3 AIX ppc-32 build j9vmap3223-20070426
1CIVMVERSION VM build 20070420_12448_bHdSMR
1CIJITVERSION JIT enabled - 20070419_1806_r8
1CIRUNNINGAS Running as a standalone JVM
You might not have an immediate need for memory dump information, but the heap information is interesting anyway. Listing 2 tells you that you have a large amount of space available on the heap, which is an indicator that there are no memory leaks present -- or at least a memory leak is not a pressing problem at the moment. You can see that the amount of free space is about three quarters of the available one gigabyte of space.
0SECTION MEMINFO subcomponent dump routine
NULL =================================
1STHEAPFREE Bytes of Heap Space Free: 31c6fb90
1STHEAPALLOC Bytes of Heap Space Allocated: 40000000
NULL
The lock information is the key in the quest of finding a deadlock. The lock is a resource that can only be owned by one thread at a time. Other threads waiting for that lock are blocked until the thread that owns it releases it. The protocol example state machine is a monitor or lock, and is invoked on every packet transmitted or received on a protocol connection. In this particular installation, there are three protocol threads and three protocol state machines. This means that there are three protocol connections up and running. The three threads are: ProtocolThreadPool:0, ProtocolThreadPool:1, and ProtocolThreadPool:2 (the thread is taken from a pool of threads called the "ProtocolThreadPool," hence the thread names).
In Listing 3, the first thing you will notice is that ProtocolThreadPool:0 and 2 are sitting there, waiting to be notified. This is typical, as the protocol thread blocks waiting to read an incoming packet. However, ProtocolThreadPool:1 is processing a receive packet. From the trace, you can see that ProtocolCfsm_Initiator (a protocol state machine) is a monitor that is in use by ProtocolThreadPool:1. This is a special situation in that the protocol thread has received a packet and is processing it while in the state machine. The amount of processing time is quite small, as the only thing that is actually performed is the lookup of where to pass this packet to. But the list of threads waiting to get access to the ProtocolCfsm_Initiator monitor is huge! There are 50 WorkManager threads, 34 WebContainer threads, and 1 Non-deferrable Alarm:0 thread. At this point, you don't know exactly why they are waiting, but any time a thread wishes to transmit a protocol packet, it must gain access to the state machine of a particular connection. Therefore, these threads can be queued up waiting to transmit a packet.
NULL
NULL ------------------------------------------------------------------------
0SECTION LOCKS subcomponent dump routine
NULL ===============================
NULL
1LKPOOLINFO Monitor pool info:
2LKPOOLTOTAL Current total number of monitors: 968
2LKMONINUSE sys_mon_t:0x3461897C infl_mon_t: 0x346189A4:
3LKMONOBJECT java/lang/Object@A05B49B8/A05B49C4:
3LKNOTIFYQ Waiting to be notified:
3LKWAITNOTIFY "ProtocolThreadPool: 0" (0x345FD800)
2LKMONINUSE sys_mon_t:0x346189D8 infl_mon_t: 0x34618A00:
3LKMONOBJECT java/lang/Object@A05AE930/A05AE93C:
3LKNOTIFYQ Waiting to be notified:
3LKWAITNOTIFY "ProtocolThreadPool: 2" (0x3464C000)
2LKMONINUSE sys_mon_t:0x355C4E94 infl_mon_t: 0x355C4EBC:
3LKMONOBJECT com/ibm/protocol/cfsm/ProtocolCfsm_Initiator@A0456A00/A0456A0C: owner
"ProtocolThreadPool: 1" (0x3464BC00), entry count 1
3LKWAITERQ Waiting to enter:
3LKWAITER "WebContainer : 60" (0x32E01B00)
3LKWAITER "WebContainer : 61" (0x33045F00)
<... 32 more of these ...>
3LKWAITER "Non-deferrable Alarm : 0" (0x33846900)
3LKWAITER "WorkManager.ProtocolWorkManager : 0" (0x35895600)
3LKWAITER "WorkManager.ProtocolWorkManager : 1" (0x3580EB00)
<... 48 more of these ...>
2LKMONINUSE sys_mon_t:0x35815D30 infl_mon_t: 0x35815D58:
3LKMONOBJECT java/lang/Object@A0A3CF90/A0A3CF9C:
3LKNOTIFYQ Waiting to be notified:
3LKWAITNOTIFY "ProtocolThreadPool : 1" (0x3464BC00)
Also, notice that ProtocolThreadPool:1 is in the ProtocolCfsm_Initiator monitor but that it also is waiting to be notified.
The next step is to determine why all of the threads are waiting. The beauty of the thread dump is that it takes a snapshot of all of the thread stacks and gives you a stack trace for every thread in the system. This area of the dump has the following header:
NULL
NULL ------------------------------------------------------------------------
0SECTION THREADS subcomponent dump routine
NULL =================================
NULL
1XMCURTHDINFO Current Thread Details
NULL ----------------------
NULL
1XMTHDINFO All Thread Details
NULL ------------------
Based on the lock above, you can check out where the threads are blocked, waiting to be notified, or allowed into a monitor. The first thread waiting is WebContainer:62. You can see that this thread is processing a Web service request, getCCServicePriceEnquiry(), which ultimately invokes sendProtocolPacket(). If you look closely at the line of code in the actual source code implementation of this method (1196), you will see that the line being executed is:
cfsm.send(packet, realmName, packetCallback);
Thus, the thread is blocked on the CFSM waiting to transmit. WebContainer:61 is blocked on the exact same line of code. The only difference is that this is processing a sendCCRefund() Web service request. Nevertheless, the task ultimately stops when it is trying to transmit a packet. If you look through all of the WebContainer threads (Listings 5 and 6), they are all blocked on the exact same line.
3XMTHREADINFO "WebContainer : 62" (TID:0x33C13400, sys_thread_t:0x32FB4BA8, state:B,
native ID:0x0000C0B3) prio=5
4XESTACKTRACE at com/ibm/rotocol/base/ProtocolBaseApiHelper.sendProtocolPacket
(ProtocolBaseApiHelper.java:1196(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiImpl.sendProtocolPacket
(ProtocolBaseApiImpl.java:649(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ProtocolInterface.transmitPacket
(ProtocolInterface.java:68(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/CreditControlRequest.transmitPacket
(CreditControlRequest.java:390(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/CreditControlRequest.sendChargingInfo
(CreditControlRequest.java:147(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ProtocolRoService.sendCCR
(ProtocolRoService.java:143(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ProtocolRoService.getCCServicePriceEnquiry(
ProtocolRoService.java:368(Compiled Code))
4XESTACKTRACE at sun/reflect/GeneratedMethodAccessor26.invoke(Bytecode PC:40(Compiled
Code))
4XESTACKTRACE at sun/reflect/DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43(Compiled Code))
4XESTACKTRACE at java/lang/reflect/Method.invoke(Method.java:615(Compiled Code))
3XMTHREADINFO "WebContainer : 61" (TID:0x33045F00, sys_thread_t:0x32BB9B28, state:B,
native ID:0x0001399B) prio=5
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiHelper.sendProtocolPacket
(ProtocolBaseApiHelper.java:1196(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiImpl.sendProtocolPacket
(ProtocolBaseApiImpl.java:649(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ProtocolInterface.transmitPacket
(ProtocolInterface.java:68(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/CreditControlRequest.transmitPacket
(CreditControlRequest.java:390(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/CreditControlRequest.sendChargingInfo
(CreditControlRequest.java:147(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ProtocolRoService.sendCCR
(ProtocolRoService.java:143(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ProtocolRoService.sendCCRefund(ProtocolRoService.
java:329(Compiled Code))
4XESTACKTRACE at sun/reflect/GeneratedMethodAccessor33.invoke(Bytecode PC:40(Compiled
Code))
4XESTACKTRACE at sun/reflect/DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43(Compiled Code))
4XESTACKTRACE at java/lang/reflect/Method.invoke(Method.java:615(Compiled Code))
The WorkManager threads (Listing 7) are slightly different. In the protocol framework, WorkManager threads are used to perform asynchronous Web service and database requests in response to a request received from the protocol server. The WorkManager thread also must respond back to the protocol server with an acknowledgement that it has received the request. All of these threads are also stopped while trying to send a packet. If you look at the line of code they are blocked at:
cfsm.send(packet, null, packetCallback);
you see that it is the act of transmitting the acknowledgement packet back to the protocol server that is blocked. Also, notice that there are exactly 50 of these WorkManager threads that are stuck in this same situation. In this example, WebSphere Application Server was configured to have a pool of 50 WorkManager threads.
3XMTHREADINFO "WorkManager.ProtocolWorkManager : 0" (TID:0x35895600, sys_thread_t:
0x357ECAD8, state:B, native ID:0x0000B187) prio=5
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiHelper.sendProtocolPacket
(ProtocolBaseApiHelper.java:1239(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiImpl.sendProtocolPacket
(ProtocolBaseApiImpl.java:686(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ProtocolInterface.transmitPacket(ProtocolInterface.
java:122(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ReAuthRequest.createAndSendAnswerPacket
(ReAuthRequest.java:391(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/ro/ReAuthRequest.run(ReAuthRequest.java:197(Compiled
Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/J2EEContext.run(J2EEContext.java:1114(Compiled
Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/WorkWithExecutionContextImpl.go
(WorkWithExecutionContextImpl.java:195(Compiled Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/CJWorkItemImpl.run(CJWorkItemImpl.java:150
(Compiled Code))
4XESTACKTRACE at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1469(Compiled
Code))
The Non-deferrable Alarm:0 thread is also blocked in a slightly different way. This thread is caused by the administrator trying to shut down the server. You can see that it is going through and shutting down the application. The shutdown process attempts to remove all connections by shutting them down. This results in a call to ProtocolConnection.stopConnection. If you look at the line of code indicated in the stack trace, you will find:
this.getProtocolCfsmInitiator().stop();
This is another synchronized call to the same CFSM, so the shutdown thread also gets stuck on the deadlock.
3XMTHREADINFO "Non-deferrable Alarm : 0" (TID:0x33846900, sys_thread_t:0x3380B550, state:
B, native ID:0x00017469) prio=5
4XESTACKTRACE at com/ibm/protocol/cfsm/ProtocolConnection.stopConnection
(ProtocolConnection.java:520)
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiHelper.confRemovePeer
(ProtocolBaseApiHelper.java:719)
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiHelper.unRegisterApplicationId
(ProtocolBaseApiHelper.java:339)
4XESTACKTRACE at com/ibm/protocol/base/ProtocolBaseApiImpl.unRegisterApplicationId
(ProtocolBaseApiImpl.java:448)
4XESTACKTRACE at com/ibm/protocol/util/DiamInitBaseServlet.unRegister(DiamInitBaseServlet.
java:1037)
4XESTACKTRACE at com/ibm/protocol/util/DiamInitBaseServlet.destroy(DiamInitBaseServlet.
java:1059)
4XESTACKTRACE at com/ibm/protocol/servlet/ProtocolInitServlet.destroy(ProtocolInitServlet.
java:215)
4XESTACKTRACE at com/ibm/ws/webcontainer/servlet/ServletWrapper.doDestroy(ServletWrapper.
java:801)
4XESTACKTRACE at com/ibm/ws/wswebcontainer/servlet/ServletWrapper.doDestroy
(ServletWrapper.java:677)
4XESTACKTRACE at com/ibm/ws/webcontainer/servlet/ServletWrapper.destroy(ServletWrapper.
java:880)
4XESTACKTRACE at com/ibm/ws/webcontainer/webapp/WebApp.destroy(WebApp.java:2594)
4XESTACKTRACE at com/ibm/ws/wswebcontainer/webapp/WebApp.destroy(WebApp.java:1078)
4XESTACKTRACE at com/ibm/ws/container/AbstractContainer.destroy(AbstractContainer.java:82)
4XESTACKTRACE at com/ibm/ws/webcontainer/webapp/WebGroup.destroy(WebGroup.java:194)
4XESTACKTRACE at com/ibm/ws/webcontainer/webapp/WebGroup.removeWebApplication(WebGroup.
java:232)
4XESTACKTRACE at com/ibm/ws/webcontainer/VirtualHost.removeWebApplication(VirtualHost.
java:282)
4XESTACKTRACE at com/ibm/ws/wswebcontainer/VirtualHost.removeWebApplication(VirtualHost.
java:181)
4XESTACKTRACE at com/ibm/ws/wswebcontainer/WebContainer.removeWebApplication(WebContainer.
java:735)
4XESTACKTRACE at com/ibm/ws/webcontainer/component/WebContainerImpl.uninstall
(WebContainerImpl.java:359)
4XESTACKTRACE at com/ibm/ws/webcontainer/component/WebContainerImpl.stop(WebContainerImpl.
java:562)
4XESTACKTRACE at com/ibm/ws/runtime/component/ApplicationMgrImpl.stop(ApplicationMgrImpl.
java:1324)
4XESTACKTRACE at com/ibm/ws/runtime/component/DeployedApplicationImpl.
fireDeployedObjectStop(DeployedApplicationImpl.java:1143)
4XESTACKTRACE at com/ibm/ws/runtime/component/DeployedModuleImpl.stop(DeployedModuleImpl.
java:602)
Finally, look at the ProtocolThreadPool: 0, 1, and 2. ProtocolThreadPool:0 and ProtocolThreadPool:2 are both in the ProtocolChannelReader waiting for the protocol channel to pass a protocol packet up to it for processing, and so they are completely idle. ThreadPool:0 and ThreadPool:2 are not the culprits; it is ThreadPool:1 that is the culprit.
3XMTHREADINFO "ProtocolThreadPool : 0" (TID:0x345FD800, sys_thread_t:0x345E3FA8, state:CW,
native ID:0x0000F6B3) prio=5
4XESTACKTRACE at java/lang/Object.wait(Native Method)
4XESTACKTRACE at java/lang/Object.wait(Object.java:231(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/cfsm/ProtocolChannelReader.run(ProtocolChannelReader.
java:253)
4XESTACKTRACE at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1469)
3XMTHREADINFO "ProtocolThreadPool : 2" (TID:0x3464C000, sys_thread_t:0x345E44D8, state:CW,
native ID:0x0000E419) prio=5
4XESTACKTRACE at java/lang/Object.wait(Native Method)
4XESTACKTRACE at java/lang/Object.wait(Object.java:231(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/cfsm/ProtocolChannelReader.run(ProtocolChannelReader
.java:253)
4XESTACKTRACE at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1469)
Listing 11 shows ProtocolThreadPool:1, which is doing quite a lot: it has received a packet and is attempting to hand that packet to a higher layer for processing. Eventually, it attempts to hand the packet over to a WorkManager thread, and here is where you will see that it blocks invoking an Object.wait() method. The enqueue operation is blocked awaiting an available thread. And so, the deadlock is complete.
3XMTHREADINFO "ProtocolThreadPool : 1" (TID:0x3464BC00, sys_thread_t:0x345E4240, state:CW,
native ID:0x0000E6EF) prio=5
4XESTACKTRACE at java/lang/Object.wait(Native Method)
4XESTACKTRACE at java/lang/Object.wait(Object.java:231(Compiled Code))
4XESTACKTRACE at com/ibm/ws/util/BoundedBuffer.waitPut_(BoundedBuffer.java:211(Compiled
Code))
4XESTACKTRACE at com/ibm/ws/util/BoundedBuffer.put(BoundedBuffer.java:323(Compiled Code))
4XESTACKTRACE at com/ibm/ws/util/ThreadPool.execute(ThreadPool.java:1135(Compiled Code))
4XESTACKTRACE at com/ibm/ws/util/ThreadPool.execute(ThreadPool.java:1014(Compiled Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/WorkItemImpl$PoolExecuteProxy.run(WorkItemImpl.
java:197(Compiled Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/WorkItemImpl.executeOnPool(WorkItemImpl.java:211
(Compiled Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/WorkManagerImpl.queueWorkItemForDispatch
(WorkManagerImpl.java:400(Compiled Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/WorkManagerImpl.schedule(WorkManagerImpl.java:951
(Compiled Code))
4XESTACKTRACE at com/ibm/ws/asynchbeans/WorkManagerImpl.schedule(WorkManagerImpl.java:771
(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/servlet/ProtocolInitServlet.packetEvent
(ProtocolInitServlet.java:287(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/cfsm/ProtocolState_RI_Open.handleRequestDataPacket
(ProtocolState_RI_Open.java:611(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/cfsm/ProtocolState_RI_Open.packetReceived
(ProtocolState_RI_Open.java:176(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/cfsm/ProtocolCfsm.packetReceived(ProtocolCfsm.java:95
(Compiled Code))
4XESTACKTRACE at com/ibm/protocol/cfsm/ProtocolChannelReader.run(ProtocolChannelReader.
java:204)
4XESTACKTRACE at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1469)
The ProtocolThreadPool:1 thread -- while holding the CFSM lock -- is waiting for a WorkManager thread to become available. The WorkManager threads are all in use, waiting to obtain the lock to the CFSM and to transmit a packet. None of the threads can make progress. Figure 1 illustrates this situation:
- 36 WebContainer threads are awaiting access to the CFSM monitor to transmit a packet.
- 50 WorkManager threads are also waiting to access the CFSM monitor to transmit a packet.
- The Reader thread, which is the one thread with access to the CFSM monitor, is receiving an inbound packet that it wants to hand over to a WorkManager thread for processing.
- Because all of the WorkManager threads are in use, the Reader thread blocks waiting for one to free up -- but they will never free up because they are all waiting to use the CFSM to transmit a packet.
This particular deadlock is a pretty easy one to resolve; you simply change the method that invokes the WorkManager threads to throw an exception if there are no WorkManager threads available, rather than block. This breaks the deadlock cycle by discarding the packet and logging an error message rather than blocking. The ProtocolThreadPool:1 thread can return back and release the CFSM lock, and then the WorkManager and WebContainer threads can transmit their packets.
To illustrate how to track down a typical deadlock problem, this article used the locks and thread stack traces to find a single resource (CFSM) on which many of the threads were blocked. The thread that was holding that resource (ProtocolThreadPool:1) was found, along with the resource for which it was waiting (the availability of a WorkManager thread), completing the cycle that caused the deadlock. Resolving the deadlock involved finding a way to release resources (drop the packet and free up the CFSM) so that the cyclical dependency would not occur again. Hopefully, you will able to apply the principles and methods described in this basic example so you can avoid or resolve deadlock conditions in your own applications.
No comments:
Post a Comment