My logs: More on multithreading

There are many things to learn when you try to solve classic problems of software engineering thinking on stepwise decisions and their consequences. I tried it for producer consumer problem and found this experiment very interesting.

Prodcucer-consumer is classic example of shared resource. Here producer produces something which consumers are waiting to consume. So why cant producer produce and directly tell consumer to use it, job done. But mostly Producers and consumers are different threads of execution. Producer can be some messages coming from some backend of the system and consumer is your application who is waiting for them for some processing. So there should be some common place for these messages so that they can be queued (or whatever based on policy) and later consumer can use them. So this is what shared resource. And there are not only one producer and one consumer thread that we are talking.

It is actually more than one. There is concern of how producer inform consumer that new message is arrived and how good the logic is that makes consumer to react as quickly as possible.

What Producer needs to do when the buffer is full?

What Consumer should do when there is no message in the buffer?

What Consumer should do after processing the message?

What if message processing itself is taking too much time and there are so many messages coming in the buffer? We can’t wait for entire message to get processed as buffer will get full and we may loose some information and increase response time which is definitely a failure at least for real time system.

What about the case when buffer should be updated by more than 2 threads at same time? What we have to lock and how long we should keep the lock considering the processing time?

First of all we will draw some requirement:

Here we have 5 producers that produce some thing after some time delay. They append those details to the queue. Now there are 5 consumers running which take those details out of the queue and process it. If buffer is full then producer can wait to allow some consumers to consume something out of the queue. If there is nothing that consumer can consume then consumer can wait for some producer to produce something.

Solution:

I have following classes:

Producer (Runnable)
Consumer (Runnable)

ThreadPool (which starts some number of threads which are meant to run the passed Runnable)

MessageBuffer (this is shared data structure by producers and consumers. Here we will have locks if buffer is empty and consumer is trying to read etc.)

Other participants are straight forward. But critical part is message buffer. I can write following simple code for addMessage and removeMessage methods.

Things to be noticed:

i) Add method will be called by producer group of thread. Remove method will be called by consumer group of threads. These are the method where shared data structure will be updated and the place where we need locking implementation.

ii) We have same lock (same MessageBuffer instance) for all (producers and consumers) threads.

Now these are common things that I find everywhere when I search for producer and consumer problem.

What is conceptually more important is this:

i) Wait is inside the while loop. This because when thread waits it is because of some condition. And when It is woken up this is not necessary that it because that condition is fulfilled. It can be interrupted from other thread, wrong notification (which should not be the case) or may be timeout. Keeping this inside the loop solves this problem as thread immediately checks condition again and if it is fulfilled then only continues.

ii) I have used wait(n) and notify(). There is some intent with this.

Use of one makes other necessary. When notify() is used to wake up waiting thread this will wake up only one thread and will keep other threads still waiting.

So imagine I have some point in time 2 producer threads waiting on buffer full condition and 2 consumer threads waiting on buffer empty condition. Yes, this scenario is possible as I have used notify() here. Now when I say notify() from some say Consumer thread that there is some space in the buffer for one producer can try to add something to queue and as all (producer and consumer) threads are locked by same messageBuffer instance, notification can go to even Consumer thread. Consider when this consumer thread wakes up queue is already empty because of last read by consumer. This newly woken up thread will check for queue empty case and as queue is empty it will again go to wait state. So we miss notification and message is still in queue even though there 2 waiting producer threads.

This could have been solve with notifyAll() method but this will lead unnecessary context switching overhead as all waiting thread (producers and consumers) will come out of waiting state and move to runnable state. Also this is meaningless when I read one message making place for one new message in queue. So waking up all say 4 waiting producer thread is unnecessary overhead.

So to solve this I have to

a) wait using wait(n) within while loop. This makes sure that even if we miss notification thread after n time will check for the condition. Yes, but of course we shouldn’t have missed the notification in first place.

b) Use different locks for reader and writers. Reader when they find that buffer is empty they wait on writer lock to which only writer can notify after adding something to the buffer and vice versa. This is important gain in performance because notification from readers are meaning less for other reader and only meant for writer.

This can be coded like this:

This code will solve the problem of unnecessary notifications and wastage of notification. Readers will notify to writers only and writer will notify readers only.

This was some explanation of producer consumer which made me to appreciate trickyness of multithreading. That’s why I like it as it makes me think like genius :)

There are many other concepts which I read in mentioned documents.

i) How to determine and optimize size of thread pool?

ii) How to use happen-before concept to make sure to avoid memory inconsistency while doing multithreading

iii) How multi core CPU’s supports concurrency and response time required for real time systems. JIT compilers options to optimize the response time for real time processing system.

iv) How garbage collection of java increases worst case response time for concurracy?

v) Support for various thread issues from new java.util.concurrent package

References:

http://www.ibm.com/developerworks/library/j-jtp0730.html

http://mindprod.com/jgloss/thread.html#VOLATILE

http://www.javaworld.com/javaworld/jw-04-2008/jw-04-realtime.html?page=4

http://java.sun.com/docs/books/tutorial/essential/concurrency/index.html

My logs

Monday, June 2, 2008

More on multithreading

No comments: