We briefly interviewed Zlatko Michailov, author of the Guide to Implementing Custom TPL Dataflow Blocks.
InfoQ: What types of applications do you think are especially suited for TPL Dataflow? Which do you see as inappropriate?
TPL Dataflow is a stream processing platform. For instance, streams of audio/video frames, streams of price quotes, etc. It is especially useful when the messages come at a high frequency. That’s when you can make a difference between an efficient platform and a non-efficient one.
An additional benefit of using a dataflow platform in general is that the topology of the dataflow network takes part in the processing. Thus the application consists of individual, small yet narrowly focused delegates. That makes the application easier to maintain.
InfoQ: Do you see TPL Dataflow as an advanced technique that is only going to be used by a few? Or do you think it will largely replace working directly with Tasks much like Tasks superseded threads?
Neither. TPL Dataflow doesn’t replace tasks. (I don’t think tasks replaced threads either. Tasks filled in a gap in the concurrent programming space.) TPL Dataflow implements patterns using tasks. While the main pattern is stream processing, each block is very general and could be used for other purposes. For instance, the WriteOnce block was designed to be used as a request-response mechanism – a WriteOnce block is instantiated upon a request and once the response data is written to it, it automatically completes, so that the requestor can continue asynchronously. Another example is ActionBlock in combination with the MaxDegreeOfParallelism option – it could be used as throttling mechanism that prevents more than a given number of processing tasks to execute at the same time. A third example is the BufferBlock in combination with the BoundedCapacity option which throttles the data feed. So I see TPL Dataflow as generally applicable.
InfoQ: For a novice just starting out with TPL Dataflow, what would you say are the most important things for them to learn?
This is purely my opinion - the most important thing is to realize that threads are expensive and the OS should not be pressured to create unnecessary threads. Developers should focus on the dependencies among tasks and should rely on the OS/framework to schedule those tasks.
Specifically about TPL Dataflow, I’d advise developers to experiment with each block individually. Chances are you’ll discover a block implements a pattern you frequently use. If you see a pattern that is close to the one you use but not quite like it, consider encapsulating multiple built-in blocks to make up that pattern. If that still doesn’t do it, you may be able to write a simple synchronous block that will fill in the gap.
InfoQ: Would you recommend mixing TPL Dataflow and Windows Workflow together?
WF’s goal is to enable persistent flows that usually take days or even months to complete. Its focus is on reliability, not on performance. TPL Dataflow targets purely performance. Its goal is to utilize the available hardware cores in the most efficient possible way. So technically you can mix the two technologies. My guess would be you can use TPL Dataflow within a WF step.