I observed that I get consistent faster times on my batch dataflow execution. I am running this on the same cluster and the source datasets have roughly the same size as when I was modeling. Is there a reason why they are faster? Do you perform any optimization? If so what kinds of optimization do you perform?
We definitely optimize our batch dataflows. It is not surprising to note that your batch dataflows do run faster. Our optimization algorithms are detailed and sophisticated. As an example, we optimize based on fields of interest.