I am not the author, so cannot claim to talk for him. But when latency is a concern ( say <10ms) per record processing is a problem as it will have higher latencies.
I think, that is the main point this article points out.
The only major point raised here is that Apache Spark does not have "record-by-record processing". I'm not sure why I'd prefer that in every case, and would appreciate some perspective.
There is very little in this that tells me why I care about the distinction. The biggest reason seems to be subsecond latency. If you drop needing REAL real time though I don't see a practical difference.