Same set of Tasks are repeated in multiple stages in a Spark Job -


a group of tasks consists of filters & maps appears in dag visualization of multiple stages. mean same transformations recomputed in stages? if how resolve this?

for every action performed on dataframe, transformations recomputed. due transformations not being computed until action performed.

if have single action there nothing can do, however, in case of multiple actions after each other, cache() can used after last transformation. using method spark save dataframe ram after first computation, making subsequent actions faster.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -