hadoop - What is Lineage In Spark? -


how lineage helps recompute data ?

for example, i'm having several nodes computing data 30 minutes each. if 1 fails after 15 minutes, can recompute data processed in 15 minutes again using lineage without giving 15 minutes again ?

everthing understand lineage in definition of it's rdds.

so let's review :

rdds immutable distributed collection of elements of data can stored in memory or disk across cluster of machines. data partitioned across machines in cluster can operated in parallel low-level api offers transformations , actions. rdds fault tolerant track data lineage information rebuild lost data automatically on failure

so there 2 things understand :

unfortunately, these topics quite long discuss in single answer. recommend take time reading them along following article data lineage.

and answer question , doubts :

if executor fails computing data, after 15 minutes, go last checkpoint, whether it's source or cache in memory and/or on disk.

that not save 15 minutes have mentioned !


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -