How does Kafka guarantee sequential disk access? -
i'm newbie kafka. when read documentation of kafka, saw kafka performing because of sequential disk access.
but how possible? in java(or else), if use file i/o, os handle appropriately. however, can't know if os store files want store in multiple sectors or in contiguous sectors. so, kafka cannot sequential disk access occurs in opinion.
am true or not?
kafka not always access disk sequentially things make more disk access often sequential. kafka messages stored in larger segment files (1gb each default) , since kafka messages not deleted when consumed (like in other message brokers) kafka not end creating fragmented filesystem on time continuously creating , deleting many variable length files. instead creates segment files , appends file until reaches 1gb (a configurable limit). when messages in segment expire delete entire 1gb segment. means these 1gb sections of disk laid out contiguous blocks. recommended best practice keep these kafka commit log files on dedicated filesystem not fragmented other apps reading , writing variable length files same filesystem. more importantly reading writing these segment files sequential , goes through os page cache reduce disk i/o further caching accessed pages in memory. why recommendation tune kernel set swappiness 1 reduce likelihood these cached pages swapped out of memory.
Comments
Post a Comment