Concurrent inference with tensorflow -


looking @ cuda backend tensorflow, looks computation synchronized on single cuda stream (which think makes sense).

this suggests 1 concurrently run inference on 2 different tensorflow models, sharing same gpu. each have own cuda stream , execute independently.

is in fact supported use case? in particular, i'm curious if (1) expected work , (2) if there performance concerns.

obviously latency may little higher if 2 models sharing same compute resources, i'm curious if there assumptions in tensorflow cause more significant performance penalties.

thanks!


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -