Concurrent inference with tensorflow -


looking @ cuda backend tensorflow, looks computation synchronized on single cuda stream (which think makes sense).

this suggests 1 concurrently run inference on 2 different tensorflow models, sharing same gpu. each have own cuda stream , execute independently.

is in fact supported use case? in particular, i'm curious if (1) expected work , (2) if there performance concerns.

obviously latency may little higher if 2 models sharing same compute resources, i'm curious if there assumptions in tensorflow cause more significant performance penalties.

thanks!


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

php - Doctrine Query Builder Error on Join: [Syntax Error] line 0, col 87: Error: Expected Literal, got 'JOIN' -