Concurrent inference with tensorflow -


looking @ cuda backend tensorflow, looks computation synchronized on single cuda stream (which think makes sense).

this suggests 1 concurrently run inference on 2 different tensorflow models, sharing same gpu. each have own cuda stream , execute independently.

is in fact supported use case? in particular, i'm curious if (1) expected work , (2) if there performance concerns.

obviously latency may little higher if 2 models sharing same compute resources, i'm curious if there assumptions in tensorflow cause more significant performance penalties.

thanks!


Comments

Popular posts from this blog

Qt QGraphicsScene is not accessable from QGraphicsView (on Qt 5.6.1) -

php - Cannot override Laravel Spark authentication with own implementation -

What is happening when Matlab is starting a "parallel pool"? -