Concurrent inference with tensorflow -


looking @ cuda backend tensorflow, looks computation synchronized on single cuda stream (which think makes sense).

this suggests 1 concurrently run inference on 2 different tensorflow models, sharing same gpu. each have own cuda stream , execute independently.

is in fact supported use case? in particular, i'm curious if (1) expected work , (2) if there performance concerns.

obviously latency may little higher if 2 models sharing same compute resources, i'm curious if there assumptions in tensorflow cause more significant performance penalties.

thanks!


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

Python Tornado package error when running server -

Qt QGraphicsScene is not accessable from QGraphicsView (on Qt 5.6.1) -