Fix ONNX Session Memory Leak in Face Search API#1050
Conversation
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
|
|
📝 WalkthroughWalkthroughThe change relocates face detector and face recognition model initialization from upfront creation to a guarded try-catch block, adds dedicated exception handling for initialization failures, and improves resource cleanup by checking None directly instead of inspecting locals. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/utils/faceSearch.py`:
- Around line 105-110: The current broad exception handler returns "Failed to
initialize models" for any error in the entire try block; narrow the scope or
make the message generic: either wrap only the model initialization code in its
own try/except and keep the existing initialization-specific message there, and
use a separate try/except (or let exceptions propagate) around calls like
fn.get_embedding(), get_all_face_embeddings(), and the similarity processing
(the block handling cosine similarity and building GetAllImagesResponse), or
change the outer except to return a generic failure message (e.g., "Failed to
process face search") so errors from fn.get_embedding(),
get_all_face_embeddings(), or the similarity logic do not misreport as
initialization failures; reference GetAllImagesResponse, fn.get_embedding(),
get_all_face_embeddings(), and the similarity processing loop when applying the
fix.
|
Hi @rahulharpal1603 I’d appreciate a review, especially around:
Thanks in advance for your time |
Shevilll
left a comment
There was a problem hiding this comment.
Peer Review: ONNX Runtime Session Memory Leak Fix
Outstanding work identifying and fixing this critical resource leak!
In libraries like ONNX Runtime (which wrap native C/C++ runtimes), memory is allocated outside Python's heap and garbage collector. If the native session is not explicitly released via .close(), the allocated memory (both RAM and VRAM) is leaked permanently, eventually leading to Out Of Memory (OOM) crashes under load.
Your fix correctly addresses the scenario where FaceDetector() succeeds but FaceNet() fails during initialization, ensuring that the previously instantiated FaceDetector is closed cleanly.
Pythonic Resource Management: Context Managers
While your updated try...except...finally block resolves the leak, we can make this code much cleaner, safer, and idiomatic using Python's context managers.
Since both FaceDetector and FaceNet have explicit .close() methods, they are prime candidates for the with statement. We can wrap them using Python's standard contextlib.closing helper.
The contextlib.closing Pattern
Here is how you can simplify perform_face_search using with and closing:
from contextlib import closing
import uuid
def perform_face_search(image_path: str) -> GetAllImagesResponse:
"""
Performs face detection, embedding generation, and similarity search.
"""
try:
# Context managers are automatically nested. If FaceDetector succeeds but FaceNet
# fails to initialize, the outer context's __exit__ is executed, cleanly closing the detector.
with closing(FaceDetector()) as fd, closing(FaceNet(DEFAULT_FACENET_MODEL)) as fn:
matches = []
image_id = str(uuid.uuid4())
# Detect faces
try:
result = fd.detect_faces(image_id, image_path, forSearch=True)
except Exception as e:
return GetAllImagesResponse(
success=False,
message=f"Failed to process image: {str(e)}",
data=[],
)
if not result or result["num_faces"] == 0:
return GetAllImagesResponse(
success=True,
message="No faces detected in the image.",
data=[],
)
process_face = result["processed_faces"][0]
new_embedding = fn.get_embedding(process_face)
images = get_all_face_embeddings()
if not images:
return GetAllImagesResponse(
success=True,
message="No face embeddings available for comparison.",
data=[],
)
for image in images:
similarity = FaceNet_util_cosine_similarity(
new_embedding, image["embeddings"]
)
if similarity >= CONFIDENCE_PERCENT:
matches.append(
ImageData(
id=image["id"],
path=image["path"],
folder_id=image["folder_id"],
thumbnailPath=image["thumbnailPath"],
metadata=image["metadata"],
isTagged=image["isTagged"],
tags=image["tags"],
bboxes=image["bbox"],
)
)
return GetAllImagesResponse(
success=True,
message=f"Successfully retrieved {len(matches)} matching images.",
data=matches,
)
except Exception as e:
return GetAllImagesResponse(
success=False,
message=f"Failed to initialize models or perform search: {str(e)}",
data=[],
)Why this is a highly professional improvement:
- No Manual Cleanup: It completely removes the
finallyblock and the manualif fd is not None: fd.close()checks, which are prone to spelling mistakes or missing references. - Perfect Safety on Initialization Failure: In a compound
with A() as a, B() as b:statement, ifA()succeeds butB()throws, Python automatically unwinds the context stack and callsa.__exit__()(which callsfd.close()). - Consolidated Error Boundaries: Any uncaught exception inside the search logic is captured by the outer
except Exception as eand returned as a cleanGetAllImagesResponse(success=False, ...)rather than causing unhandled route failures in the API layer.
Long-Term Recommendation
Consider implementing the __enter__ and __exit__ context manager protocols directly within the FaceDetector and FaceNet classes themselves. This would allow us to omit closing() and write clean native code like:
with FaceDetector() as fd, FaceNet() as fn:
# ...This is the standard API design pattern in Python for resource-heavy objects.
Summary
This PR fixes a critical ONNX Runtime memory leak in the Face Search API that occurred when model initialization failed during request handling.
Previously, if
FaceDetector()was successfully created butFaceNet()failed during initialization, the allocated ONNX inference sessions were never released. This caused permanent CPU/GPU memory leaks and backend crashes under load.This change ensures all ONNX resources are always cleaned up, even when initialization or processing fails.
Problem Description
In
perform_face_search(), model resources were initialized outside thetryblock:FaceDetector()allocates multiple ONNX inference sessionsFaceNet()may throw (missing model file, GPU OOM, corrupted model, etc.)FaceNet()failed, execution never entered thetryblockfinallycleanup block was never executedONNX sessions allocate native memory outside Python GC, so the leak accumulated silently on every failed request.
Impact
This affects production deployments with:
Steps to Reproduce
Scenario 1: Missing FaceNet model