Using Python, OpenCV, and MediaPipe to create a real-time gesture recognition system.
I wanted to build something fun that combined computer vision with interactive media. The result: OpenCV MNKY — a gesture-controlled avatar meme system that changes images based on your hand gestures in real-time.
The system uses three main technologies:
import cv2
import mediapipe as mp
mp_face = mp.solutions.face_detection
mp_hands = mp.solutions.hands
face_detection = mp_face.FaceDetection(
model_selection=0,
min_detection_confidence=0.6
)
hands = mp_hands.Hands(
max_num_hands=2,
min_detection_confidence=0.6
)
The avatar changes based on what's detected:
| Detection | Avatar | Meaning | |-----------|--------|---------| | Face only | mon3.jpg | Neutral state | | Face + Hand | mon2.jpg | Active/waving | | Finger near mouth | mon1.jpg | "Shh" gesture | | No detection | Black screen | Nobody home |
Laptop webcams are usually index 0, but external cameras (Camo, OBS Virtual Camera) have different indices. I implemented auto-detection:
for cam_index in range(1, 6):
cap = cv2.VideoCapture(cam_index)
if cap.isOpened():
print(f"✓ Camera index {cam_index} found!")
break
People expect to see themselves mirrored (like a real mirror). The solution:
frame = cv2.flip(frame, 1) # Horizontal flip
This required calculating the distance between finger landmarks and face landmarks:
# Simplified logic
mouth_y = face_landmarks[0].y * frame_height
finger_y = hand_landmarks[8].y * frame_height # Index finger tip
if abs(mouth_y - finger_y) < threshold:
show_shh_avatar()
Real-time computer vision is demanding. Optimizations I made:
The final output shows:
combined = np.hstack([frame_resized, avatar_resized])
cv2.imshow('OpenCV MNKY', combined)
Ideas for version 2.0:
Repository: github.com/Airyshtoteles/OpenCV_MNKY
MIT Licensed — feel free to fork and create your own gesture-controlled memes!