Pepper Facial Recognition

Part of the Pepper series

Pepper can recognise me!

A Demo:


from naoqi import qi
from naoqi import ALBroker
from naoqi import ALProxy
import face_recognition
import cv2
import time
import os
import sys
import numpy as np
import pickle
import argparse
import vision_definitions
import traceback
from PIL import Image
IP = ""
PORT = 9559
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--ip", type=str, default=IP, help="Robot IP address. On robot or Local Naoqi: use ''.")
parser.add_argument("--port", type=int, default=PORT, help="Naoqi port number: use default '9559'")
args = parser.parse_args()
session = qi.Session()
session.connect("tcp://" + args.ip + ":" + str(args.port))
except RuntimeError:
print ("Can't connect to Naoqi at ip \"" + args.ip + "\" on port " + str(args.port) + ".\n" + "Please check your script arguments. Run with -h option for help.")
idPersons(session, args.ip, args.port)
def idPersons(session, ip=IP, port=PORT):
#begin idPerson block
#Obtain the ALVideoDevice service
videoService = session.service('ALVideoDevice')
# subscribe top camera
SID = "pepper_face_recognition"
resolution = vision_definitions.kQVGA
colorSpace = vision_definitions.kRGBColorSpace
nameId = videoService.subscribe(SID, resolution, colorSpace, 10)
  • Obtain the Naoqi video service.
  • Subscribe to the video service for the top camera with the given SID. vision_definitions is a package in the Naoqi SDK. It basically sets the camera to use VGA (640x480) using RGB colors (3 channels).
#Obtain the ALTextToSpeech service
tts = session.service('ALTextToSpeech')
known_face_encodings = []
known_face_names = []
#Load known face names
with open('encodings_names','rb') as fp:
known_face_names = pickle.load(fp)
#Load known face encodings
with open('encodings','rb') as fp:
known_face_encodings = pickle.load(fp)
# Initialize some variables
face_locations = []
face_encodings = []
face_names = []
process_this_frame = 0
width = 320
height = 240
blank_image = np.zeros((width,height,3), np.uint8)
image = np.zeros((width,height,3), np.uint8)
greeted = []
while True:
# Grab a single frame of video
result = videoService.getImageRemote(nameId)
image = None
if result == None:
print 'cannot capture.'
elif result[6] == None:
print 'no image data string.'
image_string = str(result[6])
im = Image.frombytes("RGB", (width, height), image_string)
image = np.asarray(im)
  • try to obtain an image from Pepper
  • initialize the final image variable to None. The None value will be tested later to determine if we have a valid image.
  • the result of the attempt to obtain an image from Pepper can result in None or no image string. In that case just print a message of the fact.
  • if an image was obtained convert the image string to bytes then finally as a numpy array.
except Exception as e:
if not image is None:
# Only process every other frame of video to save time
if process_this_frame == 0:
  • test to see if we should process this frame. We only process half of the captured frames.
# begin processing the frame block
# Resize frame of video to 1/4 size for faster face recognition processing
small_frame = cv2.resize(image, (0, 0), fx=scale, fy=scale)
# Find all the faces and face encodings in the current frame of video
face_locations = face_recognition.face_locations(small_frame)
face_encodings = face_recognition.face_encodings(small_frame, face_locations)
face_names = []
for face_encoding in face_encodings:
# See if the face is a match for the known face(s)
distances = face_recognition.api.face_distance(known_face_encodings, face_encoding)
name = "Unknown"
blank_image[:,:] = (0,0,255)
distance_min_index = np.argmin(distances)
distance_min = np.amin(distances)
if distance_min < 0.53:
name = known_face_names[distance_min_index]
blank_image[:,:] = (0,255,0)
if not name in greeted:
tts.say("Hi " + name + " ! Nice to see you.")

# end of process this frames for loop
# end of if not image is None
  • resize the image to 1/4 size for faster processing
  • get all the locations and encodings for the face(s) in the grabbed image.
  • initialize an empty face_names, set name to "none" as apposed to "Unknown". If later name remains "none" then there were no faces found in the image.
  • loop through all the found face encodings
  • now set name to "Unknown" since we found at least one face.
  • initialize a red blank image (OpenCV is in BGR)
  • get the face_recognition distances. Distances are a measure of how far off a face is to known faces.
  • get the index of the minimum value in distances
  • get the minimum value of in distances
  • if the minimum distance value is less than a threshold value, in this case 0.53 then do the following:
  • 1. Obtain the name associated with the face
  • 2. Set the blank image to green
  • 3. If the name found does not already exist in the greeted list then greet the user by name and add the name to the greeted list.
  • add the name to the face names list
if name == "none":
blank_image[:,:] = (0,0,255)
process_this_frame += 1
process_this_frame = process_this_frame % 2
  • if name is "none" then make the blank image red
  • increment the process_this_frame by one and mod it with 2
# Display the results
for (top, right, bottom, left), name in zip(face_locations, face_names):
# Scale back up face locations since the frame we detected in was scaled to 1/4 size
top *= revscale
right *= revscale
bottom *= revscale
left *= revscale
# Draw a box around the face
if name=="Unknown":
cv2.rectangle(frame, (left, top), (right, bottom), color, 2)
# Draw a label with a name below the face
cv2.rectangle(frame, (left, bottom + 70), (right, bottom), color, cv2.FILLED)
cv2.putText(frame, name, (left + 6, bottom + 29), font, 1.0, write_color, 1)
# Display the resulting image
# Convert the image to BGR color (which OpenCV uses) from RGB color (which face_recognition uses)
bgr_image = image[:, :, ::-1]
frame_resized = cv2.resize(bgr_image, (0,0), fx=0.75, fy=0.75)
cv2.imshow('Video', frame_resized)
cv2.imshow('Access', blank_image)
# Hit 'q' on the keyboard to quit!
if cv2.waitKey(1) & 0xFF == ord('q'):
if cv2.waitKey(1) & 0xFF == ord('r'):
with open('encodings_names','rb') as fp:
known_face_names = pickle.load(fp)
with open('encodings','rb') as fp:
known_face_encodings = pickle.load(fp)
print('reread encodings')
# End of while loop block
  • resize the image
  • draw a bounding box around the image (red for unknown, green for known)
  • draw the label (red with white text for unknown, green with black text for known)
  • display the resulting image reduced by 1/4
  • if a 'q' key is pressed then break out of the while loop
  • if a 'r' key is pressed then reload the initial encoding files.
# end of the top most try block
except Exception as e:





Thoughts about emerging technologies and some of the challenges related to them. The technology itself usually is not the problem (or the solution).

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Team Management and Coding Best Practices using Version Control, TDD, and Clean Code

Hacking “Emdee five for life”

How to process [ insert, update, delete ] Oracle table with Millions of records using Parallel…

QA best practices for cloud and K8s solutions

Cost Optimization | AWS Ec2 Optimization

Updates to TensorFlow Lite Flutter Support Suite

[Codility] Lesson-02–2: OddOccurrencesInArray

Full Automation and Integration of Git, Jenkins, Docker and Kubernetes

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Emerging technologies & challenges

Emerging technologies & challenges

Thoughts about emerging technologies and some of the challenges related to them. The technology itself usually is not the problem (or the solution).

More from Medium

Who Will Look After AI’s Safety?

Releasing Augraphy 7, Announcing Denoising ShabbyPages

A key thing to remember while using Prodigy for your Computer Vision Projects