I am starting with the pose estimation tflite model for getting keypoints on humans.
https://www.tensorflow.org/lite/models/pose_estimation/overview
I have started with fitting a single image or a person and invoking the model:
img = cv.imread('photos\standing\\3.jpg')img = tf.reshape(tf.image.resize(img, [257,257]), [1,257,257,3])model = tf.lite.Interpreter('models\posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite')model.allocate_tensors()input_details = model.get_input_details()output_details = model.get_output_details()floating_model = input_details[0]['dtype'] == np.float32if floating_model: img = (np.float32(img) - 127.5) / 127.5model.set_tensor(input_details[0]['index'], img)model.invoke()output_data = model.get_tensor(output_details[0]['index'])# o()offset_data = model.get_tensor(output_details[1]['index'])results = np.squeeze(output_data)offsets_results = np.squeeze(offset_data)print("output shape: {}".format(output_data.shape))np.savez('sample3.npz', results, offsets_results)
but I am struggling with parsing the output correctly to get the coordinates/confidences of each body part. Does anyone have a python example for interpreting this models results? (for example: using them to map keypoints back to the original image)
My code (a snippet from a class which essentially takes the np array directly from the model output):
def get_keypoints(self, data): height, width, num_keypoints = data.shape keypoints = [] for keypoint in range(0, num_keypoints): maxval = data[0][0][keypoint] maxrow = 0 maxcol = 0 for row in range(0, width): for col in range(0,height): if data[row][col][keypoint] > maxval: maxrow = row maxcol = col maxval = data[row][col][keypoint] keypoints.append(KeyPoint(keypoint, maxrow, maxcol, maxval)) # keypoints = [Keypoint(x,y,z) for x,y,z in ] return keypointsdef get_image_coordinates_from_keypoints(self, offsets): height, width, depth = (257,257,3) # [(x,y,confidence)] coords = [{ 'point': k.body_part,'location': (k.x / (width - 1)*width + offsets[k.y][k.x][k.index], k.y / (height - 1)*height + offsets[k.y][k.x][k.index]),'confidence': k.confidence} for k in self.keypoints] return coords
after matching the indexes to the parts my output is:
Some of the coordinates here are negative, which can't be correct. Where is my mistake?