6.0 KiB

Raw Blame History

Human: 3D Face Detection, Body Pose, Hand & Finger Tracking, Iris Tracking and Age & Gender Prediction

URL: https://github.com/vladmandic/human

Suggestions are welcome!

Credits

This is an amalgamation of multiple existing models:

Face Detection: MediaPipe BlazeFace
Facial Spacial Geometry: MediaPipe FaceMesh
Eye Iris Details: MediaPipe Iris
Hand Detection & Skeleton: MediaPipe HandPose
Body Pose Detection: PoseNet
Age & Gender Prediction: SSR-Net

Install

npm install @vladmandic/human

All pre-trained models are included in folder /models (25MB total)

Demo

Demo is included in /demo

Requirements

Human library is based on TensorFlow/JS (TFJS), but does not package it to allow for indepdenent version management - import tfjs before importing Human

Usage

Human library does not require special initialization. All configuration is done in a single JSON object and all model weights will be dynamically loaded upon their first usage(and only then, Human will not load weights that it doesn't need according to configuration).

There is only ONE method you need:

import * as tf from '@tensorflow/tfjs';
import human from '@vladmandic/human';

// 'image': can be of any type of an image object: HTMLImage, HTMLVideo, HTMLMedia, Canvas, Tensor4D
// 'options': optional parameter used to override any options present in default configuration
const results = await human.detect(image, options?)

Additionally, Human library exposes two classes:

human.defaults // default configuration object
human.models   // dynamically maintained object of any loaded models

Configuration

Below is output of human.defaults object
Any property can be overriden by passing user object during human.detect()
Note that user object and default configuration are merged using deep-merge, so you do not need to redefine entire configuration

human.defaults = {
  face: {
    enabled: true,
    detector: {
      modelPath: '/models/human/blazeface/model.json',
      maxFaces: 10,
      skipFrames: 5,
      minConfidence: 0.8,
      iouThreshold: 0.3,
      scoreThreshold: 0.75,
    },
    mesh: {
      enabled: true,
      modelPath: '/models/human/facemesh/model.json',
    },
    iris: {
      enabled: true,
      modelPath: '/models/human/iris/model.json',
    },
    age: {
      enabled: true,
      modelPath: '/models/human/ssrnet-imdb-age/model.json',
      skipFrames: 5,
    },
    gender: {
      enabled: true,
      modelPath: '/models/human/ssrnet-imdb-gender/model.json',
    },
  },
  body: {
    enabled: true,
    modelPath: '/models/human/posenet/model.json',
    maxDetections: 5,
    scoreThreshold: 0.75,
    nmsRadius: 20,
  },
  hand: {
    enabled: true,
    skipFrames: 5,
    minConfidence: 0.8,
    iouThreshold: 0.3,
    scoreThreshold: 0.75,
    detector: {
      anchors: '/models/human/handdetect/anchors.json',
      modelPath: '/models/human/handdetect/model.json',
    },
    skeleton: {
      modelPath: '/models/human/handskeleton/model.json',
    },
  },
};

Where:

enabled: controls if specified modul is enabled (note: module is not loaded until it is required)
modelPath: path to specific pre-trained model weights
maxFaces, maxDetections: how many faces or people are we trying to analyze. limiting number in busy scenes will result in higher performance
skipFrames: how many frames to skip before re-running bounding box detection (e.g., face position does not move fast within a video, so it's ok to use previously detected face position and just run face geometry analysis)
minConfidence: threshold for discarding a prediction
iouThreshold: threshold for deciding whether boxes overlap too much in non-maximum suppression
scoreThreshold: threshold for deciding when to remove boxes based on score in non-maximum suppression
nmsRadius: radius for deciding points are too close in non-maximum suppression

Outputs

Result of humand.detect() is a single object that includes data for all enabled modules and all detected objects:

result = {
  face: // <array of detected objects>
  [
    {
      confidence:  // <number>
      box:         // <array [x, y, width, height]>
      mesh:        // <array of points [x, y, z]> (468 base points & 10 iris points)
      annotations: // <list of object { landmark: array of points }> (32 base annotated landmarks & 2 iris annotations)
      iris:        // <number> (relative distance of iris to camera, multiple by focal lenght to get actual distance)
      age:         // <number> (estimated age)
      gender:      // <string> (male or female)
    }
  ],
  body: // <array of detected objects>
  [
    {
      score:       // <number>,
      keypoints:   // <array of landmarks [ score, landmark, position [x, y] ]> (17 annotated landmarks)
    }
  ],
  hand:            // <array of detected objects>
  [
    confidence:    // <number>,
    box:           // <array [x, y, width, height]>,
    landmarks:     // <array of points [x, y,z]> (21 points)
    annotations:   // <array of landmarks [ landmark: <array of points> ]> (5 annotated landmakrs)
  ]
}

Performance

Of course, performance will vary depending on your hardware, but also on number of enabled modules as well as their parameters.
For example, on a low-end nVidia GTX1050 it can perform face detection at 50+ FPS, but drop to <5 FPS if all modules are enabled.

Todo

Improve detection of smaller faces, add BlazeFace back model
Create demo, host it on gitpages
Implement draw helper functions
Sample Images
Rename human to human

6.0 KiB Raw Blame History