Models

Default Models in Human Library

Default models in Human library are:

Face Detection: MediaPipe BlazeFace Back variation
Face Mesh: MediaPipe FaceMesh
Face Iris Analysis: MediaPipe Iris
Face Description: HSE FaceRes
Emotion Detection: Oarriaga Emotion
Body Analysis: MoveNet Lightning variation
Hand Analysis: HandTrack combined with MediaPipe Hands
Object Detection: MB3 CenterNet (not enabled by default)
Body Segmentation: Google Selfie (not enabled by default)
Face Anti-Spoofing: Real-or-Fake (not enabled by default)
Face Live Detection: Liveness (not enabled by default)

Optional Models in Human Library

Human includes default models but supports number of additional models and model variations of existing models

Additional models can be accessed via:

To use alternative models from local host:

download them either from github or npmjs and either
set human configuration value modelPath for each model or
set global configuration value baseModelPath to location of downloaded models

To use alternative models from a CDN use location prefix https://www.jsdelivr.com/package/npm/@vladmandic/human-models/models/ for either configuration value of modelPath or baseModelPath

Changes

All models are modified from original implementation in following manner:

Input pre-processing: image enhancements, normalization, etc.
Caching: custom caching operations to bypass specific model runs when no changes are detected
Output parsing: custom analysis of HeatMaps to regions, output values normalization, etc.
Output interpolation: custom smoothing operations
Model modifications:
- Model definition: reformatted for readability, added conversion notes and correct signatures
- Model weights: quantized to 16-bit float values for size reduction

Models are not re-trained so any bias included in the original models is present in Human
For any possible bias notes, see specific model cards

Using Alternatives

Human includes implementations for several alternative models which can be switched on-the-fly while keeping standardized input and results object structure

Switching model also automatically switches implementation used inside Human so it is critical to keep model filenames in original form

Human includes all default models while alternative models are kept in a separate repository due to size considerations and must be downloaded manually from https://github.com/vladmandic/human-models

Body detection can be switched from PoseNet to BlazePose, EfficientPose or MoveNet depending on the use case:

PoseNet: Works with multiple people in frame, works with only partial people
Best described as works-anywhere, but not with great precision
MoveNet-Lightning: Works with single person in frame, works with only partial people
Modernized and optimized version of PoseNet with different model architecture
MoveNet-Thunder: Variation of MoveNet with higher precision but slower processing
EfficientPose: Works with single person in frame, works with only partial people
Experimental model that shows future promise but is not ready for wide spread usage due to performance
BlazePose: Works with single person in frame and that person should be fully visibile
But if conditions are met, it returns far more details (39 vs 17 keypoints) and is far more accurate
Furthermore, it returns 3D approximation of each point instead of 2D

Face description can be switched from default combined model FaceRes to individual models

Gender Detection: Oarriaga Gender
Age Detection: SSR-Net Age IMDB
Face Embedding: BecauseofAI MobileFace Embedding

Object detection can be switched from centernet to nanodet

Hand destection can be switched from handdetect to handtrack

Body Segmentation can be switched from rvm to selfie or meet

List of all models included in Human library

Model Name	Model Definition Size	Model Definition	Weights Size	Weights Name	Num Tensors	Resolution
Anti-Spoofing	8K	antispoof.json	834K	antispoof.bin	11
BecauseofAI MobileFace	33K	mobileface.json	2.1M	mobileface.bin	75	112x112
EfficientPose	134K	efficientpose.json	5.6M	efficientpose.bin	217	368x368
FaceBoxes	212K	faceboxes.json	2.0M	faceboxes.bin	350	0x0
FaceRes	70K	faceres.json	6.7M	faceres.bin	128	224x224
FaceRes (Deep)	62K	faceres.json	13.9M	faceres.bin	128	224x224
GEAR Predictor (Gender/Emotion/Age/Race)	28K	gear.json	1.5M	gear.bin	25	198x198
Google Selfie	82K	selfie.json	208K	selfie.bin	136	256x256
Hand Tracking	605K	handtrack.json	2.9M	handtrack.bin	619	320x320
Liveness	17K	liveness.json	580K	liveness.bin	23	32x32
MB3-CenterNet	197K	nanodet.json	1.9M	nanodet.bin	267	128x128
MediaPipe BlazeFace (Front)	51K	blazeface-front.json	323K	blazeface-front.bin	73	128x128
MediaPipe BlazeFace (Back)	78K	blazeface-back.json	527K	blazeface-back.bin	112	256x256
MediaPipe BlazePose (Lite)	132K	blazepose-lite.json	2.6M	blazepose-lite.bin	177	256x256
MediaPipe BlazePose (Full)	145K	blazepose-full.json	6.6M	blazepose-full.bin	193	256x256
MediaPipe BlazePose (Heavy)	305K	blazepose-heavy.json	27.0M	blazepose-heavy.bin	400	256x256
MediaPipe BlazePose Detector (2D)	129K	blazepose-detector2d.json	7.2M	blazepose-detector2d.bin	180	224x224
MediaPipe BlazePose Detector (3D)	132K	blazepose-detector3d.json	5.7M	blazepose-detector3d.bin	181	224x224
MediaPipe FaceMesh	94K	facemesh.json	1.5M	facemesh.bin	120	192x192
MediaPipe FaceMesh with Attention	889K	facemesh-attention.json	2.3M	facemesh-attention.bin	1061	192x192
MediaPipe Hand Landmark (Full)	81K	handlandmark-full.json	5.4M	handlandmark-full.bin	112	224x224
MediaPipe Hand Landmark (Lite)	82K	handlandmark-lite.json	2.0M	handlandmark-lite.bin	112	224x224
MediaPipe Hand Landmark (Sparse)	88K	handlandmark-sparse.json	5.3M	handlandmark-sparse.bin	112	224x224
MediaPipe HandPose (HandDetect)	126K	handdetect.json	6.8M	handdetect.bin	152	256x256
MediaPipe HandPose (HandSkeleton)	127K	handskeleton.json	5.3M	handskeleton.bin	145	256x256
MediaPipe Iris	120K	iris.json	2.5M	iris.bin	191	64x64
MediaPipe Meet	94K	meet.json	364K	meet.bin	163	144x256
MediaPipe Selfie	82K	selfie.json	208M	selfie.bin	136	256x256
MoveNet-Lightning	158K	movenet-lightning.json	4.5M	movenet-lightning.bin	180	192x192
MoveNet-MultiPose	235K	movenet-thunder.json	9.1M	movenet-thunder.bin	303	256x256
MoveNet-Thunder	158K	movenet-thunder.json	12M	movenet-thunder.bin	178	256x256
NanoDet	255K	nanodet.json	7.3M	nanodet.bin	229	416x416
Oarriaga Emotion	18K	emotion.json	802K	emotion.bin	23	64x64
Oarriaga Gender	30K	gender.json	198K	gender.bin	39	64x64
HSE-AffectNet	47K	affectnet-mobilenet.json	6.7M	affectnet-mobilenet.bin	64	224x224
PoseNet	47K	posenet.json	4.8M	posenet.bin	62	385x385
Sirius-AI MobileFaceNet	125K	mobilefacenet.json	5.0M	mobilefacenet.bin	139	112x112
SSR-Net Age (IMDB)	93K	age.json	158K	age.bin	158	64x64
SSR-Net Gender (IMDB)	92K	gender-ssrnet-imdb.json	158K	gender-ssrnet-imdb.bin	157	64x64
Robust Video Matting	600K	rvm.json	3.6M	rvm.bin	425	512x512

Note: All model definitions JSON files are parsed for human readability

Credits

Age & Gender Prediction: SSR-Net
Anti-Spoofing: Real-of-Fake
Body Pose Detection: BlazePose
Body Pose Detection: EfficientPose
Body Pose Detection: MoveNet
Body Pose Detection: PoseNet
Body Segmentation: MediaPipe Meet
Body Segmentation: MediaPipe Selfie
Body Segmentation: Robust Video Matting
Emotion Prediction: Oarriaga
Emotion Prediction: HSE-AffectNet
Eye Iris Details: MediaPipe Iris
Face Description: HSE-FaceRes
Face Detection: MediaPipe BlazeFace
Face Embedding: BecauseofAI MobileFace
Face Embedding: DeepInsight InsightFace
Facial Spacial Geometry: MediaPipe FaceMesh
Facial Spacial Geometry with Attention: MediaPipe FaceMesh Attention Variation
Gender, Emotion, Age, Race Prediction: GEAR Predictor
Hand Detection & Skeleton: MediaPipe HandPose
Hand Tracking: HandTracking
Image Filters: WebGLImageFilter
ObjectDetection: MB3-CenterNet
ObjectDetection: NanoDet
Pinto Model Zoo: Pinto

Included models are included under license inherited from the original model source
Model code has substantially changed from source that it is considered a derivative work and not simple re-publishing

Human Library Wiki Pages
3D Face Detection, Body Pose, Hand & Finger Tracking, Iris Tracking, Age & Gender Prediction, Emotion Prediction & Gesture Recognition