Model Specification (Deprecated)
The model specification will be deprecated and replaced by the PlayTorch API for JavaScript Interface (JSI).
A PlayTorch model consists of two components: (1) A model file saved for the PyTorch "lite" interpreter format; and (2) a JSON file with details on the model input and output types. The JSON file is stored within the model file itself as an extra_file of the model with the name model/live.spec.json.
Example of model with specification preparation:
from pathlib import Path
import torch
import torchvision
from torch.utils.mobile_optimizer import optimize_for_mobile
# Get the original PyTorch model and convert it to mobile-optimized
# TorchScript.
model = torchvision.models.mobilenet_v3_small(pretrained=True)
model.eval()
script_model = torch.jit.script(model)
script_model_opt = optimize_for_mobile(script_model)
# Read the live.spec.json file and embed it into the model file.
spec = Path("live.spec.json").read_text()
extra_files = {}
extra_files["model/live.spec.json"] = spec
script_model_opt._save_for_lite_interpreter("model_with_spec.ptl", _extra_files=extra_files)
The model/live.spec.json file is a valid JSON file that contains two objects: pack and unpack objects. It may also contain other root objects that will be used by both pack (input preprocessing) and unpack (model output post processing) functionality.
The JavaScript side calls the model to forward specifying a plain javascript object that contains $key members of predefined types (Image, double, integer, string).
model/live.spec.json contains "$key" stubs that will be replaced with the values from the specified JavaScript object.
Example:
{
"pack": {
"type": "tuple",
"items": [
{
"type": "tensor_from_image",
"image": "image",
"transforms": [
{
"type": "image_to_image",
"name": "center_crop",
"width": "$cropWidth",
"height": "$cropHeight"
},
{
"type": "image_to_image",
"name": "scale",
"width": "$scaleWidth",
"height": "$scaleHeight"
},
{
"type": "image_to_tensor",
"name": "rgb_norm",
"mean": [0.0, 0.0, 0.0],
"std": [1.0, 1.0, 1.0]
}
]
},
{
"type": "tensor",
"dtype": "float",
"sizes": [1, 3],
"items": [
"$scaleWidth",
"$scaleHeight",
"$scale"
]
},
{
"type": "tensor",
"dtype": "float",
"sizes": [
1
],
"items": [
"$should_run_track"
]
},
{
"type": "tensor",
"dtype": "float",
"sizes": ["$rois_n", 4],
"items": "$rois"
}
]
},
"unpack": {
"type": "tensor",
"dtype": "float",
"key": "scores"
}
}
Respective JavaScript for this spec:
const {
result: {scores: scores},
inferenceTime: time,
} = await MobileModel.execute(modelInfo.model, {
image: image,
cropWidth: 448,
cropHeight: 448,
scaleWidth: 224,
scaleHeight: 224,
scale: 1.0,
rois_n: 3,
rois: [0, 0, 20, 20, 10, 10, 50, 50, 30, 30, 60, 60],
should_run_track: 0.0
});
Pack - Input preprocessing
The input processing required for the model is specified by pack object. Every object in pack has a type field, other fields are specific to that type.
Types supported for "pack"
tuple(currently supported on Android only)items: array of the tuple items
scalar_bool(currently supported on Android only)value:trueorfalse
scalar_long(currently supported on Android only)value: long value
scalar_double(currently supported on Android only)value: double value
tensor(currently supported on Android only)dtype: data type of the tensor ("float"or"long")items: array of tensor data of specified dtype
tensor_from_imageimage: JavaScript image objecttransforms: array of chained transformations on the input image of typeImageTransform(see below)
tensor_from_stringtokenizer:bert: Prepares tensor dtype=long of token ids using a BERT vocabulary. The vocabulary used to encode inputs must be stored in the top-level keyvocabulary_bertin the spec JSON object. It should be a string with BERT tokens separated with\n.gpt2: Prepares tensor dtype=long of token ids using a GPT2 vocabulary. The vocabulary used to encode inputs must be stored in the top-level keyvocabulary_gpt2in the spec JSON object. It should be a JSON object mapping from vocabulary terms to the corresponding tokenId.
Type ImageTransform
- type:
"image_to_image"or"image_to_tensor" - name: the name of transformation
- additional parameters specific to the particular type and name
image_to_image type:
name:center_cropCrops from the center part of the image with specified width and height. parameters:width: width of the result cropped imageheight: height of the result cropped image
name:scaleScales input image to specified width and height. parameters:width: width of the result scaled imageheight: height of the result scaled image
image_to_tensor type:
- name:
rgb_normThe output is NCHW tensor from input image, normalized by specified mean and std. parameters:mean: array of 3 float numbers with values of mean for normalization (one value per channel)std: array of 3 float numbers with values of std for normalization (one value per channel)
Unpack - Output post-processing
The result of model post processing is a plain JavaScript object, referred to below as output_jsmap.
The unpack object is a recursive structure of objects of predefined types.
Types supported for "unpack"
tuple(currently supported on Android only)items: An array ofunpackobjects, one per tuple item to unpack.
list(currently supported on Android only)items: An array ofunpackobjects, one per list item to unpack.
dict_string_key(currently supported on Android only)items: An array of objects of the form{"dict_key": <string value>}where eachdict_keyis a string key into a dictionary returned by the model. The unpacked values will be those entries in the dictionary specified by eachdict_key.
tensorkey: key of the array of specified data type that contains tensor items in NCHW format.dtype: data of the tensor "float" or "long"
scalar_long: (currently supported on Android only)key: key of the long value in output_jsmap
scalar_float: (currently supported on Android only)key: key of the double value in output_jsmap
scalar_bool: (currently supported on Android only)key: key of the bool value in output_jsmap
string:key: key of the string in output_jsmap
tensor_to_string: (currently supported on Android only)key: key of the result string in output_jsmapdecoder:gpt2: Expects tensor of long data type containing tokenIds. The vocabulary used to decode results must be stored in the top-level keyvocabulary_gpt2in the spec JSON object. It should be a JSON object mapping from vocabulary terms to the corresponding tokenId.
bert_decode_qa_answer:key: key of the result string in output_jsmap. The vocabulary used to decode results must be stored in the top-level keyvocabulary_bertin the spec JSON object. It should contain a string with BERT tokens separated with\n.
Examples
{
"vocabulary_bert": "[PAD]\n[unused0]\n[unused1]\n[unused2]\n[unused3]\n[unused4]\n[unused5]\n...",
"pack": {
"type": "tensor_from_string",
"tokenizer": "bert",
"string": "$string",
"model_input_length": "$model_input_length"
},
"unpack": {
"type": "bert_decode_qa_answer",
"key": "bert_answer"
}
}
{
"vocabulary_gpt2": { "!": 0, "\"": 1, "#": 2, "$": 3, "%": 4, "&": 5, ... ,"<|endoftext|>": 50256},
"pack": {
"type": "tensor_from_string",
"tokenizer": "gpt2",
"string": "$string"
},
"unpack": {
"type": "tensor_to_string",
"decoder": "gpt2",
"key": "text"
}
}