# Using Custom Compute Images
In the Quickstart tutorial on compute jobs,
we ran a very simple job, relying on a pre-existing Docker image (bash
).
There are many common tasks that cannot be easily shoehorned into this format. Running a Python, Nodejs, or Java program typically requires at least the interpreter as a dependency, if not other packages as well. Many programs also depend on various static data files.
In this section, we will look at how to use containers to bundle up a more complex compute job with dependencies and run it on Parcel.
For experienced Docker users: The Parcel compute environment is
OCI-compatible, and using custom Docker images is easy. Specify any Docker image
in the image
field of the submitJob({ ... })
call. The rest
of this tutorial deals mostly with Docker. At the very least, you can skip the
next section.
# Background: Containers
Containers are an easy way to bundle applications and data, and then run them
reliably across a variety of systems. Docker (opens new window) is the
best-known tool for building and running containers, but it is not the only one.
OCI (opens new window) prescribes a widely adopted set of standards
for encoding and running containers. Docker is OCI-compatible, and so is Parcel.
In other words, you can run most Docker containers on Parcel, with transparent
access to Parcel Documents. As a reminder, documents are mounted inside the
container under /parcel/data/in
.
# Sample App: Skin Cancer "detector"
In this tutorial, we will build a compute job that runs a TensorFlow classifier on user-supplied images of skin lesions (i.e. parts of the skin with unusual color/markings), predicts the type of lesion (melanoma, benign keratosis, ...), and reports the results back to the user.
We will pretend to be Acme, a company that provides privacy-conscious skin lesion diagnosis. Although Acme runs the ML model and computes the diagnoses, Acme doesn't get access to the inputs (skin images) or outputs (diagnoses) at any time. Only the model gains access in the narrow context of the Parcel Compute job.
# Building the Classification Program
We will use TensorFlow on Python with a pre-trained model (opens new window) to classify input images. While the model is trained on actual medical images, we use it here for demonstration purposes only and make no claims about its accuracy or usefulness.
Without much ado, below is the program with inline comments. You do not need to fully understand it to continue with this tutorial. The key take-aways are that:
- it's a Python/TensorFlow program, so it needs the Python interpreter and TensorFlow libraries to run, and
- it depends on a problem-specific resource (the
model.h5
file containing the ML model).
import sys
from functools import partial
import tensorflow as tf
# Read cmdline parameters. We skip error checking in this simplified example.
input_path = sys.argv[1]
output_path = sys.argv[2]
# Load the model from 'model.h5'. The model is a data dependency: The code
# assumes that the file is present in the current workdir.
top_k = tf.keras.metrics.top_k_categorical_accuracy
model = tf.keras.models.load_model(
"./model.h5",
custom_objects={
'top_2_accuracy': partial(top_k, k=2),
'top_3_accuracy': partial(top_k, k=3)
}
)
# Load the user image.
im = tf.image.decode_jpeg(tf.io.read_file(input_path))
# Preprocess the image to match expected model input format.
im = tf.image.resize(im, [224, 224])
im = (im-127.5)/127.5
# Create a batch containing a single element: our `im`.
ims = tf.expand_dims(im, 0)
# Run the model, get predictions. yhats[0] is a vector of classification scores for
# each of the 7 output classes.
yhats = model.predict(ims)
# In a typical data-processing setting, the `yhats` array would be the final
# output. For demo purposes, we will instead output a human-readable description
# of the highest-scoring class.
max_class = tf.math.argmax(yhats[0])
LABELS = [
'Actinic Keratoses and Intraepithelial Carcinoma',
'Basal Cell Carcinoma',
'Benign Keratosis',
'Dermatofibroma',
'Melanoma',
'Melanocytic Nevi',
'Vascular Lesions'
]
with open(output_path, 'w') as f:
f.write(f"This might be an image of {LABELS[max_class]}.\n")
WARNING
Parcel workers currently do not provide execution on a GPU so this TensorFlow job runs on a CPU. Please contact us at feedback@oasislabs.com if your use case requires GPUs.
# Building the Container
We will base our docker image on the official TensorFlow image. This is
convenient because it has Python and TensorFlow preinstalled. Then we will also
bundle in our program from above (predict.py
) and the ML model (model.h5
).
Here is the Dockerfile:
FROM tensorflow/tensorflow:2.4.1
# Prepare work directory.
RUN mkdir /acme_skin
WORKDIR /acme_skin
# Add the ML model. The `model.h5` file we need is inside a zip.
RUN apt -qq update \
&& apt -qq install unzip \
&& curl -sL https://github.com/uyxela/Skin-Lesion-Classifier/raw/3ca96c925cc140e6391d5cdfeb1e1ab026ee670f/model.zip > model.zip \
&& unzip model.zip \
&& rm model.zip
# Add our python script.
COPY predict.py .
# The prescribed way to use this image is to invoke predict.py with arbitrary parameters.
ENTRYPOINT ["python", "predict.py"]
EXAMPLE
The Parcel team has already built the image above and pushed it to Docker Hub
as oasislabs/acme-derma-demo
. The image is world-readable so you can pull
it (opens new window)
and run this example as-is.
If you want to repeat that particular step yourself (building the image), or if
you want to make changes to any of the files in the image, you will need to
register your own image with Docker Hub. For example, if you register your
modified image as acme/derma-demo
, move to the folder where your Dockerfile
is located, and run the following:
docker build -t acme/derma-demo .
docker push acme/derma-demo
Inside Parcel, the image will run on Linux on AMD64. If you are building and
testing your docker image on a different platform (e.g. Windows or MacOS or
Apple M1), make sure to build for (at least) the linux/amd64
platform. See
Docker docs (opens new window)
and blog post (opens new window)
for details on multi-platform builds.
TIP
You can use any other Docker registry and not just Docker Hub. The only requirement is that the image is world-readable so the Parcel compute workers can access it. Support for private images is on our roadmap. Please contact us at feedback@oasislabs.com if your use case requires them.
# Local Testing
One of Docker's great strengths is that it provides a highly reproducible
environment regardless of where the docker image is running. Therefore, to test
our solution above, we do not need to use Parcel (yet). We can run locally; we
only need to simulate the /parcel/data/in
and /parcel/data/out
directories
that Parcel would otherwise create for us.
To do so, we can prepare local directory structure:
test_parcel_workdir/
└── data/
├── in/
│ └── basal_cell_carcinoma_example.jpg
└── out/
and mount it into our container using the -v
flag when running the program:
docker run --rm \
-v $PWD/test_parcel_workdir:/parcel,noexec \
oasislabs/acme-derma-demo \
/parcel/data/in/basal_cell_carcinoma_example.jpg /parcel/data/out/prediction.txt
The above
- creates a container from the
oasislabs/acme-derma-demo
image, - mounts the
./test_parcel_workdir
directory as/parcel
inside the container, and - runs our program (
python ...
; implied by theENTRYPOINT
) inside the container with custom arguments. We choseprediction.txt
as the output path.
You might have noticed the noexec
keyword. This tells Docker that files inside
the workdir cannot be executed. The Parcel Worker imposes the same restriction;
this makes it harder for job owners to sneak malicious/unaudited code into the
job.
Our program does not output to the console, so docker run
output is not very
informative either. We instead need to look into our local directory to see the
output file:
cat test_workdir/data/out/prediction.txt
This might be an image of Basal Cell Carcinoma.
Success!
This is much faster than going through Parcel (or any other cloud solution), and especially helpful during development.
# Running in Parcel
Once our image is working locally, running it as a Parcel job is straightforward, and much like the Running Compute Jobs tutorial. We first upload a mock user document; if Acme were a real company with real customers, they would upload such documents themselves:
const parcelBob = new Parcel({
clientId: process.env.BOB_SERVICE_CLIENT_ID!,
privateKey: {
kid: 'bob-service-client',
use: 'sig',
kty: 'EC',
crv: 'P-256',
alg: 'ES256',
x: 'kbhoJYKyOgY645Y9t-Vewwhke9ZRfLh6_TBevIA6SnQ',
y: 'SEu0xuCzTH95-q_-FSZc-P6hCSnq6qH00MQ52vOVVpA',
d: '10sS7lgM_YWxf79x21mWalCkAcZZOmX0ZRE_YwEXcmc',
},
});
const bobId = (await parcelBob.getCurrentIdentity()).id;
// Upload a documents and give Acme access to it.
console.log('Uploading input document as Bob.');
const skinDocument = await parcelBob.uploadDocument(
await fs.promises.readFile('docker/test_workdir/data/in/basal_cell_carcinoma_example.jpg'),
{ details: { title: 'User-provided skin image' }, toApp: undefined },
).finished;
await parcelBob.createGrant({
grantee: process.env.ACME_APP_ID! as AppId,
condition: {
$and: [
{ 'document.id': { $eq: skinDocument.id } },
{ 'job.spec.image': { $eq: 'oasislabs/acme-derma-demo' } },
],
},
});
Then we submit the job. In the snippet below we use the pre-built oasislabs/acme-derma-demo
image. If you built and deployed your own Docker image in the previous section,
feel free to use it instead:
// Define the job.
const jobSpec: JobSpec = {
name: 'skin-prediction',
image: 'oasislabs/acme-derma-demo',
inputDocuments: [{ mountPath: 'skin.jpg', id: skinDocument.id }],
outputDocuments: [{ mountPath: 'prediction.txt', owner: bobId }],
cmd: ['python', 'predict.py', '/parcel/data/in/skin.jpg', '/parcel/data/out/prediction.txt'],
memory: '2G',
};
Notice how the information contained in the job request is almost identical to
what we passed to docker
in the Local Testing section; only
the syntax is different.
EXAMPLE
The full Node.js code that submits the job and all accompanying files using the
oasislabs/acme-derma-demo
image can be found in the
Parcel Examples repository.