# Using Custom Compute Images

In the Quickstart tutorial on compute jobs, we ran a very simple job, relying on a pre-existing Docker image (bash).

There are many common tasks that cannot be easily shoehorned into this format. Running a Python, Nodejs, or Java program typically requires at least the interpreter as a dependency, if not other packages as well. Many programs also depend on various static data files.

In this section, we will look at how to use containers to bundle up a more complex compute job with dependencies and run it on Parcel.

For experienced Docker users: The Parcel compute environment is OCI-compatible, and using custom Docker images is easy. Specify any Docker image in the image field of the submitJob({ ... }) call. The rest of this tutorial deals mostly with Docker. At the very least, you can skip the next section.

# Background: Containers

Containers are an easy way to bundle applications and data, and then run them reliably across a variety of systems. Docker (opens new window) is the best-known tool for building and running containers, but it is not the only one. OCI (opens new window) prescribes a widely adopted set of standards for encoding and running containers. Docker is OCI-compatible, and so is Parcel. In other words, you can run most Docker containers on Parcel, with transparent access to Parcel Documents. As a reminder, documents are mounted inside the container under /parcel/data/in.

# Sample App: Skin Cancer "detector"

In this tutorial, we will build a compute job that runs a TensorFlow classifier on user-supplied images of skin lesions (i.e. parts of the skin with unusual color/markings), predicts the type of lesion (melanoma, benign keratosis, ...), and reports the results back to the user.

We will pretend to be Acme, a company that provides privacy-conscious skin lesion diagnosis. Although Acme runs the ML model and computes the diagnoses, Acme doesn't get access to the inputs (skin images) or outputs (diagnoses) at any time. Only the model gains access in the narrow context of the Parcel Compute job.

# Building the Classification Program

We will use TensorFlow on Python with a pre-trained model (opens new window) to classify input images. While the model is trained on actual medical images, we use it here for demonstration purposes only and make no claims about its accuracy or usefulness.

Without much ado, below is the program with inline comments. You do not need to fully understand it to continue with this tutorial. The key take-aways are that:

  • it's a Python/TensorFlow program, so it needs the Python interpreter and TensorFlow libraries to run, and
  • it depends on a problem-specific resource (the model.h5 file containing the ML model).
import sys
from functools import partial
import tensorflow as tf

# Read cmdline parameters. We skip error checking in this simplified example.
input_path = sys.argv[1]
output_path = sys.argv[2]

# Load the model from 'model.h5'. The model is a data dependency: The code
# assumes that the file is present in the current workdir.
top_k = tf.keras.metrics.top_k_categorical_accuracy
model = tf.keras.models.load_model(
    "./model.h5",
    custom_objects={
        'top_2_accuracy': partial(top_k, k=2),
        'top_3_accuracy': partial(top_k, k=3)
    }
)

# Load the user image.
im = tf.image.decode_jpeg(tf.io.read_file(input_path))

# Preprocess the image to match expected model input format.
im = tf.image.resize(im, [224, 224])
im = (im-127.5)/127.5

# Create a batch containing a single element: our `im`.
ims = tf.expand_dims(im, 0)

# Run the model, get predictions. yhats[0] is a vector of classification scores for
# each of the 7 output classes.
yhats = model.predict(ims)

# In a typical data-processing setting, the `yhats` array would be the final
# output. For demo purposes, we will instead output a human-readable description
# of the highest-scoring class.
max_class = tf.math.argmax(yhats[0])
LABELS = [
    'Actinic Keratoses and Intraepithelial Carcinoma',
    'Basal Cell Carcinoma',
    'Benign Keratosis',
    'Dermatofibroma',
    'Melanoma',
    'Melanocytic Nevi',
    'Vascular Lesions'
]
with open(output_path, 'w') as f:
    f.write(f"This might be an image of {LABELS[max_class]}.\n")

WARNING

Parcel workers currently do not provide execution on a GPU so this TensorFlow job runs on a CPU. Please contact us at feedback@oasislabs.com if your use case requires GPUs.

# Building the Container

We will base our docker image on the official TensorFlow image. This is convenient because it has Python and TensorFlow preinstalled. Then we will also bundle in our program from above (predict.py) and the ML model (model.h5).

Here is the Dockerfile:

FROM tensorflow/tensorflow:2.4.1

# Prepare work directory.
RUN mkdir /acme_skin
WORKDIR /acme_skin

# Add the ML model. The `model.h5` file we need is inside a zip.
RUN apt -qq update \
    && apt -qq install unzip \
    && curl -sL https://github.com/uyxela/Skin-Lesion-Classifier/raw/3ca96c925cc140e6391d5cdfeb1e1ab026ee670f/model.zip > model.zip \
    && unzip model.zip \
    && rm model.zip

# Add our python script.
COPY predict.py .

# The prescribed way to use this image is to invoke predict.py with arbitrary parameters.
ENTRYPOINT ["python", "predict.py"]

EXAMPLE

The Parcel team has already built the image above and pushed it to Docker Hub as oasislabs/acme-derma-demo. The image is world-readable so you can pull it (opens new window) and run this example as-is.

If you want to repeat that particular step yourself (building the image), or if you want to make changes to any of the files in the image, you will need to register your own image with Docker Hub. For example, if you register your modified image as acme/derma-demo, move to the folder where your Dockerfile is located, and run the following:

docker build -t acme/derma-demo .
docker push acme/derma-demo

Inside Parcel, the image will run on Linux on AMD64. If you are building and testing your docker image on a different platform (e.g. Windows or MacOS or Apple M1), make sure to build for (at least) the linux/amd64 platform. See Docker docs (opens new window) and blog post (opens new window) for details on multi-platform builds.

TIP

You can use any other Docker registry and not just Docker Hub. The only requirement is that the image is world-readable so the Parcel compute workers can access it. Support for private images is on our roadmap. Please contact us at feedback@oasislabs.com if your use case requires them.

# Local Testing

One of Docker's great strengths is that it provides a highly reproducible environment regardless of where the docker image is running. Therefore, to test our solution above, we do not need to use Parcel (yet). We can run locally; we only need to simulate the /parcel/data/in and /parcel/data/out directories that Parcel would otherwise create for us.

To do so, we can prepare local directory structure:

test_parcel_workdir/
└── data/
    ├── in/
    │   └── basal_cell_carcinoma_example.jpg
    └── out/

and mount it into our container using the -v flag when running the program:

docker run --rm \
  -v $PWD/test_parcel_workdir:/parcel,noexec \
  oasislabs/acme-derma-demo \
  /parcel/data/in/basal_cell_carcinoma_example.jpg /parcel/data/out/prediction.txt

The above

  • creates a container from the oasislabs/acme-derma-demo image,
  • mounts the ./test_parcel_workdir directory as /parcel inside the container, and
  • runs our program (python ...; implied by the ENTRYPOINT) inside the container with custom arguments. We chose prediction.txt as the output path.

You might have noticed the noexec keyword. This tells Docker that files inside the workdir cannot be executed. The Parcel Worker imposes the same restriction; this makes it harder for job owners to sneak malicious/unaudited code into the job.

Our program does not output to the console, so docker run output is not very informative either. We instead need to look into our local directory to see the output file:

cat test_workdir/data/out/prediction.txt
This might be an image of Basal Cell Carcinoma.

Success!

This is much faster than going through Parcel (or any other cloud solution), and especially helpful during development.

# Running in Parcel

Once our image is working locally, running it as a Parcel job is straightforward, and much like the Running Compute Jobs tutorial. We first upload a mock user document; if Acme were a real company with real customers, they would upload such documents themselves:

const parcelBob = new Parcel({
  clientId: process.env.BOB_SERVICE_CLIENT_ID!,
  privateKey: {
    kid: 'bob-service-client',
    use: 'sig',
    kty: 'EC',
    crv: 'P-256',
    alg: 'ES256',
    x: 'kbhoJYKyOgY645Y9t-Vewwhke9ZRfLh6_TBevIA6SnQ',
    y: 'SEu0xuCzTH95-q_-FSZc-P6hCSnq6qH00MQ52vOVVpA',
    d: '10sS7lgM_YWxf79x21mWalCkAcZZOmX0ZRE_YwEXcmc',
  },
});
const bobId = (await parcelBob.getCurrentIdentity()).id;

// Upload a documents and give Acme access to it.
console.log('Uploading input document as Bob.');
const skinDocument = await parcelBob.uploadDocument(
  await fs.promises.readFile('docker/test_workdir/data/in/basal_cell_carcinoma_example.jpg'),
  { details: { title: 'User-provided skin image' }, toApp: undefined },
).finished;
await parcelBob.createGrant({
  grantee: process.env.ACME_APP_ID! as AppId,
  condition: {
    $and: [
      { 'document.id': { $eq: skinDocument.id } },
      { 'job.spec.image': { $eq: 'oasislabs/acme-derma-demo' } },
    ],
  },
});

Then we submit the job. In the snippet below we use the pre-built oasislabs/acme-derma-demo image. If you built and deployed your own Docker image in the previous section, feel free to use it instead:

// Define the job.
const jobSpec: JobSpec = {
  name: 'skin-prediction',
  image: 'oasislabs/acme-derma-demo',
  inputDocuments: [{ mountPath: 'skin.jpg', id: skinDocument.id }],
  outputDocuments: [{ mountPath: 'prediction.txt', owner: bobId }],
  cmd: ['python', 'predict.py', '/parcel/data/in/skin.jpg', '/parcel/data/out/prediction.txt'],
  memory: '2G',
};

Notice how the information contained in the job request is almost identical to what we passed to docker in the Local Testing section; only the syntax is different.

EXAMPLE

The full Node.js code that submits the job and all accompanying files using the oasislabs/acme-derma-demo image can be found in the Parcel Examples repository.