# Running Compute Jobs

In this tutorial, we will learn to securely analyze sensitive data using the Parcel SDK.

# Parcel Computation Framework

Private data analysis is performed in a compute job which we define and submit to Parcel. The basic components of a job specification are

the program(s) to run, in the form of a Docker image
program inputs, in the form of Parcel documents
some metadata about program outputs

This job is then dispatched to a Parcel worker running inside a secure enclave. The worker will execute your job as follows:

Load and decrypt your input documents, and mount them as regular files into your program's container. (This can succeed of course only if your app has permission to access those input documents.)
Run your executable with the parameters provided in the job specification.
When the execution finishes, upload your job's output files as Parcel documents.

Note how your program needs to make almost no accommodations for Parcel: its inputs and outputs are regular files, and the program itself is a regular Docker container.

# Sample Compute Job

To illustrate the steps above, we will write a simple compute job for the following scenario in the remainder of this tutorial:

Bob owns a confidential document in the form of a Parcel document.
Bob would like to compute the number of words in his document, but doesn't know how. (Here, word counting is a mock substitute for more complex real-world tasks.) Company Acme has the expertise to compute the number of words.
Bob gives Acme access to the document for the exclusive purpose of counting the words.
Acme runs a confidential compute job over the document and reports the number of words to Bob, without Acme ever seeing the document or the number of words.

# Uploading the Input Document

Input documents for the compute job are ordinary Parcel documents.

We begin our example by instantiating Parcel and uploading a simple document. We will reuse the Acme and Bob demo identities from the Managing User Data tutorial.

const parcelBob = new Parcel({
  clientId: process.env.BOB_SERVICE_CLIENT_ID!,
  privateKey: {
    kid: 'bob-service-client',
    use: 'sig',
    kty: 'EC',
    crv: 'P-256',
    alg: 'ES256',
    x: 'kbhoJYKyOgY645Y9t-Vewwhke9ZRfLh6_TBevIA6SnQ',
    y: 'SEu0xuCzTH95-q_-FSZc-P6hCSnq6qH00MQ52vOVVpA',
    d: '10sS7lgM_YWxf79x21mWalCkAcZZOmX0ZRE_YwEXcmc',
  },
});
const bobId = (await parcelBob.getCurrentIdentity()).id;

// Upload a document and give Acme access to it.
console.log('Uploading input document as Bob.');
const recipeDocument = await parcelBob.uploadDocument(
  '14g butter; 15g chicken sausage; 18g feta; 20g green pepper; 1.5min baking',
  { toApp: undefined },
).finished;
await parcelBob.createGrant({
  grantee: process.env.ACME_APP_ID! as AppId,
  condition: { 'document.id': { $eq: recipeDocument.id } },
});

# Specifying the Job

Next we specify our compute job. Jumping straight into code:

// Define the job.
const jobSpec: JobSpec = {
  name: 'word-count',
  image: 'bash',
  inputDocuments: [{ mountPath: 'recipe.txt', id: recipeDocument.id }],
  outputDocuments: [{ mountPath: 'count.txt', owner: bobId }],
  cmd: [
    '-c',
    'echo "Document has $(wc -w </parcel/data/in/recipe.txt) words" >/parcel/data/out/count.txt',
  ],
};

The syntax should be largely self-explanatory, but some details are worth calling out:

name is a human-readable description of the job. This helps with debugging and doesn't affect how your job behaves.
image is the AMD64 Docker image in which the compute job should run. If you are not familiar with Docker (opens new window), you can ignore this field for now and just think of the job as running on a typical Linux machine.

Here, we used the bash (opens new window) image -- not to be confused with the shell of the same name. This is a light-weight Linux image that takes an arbitrary number of arguments (which we pass as cmd) and runs them in the context of a bash shell. In a more realistic setting, you will typically specify a custom Docker image. To learn about creating and configuring your own images for Parcel, read the Using Custom Compute Images chapter.
inputDocuments is the list of inputs documents that will be made available to your job as plain files. Each is described by:
- id: Parcel ID of the document.
- mountPath: The path inside the docker image where the document should be mounted. This path is always interpreted relative to /parcel/data/in, so our value of recipe.txt means the document will be made available at /parcel/data/in/recipe.txt.
  
  We can choose any path here, just so long as our program knows to reference it.
outputDocuments is the list of output documents that the job will generate as files. Each output is described by:
- mountPath: The path inside the docker image to the output file generated by the job. The path is interpreted relative to the /parcel/data/out folder, so our program will have to write the result into /parcel/data/out/count.txt. Once the job completes, this file will be uploaded as a document.
  
  Again, any path is valid so long as our program can produce the file at that path.
- owner: Parcel ID of the owner for the newly uploaded document. In our example, we want Bob, not Acme, to own the computed value.
cmd is the command-line invocation of the program. The actual command that will run in our compute job is the concatenation of the entrypoint of the docker image and the cmd.

For the bash image that we're using, the entrypoint is ['/bin/bash'], so our job will run /bin/bash -c "wc -w </parcel/data/in/recipe.txt >/parcel/data/out/count.txt".

/bin/bash and wc in the command are binaries and are baked into the bash docker image. The two paths from the command match our mountPath specifications.

# Running the Job

With the job spec in hand, we call submitJob() to submit the job to Parcel for execution. We then poll the job status until the job finishes:

// Submit the job.
console.log('Running the job as Acme.');
const jobId = (await parcelAcme.submitJob(jobSpec)).id;

// Wait for job to finish.
let jobReport: JobStatusReport;
do {
  await new Promise((resolve) => setTimeout(resolve, 5000)); // eslint-disable-line no-promise-executor-return
  jobReport = await parcelAcme.getJobStatus(jobId);
  console.log(`Job status is ${JSON.stringify(jobReport.status)}`);
} while (
  jobReport.status.phase === JobPhase.PENDING ||
  jobReport.status.phase === JobPhase.RUNNING
);

const job = await parcelAcme.getJob(jobId);

console.log(
  `Job ${jobId} completed with status ${job.status?.phase} and ${job.io.outputDocuments.length} output document(s).`,
);

A job can be in one of four phases

PENDING if the job is yet to be executed,
RUNNING if the job is currently running,
SUCCEEDED if the custom command exited with status code 0 and the worker uploaded all output documents,
FAILED if the job is no longer running but something went wrong. The jobStatus.message field might contain more information.

In the snippet above, we wait until the job reaches one of the latter two phases.

# Collecting Job Outputs

If the job completed successfully, we can now collect the output document using the job.io.outputDocuments field.

Acme, who ran the job, can learn the document's ID but cannot access the document unless the owner (Bob) grants the permission. We will therefore read the output as Bob:

console.log('Downloading output document as Bob.');
const outputDownload = parcelBob.downloadDocument(job.io.outputDocuments[0].id);
const outputSaver = fs.createWriteStream(`/tmp/output_document`);
await outputDownload.pipeTo(outputSaver);
const output = fs.readFileSync('/tmp/output_document', 'utf-8');
console.log(`Here's the computed result: "${output}"`);

# Next Steps

In this tutorial, we saw how to analyze sensitive user data using a simple prebuilt Docker image. The next two tutorials show how to constrain data access more robustly with Parcel's Grant Specification Language and how to design custom docker images for use within Parcel effectively.

EXAMPLE

You can view the full example in the Parcel Examples repository.

← Managing User Data Structured Data →