Document AI Technical FAQ

Document AI
Most common technical questions around Document AI service in Google Cloud
Author

Rafa Sanchez

Published

August 28, 2022

Online request with file in GCS

You can NOT yet use file in GCS for online requests. It is only available for batch.

For online requests therefore there are two options:

  1. Use the file locally (for example, within a notebook), you have to upload the file when making the request.
  2. Mount the directory locally (on the notebook), which is slower, because you have to download and upload the file.

The fastest therefore is #1.

The online functionality with GCS is requested as FR. As mentioned above, GCS is supported in batch, but it is intended for inferences on several dozen documents (not just one).

Metrics available in Custom Document Splitter (CDS)

Custom Document Splitter (CDS) shows 4 type of metrics:

  • Document-level
  • First page
  • Interior page
  • Split only

Document-level vs First page: The document level corresponds to the entities in the Document object. Each entity represents one (splitted) logical document. The page level corresponds to the pages in a Document object entity. Each page can be thought of as having a <document type> + <BI tag> label, with the BI tag being one of {Begin, Inside} and representing whether the page is a first page (Begin) or a middle/last page (Inside) of the logical document. Document-level metrics are important to understand overall model performance but may be lower than page-level metrics, since an incorrect prediction in a single page will result in one or mode incorrect document predictions. Page-level metrics are more granular and may help identify more specific issues, e.g. the first page of a certain document type A being frequently classified as the first page of document type B.

Split-only metrics mean that we don’t care about the document type predicted (either right or wrong), but only care that the model is splitting the documents correctly. For example, there are 5 pages: AAABB (first three belong to document type A, and last two belong to document type B) - so if the model predicts CCCDD, the split only metrics will be 1 because the model is able to predict correctly that it has to split after the third page, although the predicted document types are different.

Interior page means if the model is able to predict the ‘Inside’ pages correctly (both the document type and ‘I’ inside tag)

First page predictions mean if the model is able to predict any ‘First’ page correctly (both the document type and ‘B’ Begin tag)

Metrics available in Custom Document Extractor (CDE)

https://cloud.google.com/document-ai/docs/workbench/evaluate#all-labels

Batch prediction: joining outputs with inputs

The Operation Result, once done, contains metadata which contains the individualProcessStatuses which contain the processStatus of every document in the batch including the inputGcsSource which can be used to link the individual result back to the source input document.

This code uses a Google Cloud Workflow handles the batchProcess result.

Deep Mind collaboration and Document AI

Deep Mind has provided enhancements to use 50-70% less training data to parse utility bills and purchase orders and are continuing internal evaluation of the method on other document types. Deep Mind is also collaborating with Google Cloud on improving the solution’s performance across languages with smaller datasets. Refer to more information here.