The vision package provides common functionality for working with visual object detectors. Specific detector implementations are provided in the following subpackages:

Barcode detector: com.google.android.gms.vision.barcode
Face detector: com.google.android.gms.vision.face
Text recognizer: com.google.android.gms.vision.text

Basic Usage

The easiest way to start is to operate on a single frame only. First initialize a Frame from a Bitmap:

Frame frame = Frame.Builder()
  .setBitmap(myBitmap)
  .build();

Next, create a detector instance and call its detect method with the Frame, and examine the detection results.

For example, the face detector can be used like this:

FaceDetector faceDetector = new FaceDetector.Builder(context)
  .setTrackingEnabled(false)
  .build();

SparseArray<Face> faces = faceDetector.detect(frame);

Similarly, the code for using barcode detector or text recognizer follows this same structure:

BarcodeDetector barcodeDetector = new BarcodeDetector.Builder(context)
  .build();

SparseArray<Barcode> barcodes = barcodeDetector.detect(frame);

Event-Driven Pipeline

For video applications, the vision package also provides a high performance, yet easy to use, video pipeline structure for detecting and tracking visual objects. This pipeline builds upon the detectors described above, but adds sources for supplying a series of images and processors for receiving detection results.

The vision framework includes a general-purpose CameraSource for working with the video camera in conjunction with one or more detectors. It also includes processor implementations that are sufficient for building most apps (see LargestFaceFocusingProcessor and MultiProcessor).

Apps can receive detection events from the pipeline by supplying a Tracker, as illustrated in the example below.

Example: Tracking the User's Face with the Front-Facing Camera

In this example, we will create a face detector pipeline that locates the user with the front-facing camera, tracks the user over time, and notes when the user is smiling. We start by creating a face detector with the appropriate options:

FaceDetector faceDetector = new FaceDetector.Builder(context)
    .setProminentFaceOnly(true)
    .setClassificationType(FaceDetector.ALL_CLASSIFICATIONS)
    .build();

In this example, the prominent face option configures the detector to only track one face (making face tracking faster) and enables classification (used for detecting smiling for this example).

We associate a LargestFaceFocusingProcessor with the detector which will receive the results, select the largest face, and will send events to an associated FaceTracker (described below):

faceDetector.setProcessor(
  new LargestFaceFocusingProcessor.Builder(detector, new FaceTracker())
    .build());

In this example, FaceTracker is the app's tracker implementation for receiving events about the user's face:

class FaceTracker extends Tracker<Face> {
  public void onNewItem(int id, Face face) {
    Log.i(TAG, "Awesome person detected.  Hello!");
  }

  public void onUpdate(Detector.Detections<Face> detections, Face face) {
    if (face.getIsSmilingProbability() > 0.75) {
      Log.i(TAG, "I see a smile.  They must really enjoy your app.");
    }
  }

  public void onDone() {
    Log.i(TAG, "Elvis has left the building.");
  }
}

The last step in setting up the pipeline is to create a camera source to stream video images into the face detector:

mCameraSource = new CameraSource.Builder(context, faceDetector)
  .setFacing(CameraSource.CAMERA_FACING_FRONT)
  .setRequestedPreviewSize(320, 240)
  .build();

See the discussion below on the recommended approach for managing the camera source lifecycle.

Pipeline Lifecycle in an Activity

A typical structure for an activity using a detector pipeline is to create the pipeline in onCreate(Bundle), pause the pipeline in onPause(), resume the pipeline in onResume(), and to dispose the pipeline in onDestroy(). For example:

public class MyPipelineActivity extends Activity {
  ...
    public void onCreate(Bundle bundle) {
      super.onCreate(bundle);
      ... create the detector, processor, and camera source as described above ...
    }

    protected void onResume() {
      super.onResume();
      startCameraSource();
    }

    protected void onPause() {
      super.onPause();
      mCameraSource.stop();
    }

    protected void onDestroy() {
      super.onDestroy();
      mCameraSource.release();
    }

    private void startCameraSource() {
      try {
        mCameraSource.start();
      } catch (IOException e) {
        ... your error handling here, in case there is a problem with the camera ...
      }
    }

Add the Vision Dependency to your Android Manifest

Adding the vision functionality dependency to your project's AndroidManifest.xml will indicate to the installer that it should download the dependency on app install time. Although this is not strictly required, it can make the user experience better when initially running your app.

For example, adding the following to AndroidManifest.xml (in the application section) will indicate that both the barcode and face detection dependencies should be downloaded at app install time:

<meta-data android:name="com.google.android.gms.vision.DEPENDENCIES" android:value="barcode,face">

Valid vision dependency values are:

barcode
face
ocr

However, even if this is supplied, in some cases the dependencies required to run the detectors may be downloaded on demand when your app is run for the first time rather than at install time. See isOperational() and detectorIsOperational() for more information on checking the dependency download status in your app.

Interfaces

CameraSource.PictureCallback	Callback interface used to supply image data from a photo capture.
CameraSource.ShutterCallback	Callback interface used to signal the moment of actual image capture.
Detector.Processor<T>	Interface for defining a post-processing action to be executed for each detection, when using the detector as part of a pipeline (see the class level docs above).
MultiProcessor.Factory<T>	Factory for creating new tracker instances.

Classes

CameraSource	Manages the camera in conjunction with an underlying `Detector`.
CameraSource.Builder	Builder for configuring and creating an associated camera source.
Detector<T>	Detector is the base class for implementing specific detector instances, such as a barcode detector or face detector.
Detector.Detections<T>	Detection result object containing both detected items and the associated frame metadata.
FocusingProcessor<T>	Base class for implementing a `processor` which filters the set of detection results, consistently delivering a single detected item to an associated `Tracker`.
Frame	Image data with associated `metadata`.
Frame.Builder	Builder for creating a frame instance.
Frame.Metadata	Frame metadata, describing the image dimensions, rotation, and sequencing information.
MultiDetector	A multi-detector is used to combine multiple detectors, so that multiple detectors can be used together on a frame or frames received from a source within a pipeline.
MultiDetector.Builder	Builder for creating MultiDetector instances.
MultiProcessor<T>	Detection processor which distributes the items of a detection result among individual trackers.
MultiProcessor.Builder<T>	Builder for creating a multiprocessor instance.
Tracker<T>	A tracker is used to receive notifications for a detected item over time.