Building a React Native Core ML Computer Vision App with Expo and YOLOv8

6 min readOct 2, 2024

In this tutorial, we’ll create an Expo module for image classification using YOLOv8 and build a simple app to showcase how a Core ML model can be run using React Native. This module can be integrated into any Expo or React Native app, making it easy for developers with React and JavaScript experience to add AI capabilities to their projects. The full example code can be found here.

Overview

We’ll build an Expo module that uses YOLOv8 for image classification on iOS devices. The module exposes a view component that handles both camera preview and model inference.

A screenshot from the example app — A screenshot from the demo app

Prerequisites

Xcode installed on your Mac
Basic knowledge of React Native and Expo
Node.js and npm installed
Python to export the YOLOv8 model into Core ML

Creating the Expo Module

First, let’s create the Expo module for YOLOv8 classification.

1. Set up the module structure

npx create-expo-module yolov8-classify

You can accept all the defaults when prompted.

2. Find the ios/Yolov8ClassifyView.swift file and modify as follows

import ExpoModulesCore
import Vision
import WebKit
import UIKit
import AVFoundation

enum Yolov8ClassifyViewError: Error {
    case mlModelNotFound
    case mlModelLoadingFailed(Error)
    case videoDeviceInputCreationFailed
    case cannotAddVideoInput
    case cannotAddVideoOutput
    case failedToLockVideoDevice(Error)
    case pixelBufferUnavailable
    case requestProcessingFailed(Error)
}

class Yolov8ClassifyView: ExpoView, AVCaptureVideoDataOutputSampleBufferDelegate {
    private let previewView = UIView()
    private var previewLayer: AVCaptureVideoPreviewLayer?
    private let onResult = EventDispatcher()
    private let session = AVCaptureSession()
    private var bufferSize: CGSize = .zero
    private var requests = [VNRequest]()

    required init(appContext: AppContext? = nil) {
        super.init(appContext: appContext)
        setupCaptureSession()
    }

    private func setupCaptureSession() {
        do {
            try setupCapture()
            try setupOutput()
            try setupVision()
            setupPreviewLayer()
            DispatchQueue.global(qos: .userInitiated).async { [weak self] in
                self?.session.startRunning()
            }
        } catch {
            print("Error setting up capture session: \(error)")
        }
    }

    private func setupCapture() throws {
        guard let videoDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .back),
              let deviceInput = try? AVCaptureDeviceInput(device: videoDevice) else {
            throw Yolov8ClassifyViewError.videoDeviceInputCreationFailed
        }

        session.beginConfiguration()

        guard session.canAddInput(deviceInput) else {
            throw Yolov8ClassifyViewError.cannotAddVideoInput
        }

        session.addInput(deviceInput)
        setupBufferSize(for: videoDevice)
        session.commitConfiguration()
    }

    private func setupOutput() throws {
        let videoDataOutput = AVCaptureVideoDataOutput()
        let videoDataOutputQueue = DispatchQueue(
            label: "VideoDataOutput",
            qos: .userInitiated,
            attributes: [],
            autoreleaseFrequency: .workItem
        )

        guard session.canAddOutput(videoDataOutput) else {
            throw Yolov8ClassifyViewError.cannotAddVideoOutput
        }

        session.addOutput(videoDataOutput)
        videoDataOutput.alwaysDiscardsLateVideoFrames = true
        videoDataOutput.videoSettings = [
            kCVPixelBufferPixelFormatTypeKey as String:
            Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)
        ]
        videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
    }

    private func setupVision() throws {
        guard let modelURL = Bundle.main.url(
            forResource: "yolov8x-cls",
            withExtension: "mlmodelc"
        ) else {
            throw Yolov8ClassifyViewError.mlModelNotFound
        }

        do {
            let visionModel = try VNCoreMLModel(for: MLModel(contentsOf: modelURL))
            let classificationRequest = VNCoreMLRequest(
                model: visionModel,
                completionHandler: handleClassification
            )
            self.requests = [classificationRequest]
        } catch {
            throw Yolov8ClassifyViewError.mlModelLoadingFailed(error)
        }
    }

    private func setupPreviewLayer() {
        let layer = AVCaptureVideoPreviewLayer(session: session)
        layer.videoGravity = .resizeAspectFill
        previewLayer = layer
        previewView.layer.addSublayer(layer)
        addSubview(previewView)
    }

    private func setupBufferSize(for videoDevice: AVCaptureDevice) {
        do {
            try videoDevice.lockForConfiguration()
            let dimensions = CMVideoFormatDescriptionGetDimensions(
                videoDevice.activeFormat.formatDescription
            )
            bufferSize.width = CGFloat(dimensions.width)
            bufferSize.height = CGFloat(dimensions.height)
            videoDevice.unlockForConfiguration()
        } catch {
            print("Failed to lock video device for configuration: \(error)")
        }
    }

    override func layoutSubviews() {
        super.layoutSubviews()
        previewView.frame = bounds
        previewLayer?.frame = previewView.bounds
    }

    private func handleClassification(request: VNRequest, error: Error?) {
        if let results = request.results as? [VNClassificationObservation],
           let topResult = results.max(by: { $0.confidence < $1.confidence }) {
            DispatchQueue.main.async { [weak self] in
                self?.onResult(["classification": topResult.identifier])
            }
        }
    }

    func captureOutput(
        _ output: AVCaptureOutput,
        didOutput sampleBuffer: CMSampleBuffer,
        from connection: AVCaptureConnection
    ) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            print("Could not get image buffer from sample buffer.")
            return
        }

        let imageRequestHandler = VNImageRequestHandler(
            cvPixelBuffer: pixelBuffer,
            orientation: .right,
            options: [:]
        )
        do {
            try imageRequestHandler.perform(self.requests)
        } catch {
            print("Failed to perform image request: \(error)")
        }
    }
}

This Swift code sets up the camera session, performs inference using the YOLOv8 model, and sends the classification results to the JavaScript side.

3. Modify ios/Yolov8ClassifyModule.swift

import ExpoModulesCore

public class Yolov8ClassifyModule: Module {
  public func definition() -> ModuleDefinition {
    Name("Yolov8Classify")

    View(Yolov8ClassifyView.self) {
      Events("onResult")
    }
  }
}

This code is the Expo Module definition. We’ll use the “onResult” event to return the predicted classifications into JavaScript.

4. Modify ios/Yolov8Classify.podspec

require 'json'

package = JSON.parse(File.read(File.join(__dir__, '..', 'package.json')))

Pod::Spec.new do |s|
  s.name           = 'Yolov8Classify'
  s.version        = package['version']
  s.summary        = package['description']
  s.description    = package['description']
  s.license        = package['license']
  s.author         = package['author']
  s.homepage       = package['homepage']
  s.platforms      = { :ios => '13.4', :tvos => '13.4' }
  s.swift_version  = '5.4'
  s.source         = { git: 'https://github.com/hietalajulius/yolov8-classify-rn' }
  s.static_framework = true

  s.dependency 'ExpoModulesCore'

  # Swift/Objective-C compatibility
  s.pod_target_xcconfig = {
    'DEFINES_MODULE' => 'YES',
    'SWIFT_COMPILATION_MODE' => 'wholemodule'
  }

  s.source_files = "**/*.{h,m,swift}"
  s.resources = "**/*.mlmodelc"
end

The line that says “s.resources = “**/*.mlmodelc”” ensures that the YOLOv8 Core ML model is included when the module is being built.

5. Export the YOLOv8 model as a Core ML model

First, install the ultalytics pip package and run the model export

pip install ultralytics && python -c "from ultralytics import YOLO; model = YOLO('yolov8x-cls.pt'); model.export(format='coreml', nms=True, imgsz=(640, 480))"

Next, convert the exported file into .modelc with and place it in the ios directory

xcrun coremlcompiler compile yolov8x-cls.mlpackage/ ios

6. Ensure camera permissions using a config plugin

The config plugin will ensure all apps that use the module will include the NSCameraUsageDescription in its Info.plist You need to create the following files:

plugin/src/index.ts

import { withInfoPlist, ConfigPlugin } from "expo/config-plugins";

const withCameraUsageDescription: ConfigPlugin<{
  cameraUsageDescription?: string;
}> = (config, { cameraUsageDescription }) => {
  config = withInfoPlist(config, (config) => {
    config.modResults["NSCameraUsageDescription"] =
      cameraUsageDescription ??
      "The camera is used to stream video for classification.";
    return config;
  });

  return config;
};

export default withCameraUsageDescription;

plugin/tsconfig.json

{
    "extends": "expo-module-scripts/tsconfig.plugin",
    "compilerOptions": {
      "outDir": "build",
      "rootDir": "src"
    },
    "include": ["./src"],
    "exclude": ["**/__mocks__/*", "**/__tests__/*"]
  }

app.plugin.js

module.exports = require("./plugin/build");

7. Build the JavaScript side of the module

This code creates a React component that wraps the native view and handles the onResult event.

src/Yolov8ClassifyView.tsx

import { requireNativeViewManager } from "expo-modules-core";
import * as React from "react";
import { ViewProps } from "react-native";

export type OnResultEvent = {
  classification: string;
};
export type Props = {
  onResult?: (event: { nativeEvent: OnResultEvent }) => void;
} & ViewProps;
const NativeView: React.ComponentType =
  requireNativeViewManager("Yolov8Classify");
export default function Yolov8ClassifyView(props: Props) {
  return <NativeView {...props} />;
}

Creating the Example App

Now, let’s create an example app to showcase the module’s functionality.

example/App.tsx

import {
  StyleSheet,
  View,
  Text,
  ScrollView,
  Image,
  TouchableOpacity,
  Linking,
} from "react-native";
import { Yolov8ClassifyView } from "yolov8-classify";
import { useState } from "react";

const formatClassification = (classification: string) => {
  return classification
    .split("_") // split the string into words
    .map((word) => word.charAt(0).toUpperCase() + word.slice(1).toLowerCase()) // capitalize the first letter of each word
    .join(" "); // join the words back into a string with spaces
};

export default function App() {
  const [classification, setClassification] = useState<string | null>(null);
  const openSourceCode = () => {
    const url = "https://www.juliushietala.com/"; // replace with your source code URL
    Linking.canOpenURL(url).then((supported) => {
      if (supported) {
        Linking.openURL(url);
      } else {
        console.log(`Don't know how to open URL: ${url}`);
      }
    });
  };

  return (
    <View style={styles.container}>
      <Yolov8ClassifyView
        style={styles.camera}
        onResult={(result) =>
          setClassification(result.nativeEvent.classification)
        }
      >
        <View style={styles.overlay}>
          {classification && (
            <Text style={styles.classification}>
              {formatClassification(classification)}
            </Text>
          )}
        </View>
      </Yolov8ClassifyView>
      <View style={styles.menuContainer}>
        <ScrollView horizontal showsHorizontalScrollIndicator={false}>
          <TouchableOpacity style={styles.menu} onPress={openSourceCode}>
            <Image
              style={styles.menuInner}
              source={require("./assets/logo.webp")}
            />
          </TouchableOpacity>
        </ScrollView>
      </View>
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
  },
  camera: {
    flex: 1,
  },
  overlay: {
    flex: 1,
    backgroundColor: "rgba(0, 0, 0, 0.1)",
    justifyContent: "center",
    alignItems: "center",
  },
  classification: {
    color: "white",
    fontSize: 24,
  },
  menuContainer: {
    position: "absolute",
    top: 50,
    left: 10,
    right: 10,
    height: 50,
    flexDirection: "row",
  },
  menu: {
    width: 50,
    height: 50,
    borderRadius: 25,
    backgroundColor: "white",
    marginHorizontal: 5,
    justifyContent: "center",
    alignItems: "center",
  },
  menuInner: {
    width: 46,
    height: 46,
    borderRadius: 23,
    backgroundColor: "grey",
  },
});

Running the App

To run the app on an iOS simulator or device:

In a new terminal window, run npm run build
In another terminal window, run npm run build plugin
To start the example app, run cd example && npx expo run:ios --device (you’ll need to connect your iPhone via USB to be able to select it)

Conclusion

We’ve successfully created an Expo module for image classification using YOLOv8 and built a simple app to demonstrate its functionality. This module can be easily integrated into any Expo or React Native app, allowing developers to add AI capabilities to their projects with minimal machine learning knowledge.

The YOLOv8 classification model is just the beginning. Similar approaches can be used to implement object detection, segmentation, or other computer vision tasks in your mobile apps.

For the complete source code and more detailed instructions, please refer to the GitHub repository.

Remember, this implementation is currently for iOS only. Android support could be added in the future using a similar approach with the appropriate native code.

By leveraging tools like Expo and pre-trained models, we can bridge the gap between frontend development and AI, enabling a wider range of developers to create intelligent mobile applications.