(untagged)

Augmented Reality Video Conference

Sid Sharma

20 Apr 2018

In this article, we will integrate ARKit in a video conference scenario.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Last year at WWDC 2017, Apple launched ARKit. Using this technology, developers can create mixed reality applications on the iOS platform quickly and use their device’s cameras to help augmented reality come to life.

In this article, we will integrate ARKit in a video conference scenario. This article describes the implementation of two scenarios in the video:

Integrate ARKit with live video streaming
Render the live video stream to the AR plane using Agora’s Video SDK

We will be using ARKit to detect a plane in the room and then use the Custom Video Source and Renderer function, included in Agora.io Video SDK v2.1.1, to render the live video stream onto the plane. This will end up giving a holographic feel to the video call, just like you see in Star Wars! The source code for this demo is included at the end of the article. Just add your Agora.io App ID to the ViewController.swift file and run the app on your device!

Video stream rendered in an AR plane

Basic AR Preparation

First, we will use ARKit to create a simple plane-aware application as the basis for development. Create a new project in Xcode using the Augmented Reality App template and select SceneKit as the Content Technology.

Start plane detection

Set ARConfiguration to plane detection in ViewController.

override func viewDidLoad() {
    super.viewDidLoad()
    sceneView.delegate = self
    sceneView.session.delegate = self
    sceneView.showsStatistics = true
}
override func viewWillAppear(_ animated: Bool) {
    super.viewWillAppear(animated)
    let configuration = ARWorldTrackingConfiguration()
    configuration.planeDetection = .horizontal
    sceneView.session.run(configuration)
}

Display the identified plane

To add a red background to the identified plane, implement the ARSCNViewDelegate callback method, renderer:didAddNode:forAnchor:

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
    guard let planeAnchor = anchor as? ARPlaneAnchor else {
        return
    }
    let plane = SCNBox(width: CGFloat(planeAnchor.extent.x),
                height: CGFloat(planeAnchor.extent.y),
                length: CGFloat(planeAnchor.extent.z),
                chamferRadius: 0)
    plane.firstMaterial?.diffuse.contents = UIColor.red
    let planeNode = SCNNode(geometry: plane)
    node.addChildNode(planeNode)
    planeNode.runAction(SCNAction.fadeOut(duration: 1))
}

You have now completed a very simple AR application. When a plane in the environment is identified, a red rectangle is added to it and fades out.

Once a plane is identified, a red rectangle appears.

Interactive Broadcasting Preparation

Now, we will use the Agora SDK to add live video calling capabilities to the app. Download the latest SDK package on the official website and add it to the Xcode project. Next, create an instance of AgoraRtcEngineKit in the View Controller and add the following live video related settings.

let agoraKit: AgoraRtcEngineKit = {
    let engine = AgoraRtcEngineKit.sharedEngine(withAppId:<#YourAppId#>, delegate: nil)
    engine.setChannelProfile(.liveBroadcasting)
    engine.setClientRole(.broadcaster)
    engine.enableVideo()
    return engine
}()

Finally, in the viewDidLoad function, set the delegate for agoraKit to the view controller (self) and join an Agora channel.

agoraKit.delegate = self
agoraKit.joinChannel(byToken: nil, channelId: "agoraar", info: nil, uid: 0, joinSuccess: nil)

At this point, all the preparations have been completed. We have an AR application that can recognize planes and can also make audio and video calls. The next step is to combine these two functions.

Broadcast the ARKit screen

Since ARKit already uses the device camera, we cannot start AVCaptureSession for the video capture. Fortunately, the capturedImage interface in ARFrame provides the image captured by the camera for us to use.

Add custom video source

In order to transmit video data, we need to create a class (ARVideoSource) and implement the AgoraVideoSourceProtocol, in which bufferType should return AgoraVideoBufferType.

class ARVideoSource: NSObject, AgoraVideoSourceProtocol {
    var consumer: AgoraVideoFrameConsumer?
    func shouldInitialize() -> Bool { return true }
    func shouldStart() { }
    func shouldStop() { }
    func shouldDispose() { }
    func bufferType() -> AgoraVideoBufferType {
        return .pixelBuffer
    }
}

Add a method to transmit the video frames to the ARVideoSource class:

func sendBuffer(_ buffer: CVPixelBuffer, timestamp: TimeInterval) {
    let time = CMTime(seconds: timestamp, preferredTimescale: 10000)
    consumer?.consumePixelBuffer(buffer, withTimestamp: time, rotation: .rotationNone)
}

Next, instantiate an ARVideoSource in the View Controller and pass the instance variable to the Agora SDK via the setVideoSource interface in viewDidLoad().

let videoSource = ARVideoSource()
override func viewDidLoad() {
    ……
    agoraKit.setVideoSource(videoSource)
    ……
}

This allows us to pass video frames to the Agora SDK as long as we call videoSource’s sendBuffer:timestamp: method.

Send Camera Data

We can get each ARFrame through the ARSession callback, read the camera data from it, and use the videoSource to send out.

In the viewDidLoad method, set the ARSession delegate to the View Controller and add the callback function.

override func viewDidLoad() {
    ……
    sceneView.session.delegate = self
    ……
}

extension ViewController: ARSessionDelegate {
    func session(_ session: ARSession, didUpdate frame: ARFrame) {
        videoSource.sendBuffer(frame.capturedImage, timestamp: frame.timestamp)
    }
}

Send ARSCNView data

ARFrame’s capturedImage method collects the raw data from the camera. If we want to send a picture with a virtual object already added, we must obtain the ARSCNView data. Here’s a simple idea: set a timer, switch SCNView to UIImage, convert it to CVPixelBuffer, and provide it to videoSource. The sample logic code is provided below:

func startCaptureView() {
     //Timer with 0.1 second interval
     timer.schedule(deadline: .now(), repeating: .milliseconds(100))
     timer.setEventHandler { [unowned self] in
         // Turn sceneView data into UIImage
         let sceneImage: UIImage = self.image(ofView: self.sceneView)
         // Provide to Agora SDK after being converted to CVPixelBuffer
         self.videoSourceQueue.async { [unowned self] in
            let buffer: CVPixelBuffer = self.pixelBuffer(ofImage: sceneImage)
            self.videoSource.sendBuffer(buffer, timestamp: Double(mach_absolute_time()))
        }
    }
    timer.resume()
}

Rendering the live streaming video to the AR scene

Add virtual display

First we need to create a virtual display for rendering remote video and add it to the AR scene with the user’s click.

Add a UITapGestureRecognizer to ARSCNView in the Storyboard. When the user clicks on the screen, get the position of the plane through ARSCNView’s hitTest method and put a virtual display on the clicked position.

@IBAction func doSceneViewTapped(_ recognizer: UITapGestureRecognizer) {
     let location = recognizer.location(in: sceneView)
     guard let result = sceneView.hitTest(location, types: .existingPlane).first else {
         return
     }
     let scene = SCNScene(named: "art.scnassets/displayer.scn")!
     let rootNode = scene.rootNode
     rootNode.simdTransform = result.worldTransform
     sceneView.scene.rootNode.addChildNode(rootNode)
    let displayer = rootNode.childNode(withName: "displayer", recursively: false)!
    let screen = displayer.childNode(withName: "screen", recursively: false)!
    unusedScreenNodes.append(screen)
}

Users may add multiple display screens by clicking on the screen and they will be left in the unusedScreenNodes array until they are used and video is rendered to them.

Add custom video renderer

In order to obtain remote video data from the Agora SDK, we need to construct an object ARVideoRenderer, which implements the AgoraVideoSinkProtocol.

class ARVideoRenderer: NSObject {
     var renderNode: SCNNode?
 }
 extension ARVideoRenderer: AgoraVideoSinkProtocol {
     func shouldInitialize() -> Bool { return true }
     func shouldStart() { }
     func shouldStop() { }
     func shouldDispose() { }
     func bufferType() -> AgoraVideoBufferType {
        return .rawData
    }
    func pixelFormat() -> AgoraVideoPixelFormat {
        return .I420
    }
    func renderRawData(_ rawData: UnsafeMutableRawPointer, size: CGSize, rotation: AgoraVideoRotation) {
        ……
    }
}

The remoteRenderData:size:rotation: method can get the remote video data, and then use the Metal framework to render to SCNNode. The full Metal rendering code can be found in the final version of the demo.

Set custom renderer to Agora SDK

By implementing the rtcEngine:didJoinedOfUid:elapsed: callback of the AgoraRtcEngineDelegate protocol, you can identify when/where the streamer joins the channel. Create an instance of ARVideoRenderer in the callback, set the virtual screen node (created by the previous user when clicking on the screen) to ARVideoRenderer, and set the custom renderer to the Agora SDK via the setRemoteVideoRenderer:forUserId: interface.

func rtcEngine(_ engine: AgoraRtcEngineKit, didJoinedOfUid uid: UInt, elapsed: Int) {
    guard !unusedScreenNodes.isEmpty else {
        return
    }
    let screenNode = unusedScreenNodes.removeFirst()
    let renderer = ARVideoRenderer()
    renderer.renderNode = screenNode
    agoraKit.setRemoteVideoRenderer(renderer, forUserId: uid)
}

This way when the other user joins the channel, the other user’s video will be displayed on the AR plane and get the effect of a virtual conference room.

Using the Agora SDK’s custom video source and custom video renderer features, it’s easy to combine AR and live video scenarios. This demo runs on the Agora SDK using the Agora software defined realtime network and can support 17 simultaneous video streams. It is quite clear that AR technology will bring a whole new experience to real-time video streaming.

Where to take this from here:

Challenge a friend in Pokemon Go
Bring your friends/families/colleges closer to you in a video call
Create a mixed reality fitness app to connect trainers to their clients

For the full source code, check out the Github repo here.

Please feel free to reach out on our Developer Slack Channel if you have any questions! If you’d like to be a part of our Slack Community, please fill out this form and we’ll send the invite out!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here