Last year at WWDC 2017, Apple launched ARKit. Using this technology, developers can create mixed reality applications on the iOS platform quickly and use their device’s cameras to help augmented reality come to life.
In this article, we will integrate ARKit in a video conference scenario. This article describes the implementation of two scenarios in the video:
- Integrate ARKit with live video streaming
- Render the live video stream to the AR plane using Agora’s Video SDK
We will be using ARKit to detect a plane in the room and then use the Custom Video Source and Renderer function, included in Agora.io Video SDK v2.1.1, to render the live video stream onto the plane. This will end up giving a holographic feel to the video call, just like you see in Star Wars! The source code for this demo is included at the end of the article. Just add your Agora.io App ID to the ViewController.swift file and run the app on your device!
Video stream rendered in an AR plane
Basic AR Preparation
First, we will use ARKit to create a simple plane-aware application as the basis for development. Create a new project in Xcode using the Augmented Reality App template and select SceneKit as the Content Technology.
Start plane detection
Set ARConfiguration
to plane detection in ViewController.
override func viewDidLoad() {
super.viewDidLoad()
sceneView.delegate = self
sceneView.session.delegate = self
sceneView.showsStatistics = true
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
let configuration = ARWorldTrackingConfiguration()
configuration.planeDetection = .horizontal
sceneView.session.run(configuration)
}
Display the identified plane
To add a red background to the identified plane, implement the ARSCNViewDelegate
callback method, renderer:didAddNode:forAnchor
:
func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
guard let planeAnchor = anchor as? ARPlaneAnchor else {
return
}
let plane = SCNBox(width: CGFloat(planeAnchor.extent.x),
height: CGFloat(planeAnchor.extent.y),
length: CGFloat(planeAnchor.extent.z),
chamferRadius: 0)
plane.firstMaterial?.diffuse.contents = UIColor.red
let planeNode = SCNNode(geometry: plane)
node.addChildNode(planeNode)
planeNode.runAction(SCNAction.fadeOut(duration: 1))
}
You have now completed a very simple AR application. When a plane in the environment is identified, a red rectangle is added to it and fades out.
Once a plane is identified, a red rectangle appears.
Interactive Broadcasting Preparation
Now, we will use the Agora SDK to add live video calling capabilities to the app. Download the latest SDK package on the official website and add it to the Xcode project. Next, create an instance of AgoraRtcEngineKit
in the View Controller and add the following live video related settings.
let agoraKit: AgoraRtcEngineKit = {
let engine = AgoraRtcEngineKit.sharedEngine(withAppId:<#YourAppId#>, delegate: nil)
engine.setChannelProfile(.liveBroadcasting)
engine.setClientRole(.broadcaster)
engine.enableVideo()
return engine
}()
Finally, in the viewDidLoad
function, set the delegate for agoraKit
to the view controller (self) and join an Agora channel.
agoraKit.delegate = self
agoraKit.joinChannel(byToken: nil, channelId: "agoraar", info: nil, uid: 0, joinSuccess: nil)
At this point, all the preparations have been completed. We have an AR application that can recognize planes and can also make audio and video calls. The next step is to combine these two functions.
Broadcast the ARKit screen
Since ARKit already uses the device camera, we cannot start AVCaptureSession
for the video capture. Fortunately, the capturedImage
interface in ARFrame
provides the image captured by the camera for us to use.
Add custom video source
In order to transmit video data, we need to create a class (ARVideoSource
) and implement the AgoraVideoSourceProtocol
, in which bufferType
should return AgoraVideoBufferType
.
class ARVideoSource: NSObject, AgoraVideoSourceProtocol {
var consumer: AgoraVideoFrameConsumer?
func shouldInitialize() -> Bool { return true }
func shouldStart() { }
func shouldStop() { }
func shouldDispose() { }
func bufferType() -> AgoraVideoBufferType {
return .pixelBuffer
}
}
Add a method to transmit the video frames to the ARVideoSource
class:
func sendBuffer(_ buffer: CVPixelBuffer, timestamp: TimeInterval) {
let time = CMTime(seconds: timestamp, preferredTimescale: 10000)
consumer?.consumePixelBuffer(buffer, withTimestamp: time, rotation: .rotationNone)
}
Next, instantiate an ARVideoSource
in the View Controller and pass the instance variable to the Agora SDK via the setVideoSource
interface in viewDidLoad()
.
let videoSource = ARVideoSource()
override func viewDidLoad() {
……
agoraKit.setVideoSource(videoSource)
……
}
This allows us to pass video frames to the Agora SDK as long as we call videoSource’s sendBuffer:timestamp
: method.
Send Camera Data
We can get each ARFrame
through the ARSession
callback, read the camera data from it, and use the videoSource
to send out.
In the viewDidLoad
method, set the ARSession
delegate to the View Controller and add the callback function.
override func viewDidLoad() {
……
sceneView.session.delegate = self
……
}
extension ViewController: ARSessionDelegate {
func session(_ session: ARSession, didUpdate frame: ARFrame) {
videoSource.sendBuffer(frame.capturedImage, timestamp: frame.timestamp)
}
}
Send ARSCNView data
ARFrame’s capturedImage
method collects the raw data from the camera. If we want to send a picture with a virtual object already added, we must obtain the ARSCNView
data. Here’s a simple idea: set a timer, switch SCNView
to UIImage
, convert it to CVPixelBuffer
, and provide it to videoSource
. The sample logic code is provided below:
func startCaptureView() {
timer.schedule(deadline: .now(), repeating: .milliseconds(100))
timer.setEventHandler { [unowned self] in
let sceneImage: UIImage = self.image(ofView: self.sceneView)
self.videoSourceQueue.async { [unowned self] in
let buffer: CVPixelBuffer = self.pixelBuffer(ofImage: sceneImage)
self.videoSource.sendBuffer(buffer, timestamp: Double(mach_absolute_time()))
}
}
timer.resume()
}
Rendering the live streaming video to the AR scene
Add virtual display
First we need to create a virtual display for rendering remote video and add it to the AR scene with the user’s click.
Add a UITapGestureRecognizer
to ARSCNView
in the Storyboard. When the user clicks on the screen, get the position of the plane through ARSCNView
’s hitTest
method and put a virtual display on the clicked position.
@IBAction func doSceneViewTapped(_ recognizer: UITapGestureRecognizer) {
let location = recognizer.location(in: sceneView)
guard let result = sceneView.hitTest(location, types: .existingPlane).first else {
return
}
let scene = SCNScene(named: "art.scnassets/displayer.scn")!
let rootNode = scene.rootNode
rootNode.simdTransform = result.worldTransform
sceneView.scene.rootNode.addChildNode(rootNode)
let displayer = rootNode.childNode(withName: "displayer", recursively: false)!
let screen = displayer.childNode(withName: "screen", recursively: false)!
unusedScreenNodes.append(screen)
}
Users may add multiple display screens by clicking on the screen and they will be left in the unusedScreenNodes
array until they are used and video is rendered to them.
Add custom video renderer
In order to obtain remote video data from the Agora SDK, we need to construct an object ARVideoRenderer
, which implements the AgoraVideoSinkProtocol
.
class ARVideoRenderer: NSObject {
var renderNode: SCNNode?
}
extension ARVideoRenderer: AgoraVideoSinkProtocol {
func shouldInitialize() -> Bool { return true }
func shouldStart() { }
func shouldStop() { }
func shouldDispose() { }
func bufferType() -> AgoraVideoBufferType {
return .rawData
}
func pixelFormat() -> AgoraVideoPixelFormat {
return .I420
}
func renderRawData(_ rawData: UnsafeMutableRawPointer, size: CGSize, rotation: AgoraVideoRotation) {
……
}
}
The remoteRenderData:size:rotation
: method can get the remote video data, and then use the Metal framework to render to SCNNode
. The full Metal rendering code can be found in the final version of the demo.
Set custom renderer to Agora SDK
By implementing the rtcEngine:didJoinedOfUid:elapsed
: callback of the AgoraRtcEngineDelegate
protocol, you can identify when/where the streamer joins the channel. Create an instance of ARVideoRenderer
in the callback, set the virtual screen node (created by the previous user when clicking on the screen) to ARVideoRenderer
, and set the custom renderer to the Agora SDK via the setRemoteVideoRenderer:forUserId
: interface.
func rtcEngine(_ engine: AgoraRtcEngineKit, didJoinedOfUid uid: UInt, elapsed: Int) {
guard !unusedScreenNodes.isEmpty else {
return
}
let screenNode = unusedScreenNodes.removeFirst()
let renderer = ARVideoRenderer()
renderer.renderNode = screenNode
agoraKit.setRemoteVideoRenderer(renderer, forUserId: uid)
}
This way when the other user joins the channel, the other user’s video will be displayed on the AR plane and get the effect of a virtual conference room.
Using the Agora SDK’s custom video source and custom video renderer features, it’s easy to combine AR and live video scenarios. This demo runs on the Agora SDK using the Agora software defined realtime network and can support 17 simultaneous video streams. It is quite clear that AR technology will bring a whole new experience to real-time video streaming.
Where to take this from here:
- Challenge a friend in Pokemon Go
- Bring your friends/families/colleges closer to you in a video call
- Create a mixed reality fitness app to connect trainers to their clients
For the full source code, check out the Github repo here.
Please feel free to reach out on our Developer Slack Channel if you have any questions! If you’d like to be a part of our Slack Community, please fill out this form and we’ll send the invite out!
Sign up for Agora