Categories
Uncategorized

How to Make Picture-in-Picture Mode on Android With Code Examples

This is how Picture-in-Picture mode looks like

In recent years, smartphones have become increasingly close to computers in terms of functionality, and many are already replacing the PC as their primary tool for work. The advantage of personal computers was multi-window capability, which remained unavailable on smartphones. But with the release of Android 7.0, this began to change and multi-window support appeared.

   It’s hard to overestimate the convenience of a small floating window with the video of the interlocutor when the call is minimized – you can continue the dialogue and simultaneously take notes or clarify some information. Android has two options for implementing this functionality: support for the application in a floating window and a picture-in-picture mode. Ideally, an application should support both approaches, but the floating window is more difficult to develop and imposes certain restrictions on the overall application design, so let’s consider picture-in-picture (PiP) on Android as a relatively simple way to bring multi-window support into your application.

pop up video call
PIP mode for video calls on Android

Switching to PIP mode

        Picture-in-picture mode is supported on most devices with Android 8 and above. Accordingly, if you support system versions lower than this, all PIP mode-related calls should be wrapped in the system version check:

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) { 
    // Something related to PiP 
}

   The entire `Activity` is converted to PIP, and first, you need to declare PIP support for this `Activity` in `AndroidManifest.xml`:

<activity
    ...
    android:supportsPictureInPicture="true" />

       Before using picture-in-picture it is necessary to make sure that the user’s device supports this mode, to do this we turn to the `PackageManager`.

val isPipSupported = context.packageManager.hasSystemFeature(PackageManager.FEATURE_PICTURE_IN_PICTURE)

After that, in its simplest form, the transition to picture-in-picture mode is done literally with one line:

this.enterPictureInPictureMode()

   But to go to it, you need to know when it is convenient for the user. You can make a separate button and jump when you click on it. The most common approach is an automatic switch when the user minimizes the application during a call. To track this event, there is a handy method `Activity.onUserLeaveHint` called whenever the user intentionally leaves `Activity` — whether via the Home or Recent button.

override fun onUserLeaveHint() {
    ...
    if (isPipSupported && imaginaryCallManager.isInCall)
        this.enterPictureInPictureMode()
}

Interface adaptation

        Great, now our call screen automatically goes into PIP mode on Android! But there are often “end call” or “change camera” buttons, and they will not work in this mode. It’s better to hide them when transitioning.

        To track the transition to / from PIP mode, `Activity` and `Fragment` have a method `onPictureInPictureModeChanged`. Let’s redefine it and hide unnecessary interface elements

override fun onPictureInPictureModeChanged(
    isInPictureInPictureMode: Boolean,
    newConfig: Configuration?
) {
    super.onPictureInPictureModeChanged(isInPictureInPictureMode, newConfig)
    setIsUiVisible(isInPictureInPictureMode)
}

   The PIP window is quite small, so it makes sense to hide everything except the interlocutor’s video, including the local user’s video — it will be too small to see anything there anyway.

How to implement picture-in-picture mode on Android app?

Customization

        The PIP window can be further customized by passing `PictureInPictureParams` in a call to `enterPictureInPictureMode`. There are not many customization options, but the option to add buttons to the bottom of the window deserves special attention. This is a nice way to keep the screen interactive despite the fact that the regular buttons stop working in PIP mode.

        The maximum number of buttons you can add depends on many factors, but you can always add at least three. All buttons over the limit simply won’t be shown, so it’s better to place the especially important ones at the beginning. You can find out the exact limit in the current configuration through the method `Activity`:

this.maxNumPictureInPictureActions

        Let’s add an end call button to our PIP window. To start with, just like with notifications, we need a `PendingIntent`, which will be responsible for telling our application that the button has been pressed. If this is the first time you’ve heard of `PendingIntent’ — you can learn more about them in our last article.

        After that, we can start creating the actual button description, namely `RemoteAction`.

val endCallPendingIntent = getPendingIntent()
val endCallAction = RemoteAction(
    // An icon for a button. The color will be ignored and changed to a system color
    Icon.createWithResource(this, R.drawable.ic_baseline_call_end_24),
    // Text of the button that won't be shown
    "End call",
    // ContentDescription для screen readers
    "End call button",
    // Our PendingIntent  that'll be launched upon pressing the button
    endCallPendingIntent
)

        Our “action” is ready, now we need to add it to the PIP parameters and, subsequently, to the mode transition call.

        Let’s start by creating a Builder for our customization parameters:

val pipParams = PictureInPictureParams.Builder()
    .setActions(listOf(endCallAction))
    .build()

this.enterPictureInPictureMode(pipParams)
multi-window mode
How to customize picture-in-picture mode?

In addition to the buttons, through the parameters, you can set the aspect ratio of the PIP features on Android or the animation of switching to this mode.

Other articles about calls on Android

WebRTC on Android

How to Make a Custom Call Notification on Android? With Code Examples

What Every Android App With Calls Should Have

How to Implement Audio Output Switching During the Call on Android App?

How to Implement Foreground Service and Deep Links for Android apps with calls? With Code Examples

    Conclusion

        We have considered a fairly simple but very handy variant of using the multi-window feature to improve the user experience, learned how to add buttons to the PIP window on Android, and adapt our interface when switching to and from this mode.

Categories
Uncategorized

How to Create Video Chat on Android? WebRTC Guide For Beginners

webrtc in android

Briefly about WebRTC

WebRTC is a video chat and conferencing development technology. It allows you to create a peer-to-peer connection between mobile devices and browsers to transmit media streams. You can find more details on how it works and its general principles in our article about WebRTC in plain language.

2 ways to implement video communication with WebRTC on Android

  • The easiest and fastest option is to use one of the many commercial projects, such as Twilio or LiveSwitch. They provide their own SDKs for various platforms and implement functionality out of the box, but they have drawbacks. They are paid and the functionality is limited: you can only do the features that they have, not any that you can think of.
  • Another option is to use one of the existing libraries. This approach requires more code but will save you money and give you more flexibility in functionality implementation. In this article, we will look at the second option and use https://webrtc.github.io/webrtc-org/native-code/android/ as our library.

Creating a connection

Creating a WebRTC connection consists of two steps: 

  1. Establishing a logical connection – devices must agree on the data format, codecs, etc.
  2. Establishing a physical connection – devices must know each other’s addresses

To begin with, note that at the initiation of a connection, to exchange data between devices, a signaling mechanism is used. The signaling mechanism can be any channel for transmitting data, such as sockets.

Suppose we want to establish a video connection between two devices. To do this we need to establish a logical connection between them.

A logical connection

A logical connection is established using Session Description Protocol (SDP), for this one peer:

Creates a PeerConnection object.

Forms an object on the SDP offer, which contains data about the upcoming session, and sends it to the interlocutor using a signaling mechanism. 

val peerConnectionFactory: PeerConnectionFactory
lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          ...
      }
  )!!
}

fun sendSdpOffer() {
  peerConnection.createOffer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpOffer(sdpOffer)
          }

          ...

      }, MediaConstraints()
  )
}

In turn, the other peer:

  1. Also creates a PeerConnection object.
  2. Using the signal mechanism, receives the SDP-offer poisoned by the first peer and stores it in itself 
  3. Forms an SDP-answer and sends it back, also using the signal mechanism
fun onSdpOfferReceive(sdpOffer: SessionDescription) {// Saving the received SDP-offer
  peerConnection.setRemoteDescription(sdpObserver, sdpOffer)
  sendSdpAnswer()
}

// FOrming and sending SDP-answer
fun sendSdpAnswer() {
  peerConnection.createAnswer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpAnswer(sdpOffer)
          }
           …
      }, MediaConstraints()
  )
}

The first peer, having received the SDP answer, keeps it

fun onSdpAnswerReceive(sdpAnswer: SessionDescription) {
  peerConnection.setRemoteDescription(sdpObserver, sdpAnswer)
  sendSdpAnswer()
}

After successful exchange of SessionDescription objects, the logical connection is considered established. 

Physical connection 

We now need to establish the physical connection between the devices, which is most often a non-trivial task. Typically, devices on the Internet do not have public addresses, since they are located behind routers and firewalls. To solve this problem WebRTC uses ICE (Interactive Connectivity Establishment) technology.

Stun and Turn servers are an important part of ICE. They serve one purpose – to establish connections between devices that do not have public addresses.

Stun server

A device makes a request to a Stun-server and receives its public address in response. Then, using a signaling mechanism, it sends it to the interlocutor. After the interlocutor does the same, the devices recognize each other’s network location and are ready to transmit data to each other.

Turn-server

In some cases, the router may have a “Symmetric NAT” limitation. This restriction won’t allow a direct connection between the devices. In this case, the Turn server is used. It serves as an intermediary and all data goes through it. Read more in Mozilla’s WebRTC documentation.

As we have seen, STUN and TURN servers play an important role in establishing a physical connection between devices. It is for this purpose that we when creating the PeerConnection object, pass a list with available ICE servers. 

To establish a physical connection, one peer generates ICE candidates – objects containing information about how a device can be found on the network and sends them via a signaling mechanism to the peer

lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          override fun onIceCandidate(iceCandidate: IceCandidate) {
              signaling.sendIceCandidate(iceCandidate)
          }           …
      }
  )!!
}

Then the second peer receives the ICE candidates of the first peer via a signaling mechanism and keeps them for itself. It also generates its own ICE-candidates and sends them back

fun onIceCandidateReceive(iceCandidate: IceCandidate) {
  peerConnection.addIceCandidate(iceCandidate)
}

Now that the peers have exchanged their addresses, you can start transmitting and receiving data.

Receiving data

The library, after establishing logical and physical connections with the interlocutor, calls the onAddTrack header and passes into it the MediaStream object containing VideoTrack and AudioTrack of the interlocutor

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

   val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

   peerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {

           override fun onIceCandidate(iceCandidate: IceCandidate) { … }

           override fun onAddTrack(
               rtpReceiver: RtpReceiver?,
               mediaStreams: Array<out MediaStream>
           ) {
               onTrackAdded(mediaStreams)
           }
           … 
       }
   )!!
}

Next, we must retrieve the VideoTrack from the MediaStream and display it on the screen. 

private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   val videoTrack: VideoTrack? = mediaStreams.mapNotNull {                                                            
       it.videoTracks.firstOrNull() 
   }.firstOrNull()

   displayVideoTrack(videoTrack)

   … 
}

To display VideoTrack, you need to pass it an object that implements the VideoSink interface. For this purpose, the library provides SurfaceViewRenderer class.

fun displayVideoTrack(videoTrack: VideoTrack?) {
   videoTrack?.addSink(binding.surfaceViewRenderer)
}

To get the sound of the interlocutor we don’t need to do anything extra – the library does everything for us. But still, if we want to fine-tune the sound, we can get an AudioTrack object and use it to change the audio settings

var audioTrack: AudioTrack? = null
private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   … 

   audioTrack = mediaStreams.mapNotNull { 
       it.audioTracks.firstOrNull() 
   }.firstOrNull()
}

For example, we could mute the interlocutor, like this:

fun muteAudioTrack() {
   audioTrack.setEnabled(false)
}

Sending data

Sending video and audio from your device also begins by creating a PeerConnection object and sending ICE candidates. But unlike creating an SDPOffer when receiving a video stream from the interlocutor, in this case, we must first create a MediaStream object, which includes AudioTrack and VideoTrack. 

To send our audio and video streams, we need to create a PeerConnection object, and then use a signaling mechanism to exchange IceCandidate and SDP packets. But instead of getting the media stream from the library, we must get the media stream from our device and pass it to the library so that it will pass it to our interlocutor.

fun createLocalConnection() {

   localPeerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {
            ...
       }
   )!!

   val localMediaStream = getLocalMediaStream()
   localPeerConnection.addStream(localMediaStream)

   localPeerConnection.createOffer(
       object : SdpObserver {
            ...
       }, MediaConstraints()
   )
}

Now we need to create a MediaStream object and pass the AudioTrack and VideoTrack objects into it

val context: Context
private fun getLocalMediaStream(): MediaStream? {
   val stream = peerConnectionFactory.createLocalMediaStream("user")

   val audioTrack = getLocalAudioTrack()
   stream.addTrack(audioTrack)

   val videoTrack = getLocalVideoTrack(context)
   stream.addTrack(videoTrack)

   return stream
}

Receive audio track:

private fun getLocalAudioTrack(): AudioTrack {
   val audioConstraints = MediaConstraints()
   val audioSource = peerConnectionFactory.createAudioSource(audioConstraints)
   return peerConnectionFactory.createAudioTrack("user_audio", audioSource)
}

Receiving VideoTrack is tiny bit more difficult. First, get a list of all cameras of the device.

lateinit var capturer: CameraVideoCapturer

private fun getLocalVideoTrack(context: Context): VideoTrack {
   val cameraEnumerator = Camera2Enumerator(context)
   val camera = cameraEnumerator.deviceNames.firstOrNull {
       cameraEnumerator.isFrontFacing(it)
   } ?: cameraEnumerator.deviceNames.first()
   
   ...

}

Next, create a CameraVideoCapturer object, which will capture the image

private fun getLocalVideoTrack(context: Context): VideoTrack {

   ...


   capturer = cameraEnumerator.createCapturer(camera, null)
   val surfaceTextureHelper = SurfaceTextureHelper.create(
       "CaptureThread",
       EglBase.create().eglBaseContext
   )
   val videoSource =
       peerConnectionFactory.createVideoSource(capturer.isScreencast ?: false)
   capturer.initialize(surfaceTextureHelper, context, videoSource.capturerObserver)

   ...

}

Now, after getting CameraVideoCapturer, start capturing the image and add it to the MediaStream

private fun getLocalMediaStream(): MediaStream? {
  ...

  val videoTrack = getLocalVideoTrack(context)
  stream.addTrack(videoTrack)

  return stream
}

private fun getLocalVideoTrack(context: Context): VideoTrack {
    ...

  capturer.startCapture(1024, 720, 30)

  return peerConnectionFactory.createVideoTrack("user0_video", videoSource)

}

After creating a MediaStream and adding it to the PeerConnection, the library forms an SDP offer, and the SDP packet exchange described above takes place through the signaling mechanism. When this process is complete, the interlocutor will begin to receive our video stream. Congratulations, at this point the connection is established.

Many to Many

We have considered a one-to-one connection. WebRTC also allows you to create many-to-many connections. In its simplest form, this is done in exactly the same way as a one-to-one connection. The difference is that the PeerConnection object, as well as the SDP packet and ICE-candidate exchange, is not done once but for each participant. This approach has disadvantages:

  • The device is heavily loaded because it needs to send the same data stream to each interlocutor
  • The implementation of additional features such as video recording, transcoding, etc. is difficult or even impossible

In this case, WebRTC can be used in conjunction with a media server that takes care of the above tasks. For the client-side the process is exactly the same as for direct connection to the interlocutors’ devices, but the media stream is not sent to all participants, but only to the media server. The media server retransmits it to the other participants.

Conclusion

We have considered the simplest way to create a WebRTC connection on Android. If after reading this you still don’t understand it, just go through all the steps again and try to implement them yourself – once you have grasped the key points, using this technology in practice will not be a problem. 

You can also refer to the following resources for a better understanding of WebRTC:

WebRTC documentation by Mozilla

Fora Soft article on WebRTC security