Categories
Uncategorized

How to Make Picture-in-Picture Mode on Android With Code Examples

picture-in-picture-android
This is how Picture-in-Picture mode looks like

 In recent years, smartphones have become increasingly close to computers in terms of functionality, and many are already replacing the PC as their primary tool for work. The advantage of personal computers was multi-window capability, which remained unavailable on smartphones. But with the release of Android 7.0, this began to change and multi-window support appeared.

            It’s hard to overestimate the convenience of a small floating window with the video of the interlocutor when the call is minimized – you can continue the dialogue and simultaneously take notes or clarify some information. Android has two options for implementing this functionality: support for the application in a floating window and a picture-in-picture mode. Ideally, an application should support both approaches, but the floating window is more difficult to develop and imposes certain restrictions on the overall application design, so let’s consider picture-in-picture (PiP) on Android as a relatively simple way to bring multi-window support into your application.

Switching to PIP mode

        Picture-in-picture mode is supported on most devices with Android 8 and above. Accordingly, if you support system versions lower than this, all PIP mode-related calls should be wrapped in the system version check:

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) { 
    // Что-то связанное с PIP 
}

   The entire `Activity` is converted to PIP, and first, you need to declare PIP support for this `Activity` in `AndroidManifest.xml`:

<activity
    ...
    android:supportsPictureInPicture="true" />

       Before using picture-in-picture it is necessary to make sure that the user’s device supports this mode, to do this we turn to the `PackageManager`.

val isPipSupported = context.packageManager.hasSystemFeature(PackageManager.FEATURE_PICTURE_IN_PICTURE)

After that, in its simplest form, the transition to picture-in-picture mode is done literally with one line:

this.enterPictureInPictureMode()

   But to go to it, you need to know when it is convenient for the user. You can make a separate button and jump when you click on it. The most common approach is an automatic switch when the user minimizes the application during a call. To track this event, there is a handy method `Activity.onUserLeaveHint` called whenever the user intentionally leaves `Activity` — whether via the Home or Recent button.

override fun onUserLeaveHint() {
    ...
    if (isPipSupported && imaginaryCallManager.isInCall)
        this.enterPictureInPictureMode()
}

Interface adaptation

        Great, now our call screen automatically goes into PIP mode on Android! But there are often “end call” or “change camera” buttons, and they will not work in this mode. It’s better to hide them when transitioning.

        To track the transition to / from PIP mode, `Activity` and `Fragment` have a method `onPictureInPictureModeChanged`. Let’s redefine it and hide unnecessary interface elements

override fun onPictureInPictureModeChanged(
    isInPictureInPictureMode: Boolean,
    newConfig: Configuration?
) {
    super.onPictureInPictureModeChanged(isInPictureInPictureMode, newConfig)
    setIsUiVisible(isInPictureInPictureMode)
}

   The PIP window is quite small, so it makes sense to hide everything except the interlocutor’s video, including the local user’s video — it will be too small to see anything there anyway.

Customization

        The PIP window can be further customized by passing `PictureInPictureParams` in a call to `enterPictureInPictureMode`. There are not many customization options, but the option to add buttons to the bottom of the window deserves special attention. This is a nice way to keep the screen interactive despite the fact that the regular buttons stop working in PIP mode.

        The maximum number of buttons you can add depends on many factors, but you can always add at least three. All buttons over the limit simply won’t be shown, so it’s better to place the especially important ones at the beginning. You can find out the exact limit in the current configuration through the method `Activity`:

this.maxNumPictureInPictureActions

        Let’s add an end call button to our PIP window. To start with, just like with notifications, we need a `PendingIntent`, which will be responsible for telling our application that the button has been pressed. If this is the first time you’ve heard of `PendingIntent’ — you can learn more about them in our last article.

        After that, we can start creating the actual button description, namely `RemoteAction`.

val endCallPendingIntent = getPendingIntent()
val endCallAction = RemoteAction(
    // Иконка для кнопки; цвет будет проигнорирован и заменен на системный
    Icon.createWithResource(this, R.drawable.ic_baseline_call_end_24),
    // Текст кнопки, который не будет показан
    "End call",
    // ContentDescription для screen readers
    "End call button",
    // Наш PendingIntent, который будет запущен при нажатии на кнопку
    endCallPendingIntent
)

        Our “action” is ready, now we need to add it to the PIP parameters and, subsequently, to the mode transition call

        Let’s start by creating a Builder for our customization parameters:

val pipParams = PictureInPictureParams.Builder()
    .setActions(listOf(endCallAction))
    .build()

this.enterPictureInPictureMode(pipParams)

       In addition to the buttons, through the parameters, you can set the aspect ratio of the PIP features on Android or the animation of switching to this mode.

Other articles about calls on Android

WebRTC on Android

How to Make a Custom Call Notification on Android? With Code Examples

What Every Android App With Calls Should Have

    Conclusion

        We have considered a fairly simple but very handy variant of using the multi-window feature to improve the user experience, learned how to add buttons to the PIP window on Android, and adapt our interface when switching to and from this mode.

Categories
Uncategorized

WebRTC in Android

webrtc in android

Briefly about WebRTC

WebRTC is a video chat and conferencing development technology. It allows you to create a peer-to-peer connection between mobile devices and browsers to transmit media streams. You can find more details on how it works and its general principles in our article about WebRTC in plain language.

2 ways to implement video communication with WebRTC on Android

  • The easiest and fastest option is to use one of the many commercial projects, such as Twilio or LiveSwitch. They provide their own SDKs for various platforms and implement functionality out of the box, but they have drawbacks. They are paid and the functionality is limited: you can only do the features that they have, not any that you can think of.
  • Another option is to use one of the existing libraries. This approach requires more code but will save you money and give you more flexibility in functionality implementation. In this article, we will look at the second option and use https://webrtc.github.io/webrtc-org/native-code/android/ as our library.

Creating a connection

Creating a WebRTC connection consists of two steps: 

  1. Establishing a logical connection – devices must agree on the data format, codecs, etc.
  2. Establishing a physical connection – devices must know each other’s addresses

To begin with, note that at the initiation of a connection, to exchange data between devices, a signaling mechanism is used. The signaling mechanism can be any channel for transmitting data, such as sockets.

Suppose we want to establish a video connection between two devices. To do this we need to establish a logical connection between them.

A logical connection

A logical connection is established using Session Description Protocol (SDP), for this one peer:

Creates a PeerConnection object.

Forms an object on the SDP offer, which contains data about the upcoming session, and sends it to the interlocutor using a signaling mechanism. 

val peerConnectionFactory: PeerConnectionFactory
lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          ...
      }
  )!!
}

fun sendSdpOffer() {
  peerConnection.createOffer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpOffer(sdpOffer)
          }

          ...

      }, MediaConstraints()
  )
}

In turn, the other peer:

  1. Also creates a PeerConnection object.
  2. Using the signal mechanism, receives the SDP-offer poisoned by the first peer and stores it in itself 
  3. Forms an SDP-answer and sends it back, also using the signal mechanism
fun onSdpOfferReceive(sdpOffer: SessionDescription) {// Saving the received SDP-offer
  peerConnection.setRemoteDescription(sdpObserver, sdpOffer)
  sendSdpAnswer()
}

// FOrming and sending SDP-answer
fun sendSdpAnswer() {
  peerConnection.createAnswer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpAnswer(sdpOffer)
          }
           …
      }, MediaConstraints()
  )
}

The first peer, having received the SDP answer, keeps it

fun onSdpAnswerReceive(sdpAnswer: SessionDescription) {
  peerConnection.setRemoteDescription(sdpObserver, sdpAnswer)
  sendSdpAnswer()
}

After successful exchange of SessionDescription objects, the logical connection is considered established. 

Physical connection 

We now need to establish the physical connection between the devices, which is most often a non-trivial task. Typically, devices on the Internet do not have public addresses, since they are located behind routers and firewalls. To solve this problem WebRTC uses ICE (Interactive Connectivity Establishment) technology.

Stun and Turn servers are an important part of ICE. They serve one purpose – to establish connections between devices that do not have public addresses.

Stun server

A device makes a request to a Stun-server and receives its public address in response. Then, using a signaling mechanism, it sends it to the interlocutor. After the interlocutor does the same, the devices recognize each other’s network location and are ready to transmit data to each other.

Turn-server

In some cases, the router may have a “Symmetric NAT” limitation. This restriction won’t allow a direct connection between the devices. In this case, the Turn server is used. It serves as an intermediary and all data goes through it. Read more in Mozilla’s WebRTC documentation.

As we have seen, STUN and TURN servers play an important role in establishing a physical connection between devices. It is for this purpose that we when creating the PeerConnection object, pass a list with available ICE servers. 

To establish a physical connection, one peer generates ICE candidates – objects containing information about how a device can be found on the network and sends them via a signaling mechanism to the peer

lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          override fun onIceCandidate(iceCandidate: IceCandidate) {
              signaling.sendIceCandidate(iceCandidate)
          }           …
      }
  )!!
}

Then the second peer receives the ICE candidates of the first peer via a signaling mechanism and keeps them for itself. It also generates its own ICE-candidates and sends them back

fun onIceCandidateReceive(iceCandidate: IceCandidate) {
  peerConnection.addIceCandidate(iceCandidate)
}

Now that the peers have exchanged their addresses, you can start transmitting and receiving data.

Receiving data

The library, after establishing logical and physical connections with the interlocutor, calls the onAddTrack header and passes into it the MediaStream object containing VideoTrack and AudioTrack of the interlocutor

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

   val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

   peerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {

           override fun onIceCandidate(iceCandidate: IceCandidate) { … }

           override fun onAddTrack(
               rtpReceiver: RtpReceiver?,
               mediaStreams: Array<out MediaStream>
           ) {
               onTrackAdded(mediaStreams)
           }
           … 
       }
   )!!
}

Next, we must retrieve the VideoTrack from the MediaStream and display it on the screen. 

private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   val videoTrack: VideoTrack? = mediaStreams.mapNotNull {                                                            
       it.videoTracks.firstOrNull() 
   }.firstOrNull()

   displayVideoTrack(videoTrack)

   … 
}

To display VideoTrack, you need to pass it an object that implements the VideoSink interface. For this purpose, the library provides SurfaceViewRenderer class.

fun displayVideoTrack(videoTrack: VideoTrack?) {
   videoTrack?.addSink(binding.surfaceViewRenderer)
}

To get the sound of the interlocutor we don’t need to do anything extra – the library does everything for us. But still, if we want to fine-tune the sound, we can get an AudioTrack object and use it to change the

var audioTrack: AudioTrack? = null
private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   … 

   audioTrack = mediaStreams.mapNotNull { 
       it.audioTracks.firstOrNull() 
   }.firstOrNull()
}

For example, we could mute the interlocutor like this:

fun muteAudioTrack() {
   audioTrack.setEnabled(false)
}

Sending data

Sending video and audio from your device also begins by creating a PeerConnection object and sending ICE candidates. But unlike creating an SDPOffer when receiving a video stream from the interlocutor, in this case, we must first create a MediaStream object, which includes AudioTrack and VideoTrack. 

To send our audio and video streams, we need to create a PeerConnection object, and then use a signaling mechanism to exchange IceCandidate and SDP packets. But instead of getting the media stream from the library, we must get the media stream from our device and pass it to the library so that it will pass it to our interlocutor.

fun createLocalConnection() {

   localPeerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {
            ...
       }
   )!!

   val localMediaStream = getLocalMediaStream()
   localPeerConnection.addStream(localMediaStream)

   localPeerConnection.createOffer(
       object : SdpObserver {
            ...
       }, MediaConstraints()
   )
}

Now we need to create a MediaStream object and pass the AudioTrack and VideoTrack objects into it

val context: Context
private fun getLocalMediaStream(): MediaStream? {
   val stream = peerConnectionFactory.createLocalMediaStream("user")

   val audioTrack = getLocalAudioTrack()
   stream.addTrack(audioTrack)

   val videoTrack = getLocalVideoTrack(context)
   stream.addTrack(videoTrack)

   return stream
}

Receive audio track:

private fun getLocalAudioTrack(): AudioTrack {
   val audioConstraints = MediaConstraints()
   val audioSource = peerConnectionFactory.createAudioSource(audioConstraints)
   return peerConnectionFactory.createAudioTrack("user_audio", audioSource)
}

Receiving VideoTrack is tiny bit more difficult. First, get a list of all cameras of the device.

lateinit var capturer: CameraVideoCapturer

private fun getLocalVideoTrack(context: Context): VideoTrack {
   val cameraEnumerator = Camera2Enumerator(context)
   val camera = cameraEnumerator.deviceNames.firstOrNull {
       cameraEnumerator.isFrontFacing(it)
   } ?: cameraEnumerator.deviceNames.first()
   
   ...

}

Next, create a CameraVideoCapturer object, which will capture the image

private fun getLocalVideoTrack(context: Context): VideoTrack {

   ...


   capturer = cameraEnumerator.createCapturer(camera, null)
   val surfaceTextureHelper = SurfaceTextureHelper.create(
       "CaptureThread",
       EglBase.create().eglBaseContext
   )
   val videoSource =
       peerConnectionFactory.createVideoSource(capturer.isScreencast ?: false)
   capturer.initialize(surfaceTextureHelper, context, videoSource.capturerObserver)

   ...

}

Now, after getting CameraVideoCapturer, start capturing the image and add it to the MediaStream

private fun getLocalMediaStream(): MediaStream? {
  ...

  val videoTrack = getLocalVideoTrack(context)
  stream.addTrack(videoTrack)

  return stream
}

private fun getLocalVideoTrack(context: Context): VideoTrack {
    ...

  capturer.startCapture(1024, 720, 30)

  return peerConnectionFactory.createVideoTrack("user0_video", videoSource)

}

After creating a MediaStream and adding it to the PeerConnection, the library forms an SDP offer, and the SDP packet exchange described above takes place through the signaling mechanism. When this process is complete, the interlocutor will begin to receive our video stream. Congratulations, at this point the connection is established.

Many to Many

We have considered a one-to-one connection. WebRTC also allows you to create many-to-many connections. In its simplest form, this is done in exactly the same way as a one-to-one connection. The difference is that the PeerConnection object, as well as the SDP packet and ICE-candidate exchange, is not done once but for each participant. This approach has disadvantages:

  • The device is heavily loaded because it needs to send the same data stream to each interlocutor
  • The implementation of additional features such as video recording, transcoding, etc. is difficult or even impossible

In this case, WebRTC can be used in conjunction with a media server that takes care of the above tasks. For the client-side the process is exactly the same as for direct connection to the interlocutors’ devices, but the media stream is not sent to all participants, but only to the media server. The media server retransmits it to the other participants.

Conclusion

We have considered the simplest way to create a WebRTC connection on Android. If after reading this you still don’t understand it, just go through all the steps again and try to implement them yourself – once you have grasped the key points, using this technology in practice will not be a problem. 

You can also refer to the following resources for a better understanding of WebRTC:

WebRTC documentation by Mozilla

Fora Soft article on WebRTC in simple terms

Fora Soft article on WebRTC security

Categories
Uncategorized

Software Development for In-Sync Music Jamming Online by Video Chat

jam-online
music collaboration software with video chat

Musicians can survive the pandemic with WorldCastLive.com. The band connects in a video call, invites the fans to watch, and they jam online remotely at a live concert. 100% in sync, with less than a second delay.

Features

Use cases

Devices

How much?

Features for virtual music jam by a video chat

🎶Audio quality

Synchronization

Why not make music online with friends and strangers in any video chat? I take the guitar, call Joe with drums, add Sarah with a piano, and we all play. Because it doesn’t work: participants’ sound is not in perfect sync. It’s ok for a talk but not for a real time video music collaboration app.

We develop video conferences for musicians to play together, learn and teach, and hold concerts.

Sync for listeners

Each musician produces an audio track. Our software marks them, recognizes delays for each one of them, and syncs them into one audio file on the server. It’s streamed to the audience.

But if this happens afterward on the server, how can the musicians perform together? They have to hear each other in sync right now to play together.

Sync for the musicians

We calibrate audiotracks: start sound at one node, and listen to it when it reaches the other node – thus measure the delay. Let’s imagine we have a drummer, a guitarist, and a singer. The drummer starts, his audiotrack goes to the guitarist. The guitarist starts, the 2 audio tracks go to the singer in sync: the delay is added to the drummer track with which it goes to the guitarist. It snowballs from there: each next musician hears the previous musicians only. The singer hears the guitarist and the drummer, the guitarist hears the drummer only, the drummer does not hear anyone.

Clear sound: right audio codec with right settings

clear-sound-music-video-chat
Audio quality in music software

In WebRTC the developer picks an audio codec. Some are better for voice, some for music. Choose Opus: the best sound quality plus low latency – Mozilla thinks so too.

To pick Opus is not everything. By default, WebRTC is set for voice calls so that the voice would sound clearer and louder. So, for music jam online in real time we need to make 3 adjustments:

  • Background noise removal distorts sound when playing music. We switch it off.
  • ~40 kb/s is a standard bitrate for a voice call. Music needs 128 kb/s minimum. Opus supports up to 510 kb/s – so we increase it.
  • We increase the number of audio channels from 1 to 2: from mono to stereo.

🚀 Real-time streaming

Jam online with no latency

Subsecond latency is a norm in video chats – otherwise speaking to each other would not be possible. In video broadcasts to thousands of people, a few-second latency is a norm. When jam online with other musicians remote by a video chat, latency must be subsecond even though thousands are watching. Read how we do it in the article.

Monitoring to prevent latency: Internet connection, sound card, audio output

jam-online-no-lantency
 online rehearsal with no lantency

See the sound quality of each musician in real-time: green for good, yellow for decent, and red for unacceptable. Quality parameters: Internet connection, sound card latency, audio output.

For example, if one participant has slow Internet, the video conference won’t be real-time and low latency music collaboration is not possible. So his Internet shows red, and the slow user knows that he needs to fix the problem.

🎸 Connect professional musical equipment

music-equipment-settings
instruments settings on music software

Output: sound cards and audio interfaces

For sound output, connect a sound card or an audio interface for professional sound. Show a volume bar for that output that displays volume in real-time.

Input: professional microphones, musical instruments, and amplifiers

For sound input, connect a guitar or other electronic instruments directly to the video conference. Or connect the instrument to an amplifier to increase the power of a signal and plug the amplifier in the conference. See the input’s signal level change in real-time.

🥁 Professional tools for musicians

music-software-set-up
musician personal sound settings

Crossfader

Set different volumes for different audio channels. Make your own instrument louder than the call, set it to the same volume, or listen to the call louder than your instrument. Mute audio channels.

See the volume set for each instrument in the music jam over the Internet. The range can be from (-12)dB up to (+12)dB with a step of 1 dB.

Equalizer

A list of ranges with sliders to raise and lower the volume of the frequency range between 32 Hz, 64 Hz, 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz,16 kHz. Change how everything sounds: sound volume, noises, the effect of moving closer and farther. Remember the settings and apply them to other calls.

Metronome

Add a metronome and choose BPM for better music collaboration.

💬 Communication tools

Talkback

The band coordinator gives feedback to musicians during the live concert. Speak privately to one musician to not disturb the others. Push to talk, release the button to mute yourself.

Text chat

Let the audience talk without interrupting the performance. Send text messages, emojis, images, even documents. See the participant list.

Recording

Record the concert and let those who haven’t seen it live watch it. Record lessons to re-watch.

Use cases when a video conference with music in sync comes in handy

  • A platform for online virtual concert live
  • Online band or choir rehearsal with remote musicians from different locations
  • E-learning for music
  • Virtual karaoke party online

Devices that Fora Soft develops for

music-app-development
music apps and software development
  • Web browsers – use without download
  • Smartphones and tablets – iOS and Android
  • Desktop PCs and laptops
  • Smart TVs – Samsung, LG, Android-based STBs, Apple TV
  • Virtual reality (VR) headsets

💵 How much development of a conference with synchronized music costs

We develop custom applications tailored to your needs. That is why firstly we plan it, draw a wireframe, then estimate. To give approximate indications:

The simplest 1-on-1 video chat component adjusted for music 

  • 2-4 weeks 
  • About $7,000
  • Could be useful for teaching music lessons

It is not a fully functioning system with login, payment, etc. – just the video chat component. You can integrate it into your application. 

The simplest video conference component for musicians from different places to perform for audiences of thousands of people

  • 1,5-2,5 months 
  • around $28,000

Not a fully functional system with registration, payments, etc. – just the video conference component. You can integrate it into your solution.

Simplest fully functional e-learning system with 1-on-1 video chat adjusted for music 

  • 3-4 months 
  • About $36,000

A fully functioning system with registration, teacher list, payment. Applicable for 1 platform, e.g. web, or iOS, or Android.

The simplest fully functional video conference system with music in sync

  • 4-5 months 
  • around $54,000

It is built from the ground up for one platform, such as web, iOS, or Android. Users register, pay, and play music together for audiences of thousands of people.

Big musical video conferencing solutions 

We assign a dedicated team and work ongoing. These are products that proved their success and generated profit.

Send us your requirements – we’ll get back with an approximate estimation. Or let’s have a call to clarify what you need.

Categories
Uncategorized

WebRTC Security in Plain Language for Business People

webrtc-security

Let’s say you are a businessman and you want to develop a video conference or add a video chat to your program. How do you know what the developer has done is safe? What kind of protection can you promise your users? There are a lot of articles, but they are technical – it’s hard to figure out the specifics of security. Let’s explain in simple words.

WebRTC security measures consist of 3 parts: those offered by WebRTC, those provided by the browser, and those programmed by the developer. Let’s discuss the measures of each kind, how they are circumvented – WebRTC security vulnerabilities, and how to protect from them.

What is WebRTC?

WebRTC – Web Real-Time Communications – is an open standard that describes the transmission of streaming audio, video, and content between browsers or other supporting applications in real-time.

WebRTC is an open-source project, so anyone can do WebRTC code security testing, like here.

WebRTC works on all Internet-connected devices:

  • in all major browsers
  • in applications for mobile devices – e.g. iOS, Android
  • on desktop applications for computers – e.g., Windows and Mac
  • on smartwatches
  • on smart TV
  • on virtual reality helmets
WebRTC-supported devices

To make WebRTC work on these different devices, the WebRTC library was created.

What kind of security does WebRTC offer?

Data encryption other than audio and video: DTLS

The WebRTC library incorporates the DTLS protocol. DTLS stands for Datagram Transport Layer Security. It encrypts data in transit, including keys for transmitting encrypted audio and video. Here you can find the official DTLS documentation from the IETF – Internet Engineering Task Force.

DTLS does not need to be enabled or configured beforehand because it is built in. The video application developer doesn’t need to do anything – DTLS in WebRTC works by default.

DTLS is an extension to the Transport Layer Security (TLS) protocol, which provides asymmetric encryption. Let’s take the example of a paper letter and parcel to understand what symmetric and asymmetric encryptions are.

We exchange letters. A postal worker can open a normal letter, it can be stolen and read. We wanted nobody to be able to read the letters but us. You came up with a way to encrypt them, like swapping letters in the text. In order for me to decipher your letters, you will have to describe how to decipher your cipher and send it to me. This is symmetric encryption: both you and I encrypt the letters and we both have the decryption algorithm – the key.

The weakness of symmetric encryption is in the transmission of the key. It can also be read by the letter carrier or this very letter with the key can be stolen.

The invention of asymmetric encryption was a major mathematical breakthrough. It uses one key to encrypt and another key to decrypt. It is impossible to know the decryption key without having the encryption key. That’s why an encryption key is called a public key – you can safely give it to anyone, it can only encrypt a message. The decryption key is called a private key – and it’s not shared with anyone.

Instead of encrypting the letter and sending me the key, you send me an open lock and keep the key. I write you a letter, put it in a box, put my open lock in the same box, and latch your lock on the box. I send it to you, and you open the box with your key, which has not passed to anyone else.

In symmetric encryption, keys are now disposable. For example, we made a call – the keys were created specifically for the call and deleted as soon as we hung up. Therefore, asymmetric and symmetric encryption are equally secure once the connection is established and keys are exchanged. The weakness of symmetric encryption is only that the decryption key has to be transferred.

But asymmetric encryption is much slower than symmetric encryption. The mathematical algorithms are more complicated, requiring more steps. That’s why asymmetric encryption is used in DTLS only to securely exchange symmetric keys. The data itself is encrypted with symmetric encryption. 

What data DTLS encrypts in WebRTC: all except video and audio

How to bypass DTLS?

Cracking the DTLS cipher is a complex mathematical problem. It’s not considered to be done in a reasonable time without a supercomputer – and probably not with one either. It’s more profitable for hackers to look for other WebRTC security vulnerabilities. 

The only way to bypass DTLS is to steal the private key: steal your laptop or pick the password to the server. 

In the case of video calls through a media server, the server is a separate computer that stores its private key. If you access it, you can eavesdrop and spy on the call. 

It is also possible to access your computer. For example, you have gone out to lunch and left your computer on in your office. An intruder enters your office and downloads a file on your computer that will give him your private key. 

But first of all, it’s like stealing gas: to steal gas, you have to be sitting at the gas line. The intruder has to have access to the wires that transmit the information from you – or be on the same Wi-Fi network: sitting in the same office, for instance. But why go through all that trouble: you can simply upload a file to your computer that will write screen and sound and send it to the intruder. You may download such a malicious file from the Internet by accident yourself if you download unverified programs from unverified sites.

Second, this is not hacking DTLS encryption, but hacking your computer.

How to protect yourself from a DTLS vulnerability?

  • Don’t leave your computer turned on without your password.
  • Keep your computer’s password safe. If you are the owner of a video program, keep the password from the server where it is installed safely. Change your password on a regular basis. Don’t use the password that you use elsewhere.
  • Don’t download untested programs.
  • Don’t download anything from unverified sites.

Audio and video encryption: SRTP

DTLS encrypts everything but the video and audio. DTLS is secure but because of this, it’s slow. Video and audio are “heavy” types of data. Therefore, DTLS is not used for real-time video and audio encryption – it would be laggy. They are encrypted by SRTP – Secure Real-time Transport Protocol, which is faster but therefore less secure. The official SRTP documentation from the Internet Engineering Board.

What data SRTP encrypts in WebRTC: video and audio

How to bypass SRTP?

2 SRTP security vulnerabilities:

  1. Packet headers are not encrypted

    SRTP encrypts the contents of RTP packets, but not the header. Anyone who sees SRTP packets will be able to tell if the user is currently speaking. The speech itself is not revealed, but it can still be used against the speaker. For example, law enforcement officials would be able to figure out if the user was communicating with a criminal.
  1. Cipher keys can be intercepted

    Suppose users A and B are exchanging video and audio. They want to make sure that no one is eavesdropping. To do this, the video and audio must be encrypted. Then, if they are intercepted, the intruder will not understand anything. User A encrypts his video and audio. Now no one can understand them, not even B. A needs to give B the key so that B can decrypt the video and audio in his place. But the key can also be intercepted – that’s the vulnerability of SRTP.

How to defend against SRTP attacks?

  1. Packet headers are not encrypted

    There is a proposed standard on how to encrypt packet headers in SRTP. As of October 2021, this solution is not yet included in SRTP; its status is that of a proposed standard. When it’s included in SRTP, its status will change to “approved standard”. You can check the status here, under the Status heading.
  1. Cipher keys can be intercepted

    There are 2 methods of key exchange:
    1) via SDES – Session Description Protocol Security Descriptions
    2) via DTLS encryption

1) SDES doesn’t support end-to-end encryption. That is, if there is an intermediary between A and B, such as a proxy, you have to give the key to the proxy. The proxy will receive the video and audio, decrypt them, encrypt them back – and pass them to B. Transmission through SDES is not secure: it is possible to intercept decrypted video and audio from the intermediary at the moment when they are decrypted, but not yet encrypted back.

2) The key is no longer “heavy” video or audio. It can be encrypted with reliable DTLS – it can handle key encryption quickly, no lags. This method is called DTLS-SRTP hybrid. Use this method instead of SDES to protect yourself.

IP Address Protection – IP Location Privacy

The IP address is the address of a computer on the Internet.

How IP address looks like

What is the danger if an intruder finds out your IP address?

Think of IP as your home address. The thief can steal your passport, find out where you live, and come to break into your front door.

Once they know your IP, a hacker can start looking for vulnerabilities in your computer. For example, run a port check and find out what programs you have installed. 

For example, it’s a messenger. And there’s information online that this messenger has a vulnerability that can be used to log onto your computer. A hacker can use it as in the case above: when you downloaded an unverified program and it started recording your screen and sound and sending them to the hacker. Only in this case, you didn’t install anything yourself, you were careful. But the hacker downloaded this program to your computer through a messenger vulnerability. Messenger is just an example. Any program with a vulnerability on your computer can be used.

The other danger is that a hacker can use your IP address to determine where you are physically. This is how they stall in movies when negotiations with a terrorist happen to get a fix on their location.

How do I protect my IP address from intruders?

It’s impossible to be completely protected from this. But there are two ways to reduce the risks:

  • Postpone the IP address exchange until the user picks up the phone. So, if you do not take the call, the other party will not know your address. But if you do pick up, they will. This is done by suppressing JavaScript conversations with ICE until the user picks up the phone.

    ICE – Internet Connectivity Establishment: It describes the protocols and routes needed for WebRTC to communicate with the remote device. Read more about ICE in our article WebRTC in plain language.

    The downside:
    Remember, social networks and Skype show you who’s online and who’s not? You can’t do that.
  • Don’t use p2p communication, but use an intermediary server. In this case, the interlocutor will only know the IP address of the intermediary, not yours.

    The disadvantage:
    All traffic will go through the intermediary. This creates other security problems like the one above about SDES.

    If the intermediary is a media server and it’s installed on your server, it’s as secure as your server because it’s under your control. For measures to protect your server, see the SOP section below.

What security methods do browsers offer?

These methods are only for web applications running in a browser. For example, this doesn’t apply to mobile applications on WebRTC.

SOP – Same Origin Policy

When you open a website, the scripts needed to run that site are downloaded to your computer. A script is a program that runs inside the browser. Each script is downloaded from somewhere – the server where it is physically stored. This is its origin. One site may have scripts from different origins. SOP means that scripts downloaded from different origins do not have access to each other.

For example, you have a video chat site. It has your scripts – they are stored on your server. And there are third-party scripts – for example, a script to check if the contact form is filled out correctly. Your developer used it so he didn’t have to write it from scratch himself. You have no control over the third-party script. Someone could hack it: gain access to the server where it is stored and make that script, for example, request access to the camera and microphone of users on all sites where it is used. Third-party scripting attacks are called XSS – cross-site scripting.

If there were no SOP, the third-party script would simply gain access to your users’ cameras and microphones. Their conversations could be viewed and listened to or recorded by an intruder.

But the SOP is there. The third-party script isn’t on your server – it’s at another origin. Therefore, it doesn’t have access to the data on your server. It can’t access your user’s camera and microphone. 

But it can show the user a request to give him access to the camera and the microphone. The user will see the “Grant access to camera and microphone?” sign again, even though he has already granted access. This will look strange, but the user may give access thinking that he’s giving access to your site. Then the attacker would still be able to watch and listen to his conversations. The protection of the SOP is that without the SOP, access would not be requested again.

Access to the camera and microphone is just the most obvious example. The same goes for screen sharing, for example.

It’s even worse with text chat. If there were no SOP, it would be possible to send this malicious script to the chat room. Scripts aren’t displayed in chat: the user would see a blank message. But the script would be executed – and the attacker could watch and listen to his conversations and record them. With SOP the script will not run – because it is not on your server, but in another origin.

How to bypass SOP and how to protect yourself

3 SOP vulnerabilities: errors in CORS, connects via WebSocket, and Server hacking
  1. Errors in CORS – Cross-Origin Resource Sharing

    Complex web applications cannot work comfortably in an SOP environment. Even components of the same website can be stored on different servers – in different origins. Asking the user for permission every time would be annoying.

    This is why developers are given the ability to add exceptions to the SOP – Cross-Origin Resource Sharing (CORS). The developer must list the origins-exceptions separated by a comma, or put “*” to allow all.
    During the development process, there are often different versions of the site: the production version – available to real users, pre-production – available to the site owner for the final testing before posting to production, test – for testing by testers, the developer’s version. URLs of all versions are different. The programmer has to change the URL of exceptions from the SOP each time he transfers the version to another version. There is a temptation to put “*” to speed up. He can forget to replace the “*” in the list of exceptions in the production version, and then the SOP for your site will not work. It will become vulnerable to any third-party scripts.

    How to protect against errors in CORS

    To the developer – check for vulnerabilities from XSS: write exceptions from SOP, instead of “disabling” it by typing “*”.

    To the user – revoke camera and microphone accesses that are no longer needed. The browser stores a list of permissions: to revoke, you must uncheck the box.
  1. Replacing the server your server connects to via WebSocket

    What is WebSocket?

    Remember the CORS, the SOP exception that you have to set manually? There is another exception that is always in effect by default. This is WebSocket.

    Why such an insecure technology, you ask? For real-time communication. The request technology that SOP covers doesn’t allow for real-time communication, because it’s one-way.

    Imagine you’re driving in a car with a child in the back seat. You are server-side, the child is the client-side. The child asks you periodically: are we there? You answer “no.” In inquiry technology, when you finally arrive, you will not be able to say “we have arrived” to the child yourself. You have to wait for the child to ask. WebSocket allows you to say “arrived” yourself without having to wait for the question.

    Examples from the field of programming: video and text chats. If WebSocket didn’t exist, the client side would have to periodically ask, “do I have incoming calls?”, “do I have messages?” Even if you ask once every 5 seconds, it’s already a delay. You can ask more often – once a second, for example. But then the load on the server increases, the server must be significantly more powerful, that is, more expensive. This is inefficient and this is why WebSocket was invented.

    What is the vulnerability of WebSocket

    WebSocket is a direct connection to the server. But which one? Well, normally yours. But what if the intruder replaces your server address with his own? Yes, his server address would not be at your origin. But the connection is through WebSocket, so the SOP won’t check it and won’t protect it.

    What can happen because of this substitution? On the client-side, your text or video chat will receive a new message or an incoming call. It will appear to be one person writing or calling, but in fact, it will be an intruder. You may receive a message from your boss, such as “urgently send… my Gmail account password, the monthly earnings report” – whatever. You might get a call from an intruder pretending to be your boss, asking you to do something. If the voices are similar, you won’t even think that it might not be him – because the call is displayed as if it was from him.

    How this can be done is a creative question. You have to look for vulnerabilities in the site. An example is XSS. You have a site with a video chat and a contact form, the messages from which are displayed in the admin panel of the site. A hacker sends the “replace the server address with this one” script to the contact form. The script appears in the admin panel along with all the messages from the contact form. Now it’s “inside” your site – it has the same source. SOP will not stop it. The script is executed, the server address is changed to this one.

    How to protect against spoofing the server that your server connects to via WebSocket
  • Filter any data from users to scripts

    If the developer programmed not to accept scripts from users – the message from the contact form in the example above would not be accepted, and an intruder would not be able to spoof your server into his own on a WebSocket connection this way. You should always filter user messages for scripts, this will protect against server spoofing in WebSocket as well as many other problems.
  • Program a check that the connection through WebSocket is made to the correct origin

    For example, generate a unique codeword for each WebSocket connection. This codeword is not sent over the WebSocket, which means the SOP works. If a request for a codeword is sent to a third-party source, SOP will not allow it to be sent – because the third-party server is of a different origin.
  • Code obfuscation

    To obfuscate code is to make it incomprehensible while keeping it working. Programmers write code clearly – at least they should 🙂 So that if another developer adopts the code, he can make out in this code which part does what and work with this code. For example, programmers clearly name variables. The server address which is to be connected to via WebSocket is also a variable and will be named clearly, e.g. “server address for WebSocket connection”. After running the code through obfuscation, this variable will be called, for example, “C”. An outside intruder programmer will not understand which variable is responsible for what.

    The mechanism of codeword generation is stored in the code. Cracking it is an extra effort, but it is possible. If you make the code unreadable, the intruder won’t be able to find this mechanism in the code.
  1. Server hacking

    If your server gets hacked, a malicious third-party script can be “put” on your server. The SOP will not help: Your server is now the source of this script. This script will be able to take advantage of the camera and microphone access that the user has already given to your site. The script still won’t be able to send the recording to a third-party server, but it doesn’t need to. The attacker has access to your server: he can simply take the recording from there.

    How a server can be hacked is not among WebRTC security issues, so it’s beyond the scope of this article. For example, an attacker could simply steal your server username and password.

How to protect yourself from the server hack

The most obvious thing is to protect the username-password. 

If your server is hacked, you can’t protect yourself from the consequences. But there are ways to make life difficult for the attacker.

  1. Store all user content in encrypted form on the server. For example, records of video conferences. The server itself should be able to decrypt them. So, the server stores the decryption method. If the server is hacked, the attacker can find it. But that’s going to take time. He won’t be able to just swing by the server, copy the conversations and leave. The time he will have to spend on the compromised server will increase. This gives the server owner time to take some measures, such as finding the active session of the connected intruder and disabling him as an administrator and changing the server password.
  1. Ideally, do not store user content on the server. For example, allow recording conferences, but don’t save them on the server, let the user download the file. Once the file is downloaded – only the user has it, it’s not on the server.
  1. Give the user more options to protect himself – develop notifications in the interface of your program. We don’t recommend this method for everyone, because it’s inconvenient for the user. But if you are developing video calls for a bank or a medical institution, security is more important than convenience:
  1. Ask for access to the camera and microphone before each call.

    If your site gets hacked and they want to call someone on behalf of the user without their permission, the user will get a notification: “Do you want the camera and microphone access for the call?” He didn’t initiate that call, so it’s likely to keep the user safe: he’ll click “no.” It’s safe, but it’s inconvenient. What percentage of users will go to a competitor instead of clicking “allow” before every call?
  1. Ask for access to the camera and microphone to call specific users.

    Calling a user for the first time? See a notification saying “Allow camera and microphone access for calls to …Chris Baker (for example)?”. It’s less inconvenient for the user if they call the same people often. But it still loses in convenience to programs that ask for access only once.

Use a known browser from a trusted source

What is it?

The program you use to visit websites. Video conferencing works in the browser. When you use it, you assume the browser is secure.

How do attackers use the browser?

By injecting malicious code that does what the hacker wants.

How to protect yourself?

  • Don’t download browsers from untrusted sources.
    Here’s a list of official sites for the most popular browsers:
  • Don’t use unknown browsers
    Just like with the links. If a browser looks suspicious, don’t download it.
    You can give a list of safe browsers to the users of your web application. Although, if they are on your site, it means that they already use some browser… 🙂

What security measures should the developer think about?

WebRTC was built with security in mind. But not everything depends on WebRTC because it’s only a part of your program that is responsible for the calls. If someone steals the user’s password, WebRTC won’t protect it, no matter how secure the technology is. Let’s break down how to make your application more secure.

Signaling Layer

The Signaling Layer is responsible for exchanging the data needed to establish a connection. How connection establishment works, the developer writes – it happens before WebRTC and all its encryption comes into play. Simply put: When you’re sitting on a video call site and a pop-up pops up, “Call for you, accept/reject?” Before you hit “accept” it’s a signal layer, establishing a connection.

How can attackers use the signaling layer and how can they protect themselves?

There are many possibilities to do this. Let’s look at the 3 primary ones: Man-in-the-Middle attack, Replay attack, Session hijacking.

Attack on signalling layer: 2 people in process of establishing a connection, the intruder connects in the middle
  • MitM (Man-in-the-Middle) attack

In the context of WebRTC, this is the interception of traffic before the connection is established – before the DTLS and SRTP encryption described above comes into effect. An attacker sits between the callers. He can eavesdrop and spy on conversations or, for example, send a pornographic picture to your conference – this is called zoombombing.

This can be any intruder connected to the same Wi-fi or wired network as you – he can watch and listen to all the traffic going on your Wi-fi network or on your wire.

How to protect yourself?

Use HTTPS instead of HTTP. HTTPS supports SSL/TLS encryption throughout the session. Man-in-the-middle will still be able to intercept your traffic. But the traffic will be encrypted and he won’t understand it. He can save it and try to decrypt it, but he won’t understand it right away.

SSL – Security Sockets Layer – is the predecessor to TLS. It turns HTTP into HTTPS, securing the site. Users used to go to HTTP and HTTPS sites without seeing the difference. Now HTTPS is a mandatory standard: developers have to protect their sites with SSL certificates. Otherwise, the browsers won’t let the user go to the site: they’ll show that dreaded “your connection is not secured” message – and only by clicking “more” can the user click “still go to the site”. Not all users will click “go anyway”, that’s why all developers now add SSL certificates to sites.

  • Replay attack

You have protected yourself from Man-in-the-middle with HTTPS. Now the attacker hears your messages but does not understand them. But he hears them! And therefore, he can repeat – replay. For example, you gave the command “transfer 100 dollars”. And the attacker, though he does not understand it, repeats “transfer 100 dollars” – and without additional protection, the command will be executed. From you will be written off 100 dollars 2 times, and the second 100 dollars will be sent in the same place where the first.

How to protect yourself?

Set a random session key. This key will be active during one session and cannot be used twice. “Send $100. ABC”. If an intruder repeats “transfer $100. ABC” – it will become clear that the message is repeated and it should not be executed. This is exactly what we did in the NextHuddle project – a video conferencing service for educational events. NextHuddle is designed for an audience of 5000 users and 25 streamers.

  • Session hijacking

Session hijacking is when a hacker takes over your Internet session. For example, you call the bank. You say who you are, your date of birth, or a secret word. “Okay, we recognize you. What do you want?” – and then the intruder takes the phone receiver from you and tells them what he wants.

How do you protect yourself?

Use HTTPS. You have to be man-in-the-middle to hijack the session. So what protects against man-in-the-middle also protects against session hijacking.

Selecting the DTLS Encryption Bit

DTLS is an encryption protocol. The protocol has encryption algorithms such as AES. AES has bits – 128 or more complex and protected 256. In WebRTC they are chosen by the developer. Make sure that the bit selected for AES is the one that gives the highest security, 256.

AES-256 encryption compared to AES-128 as a bigger lock against a smaller one

You can read how to do this in the Mozilla documentation, for example. A certificate is generated and when you create a peer connection you pass on this certificate.

Authentication and member tracking

The task of the developer is to make sure that everyone who enters the video conference room is authorized to do so.

Example 1 – private rooms: for example, a paid video lesson with a teacher. The developer should program a check: has the user paid for the lesson? If he has paid, let him in, and if he hasn’t, don’t let him in.

This seems obvious, but we have encountered many cases where you can copy the URL of such a paid conference and send it to anyone and he goes and visits the conference even though he did not pay for the lesson.

Example 2 – open rooms: for example, business video conferences of the “join without registration” type. This is done for convenience: when you don’t want to make a business partner waste time and register. You just send him a link, he follows it and gets in the conference.

If there are not so many participants, the owner himself will see if someone has joined too much. But if a lot do, the owner won’t notice. One way out is for the developer to program the manual approval of new participants by the owner of the conference.

Example 3 – helping the user to protect his login and password. If an intruder gets hold of a user’s login and password, he will be able to log in with it.

Program the login through third-party services. For example, social networks, Google login, or Apple login on mobile devices. You may not use a password, but send a login code to your email or phone. This will reduce the number of passwords a user has to keep. The thief would not need to steal a password from your program, but a password from a third-party service such as a social network, mobile account, or email. 

You can use two ways at once – for example, the username and password from your program plus a confirmation code on your phone. Then, in order to hack your account, you will need to steal two passwords instead of one.

Phone in a hand with a verification code input screen to protect login

Not all users will want to log in that hard and long to call. A choice can be given: one login method or two. Those who care about security will choose two, and be grateful. 

Access Settings

Let’s be honest – we don’t always read the access settings dialogs. If the user is used to clicking OK, the application may get permissions he didn’t want to give. 

The other extreme measure, the user may delete the app if they don’t immediately understand why they’re being asked for access.

Good and bad example of permission request in apps

The solution is simple – show care. Write clearly what permissions the user gives and why.

For example, in mobile applications: before showing a standard pop-up requesting access to geolocation, show an explanation like “People in our chat room call nearby. Allow geolocation access, so we can show you the people nearby.

Screen sharing

Any app that gives a screen demo feature should have a warning about exactly what the user is showing. 

For example, before a screencasting session, when the user selects the area of the screen to be shown. Make a reminder notification so that the user doesn’t accidentally show a piece of the screen with data they don’t want to show. “What do you want to share?” – and options: ” – entire screen, – only one application – select which one, such as just the browser.”

Choice what app to share when screensharing

If you gave the site permission to do a screen share, and the site gets hacked, the hacker can send you a script that opens some web page in your browser while you’re doing the screen share. For example, he knows how links to social networking posts are formed. He has formed a link to your correspondence with a particular person that he wants to see. He’s not logged in to your social network – so when he follows that link, he won’t see your correspondence. But if he’s hacked into a site that you’ve allowed to show the screen, the next time you show the screen there he’ll execute a script that will open that page with the correspondence in your browser. You will rush to close it, but too late: the screencasting has already passed it to the intruder. The protection against this is the same as against hacking the server – keep your passwords safe. But it is difficult to do. What’s easier is not to hack the site, but to send a fake link requesting screen sharing.

Where to read more about WebRTC security

There are many articles on the internet about security in WebRTC. There are 2 problems with them:

  1. They merely express someone’s subjective opinion. Our article is no exception. The opinion may be wrong.
  2. Most articles are technical: it might be difficult for somebody who’s not a programmer to understand.

How to solve these problems?

  1. Use the scientific method of research: read primary sources, the publications confirmed by someone’s authority. In scientific work, these are publications in Higher Attestation Commission (HAC) journals – before publication in them, the work must be approved by another scientist from the HAC. In IT these are the W3C – World Wide Web Consortium and the IETF – Internet Engineering Task Force. The work is approved by technical experts from Google, Mozilla, and similar corporations before it is published.
    WebRTC security considerations from the W3C specification – in brief
    WebRTC security considerations from the IETF – details on threats, a bit about protecting against them
    IETF’s WebRTC security architecture – more on WebRTC threat protection
  2. The documentation above is correct but written in such technical language that a non-technical person can’t figure it out. Most of the articles on the internet are the same way. That’s why we wrote this one. After reading it:
    – The basics will become clear to you (hopefully). Maybe this will be enough to make a decision.
    – If not, the primary sources will be easier for you to understand. Cooperate with your programmer – or reach out to us for advice.

Conclusion

Security of a WebRTC: WebRTC alone = 1 shield of 3, WebRTC + Good developer = 3 shields of 3

WebRTC itself is secure. But if the developer of a WebRTC-based application doesn’t take care of security, his users will not be safe.

For example, in WebRTC all data except video and audio is encrypted by DTLS, and audio and video are encrypted by SRTP. But many WebRTC security settings are chosen by the developer of the video application: for example, how to transfer keys to SRTP – by DTLS top-level security or not.

Furthermore, WebRTC is only a way to transmit data when the connection is already established. What happens to users before the connection is established is entirely up to the developer: as he programs it, so it will be. What SOP exceptions to set, how to let users in a conference, whether to use HTTPS – all this is up to the developer.

Write to us, we’ll check your video application for security. 

Check out our Instagram – we post projects there, most of which were made on WebRTC.

Categories
Uncategorized

How to Protect Your Online Business from Internet Shutdown? [2021]

internet-shut-down-2021

On October 4th the internet went down. Facebook, Instagram, and WhatsApp were unresponsive for several hours. It, in turn, caused abnormalities with other services. People called it Monday Blackout.

Building on ready-made PaaS (Platform-as-a-Service, third-party tools, and platforms you can combine into your own product) is awesome. It may reduce your time-to-market by half and is probably saving quite a buck while you rush to a quick release. But sometimes it hurts – just like it did on Monday when half the web went out. 

Let’s find out what’s happened, and throw in a couple of battle-proven tips on how to evade downtime even when things come at you that hard. 

What has happened?

In short, the DNS service (partially) failed. For a metaphor, let’s imagine half of all the world’s streets went missing from all the maps and nav apps. At the very moment when you’re out on a leave. On your way to the airport. Running out of time. In a country where nobody speaks your language. Ouch.

To give this ouch a more techy scent, let’s drop in some detail. The root cause of the whole thing was that a considerable part of domain names failed to resolve. This means, the internet backbone gave zero response when your browser asked, how to reach, say, Facebook. And all the third-fourth-twentieth-level subdomains used for third-party apps, all the authentication services, all the ad trackers, and zillions more services of various application scope.

How to minimize the risks of going down for your online project?

1. Don’t use third-party solutions to provide critical features

Monday Blackout is the reason to never use third-party solutions to provide critical features and to be so skeptical about using them as a single source of any feature. Using one global authentication provider like Google IS convenient, but any downtime on their side would totally ruin the user experience for your customers. And even if it is up and running – there is always a chance politics kick in: cases, when Middle Eastern or Asian governments sanction global tech corporations, are, unfortunately, not unheard of.

2. Have native iOS and Android apps for a reliable experience

We suggest building native mobile apps for both platforms, instead of relying solely on web apps. While web applications are naturally dependent on the DNS (you have to type in the URL to get where you want to), mobile software is already on your customer’s phone. It implies that all the fallback features can be at their disposal: e.g., in case of a DNS failure, it can store a number of backup domain names or even straight away IP addresses to reach out for, retaining a service level even when the others fail.

3. Use peer-to-peer (p2p) communication

The more the merrier, right? DNS failure was the root cause of the internet shutting down, but the effect was much more vast. When some of the social networks went unreachable, users rushed to their competitors — who, in turn, were not all ready to deal with that spike.

That’s why we’re so much into WebRTC’s p2p capabilities. With live multimedia being pretty traffic and resource-intensive, peer-to-peer communication is a budget saver at all times, and a business saver in cases like that. Even if a secondary service running somewhere in the cloud becomes unreachable for a while, the key feature will be available, as the spike load will be redistributed between the devices that are directly involved in a particular call.

4. Set up auto-scaling to handle spike load

Another critical thing is scalability. Design your platforms to scale up and down – by different strategies. Either automatically or manually, make your high-load solutions to be architectured to respond to traffic spikes without service degradation. For example, we architect, develop and test them to match the target criteria and even outdo them considerably. 

5. Keep your code portable

And now – back to the PaaS, as it was what made the Monday blackout a blackout. When the users rushed to competing social networks, they sure reached for different domains. But under the hood, many of those were attached to the same cloud computing platforms, known for their quality and massive resource pool. And those platforms started to crack under load, making the internet go down.

That is why we recommend delivering software in code – it’s not only ownership rights. With code at your disposal, you are independent of some premade set up in a cloud. If your cloud provider goes down, what you do is simply deploy the whole thing on a different availability zone, or a competing provider, or even on-premise — whichever serves you better. 

So, here’s the summary of why the internet went down.

What’s happened? A chain reaction. 

  • DNS failure made a number of massively used resources unreachable
  • Their audience rushed to alternative destinations
  • Many of those destinations run off the same cloud platforms the traffic snowballed to
  • Many side services also failed, as they used third-party solutions provided by big tech companies affected by the DNS failure

How we mitigate these risks:

  • Third-party independent critical features
  • Mobile apps and PWAs for less DNS dependency
  • P2p to avoid bottlenecks
  • Auto-scaling to handle spike load
  • Code availability for quick redeployment

Blackouts like this one are, definitely, a rare occurrence. But when they arrive, you are either struggling to minimize your losses, or welcoming the discontent customers of your less reliable competitors. Need a team capable of keeping your business ready for a chance like that? Request a quote.  

Categories
Uncategorized

Kurento Media Server: Everything You Need To Know In 2021

A short review on Kurento in simple language for business people and those who’re not into all that techy stuff.

Imagine that a programmer approached you and said that he needs a media server for development, and he recommends that you use Kurento. How do you know whether it’s the best choice? There’s a lot of information but digesting it might be difficult as it’s all deeply technical.

We’ll try our best to provide you with enough information to make a decision whether or not you should use Kurento in your video app. We’ll tell you why the media server is important, about the license, architecture, main functions that you can develop with Kurento, those being modules. We’ll finish with the summary of when it’s best to use a low-level media server, such as Kurento, and when it’s best to use an out-of-the-box solution.

WebRTC and why we need a media server

Kurento is an open-source WebRTC media streaming server with many built-in video conferencing modules released under the Apache license. WebRTC is a standardized, low latency, real-time, browser-to-browser transmission method without the need for third-party plugins or extensions. WebRTC is a fully client-side technology, so why would we need a media server?

The main reason is the load on the client with a large number of participants. The number of connections between participants grows exponentially, at the same time video quality worsens and the load on traffic and system resources grows. WebRTC can be used as normal P2P communication between 2-6 (in our experience), but if there are more participants, it makes more sense to use a media server. In addition, there are difficulties if we need to save a video recording to a separate file or somehow process it on the fly because all the work lies on the client.

A short history of Kurento

Kurento was developed in 2010 in Madrid as a separate open-source project. The main language that Kurento uses is C++, which helps to optimize the resources of the system.

The media server has more than 2500 stars on GitHub and a few hundred forks, which are separate branches of the project supported by the community.

At the moment Kurento development team joined Twilio and there are minor versions of Kurento itself with some minor patches, and new versions are still being released.

Apache, the license of Kurento

Kurento is released under the Apache license. It gives the developers absolute freedom when working with a code. You just have to mention what changes you apply and who the first author is.

Products with software under the Apache license can be used in commercial products. You can use Kurento in your products for free and get income from those products. No need to pay any loyalty.

You can find the full text about Apache here.

Kurento architecture: MCU and SFU

There are two main types of media server architectures: MCU and SFU.

MCU (Multipoint Conferencing Unit) is a collage-like video architecture. We have multiple streams of users which make up one big seamless picture with each item’s location unchanged. The MCU takes an outgoing video stream from each participant, then the media server stitches all the streams into one with a fixed layout. Thus, despite the fact that there are many participants in the conference, each client receives only one stream as input. This allows to save CPU resources and traffic consumption on the client-side, but increases the load on the server itself and limits the possibilities of video chat layout customization. With MCU we can’t tell exactly which participant is on the video. Mixing streams requires a lot of processing power, increasing the cost of maintaining the server. This type of architecture is mostly suitable for meetings with a large number of participants (> 40). MCU is also a good solution if you need to get video streams on weak devices e.g. on the phone thanks to processing on the server.

SFU (Selective Forwarding Unit) is a popular architecture in modern WebRTC solutions that allows the video conference client to receive only the video streams it needs at the moment. SFU is more like a mosaic where you have to assemble the elements yourself, but the order of assembly is up to you. In SFU each participant also sends his stream to the server, but the other stream comes separately. This architecture better distributes the load between server and client and gives full control over the implementation of the video chat interface. Unlike MCU, a server with SFU doesn’t have to decode and transcode incoming streams. This helps to significantly reduce the load on the server CPU. SFU is well suited for broadcasting (one-to-many or one-person streaming) due to its ability to dynamically scale the system depending on the number of streams. At the same time, this type of server requires more outgoing server bandwidth because it has to do more streams to clients.

Let’s compare these 2 types using the 4-people video conference as an example.

On the client:

compare-MCU-to-SFU-on-the-client
Differences between SFU and MCU for the client

On the server:

compare-MCU-to-SFU-on-the-server
Differences between SFU and MCU for the server

There are also systems that use hybrid architectures to achieve the best result depending on the current number of users and the needs of a particular client. For example, if the client is a weak mobile device, it can receive a single stream from the media server as in the MCU. Browser users, on the other hand, will receive streams separately, where it is possible to implement a unique way of meshing elements as in SFU. Another case would be interaction with SIP devices (mostly IP telephony) where only the MCU is supported. Some hybrids may change from SFU to MCU on the fly when the number of participants reaches a certain threshold. There is the XDN (Experience Delivery Network) architecture from red5pro, which uses cloud technologies to solve WebRTC scaling problems. It has clusters that consist of different types of nodes. There are the source, relay, and edge nodes. Within this topology, any source node receives incoming streams and exchanges data with several edge nodes to support thousands of participants. For larger cases, source nodes can pass the flow to relay nodes, which pass the flow to multiple edge nodes to scale the cluster even more.

Kurento allows both types, using SFU by default. MCU architecture can be achieved using the Compositor element. You can mix SFU and MCU to obtain a hybrid type. Kurento uses SFU by default.

Description of the main modules and functionality

What are modules: main principles of Kurento design

Kurento includes several basic modules tailored to work not only with WebRTC, but also with video recording, computer vision, and AR filters. The media server would be a good choice if you want to work directly with regular WebRTC without using additional wrappers. It can be useful if you want to integrate your video chat with a native Android and iOS app. Kurento does not include a signaling mechanism, you can choose what works best for you depending on your project requirements, be it WebSockets or something else. Since Kurento is an open-source project, this significantly reduces the cost of using a media server.

Unlike the off-the-shelf paid media servers, Kurento is a rather low-level solution that allows you to customize the interaction of modules in the pipeline. Also, unlike, for example, Jitsi, which is a boxed solution and has a ready-made interface, with Kurento it’s much easier to choose and implement the interface you want. And of course, you can write your own module extending the standard media server functionality. The basis of the Kurento architecture design is modularity, where each media element is a program block that performs a certain task and can interact with other elements. In Kurento jargon, the developers call this Media Pipeline and Media Elements. Media Elements are simply modules that connect to each other within the Pipeline. Note, however, that the topology of the Pipeline is chosen by the developers, it does not necessarily have to be a linear sequence of elements.

Let’s talk about the main modules now.

WebRTC video conference

The basis of Kurento for video conferencing is the WebRtcEndpoint. Using it, we can transfer WebRTC streams and interconnect them in a single pipeline. In a video conference, the pipelines can usually be thought of as one room that is isolated from other similar rooms. In this case, WebRtcEndpoint is the user who wants to broadcast their own video or watch someone else’s.

Recording video and audio

The media server uses RecorderEndpoint to record. We can connect the WebRTC stream to the recorder, and the recorder will record the result to the file system. After connecting and starting the video just call the Kurento API method “record” of the recorder. We can specify restrictions when recording: for example, we can record only audio or only video.

Playing third-party media

Kurento has a built-in PlayerEndpoint module that allows you to play third-party audio and video files as well as RTMP and RTSP streams. This means that you can add a stream from the IP camera and broadcast it to other participants of the video conference. You can use the player to play music for users, which is good for music platforms, online training, or even text-to-speech. Thanks to the flexibility of the pipeline, it’s possible to play media not only to all participants but also to a specific user depending on your needs. You can do this simply by creating a new PlayerEndpoint and connecting it to the desired WebRtcEndpoint using the connect method. Once you need to play a media file, just call the player’s play method.

Computer vision and filters

The media server allows you to use many filters. For example, ZBarFilter is used to recognize QR codes, FaceOverlayFilter can recognize a face in a video stream and highlight it in real-time. And with GStreamerFilter, you can create your own custom filter. Also, Kurento has several experimental modules, including filters which are installed separately. There is kms-crowddetector that can detect a crowd, and there is a kms-platedetector that can detect a vehicle’s license plate number.

No support for multi-streaming with different resolutions

There is no support for Simulcast (sending multiple copies of the same stream in different quality) and SVC (sending a stream in low quality with the ability to add additional layers to improve quality if needed).

There’s a solution. You can create several streams and request videos with different resolutions. Switch between video streams depending on packet losses. However, this solution is far from ideal as it puts a huge load on the server. So let’s be honest: there’s no solution that’d be good per se.

Codec support in Kurento

In order to stream over a network, computers use codecs. Codecs are programs that can perform data or signal encoding. They allow you to compress video or audio to an acceptable size for network streaming and playback. 

Kurento Media Server supports codecs like H.264, VP8/9 for video, and OPUS for audio. They allow you to achieve a high degree of compression of the video stream while maintaining high quality. These codecs are the current WebRTC standards. There is no support for AV1 and H.265. These are the latest standards that allow you to reduce the bitrate for the same quality as their older counterparts. The media server itself compresses the received video stream to a quality that matches the bandwidth to the client at that moment, and if necessary, transcodes the stream into the codec the client needs.

API, documentation, Typescript support

The Kurento API can be accessed via the kurento-client library and various additional libraries, e.g. kurento-utils. You can find detailed documentation on the official Kurento website with the description of every module and usage examples for Java and Javascript. you can go either here or here. The site provides information not only about the media server itself but also about related WebRTC themes, such as configuring the TURN (Traversal Using Relay NAT) server, necessary to bypass the communication limitations of users behind a symmetric NAT. The TURN protocol allows you to get the IP address and port needed to establish a WebRTC connection in the face of NAT restrictions. Symmetric NATs have additional protection against transport data tampering. The symmetric NAT table stores 2 more parameters – IP and port of the remote host. Packets from the external network are discarded because the source data doesn’t match the data recorded in the table. The client library supports types for Typescript.

OpenVidu

Based on Kurento, there is the OpenVidu framework, which is designed for simpler development if you don’t need anything other than simple video conferencing. No need to use a server to communicate with the framework, just connect the client library. The library has many wrappers for all popular frontend frameworks as well as for Android and iOS. OpenVidu is a free solution, but there is a paid version that provides additional features for monitoring and scaling the WebRTC platform. And there are also future plans for Simulcast and SVC and automatic switching to P2P sessions for 1-1 cases.

The number of participants in a video conference

Kurento developers have their own benchmark to determine the maximum number of sessions on one machine. Although it’s originally intended for OpenVidu, it’s also suitable for Kurento. Below you can see a table for the different machines on AWS.

video-conferencing-AWS-benchmarks
Video conferencing benchmarks for AWS machines

For example, a single AWS c5.large instance with two CPU cores and 4 GB of RAM can handle 4 group-type meetings with 7 participants. In each such meeting, users show and watch each other’s streams (many-to-many). 

Summary: when is Kurento suitable?

  • for a video conferencing platform
  • for integration with AR,
  • for video recording
  • for playing media files for conference participants
  • for working with pure RTP, for example, for live streaming.

Kurento is a low-level media server. If you use Kurento, development might take a bit more time than out-of-the-box solutions. That’s if your developer doesn’t specialize in Kurento.

However, Kurento allows you to implement any functionality you want. It often happens with ready-made solutions that the customer asks to create something else, but that functionality isn’t supported.

If you’re interested in Kurento media server installation for your project, request a free quote, and we’ll get back to you ASAP!

Categories
Uncategorized

How to Make a Custom Call Notification on Android? With Code Examples

How to create a custom Android call notification

You will learn how to make incoming call notifications on Android from basic to advanced layouts from this article. Customize the notification screen with our examples.

Last time, we told you what any Android app with calls should have and promised to show you how to implement it. Today we’ll deal with notifications for incoming calls: we’ll start with the simplest and most minimalistic ones, and end with full-screen notifications with an off-system design. Let’s get started! 

Channel creation (api 26+)

Since Android 8.0, each notification must have a notification channel to which it belongs. Before this version of the system, the user could either allow or disallow the app to show notifications, without being able to turn off only a certain category, which was not very convenient. With channels, on the other hand, the user can turn off annoying notifications from the app, such as ads and unnecessary reminders, while leaving only the ones he needs (new messages, calls, and so on).

If we don’t specify a channel ID, using the Deprecated builder. If we don’t create a channel with such an ID, the notification will not be displayed with the Android 8 or later versions.

We need the androidx.core library which you probably already have hooked up. We write in Kotlin, so we use the version of the library for that language:

dependencies {
    implementation(“androidx.core:core-ktx:1.5.0”)
}

All work with notifications is done through the system service NotificationManager. For backward compatibility, it is always better to use the Compat version of Android classes if you have them, so we will use NotificationManagerCompat. To get the instance:

val notificationManager = NotificationManagerCompat.from(context)

Let’s create our channel. You can set a lot of parameters for the channel, such as a general sound for notifications and a vibration pattern. We will set only the basic ones, and the full list you can find here.

val INCOMING_CALL_CHANNEL_ID = “incoming_call”

// Creating an object with channel data

val channel = NotificationChannelCompat.Builder(

    // channel ID, it must be unique within the package

    INCOMING_CALL_CHANNEL_ID,

    // The importance of the notification affects whether the notification makes a sound, is shown immediately, and so on. We set it to maximum, it’s a call after all.

    NotificationManagerCompat.IMPORTANCE_HIGH

)

    // the name of the channel, which will be displayed in the system notification settings of the application

    .setName(“Incoming calls”)

    // channel description, will be displayed in the same place

    .setDescription(“Incoming audio and video call alerts”)

    .build()

// Creating the channel. If such a channel already exists, nothing happens, so this method can be used before sending each notification to the channel.

notificationManager.createNotificationChannel(channel)

create-channel-notifcation-on-android
How to create notification channel on Android

Displaying a notification

Wonderful, now we can start creating the notification itself, let’s start with the simplest example:

val notificationBuilder = NotificationCompat.Builder( 

this, 

    // channel ID again

    INCOMING_CALL_CHANNEL_ID

)

    // A small icon that will be displayed in the status bar

    .setSmallIcon(R.drawable.icon)

    // Notification title

    .setContentTitle(“Incoming call”)

    // Notification text, usually the caller’s name

    .setContentText(“James Smith”)

    // Large image, usually a photo / avatar of the caller

    .setLargeIcon(BitmapFactory.decodeResource(resources, R.drawable.logo))

    // For notification of an incoming call, it’s wise to make it so that it can’t be “swiped”

    .setOngoing(true)

        So far we’ve only created a sort of “description” of the notification, but it’s not yet shown to the user. To display it, let’s turn to the manager again:

// Let’s get to building our notification

val notification = notificationBuilder.build()

// We ask the system to display it

notificationManager.notify(INCOMING_CALL_NOTIFICATION_ID, notification)

set-up-display-android-notification
How to display a notification for Android

    The INCOMING_CALL_NOTIFICATION_ID is a notification identifier that can be used to find and interact with an already displayed notification.

        For example, the user wasn’t answering the call for a long time, the caller got tired of waiting and canceled the call. Then we can cancel notification:

notificationManager.cancel(INCOMING_CALL_NOTIFICATION_ID)

        Or, in the case of a conferencing application, if more than one person has joined the caller, we can update our notification. To do this, just create a new notification and pass the same notification ID in the notify call — then the old notification will just be updated with the data, without animating the appearance of the new notification. To do this, we can reuse the old notificationBuilder by simply replacing the changed part in it:

notificationBuilder.setContentText(“James Smith, George Watson”)

notificationManager.notify(

    INCOMING_CALL_NOTIFICATION_ID, 

    notificationBuilder.build()

)

Button actions upon clicking

A simple notification of an incoming call, after which the user has to find our application himself and accept or reject the call is not a very useful thing. Fortunately, we can add action buttons to our notification!

To do this, we add one or more actions when creating the notification. Creating them will look something like this:

val action = NotificationCompat.Action.Builder(

    // The icon that will be displayed on the button (or not, depends on the Android version)

    IconCompat.createWithResource(applicationContext, R.drawable.icon_accept_call),

    // The text on the button

    getString(R.string.accept_call),

    // The action itself, PendingIntent

    acceptCallIntent

).build()

Wait a minute, what does another PendingIntent mean? It’s a very broad topic, worthy of its own article, but simplistically, it’s a description of how to run an element of our application (such as an activity or service). In its simplest form it goes like this:

const val ACTION_ACCEPT_CALL = 101

// We create a normal intent, just like when we start a new Activity

val intent = Intent(applicationContext, MainActivity::class.java).apply {

    action = ACTION_ACCEPT_CALL

}

// But we don’t run it ourselves, we pass it to PendingIntent, which will be called later when the button is pressed

val acceptCallIntent = PendingIntent.getActivity(applicationContext, REQUEST_CODE_ACCEPT_CALL, intent, PendingIntent.FLAG_UPDATE_CURRENT)

Accordingly, we need to handle this action in activity itself

To do this, in `onCreate()` (and in `onNewIntent()` if you use the flag `FLAG_ACTIVITY_SINGLE_TOP` for your activity), take `action` from `intent` and take the action:

override fun onNewIntent(intent: Intent?) {

    super.onNewIntent(intent)

    if (intent?.action == ACTION_ACCEPT_CALL) 

        imaginaryCallManager.acceptCall()

}

Now that we have everything ready for our action, we can add it to our notification via `Builder`

notificationBuilder.addAction(action)

add-buttons-to-android-notification
How to add notification buttons on Android

In addition to the buttons, we can assign an action by clicking on the notification itself, outside of the buttons. Going to the incoming call screen seems like the best solution — to do this, we repeat all the steps of creating an action, but use a different action id instead of `ACTION_ACCEPT_CALL`, and in `MainActivity.onCreate()` handle that `action` with navigation

override fun onNewIntent(intent: Intent?) {

    …

    if (intent?.action == ACTION_SHOW_INCOMING_CALL_SCREEN)

        imaginaryNavigator.navigate(IncomingCallScreen())

}

You can also use `service` instead of `activity` to handle events.

Notifications with their own design

Notifications themselves are part of the system interface, so they will be displayed in the same system style. However, if you want to stand out, or if the standard arrangement of buttons and other notification elements don’t suit you, you can give the notifications your own unique style.

DISCLAIMER: Due to the huge variety of Android devices with different screen sizes and aspect ratios, combined with the limited positioning of elements in notifications (relative to regular application screens), Custom Content Notification is much more difficult to support

The notification will still be rendered by the system, that is, outside of our application process, so we need to use RemoteViews instead of the regular View. Note that this mechanism does not support all the familiar elements, in particular, the `ConstraintLayout` is not available.

A simple example is a custom notification with one button for accepting a call:

<!– notification_custom.xml –>

<RelativeLayout 

    

    android:layout_width=”match_parent”

    android:layout_height=”match_parent”>

    <Button

        android:id=”@+id/button_accept_call”

        android:layout_width=”wrap_content”

        android:layout_height=”wrap_content”

        android:layout_centerHorizontal=”true”

        android:layout_alignParentBottom=”true”

        android:backgroundTint=”@color/green_accept”

        android:text=”@string/accept_call”

        android:textColor=”@color/fora_white” />

</RelativeLayout>.

The layout is ready, now we need to create an instance RemoteViews and pass it to the notification constructor

val remoteView = RemoteViews(packageName, R.layout.notification_custom)

// Set the PendingIntent that will “shoot” when the button is clicked. A normal onClickListener won’t work here – again, the notification will live outside our process

remoteView.setOnClickPendingIntent(R.id.button_accept_call, pendingIntent)

// Add to our long-suffering builder

notificationBuilder.setCustomContentView(remoteView)

create-custom-android-notification
How to create a custom notification on Android

Our example is as simplistic as possible and, of course, a bit jarring. Usually, a customized notification is done in a style similar to the system notification, but in a branded color scheme, like the notifications in Skype, for example.

In addition to .setCustomContentView, which is a normal notification, we can separately specify mark-up for expanded state .setCustomBigContentView and for the head-up state .setCustomHeadsUpContentView

Full-screen notifications

Now our custom notification layouts match the design inside the app, but they’re still small notifications, with small buttons. And what happens when you get a normal incoming call? Our eyes are presented with a beautiful screen that takes up all the available space. Fortunately, this functionality is available to us! And we’re not afraid of any limitations associated with RemoteViews, as we can show the full `activity`.

First of all, we have to add a permission to `AndroidManifest.xml

<uses-permission android:name=”android.permission.USE_FULL_SCREEN_INTENT” />

After creating an `activity` with the desired design and functionality, we initialize the PendingIntent and add it to the notification:

val intent = Intent(this, FullscreenNotificationActivity::class.java)

val pendingIntent = PendingIntent.getActivity(applicationContext, 0, intent, PendingIntent.FLAG_UPDATE_CURRENT)

// At the same time we set highPriority to true, so what is highPriority if not an incoming call?

notificationBuilder.setFullScreenIntent(pendingIntent, highPriority = true)

Yes, and that’s it! Despite the fact that this functionality is so easy to add, for some reason not all call-related applications use it. However, giants like Whatsapp and Telegram have implemented notifications of incoming calls in this way!

create-full-screen-android-notification
How to create a full screen notification on Android

Bottom line

The incoming call notification on Android is a very important part of the application. There are a lot of requirements: it should be prompt, eye-catching, but not annoying. Today we learned about the tools available to achieve all these goals. Let your notifications be always beautiful!

Categories
Uncategorized

Video conference and text chat software development

video conference

5 Russian government agencies and both major telecom operators are clients of imind.com. We developed a new version of their video conference and chat for businesses. Agencies meet there and telecom operators resell it to businesses under their brands. Read a reference from the client on Clutch – search for “Intermind”.

Features

Industries 

Devices 

Technologies 

Costs

Features for video, audio, and text communication software

🎦 WebRTC videoconference

We develop for any number of participants:

  • One-on-one video chats
  • Video conferences with an unlimited number of participants

50 live videos on one screen at the same time was the maximum we’ve done. For example, Zoom has 100 live video participants, though it shows 25 live videos on one screen. To see the others, you switch between screens.

Some other functions: custom backgrounds, enlarging videos of particular participants, picking a camera and microphone from the list, muting a camera and microphone, and a video preview of how you look.

🎬 Conference recording

Record the whole screen of the conference. Set the time to store recordings on the server. For example, on imind.com we keep videos for 30 days on a free plan forever on the most advanced one.

Do not interrupt the recording if the recorder dropped off. In Zoom, if the recorder leaves, the recording stops. In imind.com it continues.

💻 Screen sharing and sharing multiple screens simultaneously

Show your screen instead of a video. Choose to show everything or just 1 application – to not show private data accidentally.

Make all video participants share screens at the same time. It helps to compare something. Users don’t have to stop one sharing and start another one. See it in action at imind.com.

☎️ Join a conference from a landline phone

For those in the countryside without an Internet connection. Dial a phone number on a wired telephone or your mobile and enter the conference with audio, without a video. SIP technology with Asterisk and FreeSWITCH servers powers this function.

💬 Text chat

Send text messages and emoticons. React with emojis. Send pictures and documents. Go to a private chat with one participant. See a list of participants.

✒️ Document editing and signing

Share a document on the conference screen. Scroll through it together, make changes. Sign: upload your signature image or draw it manually. Convenient for remote contract signing in the pandemic.

📋 Polls

Create polls with open and closed questions. View statistics. Make the collective decision-making process faster!

🎙 Webinars

In the broadcast mode, display a presentation full-screen to the audience, plus the presenter’s video. Add guest speakers’ videos. Record the whole session to share with participants afterward.

⌚️ Everlasting rooms with custom links

Create a room and set a custom link to it like videoconference.com/dailymeeting. It’s convenient for regular meetings. Ask participants to add the link to bookmarks and enter at the agreed time each time.

👥 User management

Assign administrators and delegate them the creation of rooms, addition, and deletion of users.

🔐 Security

  • One-time codes instead of passwords
  • Host approves guests before they enter the conference
  • See a picture of the guest before approving him
  • Encryption: we enable AES-256 encryption in WebRTC

🎨 Custom branding

Change color schemes, use your logo, change backgrounds to corporate images.

🗣 Speech-to-text and translation

User speech is recognized and shown on the screen. It can be in another language for translation.

📺 Watch videos together online

Watch a movie or a sports game together with friends. Show an employee onboarding video to the new staff members. Chat by video, voice, and text.

📝 Subscription plans

Free plans with basic functionality, advanced ones for pro and business users.

Industries Fora Soft developed real-time communication tools for

  • 👨‍💼 Businesses – corporate communication tools
  • 🧑‍⚕️ Telemedicine – HIPAA-compliant, with EMR, visit scheduling, and payments
  • 👨‍🎓 E-learning – with whiteboards, LMS, teacher reviews, lesson booking, and payments
  • 👩‍🎤 Entertainment: online cinemas, messengers
  • 🏋️ Fitness and training
  • 🛍 Ecommerce and marketplaces – text chats, demonstrations of goods and services by live video calls

Devices Fora Soft develops for

  • Web browsers
    Chrome, Firefox, Safari, Opera, Edge – applications that require no download
  • Phones and tablets on iOS and Android
    Native applications that you download from AppStore and Google Play
  • Desktop and laptop computers
    Applications that you download and install
  • Smart TVs
    Javascript applications for Samsung and LG, Kotlin apps for Android-based STBs, Swift apps for Apple TV
  • Virtual reality (VR) headsets
    Meetings in virtual rooms

🛠 What technologies to develop a custom video conference on

Basic technology to transmit video

Different technologies suit best for different tasks:

  • for video chats and conferences – WebRTC
  • for broadcasting to a big audience – HLS
  • for streaming to third-party products like YouTube and Facebook – RTMP
  • for calling to phone numbers – SIP
  • for connecting IP cameras – RTSP and RTP

Freelancer or an agency that does not specialize in video software may pick the technology they are best familiar with. It might be not the best for your tasks. In the worst case, you’ll have to throw the work away and redo it. 

We know all the video technologies well. So we choose what’s best for your goal. If you need several of these features in one project – a mix of these technologies should be used. 

WebRTC is the main technology almost always used for video conferences though. This is the technology for media streaming in real-time that works across all browsers and mobile devices people now use. Google, Apple, and Microsoft support and develop it.

WebRTC supports VP8, VP9 and H264 Constrained Baseline profile for video and OPUS, G.711 (PCMA and PCMU) for audio. It allows sending video up to 8,192 x 4,320 pixels – more than 4K. So the limitations to video stream quality on WebRTC are the internet speed and device power of the end-user. 

WebRTC video quality is better than in SIP-based video chats, as a study of an Indonesian university shows. See Figure 6 on page 9: Video test results and read the reasoning below it.

Is a media server needed for video conferencing software development?

For video chats with 2-6 participants, we develop p2p solutions. You don’t pay for the heavy video traffic on your servers.

For video conferences with 7 and more people, we use media servers and bridges – Kurento is the 1st choice. 

For “quick and dirty” prototypes we can integrate third-party solutions – ready implementations of video chats with media servers that allow slight customization. 

  • p2p video chats

P2p means video and audio go directly from sender to receivers. Streams do not have to go to a powerful server first. Computers, smartphones, and tablets people use nowadays are powerful enough to handle 2-6 streams without delays.

Many businesses do not need more people in a video conference. Telemedicine usually means just 2 participants: a doctor and a patient. The development of a video chat with a media server is a mistake here. Businesses would have to pay for the traffic going through the server not receiving any benefit.

  • Video conferences with a media server

Users cannot handle sending more than 5 outgoing video streams without lags now. People’s computers, smartphones, and tablets are not powerful enough. While sending their own video, they accept incoming streams. So for more than 6 people in video chat – each sends just 1 outgoing stream to a media server. The media server is powerful enough to send this stream to each participant.

Kurento is our first choice of media servers now for 3 reasons:

  • It is reliable.

    It was one of the first media servers to appear. So it gained the biggest community of developers. The more developers use technology the faster they solve issues, the quicker you find the answers to questions. This makes development quicker and easier, so you pay less for it.

    Twilio bought Kurento technology for $8.5 million. Now Twilio provides the most reliable paid third-party video chat solution, based on our experience.

    In 2021, other media servers have smaller developers’ and contributors’ communities or are backed by not-so-big companies, based on our experience and impression. They either are not as reliable as Kurento or do not allow developing that many functions.
  • It allows adding the widest number of custom features.

    From screen sharing to face recognition and more – we have not faced any feature that our client would want, not possible to develop with Kurento. To give developers this possibility, the Kurento contributors had to develop each one separately and polish it to a well-working solution. Other media servers did not have that much time and resources to offer the same.
  • It is free.

    Kurento is open-source. It means you may use it in your products legally for free. You don’t have to pay royalties to the technology owner.

We work with other media servers and bridges – when not that many functions are needed, or it is an existing product already using another media server:

We compare media servers and bridges regularly as all of them develop. Knowing your needs, we recommend the optimal choice.

  • Integration of third-party solutions

Third-party solutions are paid: you pay for minutes of usage. The development of a custom video chat is cheaper in the long run.

Their features are also limited to what their developers developed.

They are quicker to integrate and get a working prototype though. If you need to impress investors – we can integrate them. You get your app quicker and cheaper compared to the custom development.

However, to replace it with a custom video chat later – you’ll have to throw away the existing implementation and develop a custom one. So, you’ll pay twice for the video component.

We use these 3 -they are the most reliable ones based on our experience:

Write to us: we’ll help to pick optimal technologies for your video conference.

💵 How much the development of a video conference costs

You’re here – means ready solutions sold as is to integrate into your existing software probably do not suit you and you need a custom one. The cost of a custom one depends on features and their complexity. So we can’t say the price before knowing these.

Take even the log in function as an example. A simple one is just through email and password. A complex one may have a login through Facebook, Google, and others. Each way requires extra effort to implement. So the cost may differ several times. And login is the simplest function for a few work hours. Imagine how much the added complexity will influence the cost of more complex functions. And you’d probably have quite a lot of functions.

Though we can give some indications.

✅ The simplest video chat component takes us 2-4 weeks and costs USD 8000. It is not a fully functioning system with login, subscriptions, booking, etc. – just the video chat with a text chat and screen sharing. You’d integrate it into your website or app and it would receive user info from there. 

✅ The simplest fully functional video chat system takes us about 4-5 months and around USD 56 000. It is built from the ground up for one platform – either web or iOS or Android for example. Users register, pick a plan, and use the system.

✅ A big video conferencing solution development is an ongoing work. The 1st release takes about 7 months and USD 280 000.
Reach us, let’s discuss your project. After the 1st call, you get an approximate estimation.

Categories
Uncategorized

What Every Android App With Calls Should Have

In today’s world, mobile communication is everything. We are surrounded by apps for audio and video calls, meetings, and broadcasts. With the pandemic, it’s not just business meetings that have moved from meeting rooms to calling apps. Calls to family, concerts, and even consultations with doctors are all now available on apps.

In this article we’ll cover the features every communication app should have, whether it’s a small program for calls or a platform for business meetings and webinars, and in the following articles, we’ll show you some examples of how to implement them.

Incoming call notification

Apps can send notifications to notify you of something important. There’s nothing more important for a communication app than an incoming call or a scheduled conference that the user forgot about.

So any app with call functionality has to use this mechanism to notify. Of course, we can show the name and the photo of the caller. Also, for the user’s convenience, we can add buttons to answer or reject the call without unnecessary clicks and opening the app.

You can go even further and change the notification design provided by the system.

However, options for Android devices don’t end here. Show a full-screen notification with your design even if the screen is locked! Read the guide on how to make your Android call notification here.

A notification that does not allow to close the process

The call may take a long time, so the user decides to do something at the same time. He will open another application, for example, a text document. At this moment an unpleasant surprise awaits us: if the system does not have enough resources to display this application, it may simply close ours without a warning! Therefore, the call will be terminated, leaving the user very confused.

Fortunately, there is a way to avoid this by using the Foreground Service mechanism. We mark our application as being actively used by the user even if it is minimized. After that, the application might get closed only in the most extreme case, if the system runs out of resources even for the most crucial processes.

The system, for security reasons, requires a persistent small notification, letting the user know that the application is performing work in the background.

It is essentially a normal notification, albeit with one difference: it can’t be swiped away. You don’t need to worry about accidentally wiping it away, so the application is once again defenseless against the all-optimizing system. 

You can do with a very small notification:

It appears quietly in the notification panel, without showing immediately to the user, like an incoming call notification. 

Nevertheless, it is still a notification, and all the techniques described in the previous paragraph apply to it – you can add buttons and customize the design

Picture-in-picture for video calls

Now the user can participate in a call or conference call and mind his own business without being afraid that the call will end abruptly. However, we can go even further in supporting multitasking! 

If your app has a video call feature, you can show a small video call window (picture-in-picture) for the user’s convenience, even if they go to other app screens. And, starting from Android 8.0, we can show such a window not only in our application but also on top of other applications!

You can also add controls to this window, such as camera switching or pause buttons.

Ability to switch audio output devices

An integral part of any application with calls, video conferences, or broadcasts is audio playback. But how do we know from which audio output device the user wants to hear the sound? We can, of course, try to guess for him, but it’s always better to guess and provide a choice. For example, with this feature, the user won’t have to turn off the Bluetooth headphones to turn on the speakerphone

So if you give the user the ability to switch the audio output device at any point in the call, they will be grateful.

The implementation often depends on the specific application, but there is a method that works in almost all cases. You will learn about it in one of the next articles in this series.

A deep link to quickly join a conference or a call

For both app distribution and UX, the ability to share a broadcast or invite someone to a call or conference is useful. But it may happen that the person invited is not yet a user of your app.

Well, that won’t be for long. You can generate a special link that will take those who already have the app directly to the call to which they were invited and those who don’t have the app installed to their platform’s app store. iPhone owners will go to the App Store, and Android users will go to Google Play. 

In addition, with this link, once the application is installed, it will launch immediately, and the new user will immediately get into the call to which he was invited! 

Bottom line

We covered the main features of the system that allows us to improve the user experience when using our audio/video apps, from protecting our app from being shut down by the system right during a call, to UX conveniences like picture-in-picture mode.

Of course, every app is unique, with its own tasks and nuances, so these tips are no clear-cut rules. Nevertheless, if something from this list seems appropriate for a particular application, it’s worth implementing.

Categories
Uncategorized

Video surveillance, CCTV and video management software development with IP cameras

video surveillance ip cameras

2000 IP cameras stream in our video surveillance system ipivis.com. It works at 450 US police departments, medical education, and child advocacy centers.

What features we develop for closed-circuit television (CCTV) software

📹Live video streaming from internet protocol (IP) cameras

Full HD quality – like most movies on YouTube and television. Watch it on a cinema size screen in an auditorium – students will see the details.

Video and audio are so in sync – speech pathologists work with these streams.

🔎Pan-tilt-zoom – PTZ

Pan means the camera moves left and right to show the whole room.

Tilt means movement up and down.

Zoom – enlarge the view. E.g., from the whole room observation zoom to a sheet of paper on the table, to distinguish what a patient is writing.

To use PTZ, you need to buy PTZ-enabled IP cameras.

🎬Video recording from digital IP cameras

Hit the record button when you want – or schedule recording. The stream will start recording automatically.

Set recurrence – e.g. daily, weekly. Decide for how long to repeat: end after N repetitions, endlessly, till a certain date. Set a starting position to record for PTZ cameras.

Save to any popular format, e.g. mp4. Convert.

💻Software-as-a-Service

No hardware equipment on site, except the IP cameras. We program all the functions that Digital Video Recorders (DVR) and Network Video Recorders (NVR) have. So video is processed and stored on a server. The servers are usually in the cloud – rented from providers like Amazon. However, the server can be in your local server room too. 

🗣Talking CCTV

Some IP cameras have speakers. So you can speak into your laptop mic and someone will hear you near the IP camera. Scare intruders away 🙂

📼Marks on video 

Add comments on video while watching live or pre-recorded ones. Police officers mark confessions – and don’t have to re-watch the whole interrogation to find it again.

📸Closed-circuit digital photography (CCDP)

Take pictures with your IP cameras and save them.

❌Permission control

Video of people is sensitive content and subject to law regulations. For example, doctors must access interviews of their patients only and not their colleagues’. We develop software with as many user roles as you need. When a user logs in, he only has access to the content permitted to him. 

🕹Operate with hardware buttons

Push a real button on the wall to start streaming and recording. A sign “In Use” lights on. Stop the recording from your computer or the same button.

👋Movement and object recognition

Some IP cameras have movement detectors. Cameras may start recording or play some warning sound to scare away the intruders. Get an SMS or push notification about that.

Define “suspicious objects”, and the system will warn you when a camera spots one. We developed an app where military drones monitor land for opponent soldiers and cars this way. Neural networks teach the app to recognize them better and better. We recognize objects on live video with OpenCV.

👂Voice recognition

Type a word, and get all spots on the video marked where it sounds. Police officers search through interrogation recordings this way. For sound recognition, we use Amazon Transcribe, one of Amazon Web Services (AWS) products. 

🎬Create clips

Crop videos and save short clips. Delete unneeded parts.

Burn video on CDs. Yep, the police still use them in 2020.

Devices for which we develop VMS and video surveillance software

Devices for which we develop VMS and video surveillance software

What IP camera to pick for a video surveillance application

Start with those that support ONVIF standards and program your software to support them.

Most IP cameras support the ONVIF standard. It’s a standard API – application programming interface. It’s a “language” a program can speak with the IP cameras.

Axis, Bosch, and Sony founded ONVIF. Most of their cameras should support it, but it is not guaranteed. Other manufacturers want their cameras to sell well – so they are interested in supporting ONVIF. However, not all the cameras are supported – the standard is voluntary.

If the camera does not support ONVIF, the support of such IP camera is programmed separately. So you can’t program once and for any camera.

So, a safe bet among IP camera brands is starting with Axis. Axis has the largest market share – your software will support more cameras than it would with any other choice. Many Bosch and Sony cameras will work too as they support ONVIF.

What industries we developed video surveillance and management software for

👮‍♂️Police interviews

🔎Forensic interviews

🔬Clinical observation and recording

🧨Military drone observation

🎰 Poker: recognition of chips in real casinos, recognition of cards in online casinos

How much it costs to develop video surveillance software

The initial working version of a video surveillance website takes us about 3 months, around USD 24,800. Add IP cameras, watch live streams, record. 

However, custom software needs individual planning and estimation.

With ipivs.com we work on an ongoing basis – provide a dedicated team.

Send us a message through Request a quote. We’ll estimate the time and price for your project.