Screen sharing is almost the basic function of video call platforms. Skype, WhatsApp, Telegram, Teams, Google Meet. All these systems have this feature.
You can enable screen sharing immediately when you create a new video call, in advance, before it actually starts.
However, we will take a glance at the most common case, when screen sharing starts after the call itself started.
To simplify the description of the screen sharing implementation, let’s say that we already have a ready-made application with WebRTC calls. Read more about the implementation of the WebRTC video call.
Steps for implementation will be the following:
Accessing screen content
Creating a video track with a screen image
Replacing the camera video track to the the screen video track
Displaying a notification of an ongoing screen sharing
Now each one in detail:
Accessing screen content
First we get access to capturing the screen content and device sound with Media Projection API:
val screenSharingPermissionLauncher = registerForActivityResult(
ActivityResultContracts.StartActivityForResult()
) { result ->
// Handle request result
val screenSharingIntent = result.data
if (screenSharingIntent != null) {
// Success request
}
}
val mediaProjectionManager = getSystemService(Context.MEDIA_PROJECTION_SERVICE) as MediaProjectionManager
val intent = mediaProjectionManager.createScreenCaptureIntent()
screenSharingPermissionLauncher.launch(intent)
When calling for screenSharingPermissionLauncher.launch(intent), a dialog window will appear. It will tell the user that media projection will access all the information displayed on the screen.
As a result of successful access to the screen content we get screenSharingIntent
Creating a video track with a screen image
Create videoCapturer, which will capture the image from the screen:
val mediaProjectionCallback = object : MediaProjection.Callback() {
override fun onStop() {
// screen capture stopped
}
}
val videoCapturer = ScreenCapturerAndroid(screenSharingIntent, mediaProjectionCallback)
Then create localVideoTrack:
val surfaceTextureHelper = SurfaceTextureHelper.create("CaptureThread", eglBase.eglBaseContext)
val videoSource = peerConnectionFactory.createVideoSource(/* isScreencast = */ true)
videoCapturer.initialize(surfaceTextureHelper,context, videoSource.capturerObserver)
videoCapturer.startCapture(displayWidth, displayHeight, fps)
val localVideoTrack = peerConnectionFactory.createVideoTrack(VIDEO_TRACK_ID, videoSource)
Replacing the camera video track to the the screen video track
To replace the video track correctly, implement the renegotiation logic for both call participants. When changing local media tracks, WebRTC calls onRenegotiationNeeded. It repeats the sdp exchange process:
val peerConnectionObserver = object : PeerConnection.Observer {
...
override fun onRenegotiationNeeded() {
// Launch sdp exchange
peerConnection.createOffer(...)
}
}
val peerConnection = peerConnectionFactory.createPeerConnection(iceServers, peerConnectionObserver)
Now to the video track replacing. Delete the camera video track from the local media:
localMediaStream.removeTrack(cameraVideoTrack)
Stop capturing the camera video:
cameraVideoCapturer.stopCapture()
Add screen sharing video track:
localMediaStream.addTrack(screenVideoTrack)
Displaying a notification about an ongoing screenshot
At the start of the screen sharing, it’s necessary to run the Foreground Service with the notification that the demonstration has started.
Create a ScreencastService and add it to AndroidManifest.xml. Also specify the foregroundServiceType parameter:
Before replacing the video trach from the camera with the screen sharing video track, launch ScreencastService:
val intent = Intent(this, ScreencastService::class.java)
ContextCompat.startForegroundService(this, intent)
Then, in ScreencastService (e.g. in onStartCommand()), call the startForeground method:
startForeground(NOTIFICATION_ID, notification)
Common issues with implementation
The app crashes on Android 10+ devices with the “Media projections require a foreground service of type ServiceInfo.FOREGROUND_SERVICE_TYPE_MEDIA_PROJECTION” error
Foreground Service ensures that the system will not “kill” the app during screen sharing. The Foreground Service notification will inform the user about the running screen sharing and will allow to quickly return to the application.
How to solve: do not forget to display the notification about the started screen sharing 🙂
There is no replacement for the camera video track to the screen one
This might occur if the recognition logic is not implement (correctly) on one or both callers sides.
How to solve: override onRenegotiationNeeded method in PeerConnection.Observer (method name on other platforms may differ). When calling onRenegotiationNeeded, the sdp exchange process must be started.
Conclusion
In this article we covered the implementation of screen sharing in video call and how you can:
Access screen content with MediaProjection API
Capture screen content with ScreenCapturerAndroid
Create a local video track with screen image
Replace the camera video track with the screen video track
Implement Foreground Service for displaying screenshot notification
The phrase “augmented reality” or AR has long been on everyone’s lips and is used in many areas of life. AR is being actively implemented in mobile applications as well. A large part of the AR market is occupied by entertainment applications. Remember the PokemonGo fever of 2016? However, entertainment is not the only area with AR. Tourism, medicine, education, healthcare, retail, and other areas also actively use AR. According to studies, by the end of 2020, there were almost 600 million active users of mobile apps with AR. By 2024, a nearly three-fold growth (1.7 billion) is predicted, and the amount of revenue from such applications is estimated at $ 26 billion. The future is very close!
That’s why in this article we’ll consider several popular tools for Android mobile app development with AR functionality, their pros and cons.
History of AR
It’s been quite a long time since the advent of AR technology and its implementation in smartphones. It was originally part of VR. In 1961, Philco Corporation (USA) developed the first Headsight virtual reality helmets. Like most inventions, they were first used for the needs of the Department of Defense. Then the technology evolved: there were various simulators, virtual helmets, and even goggles with gloves. Their distribution was not widespread, but these technologies interested NASA and the CIA. In 1990, Tom Codell coined the term “Augmented reality”. We can say that from that moment on, AR became separate from VR. In the ’90s, there were many interesting inventions: an exoskeleton, which allowed the military to virtually control cars, gaming platforms. In 1993, Sega developed the Genesis game console. However, this product did not become mass-market: users were recorded nausea and headaches during games. The high cost of devices, scarce technical equipment, and side effects forced people to forget about VR and AR technologies in the mass segment for a while. In 1994, AR made its way into the arts for the first time with a theater production called Dancing in Cyberspace. In it, acrobats danced in virtual space.
In 2000, in the popular game Quake, thanks to the virtual reality helmet, it became possible to chase monsters in the street. This may have inspired the future creators of the game Pokemon Go. Until the 2010s, attempts to bring AR to the masses were not very successful.
In the 2010s, quite successful projects appeared: MARTA (an application from Volkswagen that gives step-by-step recommendations on car repair and maintenance) and Google Glass glasses. At the same time, the introduction of AR in mobile applications begins: Pokemon Go, IKEA Place, the integration of AR in various Google applications (Translate, Maps, etc.), the introduction of filters in Instagram, etc. Currently, there are more and more mobile applications with AR and their use is spreading not only in the field of entertainment.
What is AR and how it works on a smartphone
Essentially, AR is based on computer vision technology. It all starts with a device that has a camera on it. The camera scans an image of the real world. That’s why when you run most AR apps, you’re first asked to move the camera around in space for a while. Then the pre-installed AR engine analyzes this information and builds a virtual world based on it, in which it places an AR object or several objects (picture, 3D model, text, video) on the background of the original image. AR objects can be pre-stored in the phone memory or can be downloaded from the Internet in real-time. The application remembers the location of the objects, so the position of the objects does not change when the smartphone moves unless it is specifically provided by the application functionality. Objects are fixed in space with special markers – identifiers. There are 3 main methods for AR technology to work:
Natural markers. A virtual grid is superimposed on the surrounding world. On this grid, the AR engine identifies anchor points, which determine the exact location to which the virtual object will be attached in the future. Benefit: Real-world objects serve as natural markers. No need to create markers programmatically.
Artificial markers. The appearance of the AR object is tied to some specific marker created artificially, such as the place where the QR code was scanned. This technology works more reliably than with natural markers.
Spatial technology. In this case, the position of the AR object is attached to certain geographical coordinates. GPS/GLONASS, gyroscope, and compass data embedded in the smartphone are used.
Tools for AR in Android
AR tools comparison table
Google ARCore
The first thing that comes to mind is Google’s ARCore. ARCore isn’t an SDK, but a platform for working with AR. So you have to additionally implement the graphical elements that the user interacts with. This means that we can’t do everything with ARCore alone, and we need to implement technologies to work with graphics.
There are several solutions for this.
If you want to use Kotlin:
Until recently, you could use Google’s dedicated Sceneform SDK. But in 2020, Google moved Sceneform to the archive and withdrew further support for it. Currently, the Sceneform repository is maintained by enthusiasts and is available here. It must be said that the repository is updated quite frequently. However, there is still a risk of using technology that is not supported by Google.
Integrate OpenGL into the project. OpenGL is a library written in C++ specifically to work with graphical objects. Android provides an SDK to work with OpenGL to run on Kotlin and Java. This option is suitable if your developers know how to work with OpenGL or can figure it out quickly (which is a non-trivial task).
If you want to use something that isn’t Kotlin:
Android NDK. If your developers know C++, they can use the Android NDK for development. However, they will also need to deal with graphics there. The OpenGL library already mentioned will be suitable for this task.
Unreal Engine. There is nothing better for dealing with graphics than game engines. Unfortunately, ARCore is no longer supported by the Unity SDK, but Unreal Engine developers can still develop applications.
Vuforia
Vuforia is developed by PTC. Another popular tool for developing AR applications is Vuforia. Vuforia can work with normal 2D and 3D objects as well as video and audio, unlike ARCore. You can create virtual buttons, change the background, and control occlusion. It’s a state where one object is slightly hidden by another.
Fun fact: using Vuforia, a developer can turn on ARCore under the hood. Furthermore, the official Vuforia documentation recommends that you do this. That is, while running the application, Vuforia will check to see if it is possible to use ARCore on the device and if so, it will do so.
Unfortunately, bad news again for Kotlin fans. Vuforia can only be used in C or Unity. Also, the downside is that if you plan to publish your application for commercial purposes, you will have to buy a paid version of Vuforia (Vuforia prices).
This library is completely free. However, the documentation leaves a lot to be desired. The official website does not respond to clicks on menu items. Apparently, ARToolKit supports Android development on Unity. Using this library is quite risky.
MAXST
A popular solution from Korea. It has very detailed documentation. There is an SDK to work with 2D and 3D objects. Available in Java and Unity. In Java, you need to additionally implement the work with graphics. The official website states that the SDK works on Android from version 4.3, which is a huge plus for those who want to cover the maximum number of devices. The documentation is quite detailed. However, this SDK is payable if you plan to publish the app. The prices are here.
Wikitude
Development by an Austrian company that was recently taken over by Qualcomm. Allows you to recognize and track 2D and 3D objects, images, scenes and work with geodata, there is integration with smart glasses. There is a Java SDK (you need to additionally implement the work with graphics), as well as Unity and Flutter. This solution is paid, but you can try the free version for 45 days.
Conclusion
Now there is a choice of frameworks to develop AR applications for Android. Of course, there are many more, but I have tried to collect the most popular ones. I hope this will help you with your choice. May Android be with you.
Let’s take a look at 2 more UX conveniences for the Android caller application. First, let’s make sure that the app continues to function normally after minimizing or locking the screen with Android Foreground Services. After that, let’s see how we can implement direct links to a call or conference with Deep Links. By clicking on them, the smartphone users will be taken directly to the call.
How to create a Foreground Service on Android
Today’s smartphones and their operating systems have many built-in optimizations aimed at extending battery life. And mobile app developers need to keep in mind the potential actions the system can take on the app.
A prime example is freeing up resources and closing apps that the user is not actively interacting with at the moment. In this case, the system considers only the app that is currently displayed on the user’s screen to be “actively used”. All other running applications can be closed at any time if the system does not have enough resources for the actively used one. Thanks to this, we can open an infinite number of applications and not explicitly close them — the system will close the old ones, and when we return to them, the application will run again.
In general, this mechanism is convenient and necessary on mobile devices. But we want to bypass this restriction so that the call is protected from sudden closure by the system. Fortunately, it is possible to “mark” a part of the application as actively used, even if it is not displayed anymore. To do this, we use the Foreground Service. Note that even this does not give full protection from the system — but it increases the “priority” of the application in the eyes of the system and also allows you to keep some objects in memory even if `Activity` is closed.
Let’s implement our service itself. In its simplest form it’s just a subclass Service, which has a link to our `CallManager` (so it won’t be cleaned up by garbage collector):
class OngoingCallService : Service() {
@Inject
lateinit var abstractCallManager: AbstractCallManager
// Implementation of an abstract method; we won’t use Bind so just return null
override fun onBind(intent: Intent): IBinder? = null
}
Service is an application component and, like Activity, must be specified in `AndroidManifest.xml`:
<service
// Class name of our service
android:name=".OngoingCallService"
android:enabled="true"
// This flag meant that other applications can’t run this service
android:exported="false"
// Declare a type of our service
android:foregroundServiceType="microphone|camera|phoneCall" />
Our Android Foreground Service starts up a bit differently than regular services:
private fun startForegroundService() {
val intent = Intent(this, OngoingCallService::class.java)
ContextCompat.startForegroundService(this, intent)
}
On Android versions above 8, the Foreground Service must call the startForeground method within a few seconds, otherwise, the application is considered to be hung (ANR). It is necessary to pass a notification to this method because, for security reasons, the presence of such services should be visible to the user (if you do not know or have forgotten how to create notifications, you can refresh your memory in one of our previous articles about call notifications on Android):
val notification = getNotification()
startForeground(ONGOING_NOTIFICATION_ID, notification)
Everything that we wrote in the previous article about notifications applies to this notification — you can update it with the list of call participants, add buttons to it, or change its design completely. The only difference is that this notification will be `ongoing` by default and users won’t be able to “swipe” it.
On Android 13 and above POST_NOTIFICATIONS permission is required to display notifications. Declare it in the manifest:
You also need to request this permission at runtime, for example when entering the application. To learn more about requesting permissions, read the documentation.
If the user denies the notification permission, they still see notices related to foreground services in the Foreground Services (FGS) Task Manager but don’t see them in the notification drawer.
When the call is over – the service must be stopped, otherwise, the application can be completely closed only through the settings, which is very inconvenient for users. Our service is stopped in the same way as usual Android services:
private fun stopForegroundService() {
val intent = Intent(this, OngoingCallService::class.java)
stopService(intent)
}
Starting and stopping a service is very convenient to implement if CallManager has a reactive field to monitor the status of the call, for example:
abstractCallManager.isInCall
.collect { if (it) startForegroundService() else stopForegroundService() }
This is the whole implementation of the service, which will allow to some extent protect our minimized application from being closed by the system.
Android Deep Links Tutorial
An extremely user-friendly feature that simplifies the growth of the user base of the app is the links to a certain place in the app. If the user doesn’t have the app, the link opens a page on Google Play. In the context of call apps, the most successful use case is the ability to share a link to a call / meeting / room. The user wants to talk to someone, throws the link to the person he’s talking to, that person downloads the app, and then gets right into the call — what could be more convenient?
The links themselves to a particular location in the application are supported by the system without any additional libraries. But in order for the link to “survive” the installation of the application, we need to ask for help from Firebase Dynamic Links.
Let’s concentrate on the implementation of links handling in the application and leave their creation to backend developers.
So, the Android deep links with code examples. First, let’s add the library:
To the user, deep links are ordinary links that he clicks on. But before opening a link in the browser, the system looks through the registry of applications and finds those that have declared that they handle links of this domain. If such an application is found – instead of opening in the browser, it launches the same application and the link is passed to it. If there is more than one such application – the system window will be shown with a list where the user can choose which application to open the link with. If you own the link domain, you can protect yourself from opening such links by other applications while yours is installed.
To declare the links that our app can handle, we need to add our `Activity` an intent-filter in `AndroidManifest.xml`:
<activity ...>
<intent-filter>
// These action and category notify the system that we can “display” the links
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
// Description of the link which we can handle. In this case these are the links starting from calls://forasoft.com
<data
android:host="forasoft.com"
android:scheme="calls"/>
</intent-filter>
</activity>
When the user clicks the Dynamic Link and installs the application (or clicks on the link having the app already installed), the Activity will launch which is indicated as this link’s handler. In this Activity, we can get the link this way:
Firebase.dynamicLinks
.getDynamicLink(intent)
.addOnSuccessListener(this) { data ->
val deepLink: Uri? = data?.link
}
When using regular deep links, the data becomes a bit simpler:
val deepLink = intent?.data
That’s all, now all we have left is getting the parameters that interest us from the link and carrying out the actions in your application that are necessary to connect to the call:
val meetindId = deepLink?.getQueryParameter("meetingid")
if (meetingId != null) abstractCallManager.joinMeeting(meetingId)
Conclusion
In the final article of our cycle “what each application with calls should have” we’ve gone through keeping our application alive after minimizing it and using the deep links as a convenient option for invitations to the call. Now you know all the mechanisms that make the user experience better not only inside the application but also at the system level.
Automatically change your audio output on Android app
Seamless and timely switching between the sound output devices on Android is a feature that is usually taken for granted, but the lack of it (or problems with it) is very annoying. Today we will analyze how to implement such switching in Android ringtones, starting from the manual switching by the user to the automatic switching when headsets are connected. At the same time, let’s talk about pausing the rest of the audio system for the duration of the call. This implementation is suitable for almost all calling applications since it operates at the system level rather than the call engine level, e.g., WebRTC.
Audio output device management
All management of Android sound output devices is implemented through the system’s `AudioManager`. To work with it you need to add permission to `AndroidManifest.xml`:
First of all, when a call starts in our app, it is highly recommended to capture the audio focus — let the system know that the user is now communicating with someone, and it is best not to be distracted by sounds from other apps. For example, if the user was listening to music, but received a call and answered — the music will be paused for the duration of the call.
There are two mechanisms of audio focus request — the old one is deprecated, and the new one is available since Android 8.0. We implement for all versions of the system:
// Receiving an AudioManager sample
val audioManager = context.getSystemService(Context.AUDIO_SERVICE) as AudioManager
// We need a "request" for the new approach. Let's generate it for versions >=8.0 and leave null for older ones
@RequiresApi(Build.VERSION_CODES.O)
private fun getAudioFocusRequest() =
AudioFocusRequest.Builder(AudioManager.AUDIOFOCUS_GAIN).build()
// Focus request
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
// Use the generated request
audioManager.requestAudioFocus(getAudioFocusRequest())
} else {
audioManager.requestAudioFocus(
// Listener of receiving focus. Let's leave it empty for the sake of simpleness
{ },
// Requesting a call focus
AudioAttributes.CONTENT_TYPE_SPEECH,
AudioManager.AUDIOFOCUS_GAIN
)
}
It is important to specify the most appropriate `ContentType` and `Usage` — based on these, the system determines which of the custom volume settings to use (media volume or ringer volume) and what to do with the other audio sources (mute, pause, or allow to run as before).
val savedAudioMode = audioManager.mode
val savedIsSpeakerOn = audioManager.isSpeakerphoneOn
val savedIsMicrophoneMuted = audioManager.isMicrophoneMute
Great, we’ve got audio focus. It is highly recommended to save the original AudioManager settings right away before changing anything – this will allow us to restore it to its previous state when the call is over. You should agree that it would be very inconvenient if one application’s volume control would affect all the others
Now we can start setting our defaults. It may depend on the type of call (usually audio calls are on “speakerphone” and video calls are on “speakerphone”), on the user settings in the application or just on the last used speakerphone. Our conditional app is a video app, so we’ll set up the speakerphone right away:
// Moving AudioManager to the "call" state
audioManager.mode = AudioSystem.MODE_IN_COMMUNICATION
// Enabling speakerphone
audioManager.isSpeakerphoneOn = true
Great, we have applied the default settings. If the application design provides a button to toggle the speakerphone, we can now very easily implement its handling:
We’ve learned how to implement hands-free switching, but what happens if you connect headphones? Nothing, because `audioManager.isSpeakerphoneOn` is still `true`! And the user, of course, expects that when headphones are plugged in, the sound will start playing through them. And vice versa — if we have a video call, then when we disconnect the headphones the sound should start playing through the speakerphone.
There is no way out, we have to monitor the connection of the headphones. Let me tell you right away, the connection of wired and Bluetooth headphones is tracked differently, so we have to implement two mechanisms at once. Let’s start with wired ones and put the logic in a separate class:
class HeadsetStateProvider(
private val context: Context,
private val audioManager: AudioManager
) {
// The current state of wired headies; true means enabled
val isHeadsetPlugged = MutableStateFlow(getHeadsetState())
// Create BroadcastReceiver to track the headset connection and disconnection events
private val receiver = object : BroadcastReceiver() {
override fun onReceive(context: Context?, intent: Intent) {
if (intent.action == AudioManager.ACTION_HEADSET_PLUG) {
when (intent.getIntExtra("state", -1)) {
// 0 -- the headset is offline, 1 -- the headset is online
0 -> isHeadsetPlugged.value = false
1 -> isHeadsetPlugged.value = true
}
}
}
}
init {
val filter = IntentFilter(Intent.ACTION_HEADSET_PLUG)
// Register our BroadcastReceiver
context.registerReceiver(receiver, filter)
}
// The method to receive a current headset state. It's used to initialize the starting point.
fun getHeadsetState(): Boolean {
val audioDevices = audioManager.getDevices(AudioManager.GET_DEVICES_OUTPUTS)
return audioDevices.any {
it.type == AudioDeviceInfo.TYPE_WIRED_HEADPHONES
|| it.type == AudioDeviceInfo.TYPE_WIRED_HEADSET
}
}
}
In our example, we use `StateFlow` to implement subscription to the connection state, but instead, we can implement, for example, `HeadsetStateProviderListener`
Now just initialize this class and observe the `isHeadsetPlugged` field, turning the speaker on or off when it changes:
headsetStateProvider.isHeadsetPlugged
// If the headset isn't on, speakerphone is.
.onEach { audioManager.isSpeakerphoneOn = !it }
.launchIn(someCoroutineScope)
Bluetooth headphones connection monitoring
Now we implement the same monitoring mechanism for such Android sound output devices as Bluetooth headphones:
class BluetoothHeadsetStateProvider(
private val context: Context,
private val bluetoothManager: BluetoothManager
) {
val isHeadsetConnected = MutableStateFlow(getHeadsetState())
init {
// Receive the adapter from BluetoothManager and install our ServiceListener
bluetoothManager.adapter.getProfileProxy(context, object : BluetoothProfile.ServiceListener {
// This method will be used when the new device connects
override fun onServiceConnected(profile: Int, proxy: BluetoothProfile?) {
// Checking if it is the headset that's active
if (profile == BluetoothProfile.HEADSET)
// Refreshing state
isHeadsetConnected.value = true
}
// This method will be used when the new device disconnects
override fun onServiceDisconnected(profile: Int)
if (profile == BluetoothProfile.HEADSET)
isHeadsetConnected.value = false
}
// Enabling ServiceListener for headsets
}, BluetoothProfile.HEADSET)
}
// The method of receiving the current state of the bluetooth headset. Only used to initialize the starting state
private fun getHeadsetState(): Boolean {
val adapter = bluetoothManager.adapter
// Checking if there are active headsets
return adapter?.getProfileConnectionState(BluetoothProfile.HEADSET) == BluetoothProfile.STATE_CONNECTED
}
}
Now we implement the same monitoring mechanism for Bluetooth headphones:
To work with Bluetooth, we need another permission. For Android 12 and above, you need to declare in the manifest file and request at runtime following permission:
And now, actually, let’s give away the focus. Again, the implementation depends on the system version:
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
audioManager.abandonAudioFocusRequest(getAudioFocusRequest())
} else {
// Let's leave it empty for simplicity
audioManager.abandonAudioFocus { }
}
Limitations
In the app you can switch the sound output between three device types:
speaker
earpiece or wired
Bluetooth device
However you cannot switch between two Bluetooth devices. On Android 11 though, there’s now a feature to add the device switch to Notification. The switcher displays all available devices with the enabled volume control feature. So it will simply not show users the devices they can’t switch to from the one they’re currently using as an output.
To add the switcher, use the notif with the Notification.MediaStyle style with MediaSession connected to it:
val mediaSession = MediaSession(this, MEDIA_SESSION_TAG)
val style = Notification.MediaStyle()
.setMediaSession(mediaSession.sessionToken)
val notification = Notification.Builder(this, CHANNEL_ID)
.setStyle(style)
.setSmallIcon(R.drawable.ic_launcher_foreground)
.build()
But how does Spotify have that quick and easy device switcher?
Our reader has noticed that Spotify does have that feature where you can switch between any devices you need. We cannot know for sure how they do that. But what we assume is that most likely Spotify implemented audio devices switching with MediaRouter API. It is used for seamless data exchange between two devices.
Great, here we have implemented the perfect UX of switching between Android sound output devices in our app. The main advantage of this approach is that it is almost independent of the specific implementation of calls: in any case, the played audio will be controlled by `AudioManager’, and we control exactly at its level!
In recent years, smartphones have become increasingly close to computers in terms of functionality, and many are already replacing the PC as their primary tool for work. The advantage of personal computers was multi-window capability, which remained unavailable on smartphones. But with the release of Android 7.0, this began to change and multi-window support appeared.
It’s hard to overestimate the convenience of a small floating window with the video of the interlocutor when the call is minimized — you can continue the dialogue and simultaneously take notes or clarify some information. Android has two options for implementing this functionality: support for the application in a floating window and a picture-in-picture mode. Ideally, an application should support both approaches, but the floating window is more difficult to develop and imposes certain restrictions on the overall application design, so let’s consider picture-in-picture (PiP) on Android as a relatively simple way to bring multi-window support into your application.
Switching to PIP mode
Picture-in-picture mode is supported on most devices with Android 8 and above. Accordingly, if you support system versions lower than this, all PIP mode-related calls should be wrapped in the system version check:
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
// Something related to PiP
}
The entire `Activity` is converted to PIP, and first you need to declare PIP support and that “Activity” handles configuration changes in `AndroidManifest.xml`:
Before using picture-in-picture it is necessary to make sure that the user’s device supports this mode, to do this we turn to the `PackageManager`.
val isPipSupported = context.packageManager.hasSystemFeature(PackageManager.FEATURE_PICTURE_IN_PICTURE)
After that, in its simplest form, the transition to picture-in-picture mode is done literally with one line:
this.enterPictureInPictureMode()
But to go to it, you need to know when it is convenient for the user. You can make a separate button and jump when you click on it. The most common approach is an automatic switch when the user
But to go to it, you need to know when it is convenient for the user. You can make a separate button and jump when you click on it. The most common approach is automatic switch when the user minimizes the application during a call, such as using the Home or Recent button
Starting with Android 12, this behavior can be implemented by setting PictureInPictureParams with the setAutoEnterEnabled flag on the Activity:
val pipParams = PictureInPictureParams.Builder()
.setAutoEnterEnabled(true)
.build()
setPictureInPictureParams(pipParams)
On devices with Android 11 or lower, an activity must explicitly call enterPictureInPictureMode() in `Activity.onUserLeaveHint`:
override fun onUserLeaveHint() {
...
if (isPipSupported && imaginaryCallManager.isInCall)
this.enterPictureInPictureMode()
}
Interface adaptation
Great, now our call screen automatically goes into picture-in-picture on Android! But there are often “end call” or “change camera” buttons, and they will not work in this mode. It’s better to hide them when transitioning.
To track the transition to / from PIP mode, `Activity` and `Fragment` have a method `onPictureInPictureModeChanged`. Let’s redefine it and hide unnecessary interface elements
The PIP window is quite small, so it makes sense to hide everything except the interlocutor’s video, including the local user’s video — it will beThe PIP window is quite small, so it makes sense to hide everything except the interlocutor’s video, including the local user’s video — it will be too small to see anything there anyway.
How to implement picture-in-picture mode on Android app?
Customization
The PIP window can be further customized by passing `PictureInPictureParams` in a call to `enterPictureInPictureMode`. There are not many customization options, but the option to add buttons to the bottom of the window deserves special attention. This is a nice way to keep the screen interactive despite the fact that the regular buttons stop working when the user activates the PIP mode.
The maximum number of buttons you can add depends on many factors, but you can always add at least three. All buttons over the limit simply won’t be shown, so it’s better to place the especially important ones at the beginning. You can find out the exact limit in the current configuration through the method `Activity`:
this.maxNumPictureInPictureActions
Let’s add an End call button to our PIP window. To start with, just like with notifications, we need a `PendingIntent`, which will be responsible for telling our application that the button has been pressed. If this is the first time you’ve heard of `PendingIntent’ — you can learn more about them in our last article.
After that, we can start creating the actual button description, namely `RemoteAction`.
val endCallPendingIntent = getPendingIntent()
val endCallAction = RemoteAction(
// An icon for a button. The color will be ignored and changed to a system color
Icon.createWithResource(this, R.drawable.ic_baseline_call_end_24),
// Text of the button that won't be shown
"End call",
// ContentDescription для screen readers
"End call button",
// Our PendingIntent that'll be launched upon pressing the button
endCallPendingIntent
)
Our “action” is ready, now we need to add it to the PIP parameters and, subsequently, to the mode transition call.
Let’s start by creating a Builder for our customization parameters:
val pipParams = PictureInPictureParams.Builder()
.setActions(listOf(endCallAction))
.build()
this.enterPictureInPictureMode(pipParams)
How to customize picture-in-picture mode?
Starting with Android 8, you can define the area of the screen that will be displayed when Activity switches to PIP. You can do this with setSourceRectHint method:
val sourceRectHint = Rect()
visibleView.getGlobalVisibleRect(sourceRectHint)
val params = PictureInPictureParams.Builder()
.setSourceRectHint(sourceRectHint)
.build()
setPictureInPictureParams(params)
In addition to the buttons and the visible area of the screen, through the parameters you can set the aspect ratio of the PIP features on Android or the animation of switching to this mode.
Conclusion
We have considered a fairly simple but very handy variant of using the multi-window feature to improve the user experience, learned how to add buttons to the PIP window on Android and adapt our interface when switching to and from this mode. In the next article we’ll cover audio output switching for your Android calling apps on WebRTC.
WebRTC is a video chat and conferencing development technology. It allows you to create a peer-to-peer connection between mobile devices and browsers to transmit media streams. You can find more details on how it works and its general principles in our article about WebRTC in plain language.
2 ways to implement video communication with WebRTC on Android
The easiest and fastest option is to use one of the many commercial projects, such as Twilio or LiveSwitch. They provide their own SDKs for various platforms and implement functionality out of the box, but they have drawbacks. They are paid and the functionality is limited: you can only do the features that they have, not any that you can think of.
Another option is to use one of the existing libraries. This approach requires more code but will save you money and give you more flexibility in functionality implementation. In this article, we will look at the second option and use https://webrtc.github.io/webrtc-org/native-code/android/ as our library.
Creating a connection
Two steps in creating a WebRTC connection
Creating a WebRTC connection consists of two steps:
Establishing a logical connection – devices must agree on the data format, codecs, etc.
Establishing a physical connection – devices must know each other’s addresses
To begin with, note that at the initiation of a connection, to exchange data between devices, a signaling mechanism is used. The signaling mechanism can be any channel for transmitting data, such as sockets.
Suppose we want to establish a video connection between two devices. To do this we need to establish a logical connection between them.
A logical connection
A logical connection is established using the Session Description Protocol (SDP), for this one peer:
Creates a PeerConnection object.
Forms an object on the SDP offer, which contains data about the upcoming session, and sends it to the interlocutor using a signaling mechanism.
val peerConnectionFactory: PeerConnectionFactory
lateinit var peerConnection: PeerConnection
fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
peerConnection = peerConnectionFactory.createPeerConnection(
rtcConfig,
object : PeerConnection.Observer {
...
}
)!!
}
fun sendSdpOffer() {
peerConnection.createOffer(
object : SdpObserver {
override fun onCreateSuccess(sdpOffer: SessionDescription) {
peerConnection.setLocalDescription(sdpObserver, sdpOffer)
signaling.sendSdpOffer(sdpOffer)
}
...
}, MediaConstraints()
)
}
In turn, the other peer:
Also creates a PeerConnection object.
Using the signal mechanism, receives the SDP-offer poisoned by the first peer and stores it in itself
Forms an SDP-answer and sends it back, also using the signal mechanism
fun onSdpOfferReceive(sdpOffer: SessionDescription) {// Saving the received SDP-offer
peerConnection.setRemoteDescription(sdpObserver, sdpOffer)
sendSdpAnswer()
}
// FOrming and sending SDP-answer
fun sendSdpAnswer() {
peerConnection.createAnswer(
object : SdpObserver {
override fun onCreateSuccess(sdpOffer: SessionDescription) {
peerConnection.setLocalDescription(sdpObserver, sdpOffer)
signaling.sendSdpAnswer(sdpOffer)
}
…
}, MediaConstraints()
)
}
The first peer, having received the SDP answer, keeps it
fun onSdpAnswerReceive(sdpAnswer: SessionDescription) {
peerConnection.setRemoteDescription(sdpObserver, sdpAnswer)
sendSdpAnswer()
}
After successful exchange of SessionDescription objects, the logical connection is considered established.
Physical connection
We now need to establish the physical connection between the devices, which is most often a non-trivial task. Typically, devices on the Internet do not have public addresses, since they are located behind routers and firewalls. To solve this problem WebRTC uses ICE (Interactive Connectivity Establishment) technology.
Stun and Turn servers are an important part of ICE. They serve one purpose – to establish connections between devices that do not have public addresses.
Stun server
A device makes a request to a Stun-server and receives its public address in response. Then, using a signaling mechanism, it sends it to the interlocutor. After the interlocutor does the same, the devices recognize each other’s network location and are ready to transmit data to each other.
Turn server
In some cases, the router may have a “Symmetric NAT” limitation. This restriction won’t allow a direct connection between the devices. In this case, the Turn server is used. It serves as an intermediary and all data goes through it. Read more in Mozilla’s WebRTC documentation.
As we have seen, STUN and TURN servers play an important role in establishing a physical connection between devices. It is for this purpose that we when creating the PeerConnection object, pass a list with available ICE servers.
To establish a physical connection, one peer generates ICE candidates – objects containing information about how a device can be found on the network and sends them via a signaling mechanism to the peer
lateinit var peerConnection: PeerConnection
fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
peerConnection = peerConnectionFactory.createPeerConnection(
rtcConfig,
object : PeerConnection.Observer {
override fun onIceCandidate(iceCandidate: IceCandidate) {
signaling.sendIceCandidate(iceCandidate)
} …
}
)!!
}
Then the second peer receives the ICE candidates of the first peer via a signaling mechanism and keeps them for itself. It also generates its own ICE-candidates and sends them back
fun onIceCandidateReceive(iceCandidate: IceCandidate) {
peerConnection.addIceCandidate(iceCandidate)
}
Now that the peers have exchanged their addresses, you can start transmitting and receiving data.
Receiving data
The library, after establishing logical and physical connections with the interlocutor, calls the onAddTrack header and passes into it the MediaStream object containing VideoTrack and AudioTrack of the interlocutor
fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
peerConnection = peerConnectionFactory.createPeerConnection(
rtcConfig,
object : PeerConnection.Observer {
override fun onIceCandidate(iceCandidate: IceCandidate) { … }
override fun onAddTrack(
rtpReceiver: RtpReceiver?,
mediaStreams: Array<out MediaStream>
) {
onTrackAdded(mediaStreams)
}
…
}
)!!
}
Next, we must retrieve the VideoTrack from the MediaStream and display it on the screen.
private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
val videoTrack: VideoTrack? = mediaStreams.mapNotNull {
it.videoTracks.firstOrNull()
}.firstOrNull()
displayVideoTrack(videoTrack)
…
}
To display VideoTrack, you need to pass it an object that implements the VideoSink interface. For this purpose, the library provides SurfaceViewRenderer class.
fun displayVideoTrack(videoTrack: VideoTrack?) {
videoTrack?.addSink(binding.surfaceViewRenderer)
}
To get the sound of the interlocutor we don’t need to do anything extra – the library does everything for us. But still, if we want to fine-tune the sound, we can get an AudioTrack object and use it to change the audio settings
var audioTrack: AudioTrack? = null
private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
…
audioTrack = mediaStreams.mapNotNull {
it.audioTracks.firstOrNull()
}.firstOrNull()
}
For example, we could mute the interlocutor, like this:
fun muteAudioTrack() {
audioTrack.setEnabled(false)
}
Sending data
Sending video and audio from your device also begins by creating a PeerConnection object and sending ICE candidates. But unlike creating an SDPOffer when receiving a video stream from the interlocutor, in this case, we must first create a MediaStream object, which includes AudioTrack and VideoTrack.
To send our audio and video streams, we need to create a PeerConnection object, and then use a signaling mechanism to exchange IceCandidate and SDP packets. But instead of getting the media stream from the library, we must get the media stream from our device and pass it to the library so that it will pass it to our interlocutor.
Now we need to create a MediaStream object and pass the AudioTrack and VideoTrack objects into it
val context: Context
private fun getLocalMediaStream(): MediaStream? {
val stream = peerConnectionFactory.createLocalMediaStream("user")
val audioTrack = getLocalAudioTrack()
stream.addTrack(audioTrack)
val videoTrack = getLocalVideoTrack(context)
stream.addTrack(videoTrack)
return stream
}
Receive audio track:
private fun getLocalAudioTrack(): AudioTrack {
val audioConstraints = MediaConstraints()
val audioSource = peerConnectionFactory.createAudioSource(audioConstraints)
return peerConnectionFactory.createAudioTrack("user_audio", audioSource)
}
Receiving VideoTrack is tiny bit more difficult. First, get a list of all cameras of the device.
lateinit var capturer: CameraVideoCapturer
private fun getLocalVideoTrack(context: Context): VideoTrack {
val cameraEnumerator = Camera2Enumerator(context)
val camera = cameraEnumerator.deviceNames.firstOrNull {
cameraEnumerator.isFrontFacing(it)
} ?: cameraEnumerator.deviceNames.first()
...
}
Next, create a CameraVideoCapturer object, which will capture the image
private fun getLocalVideoTrack(context: Context): VideoTrack {
...
capturer = cameraEnumerator.createCapturer(camera, null)
val surfaceTextureHelper = SurfaceTextureHelper.create(
"CaptureThread",
EglBase.create().eglBaseContext
)
val videoSource =
peerConnectionFactory.createVideoSource(capturer.isScreencast ?: false)
capturer.initialize(surfaceTextureHelper, context, videoSource.capturerObserver)
...
}
Now, after getting CameraVideoCapturer, start capturing the image and add it to the MediaStream
private fun getLocalMediaStream(): MediaStream? {
...
val videoTrack = getLocalVideoTrack(context)
stream.addTrack(videoTrack)
return stream
}
private fun getLocalVideoTrack(context: Context): VideoTrack {
...
capturer.startCapture(1024, 720, 30)
return peerConnectionFactory.createVideoTrack("user0_video", videoSource)
}
After creating a MediaStream and adding it to the PeerConnection, the library forms an SDP offer, and the SDP packet exchange described above takes place through the signaling mechanism. When this process is complete, the interlocutor will begin to receive our video stream. Congratulations, at this point the connection is established.
Many to Many
We have considered a one-to-one connection. WebRTC also allows you to create many-to-many connections. In its simplest form, this is done in exactly the same way as a one-to-one connection. The difference is that the PeerConnection object, as well as the SDP packet and ICE-candidate exchange, is not done once but for each participant. This approach has disadvantages:
The device is heavily loaded because it needs to send the same data stream to each interlocutor
The implementation of additional features such as video recording, transcoding, etc. is difficult or even impossible
In this case, WebRTC can be used in conjunction with a media server that takes care of the above tasks. For the client-side the process is exactly the same as for direct connection to the interlocutors’ devices, but the media stream is not sent to all participants, but only to the media server. The media server retransmits it to the other participants.
Conclusion
We have considered the simplest way to create a WebRTC connection on Android. If after reading this you still don’t understand it, just go through all the steps again and try to implement them yourself – once you have grasped the key points, using this technology in practice will not be a problem.
And this is the video chat for you! Android also allows you to create a custom call notification. After reading our guide on it, even those who are really new to coding will be able to do it!
Not an Android guy? We’ve got you covered with our WebRTC on iOS guide.
You can also refer to the following resources for a better understanding of WebRTC:
You will learn how to make incoming call notifications on Android from basic to advanced layouts from this article. Customize the notification screen with our examples.
Last time, we told you what any Android app with calls should have and promised to show you how to implement it. Today we’ll deal with notifications for incoming calls: we’ll start with the simplest and most minimalistic ones, and end with full-screen notifications with an off-system design. Let’s get started!
Channel creation (api 26+)
Since Android 8.0, each notification must have a notification channel to which it belongs. Before this version of the system, the user could either allow or disallow the app to show notifications, without being able to turn off only a certain category, which was not very convenient. With channels, on the other hand, the user can turn off annoying notifications from the app, such as ads and unnecessary reminders, while leaving only the ones he needs (new messages, calls, and so on).
If we don’t specify a channel ID, using the Deprecated builder. If we don’t create a channel with such an ID, the notification will not be displayed with the Android 8 or later versions.
We need the androidx.core library which you probably already have hooked up. We write in Kotlin, so we use the version of the library for that language:
All work with notifications is done through the system service NotificationManager. For backward compatibility, it is always better to use the Compat version of Android classes if you have them, so we will use NotificationManagerCompat. To get the instance:
val notificationManager = NotificationManagerCompat.from(context)
Let’s create our channel. You can set a lot of parameters for the Let’s create our channel. You can set a lot of parameters for the channel, such as a general sound for notifications and a vibration pattern. We will set only the basic ones, and the full list you can find here.
val INCOMING_CALL_CHANNEL_ID = “incoming_call”
// Creating an object with channel data
val channel = NotificationChannelCompat.Builder(
// channel ID, it must be unique within the package
INCOMING_CALL_CHANNEL_ID,
// The importance of the notification affects whether the notification makes a sound, is shown immediately, and so on. We set it to maximum, it’s a call after all.
NotificationManagerCompat.IMPORTANCE_HIGH
)
// the name of the channel, which will be displayed in the system notification settings of the application
.setName(“Incoming calls”)
// channel description, will be displayed in the same place
.setDescription(“Incoming audio and video call alerts”)
.build()
// Creating the channel. If such a channel already exists, nothing happens, so this method can be used before sending each notification to the channel.
notificationManager.createNotificationChannel(channel)
Notification channel in application settings
Notification runtime permission (api 33+):
If your app targets Android 13+ you should declare the following permission in AndroidManifest:
You should also request POST_NOTIFICATIONS permission from a user at runtime. Learn more about permission requesting.
Displaying a notification
Now we can start creating the notification itself, let’s start with the simplest example:
val notificationBuilder = NotificationCompat.Builder(
this,
// channel ID again
INCOMING_CALL_CHANNEL_ID
)
// A small icon that will be displayed in the status bar
.setSmallIcon(R.drawable.icon)
// Notification title
.setContentTitle(“Incoming call”)
// Notification text, usually the caller’s name
.setContentText(“James Smith”)
// Large image, usually a photo / avatar of the caller
.setLargeIcon(BitmapFactory.decodeResource(resources, R.drawable.logo))
// For notification of an incoming call, it’s wise to make it so that it can’t be “swiped”
.setOngoing(true)
So far we’ve only created a sort of “description” of the notification, but it’s not yet shown to the user. To display it, let’s turn to the manager again:
// Let’s get to building our notification
val notification = notificationBuilder.build()
// We ask the system to display it
notificationManager.notify(INCOMING_CALL_NOTIFICATION_ID, notification)
Simple notification
The INCOMING_CALL_NOTIFICATION_ID is a notification identifier that can be used to find and interact with an already displayed notification.
For example, the user wasn’t answering the call for a long time, the caller got tired of waiting and canceled the call. Then we can cancel notification:
Or, in the case of a conferencing application, if more than one person has joined the caller, we can update our notification. To do this, just create a new notification and pass the same notification ID in the notify call — then the old notification will just be updated with the data, without animating the appearance of the new notification. To do this, we can reuse the old notificationBuilder by simply replacing the changed part in it:
notificationBuilder.setContentText(“James Smith, George Watson”)
notificationManager.notify(
INCOMING_CALL_NOTIFICATION_ID,
notificationBuilder.build()
)
Button actions upon clicking
A simple notification of an incoming call, after which the user has to find our application himself and accept or reject the call is not a very useful thing. Fortunately, we can add action buttons to our notification!
To do this, we add one or more actions when creating the notification. Creating them will look something like this:
val action = NotificationCompat.Action.Builder(
// The icon that will be displayed on the button (or not, depends on the Android version)
IconCompat.createWithResource(applicationContext, R.drawable.icon_accept_call),
// The text on the button
getString(R.string.accept_call),
// The action itself, PendingIntent
acceptCallIntent
).build()
Wait a minute, what does another PendingIntent mean? It’s a very broad topic, worthy of its own article, but simplistically, it’s a description of how to run an element of our application (such as an activity). In its simplest form it goes like this:
const val ACTION_ACCEPT_CALL = 101
// We create a normal intent, just like when we start a new Activity
val intent = Intent(applicationContext, MainActivity::class.java).apply {
action = ACTION_ACCEPT_CALL
}
// But we don’t run it ourselves, we pass it to PendingIntent, which will be called later when the button is pressed
val acceptCallIntent = PendingIntent.getActivity(applicationContext, REQUEST_CODE_ACCEPT_CALL, intent, PendingIntent.FLAG_UPDATE_CURRENT)
Accordingly, we need to handle this action in activity itself
To do this, in `onCreate()` (and in `onNewIntent()` if you use the flag `FLAG_ACTIVITY_SINGLE_TOP` for your activity), take `action` from `intent` and take the action:
override fun onNewIntent(intent: Intent?) {
super.onNewIntent(intent)
if (intent?.action == ACTION_ACCEPT_CALL)
imaginaryCallManager.acceptCall()
}
Now that we have everything ready for our action, we can add it to our notification via `Builder`:
notificationBuilder.addAction(action)
Notification with action buttons
In addition to the buttons, we can assign an action by clicking on the notification itself, outside of the buttons. Going to the incoming call screen seems like the best solution — to do this, we repeat all the steps of creating an action, but use a different action id instead of `ACTION_ACCEPT_CALL`, and in `MainActivity.onCreate()` handle that `action` with navigation:
override fun onNewIntent(intent: Intent?) {
…
if (intent?.action == ACTION_SHOW_INCOMING_CALL_SCREEN)
imaginaryNavigator.navigate(IncomingCallScreen())
}
Notification.CallStyle (api 31+):
There’s a new call notification style in Android 12, Notification.CallStyle. It helps you distinguish and highlight the call notification among others. In the new OS versions starting from Android 12 you have to use Notification.CallStyle instead of custom call notifications.
// Dividing the logic of creating notifications for Android 12+ below
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.S) {
// Creating a notification with Notification.CallStyle
val icon = Icon.createWithResource(this, R.drawable.user_avatar)
val caller = Person.Builder()
// Caller icon
.setIcon(icon)
// Caller name
.setName("Chuck Norris")
.setImportant(true)
.build()
// Creating the call notification style
val notificationStyle = Notification.CallStyle.forIncomingCall(caller, declineIntent, answerIntent)
Notification.Builder(this, CHANNEL_ID)
.setSmallIcon(R.drawable.ic_launcher_foreground)
.setContentTitle("Incoming call")
.setContentText("Incoming call from Chuck Norris")
.setStyle(notificationStyle)
// Intent that will be called for when tapping on the notification
.setContentIntent(contentIntent)
.setFullScreenIntent(contentIntent, true)
.setOngoing(true)
// notification category that describes this Notification. May be used by the system for ranking and filtering
.setCategory(Notification.CATEGORY_CALL)
.build()
} else {
// Creating the custom notification
...
}
Notification with Notification.CallStyle
Notifications with their own design
Notifications themselves are part of the system interface, so they will be displayed in the same system style. However, if you want to stand out, or if the standard arrangement of buttons and other notification elements don’t suit you, you can give the notifications your own unique style.
DISCLAIMER: Due to the huge variety of Android devices with different screen sizes and aspect ratios, combined with the limited positioning of elements in notifications (relative to regular application screens), Custom Content Notification is much more difficult to support
The notification will still be rendered by the system, that is, outside of our application process, so we need to use RemoteViews instead of the regular View. Note that this mechanism does not support all the familiar elements, in particular, the `ConstraintLayout` is not available.
A simple example is a custom notification with one button for accepting a call:
The layout is ready, now we need to create an instance RemoteViews and pass it to the notification constructor
val remoteView = RemoteViews(packageName, R.layout.notification_custom)
// Set the PendingIntent that will “shoot” when the button is clicked. A normal onClickListener won’t work here – again, the notification will live outside our process
remoteView.setOnClickPendingIntent(R.id.button_accept_call, pendingIntent)
// Add to our long-suffering builder
notificationBuilder.setCustomContentView(remoteView)
Notification with custom layout
Our example is as simplistic as possible and, of course, a bit jarring. Usually, a customized notification is done in a style similar to the system notification, but in a branded color scheme, like the notifications in Skype, for example.
In addition to .setCustomContentView, which is a normal notification, we can separately specify mark-up for the expanded state .setCustomBigContentView and for the head-up state .setCustomHeadsUpContentView
Full-screen notifications
Now our custom notification layouts match the design inside the app, but they’re still small notifications, with small buttons. And what happens when you get a normal incoming call? Our eyes are presented with a beautiful screen that takes up all the available space. Fortunately, this functionality is available to us! And we’re not afraid of any limitations associated with RemoteViews, as we can show the full `activity`.
First of all, we have to add a permission to `AndroidManifest.xml`
After creating an `activity` with the desired design and functionality, we initialize the PendingIntent and add it to the notification:
val intent = Intent(this, FullscreenNotificationActivity::class.java)
val pendingIntent = PendingIntent.getActivity(applicationContext, 0, intent, PendingIntent.FLAG_UPDATE_CURRENT)
// At the same time we set highPriority to true, so what is highPriority if not an incoming call?
notificationBuilder.setFullScreenIntent(pendingIntent, highPriority = true)
And that’s it! Despite the fact that this functionality is so easy to add, for some reason not all call-related applications use it. However, giants like Whatsapp and Telegram have implemented notifications of incoming calls in this way!
Custom incoming call screen
Bottom line
The incoming call notification on Android is a very important part of the application. There are a lot of requirements: it should be prompt, eye-catching, but not annoying. Today we learned about the tools available to achieve all these goals. Let your notifications be always beautiful!
The video conferencing market volume is $4.66 billion (TrueList). The global video conferencing market size is projected to grow to $22.5 billion by 2026 according to Video Conferencing Statistics 2022
We specialise in developing video and multimedia software and app. We have been doing it since 2005. Along the way we created a lot of messengers like Speakk and conferencing system like ProVideoMeeting. Freelancer or an agency that does not specialize in video software may pick the technology they are best familiar with. We will build custom product tailored for your needs.
Features for video, audio, and text communication software
WebRTC videoconference
We develop for any number of participants:
One-on-one video chats
Video conferences with an unlimited number of participants
50 live videos on one screen at the same time was the maximum we’ve done. For example, Zoom has 100 live video participants, though it shows 25 live videos on one screen. To see the others, you switch between screens.
Some other functions: custom backgrounds, enlarging videos of particular participants, picking a camera and microphone from the list, muting a camera and microphone, and a video preview of how you look.
Conference recording
Record the whole screen of the conference. Set the time to store recordings on the server. For example, on imind.com we keep videos for 30 days on a free plan forever on the most advanced one.
Do not interrupt the recording if the recorder dropped off. In Zoom, if the recorder leaves, the recording stops. In imind.com it continues.
Screen sharing and sharing multiple screens simultaneously
Show your screen instead of a video. Choose to show everything or just 1 application – to not show private data accidentally.
Make all video participants share screens at the same time. It helps to compare something. Users don’t have to stop one sharing and start another one. See it in action at imind.com.
Join a conference from a landline phone
For those in the countryside without an Internet connection. Dial a phone number on a wired telephone or your mobile and enter the conference with audio, without a video. SIP technology with Asterisk and FreeSWITCH servers powers this function.
Text chat
Send text messages and emoticons. React with emojis. Send pictures and documents. Go to a private chat with one participant. See a list of participants.
Document editing and signing
Share a document on the conference screen. Scroll through it together, make changes. Sign: upload your signature image or draw it manually. Convenient for remote contract signing in the pandemic.
Polls
Create polls with open and closed questions. View statistics. Make the collective decision-making process faster!
Webinars
In the broadcast mode, display a presentation full-screen to the audience, plus the presenter’s video. Add guest speakers’ videos. Record the whole session to share with participants afterward.
Everlasting rooms with custom links
Create a room and set a custom link to it like videoconference.com/dailymeeting. It’s convenient for regular meetings. Ask participants to add the link to bookmarks and enter at the agreed time each time.
User management
Assign administrators and delegate them the creation of rooms, addition, and deletion of users.
Security
One-time codes instead of passwords
Host approves guests before they enter the conference
See a picture of the guest before approving him
Encryption: we enable AES-256 encryption in WebRTC
Custom branding
Change color schemes, use your logo, change backgrounds to corporate images.
Speech-to-text and translation
User speech is recognized and shown on the screen. It can be in another language for translation.
Watch videos together online
Watch a movie or a sports game together with friends. Show an employee onboarding video to the new staff members. Chat by video, voice, and text.
Subscription plans
Free plans with basic functionality, advanced ones for pro and business users.
Industries we developed real-time communication tools for
Businesses – corporate communication tools
Telemedicine – HIPAA-compliant, with EMR, visit scheduling, and payments
E-learning – with whiteboards, LMS, teacher reviews, lesson booking, and payments
Entertainment: online cinemas, messengers
Fitness and training
Ecommerce and marketplaces – text chats, demonstrations of goods and services by live video calls
Devices we develop for
software development for devices
Web browsers Chrome, Firefox, Safari, Opera, Edge – applications that require no download
Phones and tablets on iOS and Android Native applications that you download from AppStore and Google Play
Desktop and laptop computers Applications that you download and install
Smart TVs Javascript applications for Samsung and LG, Kotlin apps for Android-based STBs, Swift apps for Apple TV
Virtual reality (VR) headsets Meetings in virtual rooms
What technologies to choose
Technologies for video chat development
Basic technology to transmit video
Different technologies suit best for different tasks:
for streaming to third-party products like YouTube and Facebook – RTMP
for calling to phone numbers – SIP
for connecting IP cameras – RTSP and RTP
Freelancer or an agency that does not specialize in video software may pick the technology they are best familiar with. It might be not the best for your tasks. In the worst case, you’ll have to throw the work away and redo it.
We know all the video technologies well. So we choose what’s best for your goal. If you need several of these features in one project – a mix of these technologies should be used.
WebRTC is the main technology almost always used for video conferences though. This is the technology for media streaming in real-time that works across all browsers and mobile devices people now use. Google, Apple, and Microsoft support and develop it.
WebRTC supports VP8, VP9 and H264 Constrained Baseline profile for video and OPUS, G.711 (PCMA and PCMU) for audio. It allows sending video up to 8,192 x 4,320 pixels – more than 4K. So the limitations to video stream quality on WebRTC are the internet speed and device power of the end-user.
WebRTC video quality is better than in SIP-based video chats, as a study of an Indonesian university shows. See Figure 6 on page 9: Video test results and read the reasoning below it.
Is a media server needed for video conferencing software development?
For video chats with 2-6 participants, we develop p2p solutions. You don’t pay for the heavy video traffic on your servers.
For video conferences with 7 and more people, we use media servers and bridges – Kurento is the 1st choice.
For “quick and dirty” prototypes we can integrate third-party solutions – ready implementations of video chats with media servers that allow slight customization.
p2p video chats
P2p means video and audio go directly from sender to receivers. Streams do not have to go to a powerful server first. Computers, smartphones, and tablets people use nowadays are powerful enough to handle 2-6 streams without delays.
Many businesses do not need more people in a video conference. Telemedicine usually means just 2 participants: a doctor and a patient. The development of a video chat with a media server is a mistake here. Businesses would have to pay for the traffic going through the server not receiving any benefit.
Video conferences with a media server
Users cannot handle sending more than 5 outgoing video streams without lags now. People’s computers, smartphones, and tablets are not powerful enough. While sending their own video, they accept incoming streams. So for more than 6 people in video chat – each sends just 1 outgoing stream to a media server. The media server is powerful enough to send this stream to each participant.
Kurento is our first choice of media servers now for 3 reasons:
It is reliable.
It was one of the first media servers to appear. So it gained the biggest community of developers. The more developers use technology the faster they solve issues, the quicker you find the answers to questions. This makes development quicker and easier, so you pay less for it.
In 2021, other media servers have smaller developers’ and contributors’ communities or are backed by not-so-big companies, based on our experience and impression. They either are not as reliable as Kurento or do not allow developing that many functions.
It allows adding the widest number of custom features.
From screen sharing to face recognition and more – we have not faced any feature that our client would want, not possible to develop with Kurento. To give developers this possibility, the Kurento contributors had to develop each one separately and polish it to a well-working solution. Other media servers did not have that much time and resources to offer the same.
It is free.
Kurento is open-source. It means you may use it in your products legally for free. You don’t have to pay royalties to the technology owner.
We work with other media servers and bridges – when not that many functions are needed, or it is an existing product already using another media server:
We compare media servers and bridges regularly as all of them develop. Knowing your needs, we recommend the optimal choice.
Integration of third-party solutions
Third-party solutions are paid: you pay for minutes of usage. The development of a custom video chat is cheaper in the long run.
Their features are also limited to what their developers developed.
They are quicker to integrate and get a working prototype though. If you need to impress investors – we can integrate them. You get your app quicker and cheaper compared to the custom development.
However, to replace it with a custom video chat later – you’ll have to throw away the existing implementation and develop a custom one. So, you’ll pay twice for the video component.
We use these 3 -they are the most reliable ones based on our experience:
Write to us: we’ll help to pick optimal technologies for your video conference.
How much the development of a video conference costs
You’re here – means ready solutions sold as is to integrate into your existing software probably do not suit you and you need a custom one. The cost of a custom one depends on features and their complexity. So we can’t say the price before knowing these.
Take even the log in function as an example. A simple one is just through email and password. A complex one may have a login through Facebook, Google, and others. Each way requires extra effort to implement. So the cost may differ several times. And login is the simplest function for a few work hours. Imagine how much the added complexity will influence the cost of more complex functions. And you’d probably have quite a lot of functions.
Though we can give some indications.
✅ The simplest video chat component takes us 2-4 weeks and costs USD 8000. It is not a fully functioning system with login, subscriptions, booking, etc. – just the video chat with a text chat and screen sharing. You’d integrate it into your website or app and it would receive user info from there.
✅ The simplest fully functional video chat system takes us about 4-5 months and around USD 56 000. It is built from the ground up for one platform – either web or iOS or Android for example. Users register, pick a plan, and use the system.
✅ A big video conferencing solution development is an ongoing work. The 1st release takes about 7 months and USD 280 000. Reach us, let’s discuss your project. After the 1st call, you get an approximate estimation.
In today’s world, mobile communication is everything. We are surrounded by apps for audio and video calls, meetings, and broadcasts. With the pandemic, it’s not just business meetings that have moved from meeting rooms to calling apps. Calls to family, concerts, and even consultations with doctors are all now available on apps.
In this article we’ll cover the features every communication app should have, whether it’s a small program for calls or a platform for business meetings and webinars, and in the following articles, we’ll show you some examples of how to implement them.
Incoming call notification
Apps can send notifications to notify you of something important. There’s nothing more important for a communication app than an incoming call or a scheduled conference that the user forgot about.
So any app with call functionality has to use this mechanism to notify. Of course, we can show the name and the photo of the caller. Also, for the user’s convenience, we can add buttons to answer or reject the call without unnecessary clicks and opening the app.
Incoming call default notification on Android
You can go even further and change the notification design provided by the system.
Incoming call custom notification on Android
However, options for Android devices don’t end here. Show a full-screen notification with your design even if the screen is locked! Read the guide on how to make your Android call notification here.
A notification that does not allow to close the process
The call may take a long time, so the user decides to do something at the same time. He will open another application, for example, a text document. At this moment an unpleasant surprise awaits us: if the system does not have enough resources to display this application, it may simply close ours without a warning! Therefore, the call will be terminated, leaving the user very confused.
Fortunately, there is a way to avoid this by using the Foreground Service mechanism. We mark our application as being actively used by the user even if it is minimized. After that, the application might get closed only in the most extreme case, if the system runs out of resources even for the most crucial processes.
The system, for security reasons, requires a persistent small notification, letting the user know that the application is performing work in the background.
It is essentially a normal notification, albeit with one difference: it can’t be swiped away. You don’t need to worry about accidentally wiping it away, so the application is once again defenseless against the all-optimizing system.
You can do with a very small notification:
Little notification in the panel
It appears quietly in the notification panel, without showing immediately to the user, like an incoming call notification.
Nevertheless, it is still a notification, and all the techniques described in the previous paragraph apply to it – you can add buttons and customize the design
Picture-in-picture for video calls
Now the user can participate in a call or conference call and mind his own business without being afraid that the call will end abruptly. However, we can go even further in supporting multitasking!
If your app has a video call feature, you can show a small video call window (picture-in-picture) for the user’s convenience, even if they go to other app screens. And, starting from Android 8.0, we can show such a window not only in our application but also on top of other applications!
You can also add controls to this window, such as camera switching or pause buttons. Read our guide on PiP here.
Ability to switch audio output devices
An integral part of any application with calls, video conferences, or broadcasts is audio playback. But how do we know from which audio output device the user wants to hear the sound? We can, of course, try to guess for him, but it’s always better to guess and provide a choice. For example, with this feature, the user won’t have to turn off the Bluetooth headphones to turn on the speakerphone
So if you give the user the ability to switch the audio output device at any point in the call, they will be grateful.
The implementation often depends on the specific application, but there is a method that works in almost all cases. We’ve described it here.
A deep link to quickly join a conference or a call
For both app distribution and UX, the ability to share a broadcast or invite someone to a call or conference is useful. But it may happen that the person invited is not yet a user of your app.
Well, that won’t be for long. You can generate a special link that will take those who already have the app directly to the call to which they were invited and those who don’t have the app installed to their platform’s app store. iPhone owners will go to the App Store, and Android users will go to Google Play.
In addition, with this link, once the application is installed, it will launch immediately, and the new user will immediately get into the call to which he was invited! Create your own deep links using our code examples.
Bottom line
We covered the main features of the system that allows us to improve the user experience when using our audio/video apps, from protecting our app from being shut down by the system right during a call, to UX conveniences like picture-in-picture mode.
Of course, every app is unique, with its own tasks and nuances, so these tips are no clear-cut rules. Nevertheless, if something from this list seems appropriate for a particular application, it’s worth implementing.
In the span of the last 10 years, the term “neural networks” has gone beyond the scientific and professional environment. The theory of neural network organization emerged in the middle of the last century, but only by 2012 the computer power has reached sufficient values to train neural networks. Thanks to this their widespread use began.
Neural networks are increasingly being used in mobile application development. The Deloitte report indicates that more than 60% of the applications installed by adults in developed countries use neural networks. According to statistics, Android has been ahead of its competitors in popularity for several years.
Neural networks are used:
to recognize and process voices (modern voice assistants),
to recognize and process objects (computer vision),
to recognize and process natural languages (natural language processing),
to find malicious programs,
to automate apps and make them more efficient. For example, there are healthcare applications that detect diabetic retinopathy by analyzing retinal scans.
What are neural networks and how do they work?
Mankind has adopted the idea of neural networks from nature. Scientists took the animal and human nervous systems as an example. A natural neuron consists of a nucleus, dendrites, and an axon. The axon transitions into several branches (dendrites), forming synapses (connections) with other neuronal dendrites.
Brain neural network
The artificial neuron has a similar structure. It consists of a nucleus (processing unit), several dendrites (similar to inputs), and one axon (similar to outputs), as shown in the following picture:
Artificial neuron connections scheme
Connections of several neurons form layers, and connections of layers form a neural network. There are three main types of neurons: input (receives information), hidden (processes information), and output (presents results of calculations). Take a look at the picture.
Neural network connections scheme
Neurons on different levels are connected through synapses. During the passage through a synapse, the signal can either strengthen or weaken. The parameter of a synapse is a weight – some coefficient can be any real number, due to which the information can change. Numbers (signals) are input, then they are multiplied by weights (each signal has its own weight) and summed. The activation function calculates the output signal and sends it to the output (see the picture).
Neural network function
Imagine the situation: you have touched a hot iron. Depending on the signal that comes from your finger through the nerve endings to the brain, it will make a decision: to pass the signal on through the neural connections to pull your finger away, or not to pass the signal if the iron is cold and you can leave the finger on it. The mathematical analog of the activation function has the same purpose. The activation function allows signals to pass or fail to pass from neuron to neuron depending on the information they pass. If the information is important, the function passes it through, and if the information is little or unreliable, the activation function does not allow it to pass on.
How to prepare neural networks for usage?
How neural network algorithm works
Work with neural nets goes through several stages:
Preparation of a neural network, which includes the choice of architecture (how neurons are organized), topology (the structure of their location relative to each other and the outside world), the learning algorithm, etc.
Loading the input data into a neural network.
Training a neural network. This is a very important stage, without which the neural network is useless. This is where all the magic happens: along with the input data volume fed in, the neuronet receives information about the expected result. The result obtained in the output layer of the neural network is compared with the expected one. If they do not coincide, the neural network determines which neurons affected the final value to a greater extent and adjusts weights on connections with these neurons (so-called error backpropagation algorithm). This is a very simplified explanation. We suggest reading this article to dive deeper into neural network training. Neural network training is a very resource-intensive process, so it is not done on smartphones. The training time depends on the task, architecture, and input data volume.
Checking training adequacy. A network does not always learn exactly what its creator wanted it to learn. There was a case where the network was trained to recognize images of tanks from photos. But since all the tanks were on the same background, the neural network learned to recognize this type of background, not the tanks. The quality of neural network training must be tested on examples that were not involved in its training.
Using a neural network – developers integrate the trained model into the application.
Limitations of neural networks on mobile devices
RAM limitations
Most mid-range and low-end mobile devices available on the market have between 2 and 4 GB of RAM. And usually, 1/3 of this capacity is reserved by the operating system. The system can “kill” applications with neural networks as they run when the RAM limit approaches.
The size of the application
Complex deep neural networks often weigh several gigabytes. When integrating a neural network into mobile software there is some compression, but it is still not enough to work comfortably. The main recommendation for the developers is to minimize the size of the application as much as possible on any platform to improve the UX.
Runtime
Simple neural networks often return results almost instantly and are suitable for real-time applications. However, deep neural networks can take dozens of seconds to process a single set of input data. Modern mobile processors are not yet as powerful as server processors, so processing results on a mobile device can take several hours.
To develop a mobile app with neural networks, you first need to create and train a neural network on a server or PC, and then implement it in the mobile app using off-the-shelf frameworks.
Working with a single app on multiple devices
As an example, a facial recognition app is installed on the user’s phone and tablet. It won’t be able to transfer data to other devices, so neural network training will happen separately on each of them.
Overview of neural network development libraries for Android
TensorFlow
TensorFlow is an open-source library from Google that creates and trains deep neural networks. With this library, we store a neural network and use it in an application.
The library can train and run deep neural networks to classify handwritten numbers, recognize images, embed words, and process natural languages. It works on Ubuntu, macOS, Android, iOS, and Windows.
To make learning TensorFlow easier, the development team has produced additional tutorials and improved getting started guides. Some enthusiasts have created their own TensorFlow tutorials (including InfoWorld). You can read several books on TensorFlow or take online courses.
We mobile developers should take a look at TensorFlow Lite, a lightweight TensorFlow solution for mobile and embedded devices. It allows you to do machine learning inference on the device (but not training) with low latency and small binary size. TensorFlow Lite also supports hardware acceleration using the Android neural network API. TensorFlow Lite models are compact enough to run on mobile devices and can be used offline.
TensorFlow architecture
TensorFlow Lite runs fairly small neural network models on Android and iOS devices, even if they are disabled.
The basic idea behind TensorFlow Lite is to train a TensorFlow model and convert it to the TensorFlow Lite format. The converted file can then be used in a mobile app.
TensorFlow Lite consists of two main components:
TensorFlow Lite interpreter – runs specially optimized models on cell phones, embedded Linux devices, and microcontrollers.
TensorFlow Lite converter – converts TensorFlow models into an efficient form for usage by the interpreter, and can make optimizations to improve performance and binary file size.
TensorFlow Lite is designed to simplify machine learning on mobile devices themselves instead of sending data back and forth from the server. For developers, machine learning on the device offers the following benefits:
response time: the request is not sent to the server, but is processed on the device
privacy: the data does not leave the device
Internet connection is not required
the device consumes less energy because it does not send requests to the server
Firebase ML Kit
TensorFlow Lite makes it easier to implement and use neural networks in applications. However, developing and training models still requires a lot of time and effort. To make life easier for developers, the Firebase ML Kit library was created.
The library uses already trained deep neural networks in applications with minimal code. Most of the models offered are available both locally and on Google Cloud. Developers can use models for computer vision (character recognition, barcode scanning, object detection). The library is quite popular. For example, it is used in:
Yandex.Money (a Russian e-commerce system) to recognize QR codes;
FitNow, a fitness application that recognizes texts from food labels for calorie counting;
TutboTax, a payment application that recognizes document barcodes.
ML Kit also has:
language detection of written text;
translation of texts on the device;
smart message response (generating a reply sentence based on the entire conversation).
In addition to methods out of the box, there is support for custom models.
What’s important is that you don’t need to use any services, APIs, or backend for this. Everything can be done directly on the device – no user traffic is loaded and developers don’t need to handle errors in case there is no internet connection. Moreover, it works faster on the device. The downside is the increased power consumption.
Developers don’t need to publish the app every time after updates, as ML Kit will dynamically update the model when it goes online.
The ML Kit team decided to invest in model compression. They are experimenting with a feature that allows you to upload a full TensorFlow model along with training data and get a compressed TensorFlow Lite model in return. Developers are looking for partners to try out the technology and get feedback from them. If you’re interested, sign up here.
Since this library is available through Firebase, you can also take advantage of other services on that platform. For example, Remote Config and A/B testing make it possible to experiment with multiple user models. If you already have a trained neural network loaded into your application, you can add another one without republishing it to switch between them or use two at once for the sake of experimentation – the user won’t notice.
Problems of using neural networks in mobile development
Developing Android apps that use neural networks is still a challenge for mobile developers. Training neural networks can take weeks or months since the input information can consist of millions of elements. Such a serious workload is still out of reach for many smartphones.
Check to see if you can’t avoid having a neural network in a mobile app if:
there are no specialists in your company who are familiar with neural networks;
your task is quite non-trivial, and to solve it you need to develop your own model, i.e. you cannot use ready-made solutions from Google, because this will take a lot of time;
the customer needs a quick result – training neural networks can take a very long time;
the application will be used on devices with an old version of Android (below 9). Such devices do not have enough power.
Conclusion
Neural networks became popular a few years ago, and more and more companies are using this technology in their applications. Mobile devices impose their own limitations on neural network operation. If you decide to use them, the best choice would be a ready-made solution from Google (ML Kit) or the development and implementation of your own neural network with TensorFlow Lite.