Categories
Uncategorized

Why should Android developers start building AR apps before 2024?

ar-android-development

The phrase “augmented reality” or AR has long been on everyone’s lips and is used in many areas of life. AR is being actively implemented in mobile applications as well. A large part of the AR market is occupied by entertainment applications. Remember the PokemonGo fever of 2016? However, entertainment is not the only area with AR. Tourism, medicine, education, healthcare, retail, and other areas also actively use AR. According to studies, by the end of 2020, there were almost 600 million active users of mobile apps with AR. By 2024, a nearly three-fold growth (1.7 billion) is predicted, and the amount of revenue from such applications is estimated at $ 26 billion. The future is very close! 

That’s why in this article we’ll consider several popular tools for Android mobile app development with AR functionality, their pros and cons.

History of AR

It’s been quite a long time since the advent of AR technology and its implementation in smartphones. It was originally part of VR. In 1961, Philco Corporation (USA) developed the first Headsight virtual reality helmets. Like most inventions, they were first used for the needs of the Department of Defense. Then the technology evolved: there were various simulators, virtual helmets, and even goggles with gloves. Their distribution was not widespread, but these technologies interested NASA and the CIA. In 1990, Tom Codell coined the term “Augmented reality”. We can say that from that moment on, AR became separate from VR. In the ’90s, there were many interesting inventions: an exoskeleton, which allowed the military to virtually control cars, gaming platforms. In 1993, Sega developed the Genesis game console. However, this product did not become mass-market: users were recorded nausea and headaches during games.  The high cost of devices, scarce technical equipment, and side effects forced people to forget about VR and AR technologies in the mass segment for a while. In 1994, AR made its way into the arts for the first time with a theater production called Dancing in Cyberspace. In it, acrobats danced in virtual space. 

In 2000, in the popular game Quake, thanks to the virtual reality helmet, it became possible to chase monsters in the street. This may have inspired the future creators of the game Pokemon Go. Until the 2010s, attempts to bring AR to the masses were not very successful. 

In the 2010s, quite successful projects appeared: MARTA (an application from Volkswagen that gives step-by-step recommendations on car repair and maintenance) and Google Glass glasses. At the same time, the introduction of AR in mobile applications begins: Pokemon Go, IKEA Place, the integration of AR in various Google applications (Translate, Maps, etc.), the introduction of filters in Instagram, etc. Currently, there are more and more mobile applications with AR and their use is spreading not only in the field of entertainment.

What is AR and how it works on a smartphone

Essentially, AR is based on computer vision technology. It all starts with a device that has a camera on it. The camera scans an image of the real world. That’s why when you run most AR apps, you’re first asked to move the camera around in space for a while. Then the pre-installed AR engine analyzes this information and builds a virtual world based on it, in which it places an AR object or several objects (picture, 3D model, text, video) on the background of the original image. AR objects can be pre-stored in the phone memory or can be downloaded from the Internet in real-time. The application remembers the location of the objects, so the position of the objects does not change when the smartphone moves unless it is specifically provided by the application functionality. Objects are fixed in space with special markers – identifiers. There are 3 main methods for AR technology to work:

  • Natural markers. A virtual grid is superimposed on the surrounding world. On this grid, the AR engine identifies anchor points, which determine the exact location to which the virtual object will be attached in the future. Benefit: Real-world objects serve as natural markers. No need to create markers programmatically.
  • Artificial markers. The appearance of the AR object is tied to some specific marker created artificially, such as the place where the QR code was scanned. This technology works more reliably than with natural markers.
  • Spatial technology. In this case, the position of the AR object is attached to certain geographical coordinates. GPS/GLONASS, gyroscope, and compass data embedded in the smartphone are used.

Tools for AR in Android

Google ARCore

The first thing that comes to mind is Google’s ARCore. ARCore isn’t an SDK, but a platform for working with AR. So you have to additionally implement the graphical elements that the user interacts with. This means that we can’t do everything with ARCore alone, and we need to implement technologies to work with graphics.

There are several solutions for this. 

If you want to use Kotlin:

  • Until recently, you could use Google’s dedicated Sceneform SDK. But in 2020, Google moved Sceneform to the archive and withdrew further support for it. Currently, the Sceneform repository is maintained by enthusiasts and is available here. It must be said that the repository is updated quite frequently. However, there is still a risk of using technology that is not supported by Google.
  • Integrate OpenGL into the project. OpenGL is a library written in C++ specifically to work with graphical objects. Android provides an SDK to work with OpenGL to run on Kotlin and Java. This option is suitable if your developers know how to work with OpenGL or can figure it out quickly (which is a non-trivial task). 

If you want to use something that isn’t Kotlin:

  • Android NDK. If your developers know C++, they can use the Android NDK for development. However, they will also need to deal with graphics there. The OpenGL library already mentioned will be suitable for this task.
  • Unreal Engine. There is nothing better for dealing with graphics than game engines. Unfortunately, ARCore is no longer supported by the Unity SDK, but Unreal Engine developers can still develop applications.

Vuforia

Vuforia is developed by PTC. Another popular tool for developing AR applications is Vuforia. Vuforia can work with normal 2D and 3D objects as well as video and audio, unlike ARCore. You can create virtual buttons, change the background, and control occlusion. It’s a state where one object is slightly hidden by another.

Fun fact: using Vuforia, a developer can turn on ARCore under the hood. Furthermore, the official Vuforia documentation recommends that you do this. That is, while running the application, Vuforia will check to see if it is possible to use ARCore on the device and if so, it will do so. 

Unfortunately, bad news again for Kotlin fans. Vuforia can only be used in C or Unity. Also, the downside is that if you plan to publish your application for commercial purposes, you will have to buy a paid version of Vuforia (Vuforia prices). 

It works with Android 6 and up, and there is a list of recommended devices.

ARToolKit

ARToolKit is a completely free open-source library for working with AR. Its features are:

  • support for Unity3D and OpenSceneGraph graphics libraries
  • support for single and dual cameras simultaneously
  • GPS support
  • ability to create real-time applications
  • integration with smart glasses
  • multi-language support
  • automatic camera calibration

This library is completely free. However, the documentation leaves a lot to be desired. The official website does not respond to clicks on menu items. Apparently, ARToolKit supports Android development on Unity. Using this library is quite risky.

MAXST 

A popular solution from Korea. It has very detailed documentation. There is an SDK to work with 2D and 3D objects. Available in Java and Unity. In Java, you need to additionally implement the work with graphics. The official website states that the SDK works on Android from version 4.3, which is a huge plus for those who want to cover the maximum number of devices. The documentation is quite detailed. However, this SDK is payable if you plan to publish the app. The prices are here.

Wikitude 

Development by an Austrian company that was recently taken over by Qualcomm. Allows you to recognize and track 2D and 3D objects, images, scenes and work with geodata, there is integration with smart glasses. There is a Java SDK (you need to additionally implement the work with graphics), as well as Unity and Flutter. This solution is paid, but you can try the free version for 45 days.

Conclusion

Now there is a choice of frameworks to develop AR applications for Android. Of course, there are many more, but I have tried to collect the most popular ones. To make it easier to compare the solutions listed above, I presented their brief characteristics in a table. I hope this will help you with your choice. May Android be with you. Fora Soft develops VR/AR applications! Have a look at our portfolio, look at Super Power FX, Anime Power FX, UniMerse. Want to have your own AR? Contact us, our technically-savvy sales team will be happy to answer all your questions.

Categories
Uncategorized

Anthony from Speakk, ‘You are so much more competent than other developers.’

We created Speakk, a chat and voice messenger for South Africa that doesn’t consume internet data. Here’s our interview with Anthony, Speakk CEO. He says why he chose Fora Soft over many other companies, how we overcame difficulties with the app, and how Fora solves issues.

Watch customer video testimonial on software development

Tell me about Speakk. What is it?

Speakk is an innovative app in South Africa. People chat and send messages and voicemails there. Similar to WhatsApp. However, users don’t pay for any of the mobile data used. Data is very expensive. Millions of South Africans walk around with smartphones but they can’t afford to use them just because of how expensive data is. That’s why we developed Speakk. For them to not pay for data used.

How it works is a simple chat app like WhatsApp. Sign up, send text messages, voice messages. We have ads, so every 8 messages or so you’ll see an ad popping up. So we pay for the data and we make money through advertisements.

How many users?

When we signed up, we got nearly 100k users over a month. Then we had a slight change in business plans. We built another app using that technology. This new app was for the educational market. We had great opportunities with COVID. Fortunate negatives, so to speak. There were many public schools in SAR where kids had no opportunity of communicating with teachers in lockdown. So we used our existing technology for that market as well. Fora Soft helped us do that.

Was Fora Soft your first choice?

No, it wasn’t. We considered many software development companies, both locally in SAR and overseas. What was very interesting for us, we gave a very simple brief to developers. We wanted to see how they understood the brief and base on that, what technologies they’d recommend and how it would cost.

Out of all the companies we’ve contacted, Fora was the only company to really grasp the challenge of what we’re doing and give us an accurate quote.

That’s why you ended up with us, right?

Yeah. Hundred percent. So, it was really because of the technical competence, we landed up being impressed by a number of other aspects working with them, but initially, you were much more technically competent than any of the other developers we’d spoken to. We had a look at your portfolio of existing work, which was not only quite wide and quite varied across different industries, but it did overlap somewhat in what we were doing as well. So, you had the experience in the space that we are in as well, which helped.

Can you please share your “before” and “after” working with us?

We don’t really have it before and after, because we’ve been partnering with Fora Soft from the beginning of this project. So really we worked with Fora Soft for the minimum viable product. We worked with them to create something that would be as light as possible but would still work. And that was the first version of the product that we created with Fora. We then evolved the product and we moved on to new products. The relationship has evolved as the products have evolved, but there wasn’t any before Fora Soft. They were really the beginning of the project for us. 

Are there any measurable figures that you could share with us that can be disclosed? Like we’ve talked about a number of users, maybe a number of crashes, revenue numbers?

Because our business has changed slightly, it’s difficult to share a lot of that, but we could say that there have been millions of messages sent in our app since we started. We accumulated that really quickly. We were a trending app on Google Play Store for quite a while. We had, at one stage, many thousands of downloads every single day. I can’t think of any other metrics that we could share. The app grew very quickly at the beginning.

Were there any difficulties while developing the program? If yes, able to overcome those difficulties with Fora Soft?

Yeah. Like any project, we had a number of difficulties. One of the big challenges we had to overcome was that the app uses this reverse billing technology, which has very, very specific technical requirements. It led to a number of issues at the beginning of the project which were unforeseen on our side and on the Fora Soft side. But it’s something that we did work with Fora Soft over a number of months to get through.

We ended up with a much more sustainable, robust product at the end that we are quite proud of. That version works better. 

I don’t know if you want me to talk about this, but the other issue we had is that we did have some issues with Fora Soft in terms of the initial specifications of some new projects that we worked on. There were some of the features that were underspecified that led to us underbudgeting the project. It had a number of ramifications for the business. Fora Soft was very apologetic about that. They rectified the billing we were on, on some kind of an agreement to help us through some of the budgetary constraints. Then going forward, we didn’t have that problem again. I think you realized where the error came from and you were very careful about that going forward.

Professionalism, determination, and communication are very important when it comes to any IT project. With us working with you and you working with us, could you rate us on the scale of 10 on those criteria and maybe add some other criteria that you deem necessary?

So in terms of professionalism, Fora Soft was very professional throughout.

We’ve dealt with many different people in the organization and everyone we’ve dealt with has been great, very professional. Obviously, in a long-term working relationship like we’ve had there, there have been one or two issues and those issues had been resolved very, very quickly. I’d say Fora has been incredibly professional in terms of communication as well. We were concerned that our company is based in South Africa. Fora is based in Russia. There would be language constraints and language issues in dealing with Fora. This didn’t prove to be a problem at all. We set up a Slack channel to communicate with our project manager when we’re in the middle of the big dev cycles,. We were speaking to those project managers throughout the day on a daily basis over Slack.

The communication was pretty much flawless.

When we did need to have a face-to-face meeting, we hopped on Skype and we were able to look each other in the eye and to speak a bit more casually. Communication hasn’t been an issue at all. You know, I think the only communication issues that were introduced every now and again was just due to some of the technical requirements that we had on our side that are very unique to the mobile environment that we are working in. I must say, just on that point, I was very impressed that South Africa has a very unique set of mobile users. We have people walking around with the latest and greatest smartphones. And we have people walking around with cell phones that are many, many years old.

Would you turn to Fora Soft with any other project or maybe recommend us to somebody else?

Yeah, yeah, definitely. And we have done other projects with Fora Soft and we have recommended Fora Soft to others as well. So not only would we do it, but we have done it already.

Categories
Uncategorized

How to Implement Foreground Service and Deep Links for Android apps with calls? With Code Examples

android-app

Let’s take a look at 2 more UX conveniences for the Android caller application. First, let’s make sure that the app continues to function normally after minimizing or locking the screen with Android Foreground Services. After that, let’s see how we can implement direct links to a call or conference with Deep Links. By clicking on them, the smartphone users will be taken directly to the call.

How to create a Foreground Service on Android

Today’s smartphones and their operating systems have many built-in optimizations aimed at extending battery life. And mobile app developers need to keep in mind the potential actions the system can take on the app. 

A prime example is freeing up resources and closing apps that the user is not actively interacting with at the moment. In this case, the system considers only the app that is currently displayed on the user’s screen to be “actively used”. All other running applications can be closed at any time if the system does not have enough resources for the actively used one. Thanks to this, we can open an infinite number of applications and not explicitly close them — the system will close the old ones, and when we return to them, the application will run again.

In general, this mechanism is convenient and necessary on mobile devices. But we want to bypass this restriction so that the call is protected from sudden closure by the system. Fortunately, it is possible to “mark” a part of the application as actively used, even if it is not displayed anymore. To do this, we use the Foreground Service. Note that even this does not give full protection from the system — but it increases the “priority” of the application in the eyes of the system and also allows you to keep some objects in memory even if `Activity` is closed.

Add permission to run such Android services:

<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />

Let’s implement our service itself. In its simplest form it’s just a subclass Service, which has a link to our `CallManager` (so it won’t be cleaned up by garbage collector):

class OngoingCallService : Service() {

    @Inject
    lateinit var abstractCallManager: AbstractCallManager

    // Implementation of an abstract method; we won’t use Bind so just return null
    override fun onBind(intent: Intent): IBinder? = null

}

Service is an application component and, like Activity, must be specified in `AndroidManifest.xml`:

<service
    // Class name of our service
    android:name=".OngoingCallService"
    android:enabled="true"
    // This flag meant that other applications can’t run this service
    android:exported="false"
    // Declare a type of our service
    android:foregroundServiceType="microphone|camera|phoneCall" />

Our Android Foreground Service starts up a bit differently than regular services:

private fun startForegroundService() {
    val intent = Intent(this, OngoingCallService::class.java)
    ContextCompat.startForegroundService(this, intent)
}

On Android versions above 8, the Foreground Service must call the startForeground method within a few seconds, otherwise, the application is considered to be hung (ANR). It is necessary to pass a notification to this method because, for security reasons, the presence of such services should be visible to the user (if you do not know or have forgotten how to create notifications, you can refresh your memory in one of our previous articles about call notifications on Android):

val notification = getNotification()
startForeground(ONGOING_NOTIFICATION_ID, notification)

Everything that we wrote in the previous article about notifications applies to this notification — you can update it with the list of call participants, add buttons to it, or change its design completely. The only difference is that this notification will be `ongoing` by default and users won’t be able to “swipe” it.

When the call is over – the service must be stopped, otherwise, the application can be completely closed only through the settings, which is very inconvenient for users. Our service is stopped in the same way as usual Android services:

private fun stopForegroundService() {
    val intent = Intent(this, OngoingCallService::class.java)
    stopService(intent)
}

Starting and stopping a service is very convenient to implement if CallManager has a reactive field to monitor the status of the call, for example:

abstractCallManager.isInCall
    .collect { if (it) startForegroundService() else stopForegroundService() }

This is the whole implementation of the service, which will allow to some extent protect our minimized application from being closed by the system.

Android Deep Links Tutorial

An extremely user-friendly feature that simplifies the growth of the user base of the app is the links to a certain place in the app. If the user doesn’t have the app, the link opens a page on Google Play. In the context of call apps, the most successful use case is the ability to share a link to a call / meeting / room. The user wants to talk to someone, throws the link to the person he’s talking to, that person downloads the app, and then gets right into the call — what could be more convenient?

The links themselves to a particular location in the application are supported by the system without any additional libraries. But in order for the link to “survive” the installation of the application, we need to ask for help from Firebase Dynamic Links.

Let’s concentrate on the implementation of links handling in the application and leave their creation to backend developers.

So, the Android deep links with code examples. First, let’s add the library:

dependencies {
    implementation 'com.google.firebase:firebase-dynamic-links:20.1.1'
}

To the user, deep links are ordinary links that he clicks on. But before opening a link in the browser, the system looks through the registry of applications and finds those that have declared that they handle links of this domain. If such an application is found – instead of opening in the browser, it launches the same application and the link is passed to it. If there is more than one such application – the system window will be shown with a list where the user can choose which application to open the link with. If you own the link domain, you can protect yourself from opening such links by other applications while yours is installed.

To declare the links that our app can handle, we need to add our `Activity` an intent-filter in `AndroidManifest.xml`:

<activity ...>
    <intent-filter>
        // These action and category notify the system that we can “display” the links
        <action android:name="android.intent.action.VIEW"/>
        <category android:name="android.intent.category.DEFAULT"/>
        <category android:name="android.intent.category.BROWSABLE"/>
        // Description of the link which we can handle. In this case these are the links starting from calls://forasoft.com
        <data
            android:host="forasoft.com"
            android:scheme="calls"/>
    </intent-filter>
</activity>

When the user clicks the Dynamic Link and installs the application (or clicks on the link having the app already installed), the Activity will launch which is indicated as this link’s handler. In this Activity, we can get the link this way:

Firebase.dynamicLinks
        .getDynamicLink(intent)
        .addOnSuccessListener(this) { data ->
            val deepLink: Uri? = data?.link
        }

When using regular deep links, the data becomes a bit simpler:

val deepLink = intent?.data

That’s all, now all we have left is getting the parameters that interest us from the link and carrying out the actions in your application that are necessary to connect to the call:

    val meetindId = deepLink?.getQueryParameter("meetingid")
        if (meetingId != null) abstractCallManager.joinMeeting(meetingId)

Conclusion

In the final article of our cycle “what each application with calls should have” we’ve gone through keeping our application alive after minimizing it and using the deep links as a convenient option for invitations to the call. Now you know all the mechanisms that make the user experience better not only inside the application but also at the system level.

Read other articles from this cycle:

What Every Android App With Calls Should Have

How to Make a Custom Call Notification on Android? With Code Examples

How to Make Picture-in-Picture Mode on Android With Code Examples

How to Implement Audio Output Switching During the Call on Android App?

Reach out to us to develop your app with calls, either for Android or other platforms 🙂

Categories
Uncategorized

Fora Soft’s CEO Nikolay Sapunov Interview to GoodFirms: Focusing on Narrow Specialization in Video and Multimedia Software

Interview-with-the-CEO
Interview with the CEO – Software development company

Incorporated in 2005, Fora Soft develops e-learning, telemedicine, and video surveillance software. They augment reality, launch Internet TV, identify objects on video, and do not build anything else.

What is Fora Soft: narrow specialization in video and multimedia software since the very beginning

From multimedia avatar-based text chats, they proceeded to video communication. They were pioneers there when a few accomplished videos were the future. Fora Soft developed the 1st video chat for the most comprehensive social network in Russia, vk.com – when even Facebook didn’t think of video chatting. The team was operating with Cirrus and Stratus technologies back then which are not in use currently. That chat named Webca was trendy, with more than 1 million users.

Fora Soft entered the international market in 2010 with a video review project for the Healthcare for a U.S. entrepreneur. Since then, they’ve been operating on all things multimedia. Imagine thousands of people utilizing your video program, and you can even protect a human’s life which is a fantastic feeling.

Fora Soft is a one-stop shop where people receive services of a complete software development cycle. Customers do not have to explore different contractors to develop their web, mobile, and desktop applications. The team helps customers with everything from graphic design to programming.

The GoodFirms team interacts with Nikolay Sapunov, the CEO of Fora Soft to know more about the company and its services. Nikolay mentions that “As a CEO, I’m accountable for strategic planning and coordinating all the departments. I also carry out all final job interviews before we send an offer”. 

How Fora Soft started

Talking about the company’s inception idea, Nikolay says that it started with a passion for computers and technologies and not a wish to earn. “As a child, I fantasized about a laptop and was delighted when I bought one from the first salary at a plant where I worked during my school years in the summer. Since that time, I started learning how to administrate it, then how to code”. 

Nikolay started as a .NET software developer and then started organizing project teams as a Project manager. “While still studying at university, my friend and I started offering software development services. In 2005 we developed the first multimedia chat – a cartoon world where you pick a character, walk, and text chat with whoever you meet. From that moment, we start counting Fora Soft history”.

In 2010 Fora Soft entered an international market with the first order from the US: they supported an entrepreneur transform his business online. He used to send interpreters to hospitals on foot when a patient did not speak English. There were a few dozens interpreters in one city in Wisconsin. Fora Soft developed a video chat for him, and he started employing interpreters from all over the world. Now his business has 740 delegates serving 670 doctor offices.

Fora Soft now: among the leaders in mobile and web development on Goodfirms

Fora Soft’s experts design apps for mobile devices by using extensive experience in video apps gained over 16 years. They ensure comprehensive compliance with iOS and Android guidelines so that your users will encounter the quality they expect.

Before plunging into the advancement of your app, business analysts and project managers will devise a detailed engineering plan. This ensures that your app will adhere to your exact needs, budget, and desired timeframe.

Thus, delivering the pixel-precise product customers envisioned and ensuring it aligns and complies with the highest standards of all the leading digital stores endows Fora Soft to lead as one of the preeminent mobile app development service providers in Russia at GoodFirms.

The review obtained at GoodFirms reflects the potential of developers at Fora Soft.

Nikolay mentions that the value of portraying business on the Internet is indisputable. Yet, the appearance itself doesn’t guarantee anything. We help to gain success with a clear plan, best practices, and the most advanced technology.

As a top web development company, Fora Soft has the ability to meet the customer needs of any industry. They are a multifaceted business solutions provider and are proud to serve clients from a vast range of industry verticals.

Thus, aligning your business goals with relevant technology, timeline, and budget endows Fora Soft to lead as one of Russia’s flourishing website development companies at GoodFirms.

The review given by Anastasia to GoodFirms proves the quality of service offerings rendered by Fora Soft.

In conclusion, Nikolay mentions that they embrace change. “In the industry in which we create a vital part, change occurs every day. We work with small and functional teams for each component of the processes, which react with an expedition to the changes happening in the industry”. 

Moreover, he mentions that Fora Soft trusts employees with insight into the organization’s direction and delegates them with the power to make decisions. Mistakes are necessary for this process. But unless you make mistakes, you never discover.

Lastly, Nikolas shares that Fora Soft’s vision is to become a synonym of video and multimedia software development worldwide. If someone thinks that a video or multimedia application is required, then Fora Soft should be the first name to come to mind.

Thus, having read the summarized narration from Nikolay’s interview, one can also go through the interview published at GoodFirms.

About GoodFirms

Washington, D.C.-based GoodFirms is a maverick B2B research and reviews firm that aligns its efforts in finding web development and mobile app development service agencies delivering unparalleled services to its clients. GoodFirms’ extensive research process ranks the companies, boosts their online reputation and helps service seekers pick the right technology partner that meets their business needs.

Categories
Uncategorized

How to Implement Audio Output Switching During the Call on Android App?

android-audio-output-tutorial
Automatically change your audio output on Android app

Seamless and timely switching between the sound output devices on Android is a feature that is usually taken for granted, but the lack of it (or problems with it) is very annoying. Today we will analyze how to implement such switching in Android ringtones, starting from the manual switching by the user to the automatic switching when headsets are connected. At the same time, let’s talk about pausing the rest of the audio system for the duration of the call. This implementation is suitable for almost all calling applications since it operates at the system level rather than the call engine level, e.g., WebRTC.

Audio output device management

    All management of Android sound output devices is implemented through the system’s `AudioManager`. To work with it you need to add permission to `AndroidManifest.xml`:

<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />

    First of all, when a call starts in our app, it is highly recommended to capture the audio focus — let the system know that the user is now communicating with someone, and it is best not to be distracted by sounds from other apps. For example, if the user was listening to music, but received a call and answered — the music will be paused for the duration of the call.

    There are two mechanisms of audio focus request — the old one is deprecated, and the new one is available since Android 8.0. We implement for all versions of the system:

// Receiving an AudioManager sample
val audioManager = context.getSystemService(Context.AUDIO_SERVICE) as AudioManager
// We need a "request" for the new approach. Let's generate it for versions >=8.0 and leave null for older ones
@RequiresApi(Build.VERSION_CODES.O)
private fun getAudioFocusRequest() =
   AudioFocusRequest.Builder(AudioManager.AUDIOFOCUS_GAIN).build()


// Focus request
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
    // Use the generated request
    audioManager.requestAudioFocus(getAudioFocusRequest())
} else {
    audioManager.requestAudioFocus(
        // Listener of receiving focus. Let's leave it empty for the sake of simpleness
        { },
        // Requesting a call focus
        AudioAttributes.CONTENT_TYPE_SPEECH,
        AudioManager.AUDIOFOCUS_GAIN
    )
}

    It is important to specify the most appropriate `ContentType` and `Usage` — based on these, the system determines which of the custom volume settings to use (media volume or ringer volume) and what to do with the other audio sources (mute, pause, or allow to run as before).

val savedAudioMode = audioManager.mode
val savedIsSpeakerOn = audioManager.isSpeakerphoneOn
val savedIsMicrophoneMuted = audioManager.isMicrophoneMute

        Great, we’ve got audio focus. It is highly recommended to save the original AudioManager settings right away before changing anything – this will allow us to restore it to its previous state when the call is over. You should agree that it would be very inconvenient if one application’s volume control would affect all the others

        Now we can start setting our defaults. It may depend on the type of call (usually audio calls are on “speakerphone” and video calls are on “speakerphone”), on the user settings in the application or just on the last used speakerphone. Our conditional app is a video app, so we’ll set up the speakerphone right away:

// Moving AudioManager to the "call" state
audioManager.mode = AudioSystem.MODE_IN_COMMUNICATION
// Enabling speakerphone
audioManager.isSpeakerphoneOn = true

 Great, we have applied the default settings. If the application design provides a button to toggle the speakerphone, we can now very easily implement its handling:

audioManager.isSpeakerphoneOn = !audioManager.isSpeakerphoneOn

Monitoring the connection of headphones

        We’ve learned how to implement hands-free switching, but what happens if you connect headphones? Nothing, because `audioManager.isSpeakerphoneOn` is still `true`! And the user, of course, expects that when headphones are plugged in, the sound will start playing through them. And vice versa — if we have a video call, then when we disconnect the headphones the sound should start playing through the speakerphone. 

        There is no way out, we have to monitor the connection of the headphones. Let me tell you right away, the connection of wired and Bluetooth headphones is tracked differently, so we have to implement two mechanisms at once. Let’s start with wired ones and put the logic in a separate class:

class HeadsetStateProvider(
    private val context: Context,
    private val audioManager: AudioManager
) {
    // The current state of wired headies; true means enabled
    val isHeadsetPlugged = MutableStateFlow(getHeadsetState())

    // Create BroadcastReceiver to track the headset connection and disconnection events
    private val receiver = object : BroadcastReceiver() {
        override fun onReceive(context: Context?, intent: Intent) {
            if (intent.action == AudioManager.ACTION_HEADSET_PLUG) {
                when (intent.getIntExtra("state", -1)) {
                    // 0 -- the headset is offline, 1 -- the headset is online
                    0 -> isHeadsetPlugged.value = false
                    1 -> isHeadsetPlugged.value = true
                }
            }
        }
    }

    init {
        val filter = IntentFilter(Intent.ACTION_HEADSET_PLUG)
        // Register our BroadcastReceiver
        context.registerReceiver(receiver, filter)
    }

    // The method to receive a current headset state. It's used to initialize the starting point.
    fun getHeadsetState(): Boolean {
        val audioDevices = audioManager.getDevices(AudioManager.GET_DEVICES_OUTPUTS)
        return audioDevices.any {
            it.type == AudioDeviceInfo.TYPE_WIRED_HEADPHONES
                    || it.type == AudioDeviceInfo.TYPE_WIRED_HEADSET
        }
    }
}

   In our example, we use `StateFlow` to implement subscription to the connection state, but instead, we can implement, for example, `HeadsetStateProviderListener`

        Now just initialize this class and observe the `isHeadsetPlugged` field, turning the speaker on or off when it changes:

headsetStateProvider.isHeadsetPlugged
    // If the headset isn't on, speakerphone is.
    .onEach { audioManager.isSpeakerphoneOn = !it }
    .launchIn(someCoroutineScope)

Bluetooth headphones connection monitoring

            Now we implement the same monitoring mechanism for such Android sound output devices as Bluetooth headphones:

class BluetoothHeadsetStateProvider(
    private val context: Context,
 private val bluetoothManager: BluetoothManager
) {

    val isHeadsetConnected = MutableStateFlow(getHeadsetState())

    init {
        // Receive the adapter from BluetoothManager and install our ServiceListener
        bluetoothManager.adapter.getProfileProxy(context, object : BluetoothProfile.ServiceListener {
            // This method will be used when the new device connects
            override fun onServiceConnected(profile: Int, proxy: BluetoothProfile?) {
                // Checking if it is the headset that's active
                if (profile == BluetoothProfile.HEADSET)
                    // Refreshing state
                    isHeadsetConnected.value = true
            }

            // This method will be used when the new device disconnects
            override fun onServiceDisconnected(profile: Int) 
                if (profile == BluetoothProfile.HEADSET)
                    isHeadsetConnected.value = false
            }
        // Enabling ServiceListener for headsets
        }, BluetoothProfile.HEADSET)
    }

    // The method of receiving the current state of the bluetooth headset. Only used to initialize the starting state
    private fun getHeadsetState(): Boolean {
        val adapter = bluetoothManager.adapter
        // Checking if there are active headsets  
        return adapter?.getProfileConnectionState(BluetoothProfile.HEADSET) == BluetoothProfile.STATE_CONNECTED
    }

}

To work with Bluetooth, we need another resolution:

<uses-permission android:name="android.permission.BLUETOOTH" /> 

    And now to automatically turn on the speakerphone when no headset is connected, and vice versa when a new headset is connected:

combine(headsetStateProvider.isHeadsetPlugged, bluetoothHeadsetStateProvider.isHeadsetPlugged) { connected, bluetoothConnected ->
    audioManager.isSpeakerphoneOn = !connected && !bluetoothConnected
}
    .launchIn(someCoroutineScope)

Tidying up after ourselves.

When the call is over, the audio focus is no longer useful to us and we have to get rid of it. Let’s restore the settings we saved at the beginning:

audioManager.mode = savedAudioMode
audioManager.isMicrophoneMute = savedIsMicrophoneMuted
audioManager.isSpeakerphoneOn = savedIsSpeakerOn

And now, actually, let’s give away the focus. Again, the implementation depends on the system version:

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
    audioManager.abandonAudioFocusRequest(getAudioFocusRequest())
} else {
    // Listener для простоты опять оставим пустым
    audioManager.abandonAudioFocus { }
}

Other articles about Android calls

How to Make Picture-in-Picture Mode on Android With Code Examples

WebRTC on Android

How to Make a Custom Call Notification on Android? With Code Examples

What Every Android App With Calls Should Have

Bottom line

        Great, here we have implemented the perfect UX of switching between Android sound output devices in our app. The main advantage of this approach is that it is almost independent of the specific implementation of calls: in any case, the played audio will be controlled by `AudioManager’, and we control exactly at its level!

Categories
Uncategorized

Naseem from MobyTap, ‘You guys know what you’re doing.’

Our copywriter Nikita talked to Naseem, the CEO & Founder of MobyTap, a video review platform. On MobyTap, businesses can show how much they love their customers, and how much their customers love them back. Naseem came to us in 2016, and after 5 years of cooperation, he definitely has something to tell us!

Software Development Reviews – MobyTap

Was Fora Soft your first choice?

Amazingly it was Vadim (the Sales Head at Fora Soft) who found me. I was on an online auction platform; I had lots of offers, and I chose my company. But when Vadim approached me, I was like, okay, let’s chat. His personality and everything he did was professional. I thought ‘even though we’ve done the contract with somebody else, why not.’ He asked me to give Fora Soft an opportunity, and I did. And that’s where the ball got rolling.

So, it was Vadim who swooped in and won you with his personality, right?

Yeah, he’s shown me some examples of the work you’ve done, I was really impressed with that.

Share any before and after working with us?

It was my 1st time developing an app. I’ve been in the recycling business for the past 15 years. So when the idea came to me, I thought recycling and video apps are worlds apart. But since my customers love what we do, that’s the easiest way to get feedback and reviews. And I thought, yeah, make an app. Sounds simple, but when you get into it, you realize that it’s not as simple.

Can you share any measurable figures, such as revenue, number of crashes, etc?

Not many crashes. It’s just been a continuous improvement. So all we’ve been doing is improving the app, making it better. And your team is fabulous. It was all about communication, and getting my message to you. You guys made life very easy, and that’s what the business is about. It took a couple of years but the job was well done, compared to the other company I had used. I literally had to cancel them and let you do the whole app.

Hands down, you guys know what you’re doing.

Those guys took a lot of time and messed up. And you guys clearly showed that you’re professional in what you do. It makes me very happy.

Thank you for your kind words. Were there any difficulties while working with us?

The only difficulty was the technology wasn’t there, so we were doing things that were 4-5 years ahead. Things that Google was catching up with. We got stuck where we needed to make it simpler for the users to input the domain name of any company in the world. If you’re doing a review, the app would find where you’re located and find the local business you’re reviewing. Nobody in the world had done that. And Google, just that year, had finished their SDK for Android. We were stuck for 4-5 months. How do we get the whole world’s domain names into the app? Then the idea came. Well, Google’s doing an SDK, could we use it? And then your team said, “We could do it”. Several months of work are saved.

You guys literally saved me a fortune by coming up with an idea of how to get the whole world’s domain names into the app.

And it works today! Because of you, guys.

Can you rate us in terms of professionalism, communication, and dedication?

Not even thinking about the score. It’s 10/10 straight up.

Do you have anything else to add?

The best thing with Fora Soft is aftercare.

You’ve raised the bar so high now, I expect it from every company.

The aftercare was so great, and the communication was amazing, it blew me away. Even in the UK here, in England, we don’t get that aftercare.

You guys looked after me and literally held my hand. Any issues that arose, you fixed them instantly, it’s like waving a magic wand. You don\t get that much in the tech world.

You’ve proved that you can find a good company online that will do what you want with your budget. If you got a small budget or a high budget, you’ve managed it perfectly. I think it’s been 5 years now.

So, if somebody needs a video app, I instantly think of you and pass them your number and details.

Thank you very much! Even though recycling and video apps are, as you said, worlds apart, we do wish you all the best.

Got a project idea of your own? Maybe, you’ve tried to make it come to fruition but are dissatisfied with the results? Contact us using the form on this website, and we’d be happy to review your case and offer the best solution.

Kindly follow us on Instagram as we share a lot of information regarding projects. You can also DM us if that’s your preferred method of communication!

Categories
Uncategorized

How to Make Picture-in-Picture Mode on Android With Code Examples

picture-in-picture-android
This is how Picture-in-Picture mode looks like

 In recent years, smartphones have become increasingly close to computers in terms of functionality, and many are already replacing the PC as their primary tool for work. The advantage of personal computers was multi-window capability, which remained unavailable on smartphones. But with the release of Android 7.0, this began to change and multi-window support appeared.

            It’s hard to overestimate the convenience of a small floating window with the video of the interlocutor when the call is minimized – you can continue the dialogue and simultaneously take notes or clarify some information. Android has two options for implementing this functionality: support for the application in a floating window and a picture-in-picture mode. Ideally, an application should support both approaches, but the floating window is more difficult to develop and imposes certain restrictions on the overall application design, so let’s consider picture-in-picture (PiP) on Android as a relatively simple way to bring multi-window support into your application.

video-call-pip-android
PIP mode for video calls on Android

Switching to PIP mode

        Picture-in-picture mode is supported on most devices with Android 8 and above. Accordingly, if you support system versions lower than this, all PIP mode-related calls should be wrapped in the system version check:

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) { 
    // Что-то связанное с PIP 
}

   The entire `Activity` is converted to PIP, and first, you need to declare PIP support for this `Activity` in `AndroidManifest.xml`:

<activity
    ...
    android:supportsPictureInPicture="true" />

       Before using picture-in-picture it is necessary to make sure that the user’s device supports this mode, to do this we turn to the `PackageManager`.

val isPipSupported = context.packageManager.hasSystemFeature(PackageManager.FEATURE_PICTURE_IN_PICTURE)

After that, in its simplest form, the transition to picture-in-picture mode is done literally with one line:

this.enterPictureInPictureMode()

   But to go to it, you need to know when it is convenient for the user. You can make a separate button and jump when you click on it. The most common approach is an automatic switch when the user minimizes the application during a call. To track this event, there is a handy method `Activity.onUserLeaveHint` called whenever the user intentionally leaves `Activity` — whether via the Home or Recent button.

override fun onUserLeaveHint() {
    ...
    if (isPipSupported && imaginaryCallManager.isInCall)
        this.enterPictureInPictureMode()
}

Interface adaptation

        Great, now our call screen automatically goes into PIP mode on Android! But there are often “end call” or “change camera” buttons, and they will not work in this mode. It’s better to hide them when transitioning.

        To track the transition to / from PIP mode, `Activity` and `Fragment` have a method `onPictureInPictureModeChanged`. Let’s redefine it and hide unnecessary interface elements

override fun onPictureInPictureModeChanged(
    isInPictureInPictureMode: Boolean,
    newConfig: Configuration?
) {
    super.onPictureInPictureModeChanged(isInPictureInPictureMode, newConfig)
    setIsUiVisible(isInPictureInPictureMode)
}

   The PIP window is quite small, so it makes sense to hide everything except the interlocutor’s video, including the local user’s video — it will be too small to see anything there anyway.

How to implement picture-in-picture mode on Android app?

Customization

        The PIP window can be further customized by passing `PictureInPictureParams` in a call to `enterPictureInPictureMode`. There are not many customization options, but the option to add buttons to the bottom of the window deserves special attention. This is a nice way to keep the screen interactive despite the fact that the regular buttons stop working in PIP mode.

        The maximum number of buttons you can add depends on many factors, but you can always add at least three. All buttons over the limit simply won’t be shown, so it’s better to place the especially important ones at the beginning. You can find out the exact limit in the current configuration through the method `Activity`:

this.maxNumPictureInPictureActions

        Let’s add an end call button to our PIP window. To start with, just like with notifications, we need a `PendingIntent`, which will be responsible for telling our application that the button has been pressed. If this is the first time you’ve heard of `PendingIntent’ — you can learn more about them in our last article.

        After that, we can start creating the actual button description, namely `RemoteAction`.

val endCallPendingIntent = getPendingIntent()
val endCallAction = RemoteAction(
    // Иконка для кнопки; цвет будет проигнорирован и заменен на системный
    Icon.createWithResource(this, R.drawable.ic_baseline_call_end_24),
    // Текст кнопки, который не будет показан
    "End call",
    // ContentDescription для screen readers
    "End call button",
    // Наш PendingIntent, который будет запущен при нажатии на кнопку
    endCallPendingIntent
)

        Our “action” is ready, now we need to add it to the PIP parameters and, subsequently, to the mode transition call

        Let’s start by creating a Builder for our customization parameters:

val pipParams = PictureInPictureParams.Builder()
    .setActions(listOf(endCallAction))
    .build()

this.enterPictureInPictureMode(pipParams)
customize-pip-feature-on-Android
How to customize picture-in-picture mode?

       In addition to the buttons, through the parameters, you can set the aspect ratio of the PIP features on Android or the animation of switching to this mode.

Other articles about calls on Android

WebRTC on Android

How to Make a Custom Call Notification on Android? With Code Examples

What Every Android App With Calls Should Have

How to Implement Audio Output Switching During the Call on Android App?

    Conclusion

        We have considered a fairly simple but very handy variant of using the multi-window feature to improve the user experience, learned how to add buttons to the PIP window on Android, and adapt our interface when switching to and from this mode.

Categories
Uncategorized

WebRTC in Android

webrtc in android

Briefly about WebRTC

WebRTC is a video chat and conferencing development technology. It allows you to create a peer-to-peer connection between mobile devices and browsers to transmit media streams. You can find more details on how it works and its general principles in our article about WebRTC in plain language.

2 ways to implement video communication with WebRTC on Android

  • The easiest and fastest option is to use one of the many commercial projects, such as Twilio or LiveSwitch. They provide their own SDKs for various platforms and implement functionality out of the box, but they have drawbacks. They are paid and the functionality is limited: you can only do the features that they have, not any that you can think of.
  • Another option is to use one of the existing libraries. This approach requires more code but will save you money and give you more flexibility in functionality implementation. In this article, we will look at the second option and use https://webrtc.github.io/webrtc-org/native-code/android/ as our library.

Creating a connection

Creating a WebRTC connection consists of two steps: 

  1. Establishing a logical connection – devices must agree on the data format, codecs, etc.
  2. Establishing a physical connection – devices must know each other’s addresses

To begin with, note that at the initiation of a connection, to exchange data between devices, a signaling mechanism is used. The signaling mechanism can be any channel for transmitting data, such as sockets.

Suppose we want to establish a video connection between two devices. To do this we need to establish a logical connection between them.

A logical connection

A logical connection is established using Session Description Protocol (SDP), for this one peer:

Creates a PeerConnection object.

Forms an object on the SDP offer, which contains data about the upcoming session, and sends it to the interlocutor using a signaling mechanism. 

val peerConnectionFactory: PeerConnectionFactory
lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          ...
      }
  )!!
}

fun sendSdpOffer() {
  peerConnection.createOffer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpOffer(sdpOffer)
          }

          ...

      }, MediaConstraints()
  )
}

In turn, the other peer:

  1. Also creates a PeerConnection object.
  2. Using the signal mechanism, receives the SDP-offer poisoned by the first peer and stores it in itself 
  3. Forms an SDP-answer and sends it back, also using the signal mechanism
fun onSdpOfferReceive(sdpOffer: SessionDescription) {// Saving the received SDP-offer
  peerConnection.setRemoteDescription(sdpObserver, sdpOffer)
  sendSdpAnswer()
}

// FOrming and sending SDP-answer
fun sendSdpAnswer() {
  peerConnection.createAnswer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpAnswer(sdpOffer)
          }
           …
      }, MediaConstraints()
  )
}

The first peer, having received the SDP answer, keeps it

fun onSdpAnswerReceive(sdpAnswer: SessionDescription) {
  peerConnection.setRemoteDescription(sdpObserver, sdpAnswer)
  sendSdpAnswer()
}

After successful exchange of SessionDescription objects, the logical connection is considered established. 

Physical connection 

We now need to establish the physical connection between the devices, which is most often a non-trivial task. Typically, devices on the Internet do not have public addresses, since they are located behind routers and firewalls. To solve this problem WebRTC uses ICE (Interactive Connectivity Establishment) technology.

Stun and Turn servers are an important part of ICE. They serve one purpose – to establish connections between devices that do not have public addresses.

Stun server

A device makes a request to a Stun-server and receives its public address in response. Then, using a signaling mechanism, it sends it to the interlocutor. After the interlocutor does the same, the devices recognize each other’s network location and are ready to transmit data to each other.

Turn-server

In some cases, the router may have a “Symmetric NAT” limitation. This restriction won’t allow a direct connection between the devices. In this case, the Turn server is used. It serves as an intermediary and all data goes through it. Read more in Mozilla’s WebRTC documentation.

As we have seen, STUN and TURN servers play an important role in establishing a physical connection between devices. It is for this purpose that we when creating the PeerConnection object, pass a list with available ICE servers. 

To establish a physical connection, one peer generates ICE candidates – objects containing information about how a device can be found on the network and sends them via a signaling mechanism to the peer

lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          override fun onIceCandidate(iceCandidate: IceCandidate) {
              signaling.sendIceCandidate(iceCandidate)
          }           …
      }
  )!!
}

Then the second peer receives the ICE candidates of the first peer via a signaling mechanism and keeps them for itself. It also generates its own ICE-candidates and sends them back

fun onIceCandidateReceive(iceCandidate: IceCandidate) {
  peerConnection.addIceCandidate(iceCandidate)
}

Now that the peers have exchanged their addresses, you can start transmitting and receiving data.

Receiving data

The library, after establishing logical and physical connections with the interlocutor, calls the onAddTrack header and passes into it the MediaStream object containing VideoTrack and AudioTrack of the interlocutor

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

   val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

   peerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {

           override fun onIceCandidate(iceCandidate: IceCandidate) { … }

           override fun onAddTrack(
               rtpReceiver: RtpReceiver?,
               mediaStreams: Array<out MediaStream>
           ) {
               onTrackAdded(mediaStreams)
           }
           … 
       }
   )!!
}

Next, we must retrieve the VideoTrack from the MediaStream and display it on the screen. 

private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   val videoTrack: VideoTrack? = mediaStreams.mapNotNull {                                                            
       it.videoTracks.firstOrNull() 
   }.firstOrNull()

   displayVideoTrack(videoTrack)

   … 
}

To display VideoTrack, you need to pass it an object that implements the VideoSink interface. For this purpose, the library provides SurfaceViewRenderer class.

fun displayVideoTrack(videoTrack: VideoTrack?) {
   videoTrack?.addSink(binding.surfaceViewRenderer)
}

To get the sound of the interlocutor we don’t need to do anything extra – the library does everything for us. But still, if we want to fine-tune the sound, we can get an AudioTrack object and use it to change the

var audioTrack: AudioTrack? = null
private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   … 

   audioTrack = mediaStreams.mapNotNull { 
       it.audioTracks.firstOrNull() 
   }.firstOrNull()
}

For example, we could mute the interlocutor like this:

fun muteAudioTrack() {
   audioTrack.setEnabled(false)
}

Sending data

Sending video and audio from your device also begins by creating a PeerConnection object and sending ICE candidates. But unlike creating an SDPOffer when receiving a video stream from the interlocutor, in this case, we must first create a MediaStream object, which includes AudioTrack and VideoTrack. 

To send our audio and video streams, we need to create a PeerConnection object, and then use a signaling mechanism to exchange IceCandidate and SDP packets. But instead of getting the media stream from the library, we must get the media stream from our device and pass it to the library so that it will pass it to our interlocutor.

fun createLocalConnection() {

   localPeerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {
            ...
       }
   )!!

   val localMediaStream = getLocalMediaStream()
   localPeerConnection.addStream(localMediaStream)

   localPeerConnection.createOffer(
       object : SdpObserver {
            ...
       }, MediaConstraints()
   )
}

Now we need to create a MediaStream object and pass the AudioTrack and VideoTrack objects into it

val context: Context
private fun getLocalMediaStream(): MediaStream? {
   val stream = peerConnectionFactory.createLocalMediaStream("user")

   val audioTrack = getLocalAudioTrack()
   stream.addTrack(audioTrack)

   val videoTrack = getLocalVideoTrack(context)
   stream.addTrack(videoTrack)

   return stream
}

Receive audio track:

private fun getLocalAudioTrack(): AudioTrack {
   val audioConstraints = MediaConstraints()
   val audioSource = peerConnectionFactory.createAudioSource(audioConstraints)
   return peerConnectionFactory.createAudioTrack("user_audio", audioSource)
}

Receiving VideoTrack is tiny bit more difficult. First, get a list of all cameras of the device.

lateinit var capturer: CameraVideoCapturer

private fun getLocalVideoTrack(context: Context): VideoTrack {
   val cameraEnumerator = Camera2Enumerator(context)
   val camera = cameraEnumerator.deviceNames.firstOrNull {
       cameraEnumerator.isFrontFacing(it)
   } ?: cameraEnumerator.deviceNames.first()
   
   ...

}

Next, create a CameraVideoCapturer object, which will capture the image

private fun getLocalVideoTrack(context: Context): VideoTrack {

   ...


   capturer = cameraEnumerator.createCapturer(camera, null)
   val surfaceTextureHelper = SurfaceTextureHelper.create(
       "CaptureThread",
       EglBase.create().eglBaseContext
   )
   val videoSource =
       peerConnectionFactory.createVideoSource(capturer.isScreencast ?: false)
   capturer.initialize(surfaceTextureHelper, context, videoSource.capturerObserver)

   ...

}

Now, after getting CameraVideoCapturer, start capturing the image and add it to the MediaStream

private fun getLocalMediaStream(): MediaStream? {
  ...

  val videoTrack = getLocalVideoTrack(context)
  stream.addTrack(videoTrack)

  return stream
}

private fun getLocalVideoTrack(context: Context): VideoTrack {
    ...

  capturer.startCapture(1024, 720, 30)

  return peerConnectionFactory.createVideoTrack("user0_video", videoSource)

}

After creating a MediaStream and adding it to the PeerConnection, the library forms an SDP offer, and the SDP packet exchange described above takes place through the signaling mechanism. When this process is complete, the interlocutor will begin to receive our video stream. Congratulations, at this point the connection is established.

Many to Many

We have considered a one-to-one connection. WebRTC also allows you to create many-to-many connections. In its simplest form, this is done in exactly the same way as a one-to-one connection. The difference is that the PeerConnection object, as well as the SDP packet and ICE-candidate exchange, is not done once but for each participant. This approach has disadvantages:

  • The device is heavily loaded because it needs to send the same data stream to each interlocutor
  • The implementation of additional features such as video recording, transcoding, etc. is difficult or even impossible

In this case, WebRTC can be used in conjunction with a media server that takes care of the above tasks. For the client-side the process is exactly the same as for direct connection to the interlocutors’ devices, but the media stream is not sent to all participants, but only to the media server. The media server retransmits it to the other participants.

Conclusion

We have considered the simplest way to create a WebRTC connection on Android. If after reading this you still don’t understand it, just go through all the steps again and try to implement them yourself – once you have grasped the key points, using this technology in practice will not be a problem. 

You can also refer to the following resources for a better understanding of WebRTC:

WebRTC documentation by Mozilla

Fora Soft article on WebRTC in simple terms

Fora Soft article on WebRTC security

Categories
Uncategorized

Software Development for In-Sync Music Jamming Online by Video Chat

jam-online
music collaboration software with video chat

Musicians can survive the pandemic with WorldCastLive.com. The band connects in a video call, invites the fans to watch, and they jam online remotely at a live concert. 100% in sync, with less than a second delay.

Features

Use cases

Devices

How much?

Features for virtual music jam by a video chat

🎶Audio quality

Synchronization

Why not make music online with friends and strangers in any video chat? I take the guitar, call Joe with drums, add Sarah with a piano, and we all play. Because it doesn’t work: participants’ sound is not in perfect sync. It’s ok for a talk but not for a real time video music collaboration app.

We develop video conferences for musicians to play together, learn and teach, and hold concerts.

Sync for listeners

Each musician produces an audio track. Our software marks them, recognizes delays for each one of them, and syncs them into one audio file on the server. It’s streamed to the audience.

But if this happens afterward on the server, how can the musicians perform together? They have to hear each other in sync right now to play together.

Sync for the musicians

We calibrate audiotracks: start sound at one node, and listen to it when it reaches the other node – thus measure the delay. Let’s imagine we have a drummer, a guitarist, and a singer. The drummer starts, his audiotrack goes to the guitarist. The guitarist starts, the 2 audio tracks go to the singer in sync: the delay is added to the drummer track with which it goes to the guitarist. It snowballs from there: each next musician hears the previous musicians only. The singer hears the guitarist and the drummer, the guitarist hears the drummer only, the drummer does not hear anyone.

Clear sound: right audio codec with right settings

clear-sound-music-video-chat
Audio quality in music software

In WebRTC the developer picks an audio codec. Some are better for voice, some for music. Choose Opus: the best sound quality plus low latency – Mozilla thinks so too.

To pick Opus is not everything. By default, WebRTC is set for voice calls so that the voice would sound clearer and louder. So, for music jam online in real time we need to make 3 adjustments:

  • Background noise removal distorts sound when playing music. We switch it off.
  • ~40 kb/s is a standard bitrate for a voice call. Music needs 128 kb/s minimum. Opus supports up to 510 kb/s – so we increase it.
  • We increase the number of audio channels from 1 to 2: from mono to stereo.

🚀 Real-time streaming

Jam online with no latency

Subsecond latency is a norm in video chats – otherwise speaking to each other would not be possible. In video broadcasts to thousands of people, a few-second latency is a norm. When jam online with other musicians remote by a video chat, latency must be subsecond even though thousands are watching. Read how we do it in the article.

Monitoring to prevent latency: Internet connection, sound card, audio output

jam-online-no-lantency
 online rehearsal with no lantency

See the sound quality of each musician in real-time: green for good, yellow for decent, and red for unacceptable. Quality parameters: Internet connection, sound card latency, audio output.

For example, if one participant has slow Internet, the video conference won’t be real-time and low latency music collaboration is not possible. So his Internet shows red, and the slow user knows that he needs to fix the problem.

🎸 Connect professional musical equipment

music-equipment-settings
instruments settings on music software

Output: sound cards and audio interfaces

For sound output, connect a sound card or an audio interface for professional sound. Show a volume bar for that output that displays volume in real-time.

Input: professional microphones, musical instruments, and amplifiers

For sound input, connect a guitar or other electronic instruments directly to the video conference. Or connect the instrument to an amplifier to increase the power of a signal and plug the amplifier in the conference. See the input’s signal level change in real-time.

🥁 Professional tools for musicians

music-software-set-up
musician personal sound settings

Crossfader

Set different volumes for different audio channels. Make your own instrument louder than the call, set it to the same volume, or listen to the call louder than your instrument. Mute audio channels.

See the volume set for each instrument in the music jam over the Internet. The range can be from (-12)dB up to (+12)dB with a step of 1 dB.

Equalizer

A list of ranges with sliders to raise and lower the volume of the frequency range between 32 Hz, 64 Hz, 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz,16 kHz. Change how everything sounds: sound volume, noises, the effect of moving closer and farther. Remember the settings and apply them to other calls.

Metronome

Add a metronome and choose BPM for better music collaboration.

💬 Communication tools

Talkback

The band coordinator gives feedback to musicians during the live concert. Speak privately to one musician to not disturb the others. Push to talk, release the button to mute yourself.

Text chat

Let the audience talk without interrupting the performance. Send text messages, emojis, images, even documents. See the participant list.

Recording

Record the concert and let those who haven’t seen it live watch it. Record lessons to re-watch.

Use cases when a video conference with music in sync comes in handy

  • A platform for online virtual concert live
  • Online band or choir rehearsal with remote musicians from different locations
  • E-learning for music
  • Virtual karaoke party online

Devices that Fora Soft develops for

music-app-development
music apps and software development
  • Web browsers – use without download
  • Smartphones and tablets – iOS and Android
  • Desktop PCs and laptops
  • Smart TVs – Samsung, LG, Android-based STBs, Apple TV
  • Virtual reality (VR) headsets

💵 How much development of a conference with synchronized music costs

We develop custom applications tailored to your needs. That is why firstly we plan it, draw a wireframe, then estimate. To give approximate indications:

The simplest 1-on-1 video chat component adjusted for music 

  • 2-4 weeks 
  • About $7,000
  • Could be useful for teaching music lessons

It is not a fully functioning system with login, payment, etc. – just the video chat component. You can integrate it into your application. 

The simplest video conference component for musicians from different places to perform for audiences of thousands of people

  • 1,5-2,5 months 
  • around $28,000

Not a fully functional system with registration, payments, etc. – just the video conference component. You can integrate it into your solution.

Simplest fully functional e-learning system with 1-on-1 video chat adjusted for music 

  • 3-4 months 
  • About $36,000

A fully functioning system with registration, teacher list, payment. Applicable for 1 platform, e.g. web, or iOS, or Android.

The simplest fully functional video conference system with music in sync

  • 4-5 months 
  • around $54,000

It is built from the ground up for one platform, such as web, iOS, or Android. Users register, pay, and play music together for audiences of thousands of people.

Big musical video conferencing solutions 

We assign a dedicated team and work ongoing. These are products that proved their success and generated profit.

Send us your requirements – we’ll get back with an approximate estimation. Or let’s have a call to clarify what you need.

Categories
Uncategorized

WebRTC Security in Plain Language for Business People

webrtc-security

Let’s say you are a businessman and you want to develop a video conference or add a video chat to your program. How do you know what the developer has done is safe? What kind of protection can you promise your users? There are a lot of articles, but they are technical – it’s hard to figure out the specifics of security. Let’s explain in simple words.

WebRTC security measures consist of 3 parts: those offered by WebRTC, those provided by the browser, and those programmed by the developer. Let’s discuss the measures of each kind, how they are circumvented – WebRTC security vulnerabilities, and how to protect from them.

What is WebRTC?

WebRTC – Web Real-Time Communications – is an open standard that describes the transmission of streaming audio, video, and content between browsers or other supporting applications in real-time.

WebRTC is an open-source project, so anyone can do WebRTC code security testing, like here.

WebRTC works on all Internet-connected devices:

  • in all major browsers
  • in applications for mobile devices – e.g. iOS, Android
  • on desktop applications for computers – e.g., Windows and Mac
  • on smartwatches
  • on smart TV
  • on virtual reality helmets
WebRTC-supported devices

To make WebRTC work on these different devices, the WebRTC library was created.

What kind of security does WebRTC offer?

Data encryption other than audio and video: DTLS

The WebRTC library incorporates the DTLS protocol. DTLS stands for Datagram Transport Layer Security. It encrypts data in transit, including keys for transmitting encrypted audio and video. Here you can find the official DTLS documentation from the IETF – Internet Engineering Task Force.

DTLS does not need to be enabled or configured beforehand because it is built in. The video application developer doesn’t need to do anything – DTLS in WebRTC works by default.

DTLS is an extension to the Transport Layer Security (TLS) protocol, which provides asymmetric encryption. Let’s take the example of a paper letter and parcel to understand what symmetric and asymmetric encryptions are.

We exchange letters. A postal worker can open a normal letter, it can be stolen and read. We wanted nobody to be able to read the letters but us. You came up with a way to encrypt them, like swapping letters in the text. In order for me to decipher your letters, you will have to describe how to decipher your cipher and send it to me. This is symmetric encryption: both you and I encrypt the letters and we both have the decryption algorithm – the key.

The weakness of symmetric encryption is in the transmission of the key. It can also be read by the letter carrier or this very letter with the key can be stolen.

The invention of asymmetric encryption was a major mathematical breakthrough. It uses one key to encrypt and another key to decrypt. It is impossible to know the decryption key without having the encryption key. That’s why an encryption key is called a public key – you can safely give it to anyone, it can only encrypt a message. The decryption key is called a private key – and it’s not shared with anyone.

Instead of encrypting the letter and sending me the key, you send me an open lock and keep the key. I write you a letter, put it in a box, put my open lock in the same box, and latch your lock on the box. I send it to you, and you open the box with your key, which has not passed to anyone else.

In symmetric encryption, keys are now disposable. For example, we made a call – the keys were created specifically for the call and deleted as soon as we hung up. Therefore, asymmetric and symmetric encryption are equally secure once the connection is established and keys are exchanged. The weakness of symmetric encryption is only that the decryption key has to be transferred.

But asymmetric encryption is much slower than symmetric encryption. The mathematical algorithms are more complicated, requiring more steps. That’s why asymmetric encryption is used in DTLS only to securely exchange symmetric keys. The data itself is encrypted with symmetric encryption. 

What data DTLS encrypts in WebRTC: all except video and audio

How to bypass DTLS?

Cracking the DTLS cipher is a complex mathematical problem. It’s not considered to be done in a reasonable time without a supercomputer – and probably not with one either. It’s more profitable for hackers to look for other WebRTC security vulnerabilities. 

The only way to bypass DTLS is to steal the private key: steal your laptop or pick the password to the server. 

In the case of video calls through a media server, the server is a separate computer that stores its private key. If you access it, you can eavesdrop and spy on the call. 

It is also possible to access your computer. For example, you have gone out to lunch and left your computer on in your office. An intruder enters your office and downloads a file on your computer that will give him your private key. 

But first of all, it’s like stealing gas: to steal gas, you have to be sitting at the gas line. The intruder has to have access to the wires that transmit the information from you – or be on the same Wi-Fi network: sitting in the same office, for instance. But why go through all that trouble: you can simply upload a file to your computer that will write screen and sound and send it to the intruder. You may download such a malicious file from the Internet by accident yourself if you download unverified programs from unverified sites.

Second, this is not hacking DTLS encryption, but hacking your computer.

How to protect yourself from a DTLS vulnerability?

  • Don’t leave your computer turned on without your password.
  • Keep your computer’s password safe. If you are the owner of a video program, keep the password from the server where it is installed safely. Change your password on a regular basis. Don’t use the password that you use elsewhere.
  • Don’t download untested programs.
  • Don’t download anything from unverified sites.

Audio and video encryption: SRTP

DTLS encrypts everything but the video and audio. DTLS is secure but because of this, it’s slow. Video and audio are “heavy” types of data. Therefore, DTLS is not used for real-time video and audio encryption – it would be laggy. They are encrypted by SRTP – Secure Real-time Transport Protocol, which is faster but therefore less secure. The official SRTP documentation from the Internet Engineering Board.

What data SRTP encrypts in WebRTC: video and audio

How to bypass SRTP?

2 SRTP security vulnerabilities:

  1. Packet headers are not encrypted

    SRTP encrypts the contents of RTP packets, but not the header. Anyone who sees SRTP packets will be able to tell if the user is currently speaking. The speech itself is not revealed, but it can still be used against the speaker. For example, law enforcement officials would be able to figure out if the user was communicating with a criminal.
  1. Cipher keys can be intercepted

    Suppose users A and B are exchanging video and audio. They want to make sure that no one is eavesdropping. To do this, the video and audio must be encrypted. Then, if they are intercepted, the intruder will not understand anything. User A encrypts his video and audio. Now no one can understand them, not even B. A needs to give B the key so that B can decrypt the video and audio in his place. But the key can also be intercepted – that’s the vulnerability of SRTP.

How to defend against SRTP attacks?

  1. Packet headers are not encrypted

    There is a proposed standard on how to encrypt packet headers in SRTP. As of October 2021, this solution is not yet included in SRTP; its status is that of a proposed standard. When it’s included in SRTP, its status will change to “approved standard”. You can check the status here, under the Status heading.
  1. Cipher keys can be intercepted

    There are 2 methods of key exchange:
    1) via SDES – Session Description Protocol Security Descriptions
    2) via DTLS encryption

1) SDES doesn’t support end-to-end encryption. That is, if there is an intermediary between A and B, such as a proxy, you have to give the key to the proxy. The proxy will receive the video and audio, decrypt them, encrypt them back – and pass them to B. Transmission through SDES is not secure: it is possible to intercept decrypted video and audio from the intermediary at the moment when they are decrypted, but not yet encrypted back.

2) The key is no longer “heavy” video or audio. It can be encrypted with reliable DTLS – it can handle key encryption quickly, no lags. This method is called DTLS-SRTP hybrid. Use this method instead of SDES to protect yourself.

IP Address Protection – IP Location Privacy

The IP address is the address of a computer on the Internet.

How IP address looks like

What is the danger if an intruder finds out your IP address?

Think of IP as your home address. The thief can steal your passport, find out where you live, and come to break into your front door.

Once they know your IP, a hacker can start looking for vulnerabilities in your computer. For example, run a port check and find out what programs you have installed. 

For example, it’s a messenger. And there’s information online that this messenger has a vulnerability that can be used to log onto your computer. A hacker can use it as in the case above: when you downloaded an unverified program and it started recording your screen and sound and sending them to the hacker. Only in this case, you didn’t install anything yourself, you were careful. But the hacker downloaded this program to your computer through a messenger vulnerability. Messenger is just an example. Any program with a vulnerability on your computer can be used.

The other danger is that a hacker can use your IP address to determine where you are physically. This is how they stall in movies when negotiations with a terrorist happen to get a fix on their location.

How do I protect my IP address from intruders?

It’s impossible to be completely protected from this. But there are two ways to reduce the risks:

  • Postpone the IP address exchange until the user picks up the phone. So, if you do not take the call, the other party will not know your address. But if you do pick up, they will. This is done by suppressing JavaScript conversations with ICE until the user picks up the phone.

    ICE – Internet Connectivity Establishment: It describes the protocols and routes needed for WebRTC to communicate with the remote device. Read more about ICE in our article WebRTC in plain language.

    The downside:
    Remember, social networks and Skype show you who’s online and who’s not? You can’t do that.
  • Don’t use p2p communication, but use an intermediary server. In this case, the interlocutor will only know the IP address of the intermediary, not yours.

    The disadvantage:
    All traffic will go through the intermediary. This creates other security problems like the one above about SDES.

    If the intermediary is a media server and it’s installed on your server, it’s as secure as your server because it’s under your control. For measures to protect your server, see the SOP section below.

What security methods do browsers offer?

These methods are only for web applications running in a browser. For example, this doesn’t apply to mobile applications on WebRTC.

SOP – Same Origin Policy

When you open a website, the scripts needed to run that site are downloaded to your computer. A script is a program that runs inside the browser. Each script is downloaded from somewhere – the server where it is physically stored. This is its origin. One site may have scripts from different origins. SOP means that scripts downloaded from different origins do not have access to each other.

For example, you have a video chat site. It has your scripts – they are stored on your server. And there are third-party scripts – for example, a script to check if the contact form is filled out correctly. Your developer used it so he didn’t have to write it from scratch himself. You have no control over the third-party script. Someone could hack it: gain access to the server where it is stored and make that script, for example, request access to the camera and microphone of users on all sites where it is used. Third-party scripting attacks are called XSS – cross-site scripting.

If there were no SOP, the third-party script would simply gain access to your users’ cameras and microphones. Their conversations could be viewed and listened to or recorded by an intruder.

But the SOP is there. The third-party script isn’t on your server – it’s at another origin. Therefore, it doesn’t have access to the data on your server. It can’t access your user’s camera and microphone. 

But it can show the user a request to give him access to the camera and the microphone. The user will see the “Grant access to camera and microphone?” sign again, even though he has already granted access. This will look strange, but the user may give access thinking that he’s giving access to your site. Then the attacker would still be able to watch and listen to his conversations. The protection of the SOP is that without the SOP, access would not be requested again.

Access to the camera and microphone is just the most obvious example. The same goes for screen sharing, for example.

It’s even worse with text chat. If there were no SOP, it would be possible to send this malicious script to the chat room. Scripts aren’t displayed in chat: the user would see a blank message. But the script would be executed – and the attacker could watch and listen to his conversations and record them. With SOP the script will not run – because it is not on your server, but in another origin.

How to bypass SOP and how to protect yourself

3 SOP vulnerabilities: errors in CORS, connects via WebSocket, and Server hacking
  1. Errors in CORS – Cross-Origin Resource Sharing

    Complex web applications cannot work comfortably in an SOP environment. Even components of the same website can be stored on different servers – in different origins. Asking the user for permission every time would be annoying.

    This is why developers are given the ability to add exceptions to the SOP – Cross-Origin Resource Sharing (CORS). The developer must list the origins-exceptions separated by a comma, or put “*” to allow all.
    During the development process, there are often different versions of the site: the production version – available to real users, pre-production – available to the site owner for the final testing before posting to production, test – for testing by testers, the developer’s version. URLs of all versions are different. The programmer has to change the URL of exceptions from the SOP each time he transfers the version to another version. There is a temptation to put “*” to speed up. He can forget to replace the “*” in the list of exceptions in the production version, and then the SOP for your site will not work. It will become vulnerable to any third-party scripts.

    How to protect against errors in CORS

    To the developer – check for vulnerabilities from XSS: write exceptions from SOP, instead of “disabling” it by typing “*”.

    To the user – revoke camera and microphone accesses that are no longer needed. The browser stores a list of permissions: to revoke, you must uncheck the box.
  1. Replacing the server your server connects to via WebSocket

    What is WebSocket?

    Remember the CORS, the SOP exception that you have to set manually? There is another exception that is always in effect by default. This is WebSocket.

    Why such an insecure technology, you ask? For real-time communication. The request technology that SOP covers doesn’t allow for real-time communication, because it’s one-way.

    Imagine you’re driving in a car with a child in the back seat. You are server-side, the child is the client-side. The child asks you periodically: are we there? You answer “no.” In inquiry technology, when you finally arrive, you will not be able to say “we have arrived” to the child yourself. You have to wait for the child to ask. WebSocket allows you to say “arrived” yourself without having to wait for the question.

    Examples from the field of programming: video and text chats. If WebSocket didn’t exist, the client side would have to periodically ask, “do I have incoming calls?”, “do I have messages?” Even if you ask once every 5 seconds, it’s already a delay. You can ask more often – once a second, for example. But then the load on the server increases, the server must be significantly more powerful, that is, more expensive. This is inefficient and this is why WebSocket was invented.

    What is the vulnerability of WebSocket

    WebSocket is a direct connection to the server. But which one? Well, normally yours. But what if the intruder replaces your server address with his own? Yes, his server address would not be at your origin. But the connection is through WebSocket, so the SOP won’t check it and won’t protect it.

    What can happen because of this substitution? On the client-side, your text or video chat will receive a new message or an incoming call. It will appear to be one person writing or calling, but in fact, it will be an intruder. You may receive a message from your boss, such as “urgently send… my Gmail account password, the monthly earnings report” – whatever. You might get a call from an intruder pretending to be your boss, asking you to do something. If the voices are similar, you won’t even think that it might not be him – because the call is displayed as if it was from him.

    How this can be done is a creative question. You have to look for vulnerabilities in the site. An example is XSS. You have a site with a video chat and a contact form, the messages from which are displayed in the admin panel of the site. A hacker sends the “replace the server address with this one” script to the contact form. The script appears in the admin panel along with all the messages from the contact form. Now it’s “inside” your site – it has the same source. SOP will not stop it. The script is executed, the server address is changed to this one.

    How to protect against spoofing the server that your server connects to via WebSocket
  • Filter any data from users to scripts

    If the developer programmed not to accept scripts from users – the message from the contact form in the example above would not be accepted, and an intruder would not be able to spoof your server into his own on a WebSocket connection this way. You should always filter user messages for scripts, this will protect against server spoofing in WebSocket as well as many other problems.
  • Program a check that the connection through WebSocket is made to the correct origin

    For example, generate a unique codeword for each WebSocket connection. This codeword is not sent over the WebSocket, which means the SOP works. If a request for a codeword is sent to a third-party source, SOP will not allow it to be sent – because the third-party server is of a different origin.
  • Code obfuscation

    To obfuscate code is to make it incomprehensible while keeping it working. Programmers write code clearly – at least they should 🙂 So that if another developer adopts the code, he can make out in this code which part does what and work with this code. For example, programmers clearly name variables. The server address which is to be connected to via WebSocket is also a variable and will be named clearly, e.g. “server address for WebSocket connection”. After running the code through obfuscation, this variable will be called, for example, “C”. An outside intruder programmer will not understand which variable is responsible for what.

    The mechanism of codeword generation is stored in the code. Cracking it is an extra effort, but it is possible. If you make the code unreadable, the intruder won’t be able to find this mechanism in the code.
  1. Server hacking

    If your server gets hacked, a malicious third-party script can be “put” on your server. The SOP will not help: Your server is now the source of this script. This script will be able to take advantage of the camera and microphone access that the user has already given to your site. The script still won’t be able to send the recording to a third-party server, but it doesn’t need to. The attacker has access to your server: he can simply take the recording from there.

    How a server can be hacked is not among WebRTC security issues, so it’s beyond the scope of this article. For example, an attacker could simply steal your server username and password.

How to protect yourself from the server hack

The most obvious thing is to protect the username-password. 

If your server is hacked, you can’t protect yourself from the consequences. But there are ways to make life difficult for the attacker.

  1. Store all user content in encrypted form on the server. For example, records of video conferences. The server itself should be able to decrypt them. So, the server stores the decryption method. If the server is hacked, the attacker can find it. But that’s going to take time. He won’t be able to just swing by the server, copy the conversations and leave. The time he will have to spend on the compromised server will increase. This gives the server owner time to take some measures, such as finding the active session of the connected intruder and disabling him as an administrator and changing the server password.
  1. Ideally, do not store user content on the server. For example, allow recording conferences, but don’t save them on the server, let the user download the file. Once the file is downloaded – only the user has it, it’s not on the server.
  1. Give the user more options to protect himself – develop notifications in the interface of your program. We don’t recommend this method for everyone, because it’s inconvenient for the user. But if you are developing video calls for a bank or a medical institution, security is more important than convenience:
  1. Ask for access to the camera and microphone before each call.

    If your site gets hacked and they want to call someone on behalf of the user without their permission, the user will get a notification: “Do you want the camera and microphone access for the call?” He didn’t initiate that call, so it’s likely to keep the user safe: he’ll click “no.” It’s safe, but it’s inconvenient. What percentage of users will go to a competitor instead of clicking “allow” before every call?
  1. Ask for access to the camera and microphone to call specific users.

    Calling a user for the first time? See a notification saying “Allow camera and microphone access for calls to …Chris Baker (for example)?”. It’s less inconvenient for the user if they call the same people often. But it still loses in convenience to programs that ask for access only once.

Use a known browser from a trusted source

What is it?

The program you use to visit websites. Video conferencing works in the browser. When you use it, you assume the browser is secure.

How do attackers use the browser?

By injecting malicious code that does what the hacker wants.

How to protect yourself?

  • Don’t download browsers from untrusted sources.
    Here’s a list of official sites for the most popular browsers:
  • Don’t use unknown browsers
    Just like with the links. If a browser looks suspicious, don’t download it.
    You can give a list of safe browsers to the users of your web application. Although, if they are on your site, it means that they already use some browser… 🙂

What security measures should the developer think about?

WebRTC was built with security in mind. But not everything depends on WebRTC because it’s only a part of your program that is responsible for the calls. If someone steals the user’s password, WebRTC won’t protect it, no matter how secure the technology is. Let’s break down how to make your application more secure.

Signaling Layer

The Signaling Layer is responsible for exchanging the data needed to establish a connection. How connection establishment works, the developer writes – it happens before WebRTC and all its encryption comes into play. Simply put: When you’re sitting on a video call site and a pop-up pops up, “Call for you, accept/reject?” Before you hit “accept” it’s a signal layer, establishing a connection.

How can attackers use the signaling layer and how can they protect themselves?

There are many possibilities to do this. Let’s look at the 3 primary ones: Man-in-the-Middle attack, Replay attack, Session hijacking.

Attack on signalling layer: 2 people in process of establishing a connection, the intruder connects in the middle
  • MitM (Man-in-the-Middle) attack

In the context of WebRTC, this is the interception of traffic before the connection is established – before the DTLS and SRTP encryption described above comes into effect. An attacker sits between the callers. He can eavesdrop and spy on conversations or, for example, send a pornographic picture to your conference – this is called zoombombing.

This can be any intruder connected to the same Wi-fi or wired network as you – he can watch and listen to all the traffic going on your Wi-fi network or on your wire.

How to protect yourself?

Use HTTPS instead of HTTP. HTTPS supports SSL/TLS encryption throughout the session. Man-in-the-middle will still be able to intercept your traffic. But the traffic will be encrypted and he won’t understand it. He can save it and try to decrypt it, but he won’t understand it right away.

SSL – Security Sockets Layer – is the predecessor to TLS. It turns HTTP into HTTPS, securing the site. Users used to go to HTTP and HTTPS sites without seeing the difference. Now HTTPS is a mandatory standard: developers have to protect their sites with SSL certificates. Otherwise, the browsers won’t let the user go to the site: they’ll show that dreaded “your connection is not secured” message – and only by clicking “more” can the user click “still go to the site”. Not all users will click “go anyway”, that’s why all developers now add SSL certificates to sites.

  • Replay attack

You have protected yourself from Man-in-the-middle with HTTPS. Now the attacker hears your messages but does not understand them. But he hears them! And therefore, he can repeat – replay. For example, you gave the command “transfer 100 dollars”. And the attacker, though he does not understand it, repeats “transfer 100 dollars” – and without additional protection, the command will be executed. From you will be written off 100 dollars 2 times, and the second 100 dollars will be sent in the same place where the first.

How to protect yourself?

Set a random session key. This key will be active during one session and cannot be used twice. “Send $100. ABC”. If an intruder repeats “transfer $100. ABC” – it will become clear that the message is repeated and it should not be executed. This is exactly what we did in the NextHuddle project – a video conferencing service for educational events. NextHuddle is designed for an audience of 5000 users and 25 streamers.

  • Session hijacking

Session hijacking is when a hacker takes over your Internet session. For example, you call the bank. You say who you are, your date of birth, or a secret word. “Okay, we recognize you. What do you want?” – and then the intruder takes the phone receiver from you and tells them what he wants.

How do you protect yourself?

Use HTTPS. You have to be man-in-the-middle to hijack the session. So what protects against man-in-the-middle also protects against session hijacking.

Selecting the DTLS Encryption Bit

DTLS is an encryption protocol. The protocol has encryption algorithms such as AES. AES has bits – 128 or more complex and protected 256. In WebRTC they are chosen by the developer. Make sure that the bit selected for AES is the one that gives the highest security, 256.

AES-256 encryption compared to AES-128 as a bigger lock against a smaller one

You can read how to do this in the Mozilla documentation, for example. A certificate is generated and when you create a peer connection you pass on this certificate.

Authentication and member tracking

The task of the developer is to make sure that everyone who enters the video conference room is authorized to do so.

Example 1 – private rooms: for example, a paid video lesson with a teacher. The developer should program a check: has the user paid for the lesson? If he has paid, let him in, and if he hasn’t, don’t let him in.

This seems obvious, but we have encountered many cases where you can copy the URL of such a paid conference and send it to anyone and he goes and visits the conference even though he did not pay for the lesson.

Example 2 – open rooms: for example, business video conferences of the “join without registration” type. This is done for convenience: when you don’t want to make a business partner waste time and register. You just send him a link, he follows it and gets in the conference.

If there are not so many participants, the owner himself will see if someone has joined too much. But if a lot do, the owner won’t notice. One way out is for the developer to program the manual approval of new participants by the owner of the conference.

Example 3 – helping the user to protect his login and password. If an intruder gets hold of a user’s login and password, he will be able to log in with it.

Program the login through third-party services. For example, social networks, Google login, or Apple login on mobile devices. You may not use a password, but send a login code to your email or phone. This will reduce the number of passwords a user has to keep. The thief would not need to steal a password from your program, but a password from a third-party service such as a social network, mobile account, or email. 

You can use two ways at once – for example, the username and password from your program plus a confirmation code on your phone. Then, in order to hack your account, you will need to steal two passwords instead of one.

Phone in a hand with a verification code input screen to protect login

Not all users will want to log in that hard and long to call. A choice can be given: one login method or two. Those who care about security will choose two, and be grateful. 

Access Settings

Let’s be honest – we don’t always read the access settings dialogs. If the user is used to clicking OK, the application may get permissions he didn’t want to give. 

The other extreme measure, the user may delete the app if they don’t immediately understand why they’re being asked for access.

Good and bad example of permission request in apps

The solution is simple – show care. Write clearly what permissions the user gives and why.

For example, in mobile applications: before showing a standard pop-up requesting access to geolocation, show an explanation like “People in our chat room call nearby. Allow geolocation access, so we can show you the people nearby.

Screen sharing

Any app that gives a screen demo feature should have a warning about exactly what the user is showing. 

For example, before a screencasting session, when the user selects the area of the screen to be shown. Make a reminder notification so that the user doesn’t accidentally show a piece of the screen with data they don’t want to show. “What do you want to share?” – and options: ” – entire screen, – only one application – select which one, such as just the browser.”

Choice what app to share when screensharing

If you gave the site permission to do a screen share, and the site gets hacked, the hacker can send you a script that opens some web page in your browser while you’re doing the screen share. For example, he knows how links to social networking posts are formed. He has formed a link to your correspondence with a particular person that he wants to see. He’s not logged in to your social network – so when he follows that link, he won’t see your correspondence. But if he’s hacked into a site that you’ve allowed to show the screen, the next time you show the screen there he’ll execute a script that will open that page with the correspondence in your browser. You will rush to close it, but too late: the screencasting has already passed it to the intruder. The protection against this is the same as against hacking the server – keep your passwords safe. But it is difficult to do. What’s easier is not to hack the site, but to send a fake link requesting screen sharing.

Where to read more about WebRTC security

There are many articles on the internet about security in WebRTC. There are 2 problems with them:

  1. They merely express someone’s subjective opinion. Our article is no exception. The opinion may be wrong.
  2. Most articles are technical: it might be difficult for somebody who’s not a programmer to understand.

How to solve these problems?

  1. Use the scientific method of research: read primary sources, the publications confirmed by someone’s authority. In scientific work, these are publications in Higher Attestation Commission (HAC) journals – before publication in them, the work must be approved by another scientist from the HAC. In IT these are the W3C – World Wide Web Consortium and the IETF – Internet Engineering Task Force. The work is approved by technical experts from Google, Mozilla, and similar corporations before it is published.
    WebRTC security considerations from the W3C specification – in brief
    WebRTC security considerations from the IETF – details on threats, a bit about protecting against them
    IETF’s WebRTC security architecture – more on WebRTC threat protection
  2. The documentation above is correct but written in such technical language that a non-technical person can’t figure it out. Most of the articles on the internet are the same way. That’s why we wrote this one. After reading it:
    – The basics will become clear to you (hopefully). Maybe this will be enough to make a decision.
    – If not, the primary sources will be easier for you to understand. Cooperate with your programmer – or reach out to us for advice.

Conclusion

Security of a WebRTC: WebRTC alone = 1 shield of 3, WebRTC + Good developer = 3 shields of 3

WebRTC itself is secure. But if the developer of a WebRTC-based application doesn’t take care of security, his users will not be safe.

For example, in WebRTC all data except video and audio is encrypted by DTLS, and audio and video are encrypted by SRTP. But many WebRTC security settings are chosen by the developer of the video application: for example, how to transfer keys to SRTP – by DTLS top-level security or not.

Furthermore, WebRTC is only a way to transmit data when the connection is already established. What happens to users before the connection is established is entirely up to the developer: as he programs it, so it will be. What SOP exceptions to set, how to let users in a conference, whether to use HTTPS – all this is up to the developer.

Write to us, we’ll check your video application for security. 

Check out our Instagram – we post projects there, most of which were made on WebRTC.