Krisp Engineering Team, Author at Krisp https://krisp.ai/blog/author/eng-team/ Blog Wed, 19 Jun 2024 14:49:59 +0000 en-US hourly 1 https://krisp.ai/blog/wp-content/uploads/2023/03/cropped-favicon-32x32.png Krisp Engineering Team, Author at Krisp https://krisp.ai/blog/author/eng-team/ 32 32 Elevate Your Contact Center Experience with Krisp Background Voice Cancellation (BVC) https://krisp.ai/blog/contact-center-background-voice-cancellation/ https://krisp.ai/blog/contact-center-background-voice-cancellation/#respond Wed, 19 Jun 2024 14:49:18 +0000 https://krisp.ai/blog/?p=12690 In the energetic environment of a contact center, maintaining clear and focused communications with customers is critical, and foundational. Agents often face the challenge of background noise and overlapping voices, which not only distract customers but can also lead to inadvertent disclosure of sensitive information. Traditional headsets and hardware solutions fall short in addressing these […]

The post Elevate Your Contact Center Experience with Krisp Background Voice Cancellation (BVC) appeared first on Krisp.

]]>
In the energetic environment of a contact center, maintaining clear and focused communications with customers is critical, and foundational. Agents often face the challenge of background noise and overlapping voices, which not only distract customers but can also lead to inadvertent disclosure of sensitive information. Traditional headsets and hardware solutions fall short in addressing these issues effectively. Krisp’s Background Voice Cancellation (BVC) is a game-changer for contact center operations, materially improving AHT, CSAT and ESAT.

What is Krisp Background Voice Cancellation?

Krisp BVC is an advanced AI noise-canceling technology that eliminates all background noises and other competing voices nearby, including the voices of other agents. This breakthrough technology is enabled as soon as an agent plugs in their headsets, without requiring individual voice enrollment or training. This innovative solution integrates smoothly with both native applications and browser-based calling applications via WebAssembly JavaScript (WASM JS), ensuring high performance and efficiency.

Why Choose Krisp BVC for Your Contact Center?

1. Enhanced Customer Experience

Customers often struggle with understanding agents when there’s background chatter, leading to frustration and reduced satisfaction. By using Krisp BVC, all extraneous voices and noises are filtered out, allowing customers to focus solely on the agent they are speaking with. This ensures a smooth and professional interaction every time, which directly contributes to higher CSAT scores.

2. Privacy and Confidentiality

In a contact center, the risk of customers overhearing personal information from other calls is a significant concern, especially for financial and healthcare customers. Krisp BVC addresses this by completely isolating the agent’s voice from the background, ensuring that sensitive information remains confidential.

3. Hardware Independence

While headsets and other hardware solutions provide some noise reduction, they do not eliminate background voices. Krisp BVC works independently of hardware, offering superior noise and background voice cancellation without the need for additional devices or complicated setups.

4. Plug-and-Play Functionality

Once the agent’s headset is plugged in, Krisp BVC is activated automatically. There’s no need for agents to enroll their voice or go through any training process, making it an effortless solution that saves time and resources.

5. Versatility Across Platforms

Krisp BVC is uniquely available for both native applications and browser-based calling applications through WASM JS. This means it can be integrated effortlessly into various platforms, ensuring consistent performance and reliability.

6. Efficient Performance

Krisp BVC is designed to run efficiently in the browser, making it an ideal solution for Contact Center as a Service (CCaaS) platforms. Its high-performance capabilities ensure minimal latency and a smooth user experience.

7. Improved CSAT Metrics

With the enhanced clarity of communication provided by Krisp BVC, customers are more likely to have positive interactions with agents. This leads to increased satisfaction, as reflected in improved CSAT metrics reported to us by a number of customers. Clear and effective communication is crucial in resolving issues promptly and accurately, which in turn boosts customer loyalty and satisfaction.

Integration Made Easy

Integrating Krisp BVC into your contact center application is straightforward. Here’s a sample code snippet to demonstrate how simple it is to get started:

Visualizing the Difference

The graphical representation above illustrates the clarity and focus achieved by using Krisp BVC. Notice how the agent’s speech is clear and distinct, free from background distractions.

Hear the Difference

Experience the transformative power of Krisp BVC with this audio comparison:

Without BVC – Competing Agent Voices

 

With BVC – Clear communication

 

Conclusion

Integrating Krisp BVC into your contact center solutions can significantly enhance the quality of interactions and customer satisfaction. Its ease of integration, combined with superior performance and versatility, makes Krisp BVC a must-have feature for modern contact centers. Upgrade your communication systems today with Krisp Background Voice Cancellation and experience the difference it makes, including improved CSAT metrics.

Ready to get started? Visit Krisp’s Developer Portal for more information and comprehensive integration guides.

The post Elevate Your Contact Center Experience with Krisp Background Voice Cancellation (BVC) appeared first on Krisp.

]]>
https://krisp.ai/blog/contact-center-background-voice-cancellation/feed/ 0
Enhancing Browser App Experiences: Krisp JS SDK Pioneers In-browser AI Voice Processing for Desktop and Mobile https://krisp.ai/blog/enhancing-browser-apps-experience/ https://krisp.ai/blog/enhancing-browser-apps-experience/#respond Wed, 15 May 2024 07:17:29 +0000 https://krisp.ai/blog/?p=12076   In today’s connected world, where web browsers serve as gateways to an assortment of online experiences, ensuring a seamless and productive user experience is paramount. One crucial aspect often overlooked in browser-based communication applications is voice quality, especially in scenarios where clarity of communication is essential.    Diverse Applications of Noise Cancellation on the […]

The post Enhancing Browser App Experiences: Krisp JS SDK Pioneers In-browser AI Voice Processing for Desktop and Mobile appeared first on Krisp.

]]>
 

In today’s connected world, where web browsers serve as gateways to an assortment of online experiences, ensuring a seamless and productive user experience is paramount. One crucial aspect often overlooked in browser-based communication applications is voice quality, especially in scenarios where clarity of communication is essential. 

 

Diverse Applications of Noise Cancellation on the Web

From virtual meetings and online classes to contact center operations, the demand for clear audio communications has become ever more important, making AI Voice processing with noise and background voice cancellation an expected and highly sought-after feature. While standalone applications have provided this functionality, integrating this directly into browser-based applications has proven to be a challenge.

The need for noise and background voice cancellation extends beyond conventional communication platforms. In Telehealth, for instance, where accurate communication is vital for call-based diagnosis and consultation, background noise and voices can hinder effective communication.  Another interesting example is insurance companies, taking calls from their customers from the place of an incident. Eliminating background noise ensures that critical information is accurately conveyed, leading to smoother claims processing and customer satisfaction. These, and many other use cases, often involve one-click web sessions for the calls. 

 

Overcoming Challenges for Mobile Browser Integration

The growing demand for quality communications in browser-based applications extends to both desktop and mobile devices. Up until recently, achieving compatibility with mobile devices, particularly with iOS Safari, posed significant difficulties. Limitations within Apple’s WebKit framework and the inherently CPU-intensive nature of JavaScript solutions hindered bringing the power of Krisp’s technologies to mobile browser applications.

The introduction of Single Instruction, Multiple Data (SIMD) support marked a significant opening for Krisp to deliver its market-leading technology into Safari specifically, and mobile browsers generally. SIMD enables parallel processing of data, significantly boosting performance and efficiency, particularly on mobile devices with limited computational resources.

By leveraging SIMD, the Krisp JS SDK has achieved low levels of CPU efficiency, making its market-leading noise cancellation available for users on mobile browser applications. This breakthrough not only enhances the user experience but also opens up new possibilities for web-based applications across various industries.

As Krisp’s technologies continue to evolve and extend into new territories, the ability to make AI Voice features available for all users across desktop and mobile browser-based applications is fundamental and allows users to have seamless access to the best voice processing technologies in the market.

 

Try next-level audio and voice technologies

Krisp licenses its SDKs to embed directly into applications and devices. Learn more about Krisp’s SDKs and begin your evaluation today.

 

The post Enhancing Browser App Experiences: Krisp JS SDK Pioneers In-browser AI Voice Processing for Desktop and Mobile appeared first on Krisp.

]]>
https://krisp.ai/blog/enhancing-browser-apps-experience/feed/ 0
Scaling WebRTC Voice Quality Across Platforms https://krisp.ai/blog/scaling-webrtc-voice-quality-across-platforms/ https://krisp.ai/blog/scaling-webrtc-voice-quality-across-platforms/#respond Wed, 17 May 2023 13:23:57 +0000 https://krisp.ai/blog/?p=9963 Real-time communications have become increasingly popular with the rise of remote work and the need for virtual meetings. WebRTC is a widely-used technology that enables real-time communication through web browsers. It allows for video, audio, and data transmission without the need for plugins or downloads. WebRTC libraries are also available outside of the browsers and […]

The post Scaling WebRTC Voice Quality Across Platforms appeared first on Krisp.

]]>
Real-time communications have become increasingly popular with the rise of remote work and the need for virtual meetings. WebRTC is a widely-used technology that enables real-time communication through web browsers. It allows for video, audio, and data transmission without the need for plugins or downloads. WebRTC libraries are also available outside of the browsers and native libraries are supported for major native platforms. WebRTC handles the sound stream from the microphone and delivers it to the peer by taking care of sound stream encoding decoding and data encryption. The user does not have to know anything about encryption and audio codecs to use WebRTC. This is the reason why WebRTC is so popular for developing real-time communication systems. With the availability of WebRTC writing communication applications became easy.

WebRTC has integrated open source, minimum-level noise-canceling technology which can be triggered by the user. However, the technology is not nearly as advanced as the Krisp noise cancellation and voice processing technologies. Applications requiring robust noise cancellation with natural voice quality opt for Krisp, while those applications not requiring great voice quality opt for an open source option.

Modifying the Audio Stream in WebRTC

Challenges

WebRTC does not provide a consistent way to modify the sound stream in the application across different platforms, which is a prerequisite for getting Krisp SDK to function. There is a way to modify the sound stream in WebRTC for Android, however, the technique is inconsistent across other platforms and didn’t represent a high standard of coding practice. Also, the technique assumed the creation of AudioDeviceModule in the C++ layer and return its pointer in the form of a long variable and pass it to the Java-level module. Clearly, this is not an elegant coding standard we want our customers to follow.

Solution

Krisp has introduced the Audio Hook feature into WebRTC, which represents a consistent approach to modifying the audio in the client application on all platforms.

The Audio Hook allows modification of the sound stream of the microphone in the client app before the sound is sent to the other peer over the network. The Krisp Application Engineering team has developed and outlined a step-by-step approach for integrating Krisp into your WebRTC application.

Stream Flow Diagram

The Version and Relation to Chrome

Google WebRTC project is related to Google Chrome. It is has been a few years since these projects use the same branch names to match each other on each release. Currently, on March 6, 2023, the stable version of Chrome is 110. Chrome 111 is in beta state. We have chosen version 111 to apply modifications on top of it. You can use the following page to find the relation between the Chrome version and the branch version.

Implementing Krisp Audio Hook on iOS

Implementing the Krisp Audio Hook on iOS involves several steps. First, you need to create a new class that implements the RTCAudioProcessorDelegate abstraction and is responsible for implementing the audio modification logic. This class should include the frameProcess method, where you can write code to process the audio frames with Krisp’s noise-canceling technology.

Here is an example implementation of the RTCAudioProcessorDelegate abstraction in the iOS sample app:

After implementing the KrispAudioProcessor class, you need to inject it into the system using the setup static function in the RTCPeerConnectionFactory class.

After building WebRTC for iOS with the Audio Hook modifications, you can now implement the RTCAudioProcessorDelegate in your iOS application. The RTCAudioProcessorDelegate is an abstract class provided by WebRTC that allows you to modify the audio frames before they are sent it over the network.

To implement the Audio Hook on iOS, you need to write audio frame processing code. You can create a new class that implements the RTCAudioProcessorDelegate abstraction, which will be responsible for implementing the audio modification logic. The abstract class has a frameProcess abstract method that is called for each audio frame that needs to be processed.

Here’s an example of how you can implement the RTCAudioProcessorDelegate in an iOS application:

You can customize the audio processing logic inside the frameProcess method according to your needs. For example, you can apply noise reduction, echo cancellation, or any other audio processing technique offered by Krisp SDK.

Once you have implemented the RTCAudioProcessorDelegate, you need to inject it into the WebRTC system. The modified WebRTC for iOS introduces a new static function setup in the RTCPeerConnectionFactory class, which should be used to inject the RTCAudioProcessorDelegate implementation into the system.

Here’s an example of how you can inject the MyAudioProcessor into the WebRTC system in an iOS application:

By calling the setupAudioProcessorDelegate function with your audio processor instance, you are now able to modify the audio frames before they are sent to the other peer over the network.

Build WebRTC for Other Platforms

The instructions provided so far are specifically for building WebRTC with the Audio Hook feature for iOS. Krisp Application Engineering team is actively working to implement Audio Hook on Android. The document will be updated once the feature becomes available.

Conclusion

Integrating Krisp into WebRTC can be a powerful solution for web application builders who want to implement the world’s best noise reduction and voice processing technologies in their WebRTC applications. By introducing the Audio Hook and modifying the WebRTC audio stream, you can take advantage of Krisp’s powerful audio processing capabilities to enhance the audio quality of your WebRTC applications.

In this article, we have discussed the challenges of modifying the WebRTC audio stream and introduced the Krisp Audio Hook as a solution. We have provided step-by-step instructions on how to build WebRTC with the Audio Hook modifications for iOS.

We hope this article and the detailed documentation have been helpful and you can now enhance the audio quality of your WebRTC applications and deliver a seamless communication experience for your users.

 

Try next-level audio and voice technologies  

Krisp licenses its SDKs to embed directly into applications and devices. Learn more about Krisp’s SDKs and begin your evaluation today.

Krisp Developers page banner

 


This article was written by Aram Tatalyan, BS in Applied Mathematics and Informatics, Staff Engineer at Krisp.

The post Scaling WebRTC Voice Quality Across Platforms appeared first on Krisp.

]]>
https://krisp.ai/blog/scaling-webrtc-voice-quality-across-platforms/feed/ 0
Breaking the audio processing barrier on the web with Krisp JS SDK https://krisp.ai/blog/breaking-the-audio-processing-barrier-on-the-web-with-krisp-js-sdk/ https://krisp.ai/blog/breaking-the-audio-processing-barrier-on-the-web-with-krisp-js-sdk/#respond Fri, 24 Mar 2023 16:07:27 +0000 https://krisp.ai/blog/?p=9878 In today’s world, browsers have become an integral part of our daily lives. Although Krisp worked on various native libraries for most platforms in use (Windows, Mac, Linux, Android, IOS), there were still many use cases where Krisp SDK was needed in the context of web applications. This article summarizes our journey of creating world-class […]

The post Breaking the audio processing barrier on the web with Krisp JS SDK appeared first on Krisp.

]]>
In today’s world, browsers have become an integral part of our daily lives. Although Krisp worked on various native libraries for most platforms in use (Windows, Mac, Linux, Android, IOS), there were still many use cases where Krisp SDK was needed in the context of web applications. This article summarizes our journey of creating world-class noise cancellation SDKs for web usage. We believe this is a unique usage of Web Assembly, which showcases the unlimited possibilities that WASM has to offer. We put a lot of time and effort into this project and have delivered a game-changing JS SDK, enabling many products to level up their web-based real-time communication.

Challenges

Processing audio in the browser is more challenging than it may seem at first glance. We faced different challenges. Some were well-known and had defined technical solutions. In contrast, some challenges were particular to our use case and we had to get creative. Let’s review some of our main challenges and discuss the solutions we came up with.

Run C++ directly on the web.

Our first challenge when working on Krisp’s JS SDK was that our core SDK is C++ based and we needed to find a way to use our core SDK’s functionality inside a browser. We found a well-known solution for that: WASM (WebAssembly). This binary instruction format allows running programming languages such as C and C++ in the browser and achieving near-native performance on the web. To compile our C++ library into a WASM module, we used Emscripten: a complete compiler toolchain to WebAssembly, with a primary focus on performance in web platforms. After compiling C++ into a WASM module, we can import that module to our javascript/typescript project and use it (see Figure 1).

WASM module usage figure

Figure 1. WASM module usage

Using our NC models

After understanding how to use our C++ code, we faced another problem: using our noise cancellation models. Up until this point, we were using our models in a native context and we didn’t have to worry about the size of our models that much. But everything changes in the context of web applications where you can’t load 30 or 40 MB of data every time while opening the application. To solve this problem, our research team worked on smaller yet fully functional models focused on performance. In Figure 2, you can see the comparison between our regular and small models.

Native Model Size Comparison

Figure 2. Model size comparison

128 samples per frame limitation

With native development, you have flexibility and control over your audio flow. However, there are some limitations you have to consider while working with audio inside a browser. Audio circulates in frames (see Figure 3). Each frame consists of 128 samples (a sample is just a number describing the sound). There is also a term called sample rate – the number of samples we will receive per second. The sample rate may vary based on the device (you may have already seen the sample rate characteristic in audio devices’ descriptions).

Sample rate

Figure 3. Audio frames in web

Now, with this information, it is reasonably easy to understand how many seconds/milliseconds of audio is stored in the given number of samples based on our sample rate. The sample count divided by the sample rate will provide us with the number of seconds stored in that sample count. To make the calculation even more helpful, we can multiply the result by 1000 to get the answer in milliseconds.

Example: Let’s say our sample rate is 48000 Hz, and we need to understand the duration of audio stored in one frame. One frame holds 128 samples; therefore, the calculation is as follows:

(128 / 48000) * 1000 = 2.66ms.

So what exactly was our problem? Our SDK needs at least 10 milliseconds of audio to process, while the browser gives and expects to receive 128 sample frames. Our solution was to create a buffering system that will:

a) Accumulate 10 milliseconds of audio and send that chunk to our WASM module to process b) Take the processed audio from the WASM module, split it into 128 sample frames, and return it to the browser (See Figure 4).

Diagram image for buffers

Figure 4. A simplified diagram of our buffering system operations

 

After handling these initial challenges, nothing stopped us from creating our first version of Krisp JS SDK.

The first version of Krisp JS SDK

JS SDK diagram

Figure 5. Simplified diagram of our first version of Krisp JS SDK

In Figure 5, you can see the simplified architecture diagram of our prototype of JS SDK.

C++ side

Krisp Audio SDK is the C++ library we needed to run in Javascript. We also added a 128-sample state machine to it to track the state of the audio data and ensure audio consistency. Krisp WASM Processor is a middleware to create Emscripten bindings, allowing us to use methods from C++ on our Javascript side. As you can see, a buffering system is included here, which works with the state machine. Everything is compiled from this point, and we get our WASM Module.

JS Side and Web Audio API

We needed to work with Web Audio API to process the audio on our javascript side. Web Audio API provides a robust system to process audio on the web. Three main concepts were important in our use case: AudioContext, AudioWorkletProcessor, and AudioWorkletNode. Let’s go over each of these shortly. AudioContext is basically an audio graph that lets you manipulate audio sources and process the audio with various effects and processors before passing it into the audio destination (see Figure 6).

JS Audio Context

Figure 6. The basic structure of AudioContext

AudioWorkletProcessor is capable of receiving input audio and setting the output audio. The interesting part is that you can manipulate audio in its process() method in between. In our case, we imported the WASM Module to the same krisp.processor.js file and started using it in the process() method, taking the cleaned audio and setting it as output.

After writing the necessary code for the processor, the final step would be to integrate that processor into the AudioContext graph somehow. For that, in our krispsdk.js file, we created our filter node using AudioWorkletNode, which takes our processor and adds it to the audio context’s graph.

Snippet image

Simplified look at the connection between AudioWorkletNode, AudioWorkletProcessor, and AudioContext

So, at this point, we already got our first working prototype, but there were several issues with it.

Issues with our first version

Buffering system on the C++ side

Although our initial version worked, we encountered some issues with our buffering system. First of all, we implemented all the buffering logic on our C++ side, and as a result, that system was getting compiled into the WASM Module. And although we implemented logic to ensure that audio was consistent, the buffer’s location could have been more practical, as there was still a chance of losing audio data between processing. It is essential to note that the buffering system was an additional middleware.

Terminating AudioWorkletNode

Another issue was that we couldn’t entirely terminate our AudioWorkletNode, which resulted in multiple memory leak issues. Also, because of the same problem, we couldn’t handle device change based on the sample rate. We should re-initialize our SDK with a proper NC model optimal for that sample rate to ensure the best quality.

Providing NC models separately

And our last issue was that we were providing our NC models separately in a folder so that they could be hosted on our client’s side and imported to JS SDK with static URLs. Seeing all these issues, we started working on solutions and came up with a new version of JS SDK.

The new version of Krisp JS SDK

New JavaScript Diagram

Figure 7. The new architecture of Krisp JS SDK

We developed core architectural improvements to eliminate the problems mentioned above (see Figure 7). Let’s go over each of them.

  • We moved all the buffering systems to the JS side so that our core SDK would receive data in the expected form, which resulted in a more straightforward implementation and more consistent behavior of our buffers. We started collecting audio and checking for the accumulated audio size. When we have our target 10ms audio in our buffer, we provide that 10ms chunk directly to SDK (So we are not sending every audio frame separately, which is a huge optimization.) With the same logic, our custom buffer class takes the cleaned audio and delivers it back to the browser, split into 128 sample frames.
  • We moved all the audio processing to a new, separate web worker thread to handle memory leak issues, which worked perfectly. After terminating the worker, everything inside of it gets terminated as well. Although there will remain an open thread, we solved the main issue with loading and keeping multiple WASM Modules. Clients can implement device change handling logic to achieve the best possible audio quality thanks to this change.
  • We decided to modify our Emscripten method bindings to achieve more flexibility in JS SDK. In the first version, we had only three methods, which were combinations of methods called from the C++ side: init(), toggle(), and dispose(). With the new version, we came to the idea of having a 1-to-1 method binding from C++ to the JS side, allowing us to work with audio on a more advanced level while in JS.For example, while calling init() before, it called multiple methods, including session and its id creation. With the new bindings, we get the session creation method separately, which means together with our in-worker processing logic, we can run multiple instances of NC sessions (for example, run for both microphone and speaker)
  • And our last improvement was related to our NC models’ delivery. Instead of handling them separately, we packed them into one Emscripten .data file. Later, we also added options to choose which NC models we should preload and which should only be loaded when needed.

As you can see in the new architecture, all the communication goes through krispsdk.js which is our main thread. We initialize the worker and worklet from it, and to make them communicate with each other, we used PostMessages. But there was room for optimization for this – to pass one chunk of audio to SDK and back, we were sending 4 PostMessages. To solve this problem, we came up with two solutions.

  1. The first option was using SharedArrayBuffer so the worker and worklet could communicate directly. This solution works, but it needs to be more practical to make it the main way of operation, as SharedArrayBuffer got some security requirements. To be able to use it, you should set specific headers:

    We implemented this solution and created an optional flag, which you can enable during the initialization.

  2. The final solution we came up with was to share a port between the worker and the worklet and send post messages directly to the same port. This optimization resulted in a 50% PostMessage cut, and now to process one chunk of audio, we are using only 2 PostMessages (See Figure 8).

    New JS Architechture

    Figure 8. The improved architecture of the new version of Krisp JS SDK

Summary

In conclusion, developing a noise cancellation SDK for web usage comes with its own set of unique challenges. Through hard work and ingenuity, the team was able to create a game-changing JS SDK that allows for web-based real-time communication with top-tier noise cancellation. The use of Web Assembly showcased the endless possibilities of this technology. The team’s efforts demonstrate that with the right approach, web-based noise cancellation can be achieved with near-native performance.

Try next-level audio and voice technologies  

Krisp licenses its SDKs to embed directly into applications and devices. Learn more about Krisp’s SDKs and begin your evaluation today.

Link to Krisp Developers Page banner

References


This article was written by Arman Jivanyan, BSc in Computer Science, Software Engineer at Krisp.

The post Breaking the audio processing barrier on the web with Krisp JS SDK appeared first on Krisp.

]]>
https://krisp.ai/blog/breaking-the-audio-processing-barrier-on-the-web-with-krisp-js-sdk/feed/ 0
How to Integrate CoreML Models Into C/C++ Codebase https://krisp.ai/blog/how-to-integrate-coreml-models-into-c-c-codebase/ https://krisp.ai/blog/how-to-integrate-coreml-models-into-c-c-codebase/#comments Tue, 28 Feb 2023 11:16:10 +0000 https://krisp.ai/blog/?p=9849 Apple released the M1 processor in November 2020. Since the hardware itself shares similarities with iPhone processors (both being arm-based, and having a neural engine), some software components also started to support MacBooks. One of those software components is Apple’s CoreML. CoreML is a framework that allows you to do ML/AI model inference on CPU, […]

The post How to Integrate CoreML Models Into C/C++ Codebase appeared first on Krisp.

]]>
Apple released the M1 processor in November 2020. Since the hardware itself shares similarities with iPhone processors (both being arm-based, and having a neural engine), some software components also started to support MacBooks. One of those software components is Apple’s CoreML. CoreML is a framework that allows you to do ML/AI model inference on CPU, GPU, or ANE. Running inference on the GPU or the ANE is not as straightforward as running it on the CPU, but those differences are out of the scope of this article. In this article, we will go through all the necessary steps needed for integrating a simple CoreML model into a C/C++-based app or SDK.

Generating a CoreML model

Usually, neural network model training is implemented in Python using frameworks such as Tensorflow or PyTorch. CoreML uses its own custom model formats, as of right now there are two: .mlmodel and .mlpackage. From their documentation, it looks like they are going to move forward with the .mlpackage and slowly drop support for the .mlmodel, so I suggest generating a .mlpackage if you are just starting. CoreML model generation is done via coremltools, an open-source Python package written by Apple. After generating a .mlpackage or a .mlmodel we can move on to the integration phase. In this article as an example we will use a dummy model that has 1 input and 1 output. Both are multidimensional arrays with float32 data types.

Integration

CoreML models need to be compiled first, to be used by the CoreML interface. The extension for a compiled model is .mlmodelc. Compilation can be done in two ways.

  1. By dragging and dropping your CoreML model to your Xcode project. This will add your model as a source file to your project. Xcode will automatically call coreml compiler for your model.
  2. By calling coreml compiler explicitly using the command line:
    mkdir output
    xcrun coremlc compile YourModel.mlmodel output.

Both are essentially the same, with the difference being that the first approach is convenient if you are working on an iOS or a MacOS application. Since our use case is that we are working in a C/C++-based software we will go with the second approach.

Now in order to use the CoreML API for model loading and inference we again have two options. We can either use the API directly or the indirect approach by which I mean to use the CoreML compiler-generated classes that will do the heavy lifting for us. Since the indirect approach is much more developer friendly we will explore that approach in this article.

So to generate the classes we can either drag and drop the model to an Xcode project as we did for model compilation or call the CoreML compiler explicitly:

  • mkdir wrappers
  • xcrun coremlc generate YourModel.mlmodel wrappers

The compiler will generate 3 classes one for model input, one for model output, and one for models that are going to be written in either Objective-C or Swift. The language can be chosen via a compiler option –language. The names of the classes depend on the .mlmodel (or .mlpackage) file name. In our case, filename is CoreMLModel.mlmodel

  • CoreMLModelInput
  • CoreMLModelOutput
  • CoreMLModel

And they have this structure.

Input/Output models usually will have a property for each model input. The biggest class is the model one. Here we are just showing two methods one for loading and one for inference, but the actual class has many methods each suited for a different use case. The model URL argument in method initWithContentsOfURL is the path to our compiled model.

Objective-C++

So far we have looked at the features of using Xcode and the CoreML framework. The languages that the CoreML compilers support are Objective-C and Swift. Since we are trying to integrate CoreML into a C/C++ codebase a little bit of hacking is needed, and the name of our hack is Objective-C++.

Objective-C++ is a language that allows you to mix C++ and Objective-C in your source code. We can just define methods and mix the two languages as much as we want, but it will not be maintainable code. Instead, what we will do is write a class in C++ and use it as a bridge for calling our wrapper methods.

CFTypeRef is just a typedef of const void*, the lifetime of the object pointed by CFTypeRef is managed manually by CoreFoundation API.  To initialize the _coreMLModel object you should write something like this.

As we know all Objective-C objects are maintained by the Objective-C runtime by their reference count(controlled with retain and release methods), what we are doing here with CFBridgingRetain is casting the CoreMLModel object to a CoreFoundation object so we can control its lifetime. At the point of casting the reference count is one, in order to decrement the reference count of a Core Foundation object and destroy it we need to call CFRelease on it. In this case, we will do it in our class’s destructor.

Inference

Moving onto model inference. In this example, our models’ input and output are both multidimensional float32 arrays. That is why CPPCoreMLModel takes in a float* and outputs a float*. If your case is different, you can extend it pretty easily, by adding more buffers, encapsulating them in a data structure, etc. Now in order to execute inference you have to wrap your preprocessed buffer with one of the CoreMLs input types. The most common of those is MLMultiArray. MLMutliArray is a multidimensional array that holds elements of type MLMultiArrayDataType. We can initialize a MultiArrayDataType with our buffer by using the initWithDataPointer initializer.

Besides the input shape initWithDataPointer also takes in the strides of your buffer, which can be calculated using your input shape and the layout of your buffer.

After creating the necessary CoreML inputs you can create an instance of the CoreML compiler-generated CoreMLModelInput class. Usually, the generated class will have an initializer that will take model inputs as arguments. In our case, it will be something like this.

After inference, we can just return the data pointer of our output. Keep in mind that the output buffer is owned by the MLMultiArray object.

Summary

Using Objective-C++ to run inference using CoreML is not the only way. The files generated by the CoreML compiler are also available in Swift. So the other route is to write some other bridging mechanism in Swift that will link our C/C++ codebase with CoreML. This solution however is easy to implement since all we have to do is change some files extension from c/cpp to .mm and give it to clang. Mixing swift here will also require the Swift compiler to be brought into our build pipeline. I highly recommend reading clangs documentation on ARC for memory management. It has saved a lot of time that I would have otherwise spent on debugging memory leaks.


Try next-level audio and voice technologies  

Krisp licenses its SDKs to embed directly into applications and devices. Learn more about Krisp’s SDKs and begin your evaluation today.


This article was written by Sero Mirzakhanyan, MSc in Computer Science, Software Engineer at Krisp.

The post How to Integrate CoreML Models Into C/C++ Codebase appeared first on Krisp.

]]>
https://krisp.ai/blog/how-to-integrate-coreml-models-into-c-c-codebase/feed/ 1