How to Use Computer Vision in Android to Create an Image Recognition App?

Google+ Pinterest LinkedIn Tumblr

This guest article on Applozic Blog was written by Mr. Dennis Muasya. His specialty is designing and developing advanced applications for the android platform, WordPress, and writing technical articles. He loves to code and study all kinds of new innovation topics. You can find him on his WebsiteTwitter and LinkedIn.

Dennis Muasya

This project’s GitHub page can be found here ComputerVision

The technique of converting a video or picture feed into a series of data points is known as computer vision. These data points are then used by the computer to identify and comprehend what is happening in the video or image. Computer Vision must be able to recognize and evaluate picture properties such as forms, colors, textures, and patterns. When creating an app for image identification using computer vision, two factors must be considered: quality and quantity. High-quality photos with appropriate lighting will boost the algorithm’s performance and help it achieve high accuracy rates. However, in order for the algorithm to perform any predictions or analyses on the images, there must be an adequate quantity of images to work with.

To begin, it’s critical to comprehend what image recognition entails. It is Computer Vision and Artificial Intelligence representing a set of methods for detecting and analyzing images to enable the automation of a specific task. TensorFlow Lite allows us to develop deep learning models that operate on mobile devices (TFLite). In fact, TFLite variants are specifically designed for mobile and edge deployment for their efficient processing capabilities. Developers can use the TensorFlow Lite Converter to convert a deep learning model created in TensorFlow into a mobile-friendly version. 

Long Polling: A Quick and Simple Rundown

In this tutorial, we’ll create a TensorFlow model for identifying photographs on Android using a custom dataset and a convolutional neural network (CNN). Rather than developing a CNN from scratch, we’ll use a pre-trained model and modify it to our new dataset via transfer learning. The model will then be utilized in an Android app that recognizes photos captured by the camera. On Github, you can find the project that was developed as a result of this tutorial. It has passed all of the necessary tests and is now operational.

Photo Preview

What exactly is Computer Vision?

It’s a branch of artificial intelligence concerned with how computers comprehend image material. Image identification, video surveillance, digital pathology, medical imaging, robotics, and other elements and fields use computer vision algorithms.
It’s not easy to create an image recognition program. To begin, you must possess particular computer vision abilities. Second, you must be proficient in a programming language such as Python or C++. Finally, you’ll need to know the fundamentals of machine learning methods like neural networks.

What is Image Recognition?

Image identification is a computer vision task that recognizes and categorizes various elements of pictures and movies. Picture recognition models are taught to take an image and label it with one or more labels as input. Target classes represent the set of possible output labels. Picture recognition models can also produce a confidence score, which indicates how certain the model is that an image belongs to a specific class.

Understanding Real Time APIs- Types & Use Cases

For example, to determine whether or not a cow was included in an image, the pipeline may look like this:

  • Image recognition model that has been trained on “cow” and “not cow” photographs.
  • The following data is fed into the model: 
  • The output of Image Model: A confidence score showing the likelihood of that image containing that item class (i.e., cow).

Convolutional Neural Networks use layers to let them do mathematical calculations on images. The layers included layers. You can use pre-trained network model designs that are compatible with standard dataset images using transfer learning. So, start by building your own network, but you’ll quickly see that pre-trained networks outperform them. Use some of the pre-trained models to get started.


Now I’ll take you step by step through the process of constructing your app. I’m assuming you already have Android Studio installed and are comfortable with the basics of Kotlin.
I’ve set up a Github for the sample app. You can either follow this instruction or simply clone the Github repository. This code may be used in any way you see fit.

Step 1: TensorFlow Requirements

The components necessary for picture recognition on Android are listed below.

  • Dependency: a sneak peek at TensorFlow (You can also use the most recent version, although the preview is adequate.)
  • Tensorflow’s initialization (method in ClassifierActivity)
  • TensorFlowImageClassifier
  • Interface for Classifiers

The Classifier Interface and Classifier Activity are simple to duplicate. Follow this URL to see these files. I’ll offer you more thorough instructions later in this guide. So, let’s get this party started.

Step 2: Including the necessary dependencies

Starting with the most important dependencies is always a smart approach. For this use case, we’ll require the following items:
TensorFlow Lite is an Android library that requires a lot of camera kit.
Add the following dependencies to the build.Gradle file:

apply plugin: ‘’

android {
  compileSdkVersion 28
  defaultConfig {
      applicationId “com.dennis.ktflite”
      minSdkVersion 21
      targetSdkVersion 28
      versionCode 1
      versionName “1.0”
      testInstrumentationRunner “”
  buildTypes {
      release {
          minifyEnabled false
          proguardFiles getDefaultProguardFile(‘proguard-android.txt’), ‘’

  aaptOptions {
      noCompress “tflite”
      noCompress “lite”

dependencies {
  implementation fileTree(dir: ‘libs’, include: [‘*.jar’])
  implementation ‘’
  implementation ‘’

  implementation ‘com.wonderkiln:camerakit:0.13.0’

  implementation ‘org.tensorflow:tensorflow-lite:+’
  testImplementation ‘junit:junit:4.12’
  androidTestImplementation ‘’
  androidTestImplementation ‘’
  implementation “org.jetbrains.kotlin:kotlin-stdlib-jdk7:$kotlin_version”

  implementation ‘’

apply plugin: ‘kotlin-android’
apply plugin: ‘kotlin-android-extensions’


CameraKit is a Jetpack support library that makes building camera apps a breeze.
It does, in fact, provide a set of useful APIs for interfacing with the device’s camera, making software development for the many different types of camera hardware available across the notoriously fragmented Android ecosystem a little easier. CameraKit makes integrating a reliable camera into your program a breeze. Our open-source camera infrastructure offers reliable capture results, scalability, and compatibility with a wide range of cameras.

Set up

The next stage will be to design a layout.
I’ll assume for the purposes of this article that your app only has one activity and one goal in life: to function as a camera preview. So, open your activity’s layout file as shown here “C:\Users\AndroidStudioProjects\app\src\main\res\layout\activity_main.XML” and add the following view.

  android:layout_gravity=”center” />

(Assume the root tag of your layout file is ConstraintLayout)
As you may expect, this is the view that will display the camera preview on the user’s screen. To make things simple, we’ll keep the screen orientation in portrait mode for now. Locate the activity tag in the AndroidManifest.xml file and add the screenOrientation attribute to it:

<activity android:name=”com.dennis.ktflite.AppActivity”

Now that we’ve completed the challenging part, let’s move on to the activities.

Step 3: Tensorflow initialization in classifier activity

The methods indicated below in the Classifier Activity class must be called “on create method.” Tensorflow is initialized at the start of your program. To do so, copy and paste the code below into the Classifier class. Please leave a comment if you have any questions about the code.

package com.dennis.ktflite

import android.content.res.AssetManager
import org.tensorflow.lite.Interpreter
import java.lang.Float
import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.nio.MappedByteBuffer
import java.nio.channels.FileChannel
import java.util.*

class Classifier(
      var interpreter: Interpreter? = null,
      var inputSize: Int = 0,
      var labelList: List<String> = emptyList()
) : IClassifier {

  companion object {
      private val MAX_RESULTS = 3
      private val BATCH_SIZE = 1
      private val PIXEL_SIZE = 3
      private val THRESHOLD = 0.1f

      fun create(assetManager: AssetManager,
                          modelPath: String,
                          labelPath: String,
                          inputSize: Int): Classifier {

          val classifier = Classifier()
          classifier.interpreter = Interpreter(classifier.loadModelFile(assetManager, modelPath))
          classifier.labelList = classifier.loadLabelList(assetManager, labelPath)
          classifier.inputSize = inputSize

          return classifier

  override fun recognizeImage(bitmap: Bitmap): List<IClassifier.Recognition> {
      val byteBuffer = convertBitmapToByteBuffer(bitmap)
      val result = Array(1) { ByteArray(labelList.size) }
      interpreter!!.run(byteBuffer, result)
      return getSortedResult(result)

  override fun close() {
      interpreter = null

      val inputStream = FileInputStream(fileDescriptor.fileDescriptor)
      val fileChannel =
      val startOffset = fileDescriptor.startOffset
      val declaredLength = fileDescriptor.declaredLength
      return, startOffset, declaredLength)

  private fun loadLabelList(assetManager: AssetManager, labelPath: String): List<String> {
      val labelList = ArrayList<String>()
      val reader = BufferedReader(InputStreamReader(
      while (true) {
          val line = reader.readLine() ?: break
      return labelList

  private fun convertBitmapToByteBuffer(bitmap: Bitmap): ByteBuffer {
      val byteBuffer = ByteBuffer.allocateDirect(BATCH_SIZE * inputSize * inputSize * PIXEL_SIZE)
      val intValues = IntArray(inputSize * inputSize)
      bitmap.getPixels(intValues, 0, bitmap.width, 0, 0, bitmap.width, bitmap.height)
      var pixel = 0
      for (i in 0 until inputSize) {
          for (j in 0 until inputSize) {
              val `val` = intValues[pixel++]
              byteBuffer.put((`val` shr 16 and 0xFF).toByte())
              byteBuffer.put((`val` shr 8 and 0xFF).toByte())
              byteBuffer.put((`val` and 0xFF).toByte())
      return byteBuffer

  private fun getSortedResult(labelProbArray: Array<ByteArray>): List<IClassifier.Recognition> {

      val pq = PriorityQueue(
              Comparator<IClassifier.Recognition> { (_, _, confidence1), (_, _, confidence2) ->, confidence2) })

      for (i in labelList.indices) {
          val confidence = (labelProbArray[0][i].toInt() and 0xff) / 255.0f
          if (confidence > THRESHOLD) {
              pq.add(IClassifier.Recognition(“” + i,
                      if (labelList.size > i) labelList[i] else “Unknown”,

      val recognitions = ArrayList<IClassifier.Recognition>()
      val recognitionsSize = Math.min(pq.size, MAX_RESULTS)
      for (i in 0 until recognitionsSize) {

      return recognitions

Put “classifier.interpreter = Interpreter(classifier.loadModelFile(assetManager, modelPath))” at the top of the class because these methods operate in a thread. The variables are described further down. These are the parameters that our model requires.

Step 4: Incorporating the App Activity

We’ll start by adding an executor for the camera image analyzer that will execute the processing code; on top of the activity, add the following line:

  private val executor = Executors.newSingleThreadExecutor()

When it’s no longer needed, we’ll also disable it in the activity’s onDestroy method:

override fun onDestroy() {
  executor.execute { classifier.close() }

Perhaps you’ve noticed, dear reader, that we haven’t yet used the startCamera() function in our code. Don’t worry, I haven’t forgotten about it. But first, let’s talk about something very different.

Analyzer of images

Unless you consider showing a camera preview amusing enough, we’ll need a mechanism to process the photographs we obtain from the camera before we can do anything interesting with them. The Classifier interface is used at this point.

In the App Activity class’s “onCreate” method, you must call the method “initTensorFlowAndLoadModel.” Tensorflow is initialized at the start of your program. To do so, copy and paste the code below into the App Activity class. If you have any questions, please feel free to leave any comments and I will be glad to address them.

private fun initTensorFlowAndLoadModel() {
  executor.execute {
      try {
          classifier = Classifier.create(
      } catch (e: Exception) {
          throw RuntimeException(“Error initializing TensorFlow!”, e)

Because this approach works similarly to a thread, you should use “Executors.newSingleThreadExecutor()” at the top of this class. These are the parameters that our model requires.

package com.dennis.ktflite

import android.os.Bundle
import android.text.method.ScrollingMovementMethod
import android.view.LayoutInflater
import android.view.View
import android.view.Window
import android.widget.Button
import android.widget.ImageView
import android.widget.TextView
import com.wonderkiln.camerakit.*
import java.util.concurrent.Executors

class AppActivity : AppCompatActivity() {
    lateinit var classifier: Classifier
    private val executor = Executors.newSingleThreadExecutor()
    lateinit var textViewResult: TextView
    lateinit var btnDetectObject: Button
    lateinit var btnToggleCamera:Button
    lateinit var imageViewResult: ImageView
    lateinit var cameraView: CameraView

    override fun onCreate(savedInstanceState: Bundle?) {
        cameraView = findViewById(
        imageViewResult = findViewById<ImageView>(
        textViewResult = findViewById(
        textViewResult.movementMethod = ScrollingMovementMethod()

        btnToggleCamera = findViewById(
        btnDetectObject = findViewById(

        val resultDialog = Dialog(this)
        val customProgressView = LayoutInflater.from(this).inflate(R.layout.result_dialog_layout, null)

        val ivImageResult = customProgressView.findViewById<ImageView>(

        val tvLoadingText = customProgressView.findViewById<TextView>(

        val tvTextResults = customProgressView.findViewById<TextView>(

        // The Loader Holder is used due to a bug in the Avi Loader library
        val aviLoaderHolder = customProgressView.findViewById<View>(

        cameraView.addCameraKitListener(object : CameraKitEventListener {
            override fun onEvent(cameraKitEvent: CameraKitEvent) { }

            override fun onError(cameraKitError: CameraKitError) { }

            override fun onImage(cameraKitImage: CameraKitImage) {

                var bitmap = cameraKitImage.bitmap
                bitmap = Bitmap.createScaledBitmap(bitmap, INPUT_SIZE, INPUT_SIZE, false)

                aviLoaderHolder.visibility = View.GONE
                tvLoadingText.visibility = View.GONE

                val results = classifier.recognizeImage(bitmap)
                tvTextResults.text = results.toString()

                tvTextResults.visibility = View.VISIBLE
                ivImageResult.visibility = View.VISIBLE



            override fun onVideo(cameraKitVideo: CameraKitVideo) { }

        btnToggleCamera.setOnClickListener { cameraView.toggleFacing() }

        btnDetectObject.setOnClickListener {
            tvTextResults.visibility = View.GONE
            ivImageResult.visibility = View.GONE


        resultDialog.setOnDismissListener {
            tvLoadingText.visibility = View.VISIBLE
            aviLoaderHolder.visibility = View.VISIBLE


    override fun onResume() {

    override fun onPause() {

    override fun onDestroy() {
        executor.execute { classifier.close() }

    private fun initTensorFlowAndLoadModel() {
        executor.execute {
            try {
                classifier = Classifier.create(
            } catch (e: Exception) {
                throw RuntimeException(“Error initializing TensorFlow!”, e)

    private fun makeButtonVisible() {
        runOnUiThread { btnDetectObject.visibility = View.VISIBLE }

    companion object {
        private const val MODEL_PATH = “mobilenet_quant_v1_224.tflite”
        private const val LABEL_PATH = “labels.txt”
        private const val INPUT_SIZE = 224

What is Streaming? How does HTTP Live Streaming Work?

Perspectives for the Future 

Naturally, the code described in this article only scratches the surface, you can do a lot more with image detection, such as using the bounding box of the identified text. (To acquire the image’s position on the screen, use the image.boundingBox) and create a number of augmented reality apps; by just modifying the image analyzer code, you can take a slightly different path and use various ML Kit APIs. This made it simpler to adapt the model built by this transfer learning method into a TensorFlow Lite model, which was then used to predict the class labels of new photographs in the Android Studio project. We can now set up all of TensorFlow’s built-in elegant relationships and use them effectively. This post demonstrated how to run an image recognition model on an Android device using TensorFlow Lite. Thank you for reading this far. I hope you found the article useful; if you did, and you’re interested in more tutorials on this subject, please leave a comment! 

Don’t forget to look at the code here. Github


Do you want to share your thoughts with the Global App Development Community? Write for Applozic! Check out how here: