Dialogflow is also called as API.AI which is Google’s chatbot development framework. It is used for Natural Language Processing using machine learning. More interesting, Dialogflow is Saas based product so that you don’t have to worry about infrastructure which can be scaled for million users easily.
Features of Dialogflow
Multi-channel support : Dialogflow support one-click integration for more than 20+ platforms including Slack, Facebook Messenger, Twitter, Kik, Line etc.
Best NLP : Dialogflow has better machine learning as compared to its competitors even with less training data.
Price : Dialogflow can be used for free of cost if you are using Standard edition.
Multi-language support : Dialogflow supports more than 14+ languages worldwide & more features are coming.
Building blocks of Dialogflow
Agent : Agents can be described as the app that we create on Dialogflow which holds definition of intents, entities, knowledge base, fulfillment etc. We can export the agent’s zip file as well which is really cool.
Intent : What the user wants to do is called an intent, i.e. intention of the user. As intents are what user wants to do, and not what we do with that information or how we reply. Intents contains many things like contexts, events, training phrases, responses etc.
Training phrases : We can say a same thing in numerous ways, but means a single task/thing. For example, if we want to set location mark, we can say [set my location mark], [i want to set a location mark`] , [can you please set location mark for me] etc. These are called training phrases which will be used to match what user wants to do.
Entities : For completing a task, we might need some data from the user. For example, for flight booking we need source, destination, date etc. These are called entities.
Fulfillment : Sometimes, we need additional logic to complete a task. We can call use fulfillment for this. For example, if user wants to book a flight we will match the intent, collect required data then we need to call some flight booking API.
Response : After all the processing, we need to reply back to the user. We do this using responses. We can configure multiple responses in the dialogflow console. Also, we can configure platform specific responses as well.
In this post, I will explain object detection and Faster RCNN which is a machine learning algorithm. We shall start from beginners’ level and go till the state-of-the-art in object detection, understanding the intuition, approach and salient features of each method.
Faster R-CNN was originally published in NIPS. It was presented by Ross Girshick, Shaoqing Ren, Kaiming He and Jian Sun in 2015.
It is one of the famous object detection architectures that uses convolution neural networks like YOLO (You Look Only Once) and SSD ( Single Shot Detector).
Everything started with “Rich feature hierarchies for accurate object detection and semantic segmentation” (R-CNN) in 2014, which used an algorithm called Selective Search to propose possible regions of interest and a standard Convolutional Neural Network (CNN) to classify and adjust them.
It quickly evolved into Fast R-CNN, published in early 2015, where a technique called Region of Interest Pooling allowed for sharing expensive computations and made the model much faster.
Finally came Faster R-CNN, where the first fully differential model was proposed. Faster R-CNN architecture is complex because it has several moving parts. It all starts with an image, from which we want to obtain:
a list of bounding boxes.
a label assigned to each bounding box.
a probability for each label and bounding box.
The input images are represented as : Height*Depth*Width. Tensors (multidimensional arrays) are passed through a pre-trained CNN. We use this as a feature extractor for the next part.
This technique is very commonly used in the context of Transfer Learning. This technique is mainly used for training a classifier on a small data set using the weights of a network trained on a bigger data set.
We now have a Region Proposal Network (RPN, for short). After CNN computes it results, it is used to find up to a predefined number of regions (bounding boxes), which may contain objects.
The hardest issue with using Deep Learning (DL) for object detection is generating a variable-length list of bounding boxes. When modeling deep neural networks, the last block is usually a fixed sized tensor output.
The variable-length problem is solved in the RPN by using anchors: fixed sized reference bounding boxes which are placed uniformly throughout the original image. Instead of having to detect where objects are, we model the problem into two parts.
After having a list of possible relevant objects and their locations in the original image, it becomes a more straightforward problem to solve. Using the features extracted by the CNN and the bounding boxes with relevant objects, we apply Region of Interest (RoI) Pooling and extract those features which would correspond to the relevant objects into a new tensor.
Finally, comes the R-CNN module, which uses that information to:
Classify the content in the bounding box (or discard it, using “background” as a label).
Adjust the bounding box coordinates (so it better fits the object).
Obviously, some major bits of information are missing, but that’s basically the general idea of how Faster R-CNN works. Next, we’ll go over the details on both the architecture and loss/training for each of the components.
By now, you should have a clear idea of how Faster R-CNN works. Faster R-CNN is one of the models that proved that it is possible to solve complex computer vision problems with the same principles that showed such amazing results at the start of this new deep learning revolution.
New models are currently being built, not only for object detection, but for semantic segmentation, 3D-object detection, and more, that are based on this original model. Some borrow the RPN, some borrow the R-CNN, others just build on top of both. This is why it is important to fully understand what is under the hood so we are better prepared to tackle future problems.
The label file was downloaded from tensorflow official github repo, link can be found here
If you don’t find it, I’ve kept a copy in my one drive for code, frozen graph as well as the label file which can be found here
import numpy as np
import six.moves.urllib as urllib
import tensorflow as tf
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
cap = cv2.VideoCapture(0)
# This is needed since the notebook is stored in the object_detection folder.
# ## Object detection imports
# Here are the imports from the object detection module.
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
# # Model preparation
# ## Variables
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_CKPT` to point to a new .pb file.
# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = 'C:/Users/Automation/Documents/TensorFlow/workspace/tfdemo/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'C:/Users/Automation/Documents/TensorFlow/workspace/tfdemo/mscoco_label_map.pbtxt'
NUM_CLASSES = 90
# ## Download Model
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
# Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
# Helper code
(im_width, im_height) = image.size
(im_height, im_width, 3)).astype(np.uint8)
with tf.Session(graph=detection_graph) as sess:
# Read frame from camera
ret, image_np = cap.read()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Extract image tensor
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Extract detection boxes
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Extract detection scores
scores = detection_graph.get_tensor_by_name('detection_scores:0')
# Extract detection classes
classes = detection_graph.get_tensor_by_name('detection_classes:0')
# Extract number of detectionsd
num_detections = detection_graph.get_tensor_by_name(
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
# Visualization of the results of a detection.
final_score = np.squeeze(scores)
count = 0
for i in range(100):
if scores is None or final_score[i] > 0.5:
count = count + 1
mytxt = "No of Objects:" + str(count)
cv2.putText(image_np,mytxt,(0,130), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2, cv2.LINE_AA)
# Display output
cv2.imshow('object detection', cv2.resize(image_np, (800, 600)))
if cv2.waitKey(25) & 0xFF == ord('q'):