computer vision

NLG using similar videos

For a given query video, system will retrieve similar videos on the basis of cosine similarity of features. Features are extracted frame by frame using ResNet_152 and compared with videos in database to retrieve nearest duplicate videos. Main purpose of solving this problem is to generate captions of untagged videos using nearest duplicate videos. Currently retrieval of similar videos is being done and its desktop application is shown below. Further, in future, description of query video will be generated based on description of retrieved videos.

Signature Verification

Signature forgery detection system (offline) classify forge and original signature from anonymous document. Original signatures are given along with all testing signatures from a user to test if its the same person who did original signatures or not. Its very important to be used in banks and other areas where handwritten signatures matters a lot and can be forged.

Person Identification and Re-identification

Occupational based Person of interest re-identification to classify and track different occupational people on basis of their appearance.This system consists of three main functionalities including:

  • Identification of person
  • Re-identification
  • Person tracking and data-logging

Person is identified with respect to its clothing and of which occupation that person belongs to; it will identify that person to be in that specific profession. There are many times when people are oblivious about presence of anomalous person in scene, or someone who was not supposed to be there, such identification can help us in identifying people and managing them accordingly. The project targets five categories including; army, doctor, imam, police and molvi.

a)- Person Re-Identification


When a person is identified, there is no specific track of that person to where that person is going. System needs to keep a track of that person and needs to store its appearance in database for further identification. This is really helpful in assigning id to specific person and keeping track over it, with a network of cameras.

b)- Person Tagging


Person tagging is really helpful when u find someone suspicions and you want to keep an eye over it for further investigation and surveillance. Person tagging tracks that specific person and ,keeps the track of that person in all network of cameras , maintaining a time log of that person.

Virtual Cloth Fitting Try-on Network

This problem targets online shopping trend and photo shooting of models for magazines and new volumes of outlets. Customer can visualize himself in desired clothes to get confidence before buying clothes. For a given input person image and desired cloth, system outputs the human image in new cloth. It involves Person representation in form of body shape heat map, pose points and image parsing. Cloth deformation according to body pose yields natural visualization.

a)- Person Representation

For a given image, pose map with key points, body shape and face and hair are extracted.

Sketch to Image Generation

Generation of photo-realistic images from the sketches. Input to this project is considered as handmade sketches specifically those that are made by kindergarten students. This model outputs the realistic image that best fits the characteristics of input sketch such as angle, position and size. Conditional generative adversarial networks are being utilized to resolve this problem.

Text to Image Generation

Generation of images using the description provided in the form of text. Input text contains the information that is subjected in the form of images. The input text targets some specific scope, for example a model trained on birds can only take text input which specifies the characteristics of birds that need to be projected. To accomplish this goal, we are exploiting the generative model based techniques.

Image to Video Generation

Generation of activity videos using the scene provided in the form of image. Image consist of one foreground and background. To generate action videos, only the foreground will undergo changes. We are working on generation of videos using input image describing the environment setup and subject using generative adversarial networks.

Automatic Invigilation

The system is capable of automatic invigilate the invigilation center. The system is used to implement the smart class rooms. The system works on the neck movement of the student present in the invigilation room. The neck moment has been monitored and on the bases of the neck movement we decide whether the particular student involved in the cheating or not. The face ROI have also been merged with the detected neck to find the identity of the person. The system generates the statistical report after the timestamp of the particular stamps, which depicts how much particular person shows behavior towards cheating.