Research Projects

Ground water monitoring / Water Analysis for Lahore, Pakistan

For analysis of ground water level for the city of Lahore by utilizing current groundwater condition that is relying on underlying geology and aquifer system. By viewing the previous and current pattern of groundwater situation also produce some prediction for the future. Designing and developing hydro databases. The core part is the development and design of automated workflows using ETL and by writing queries which are optimized fully , it also includes the business workflows automation regarding GISystems. Also deployed telemetry modules for the continuous logging and transmission of data. We also deployed a server and set all of its infrastructure.


Urdu has witnessed the development of significant applications such as email spam detection, genre identification, product review analysis, news categorization, fake news detection, text classification and many more Urdu is considered a linguistically rich and morphologically complex language, thus, state of the art natural language processing APIs like Gensim, SPacy, NLTK, CoreNLP can not process Urdu text at all

SETMOKE API is a language processing toolkit based on machine and deep learning which provides multifarious modules to manage and process Urdu text. It may help numerous researchers and practitioners to develop smart applications. Contribution of Frame works

SETMOKE API Provides Following Modules

1- Pre-processing
2- Urdu Text Classification
3- Urdu Text Summarization
4- Urdu Stemmer
5- Urdu Question Classification

Preprocessing of text enables the better extraction of non-trivial knowledge from unstructured text Preprocessing pipeline comprises of tokenization, stemming, pos tagging, and named entity recognition enable the extraction of significant information from unstructured text. SETMOKE API preprocessing module provides robust algorithms for tokenization, stemming, pos tagging and named entity recognition which are considered indispensable for sentiment analysis, and recommendation systems

Pre-Processing Interface


Urdu Text Classification

Text classification plays an important role for the development of diverse applications such as email spam detection, gender identification, product review analysis, news categorization, and fake news detection. SETMOKE API text classification module employ ten filter based feature selection methods, two feature representation approaches, and two machine learning classifiers to effectively classify Urdu text in one of the predefined categories.

Urdu Text Classification Methodology

Urdu Text Classification Interface

Urdu Text Summarization

Automatic text summarization is being extensively used for various renowned languages (English, Chinese) in order generate precise and fluent summaries. SETMOKE APIs text summarization module exploits five state of the art extractive summarization methods in order to generate an effective summary of single document

Urdu Text Summarization Methodology

Urdu Text Summarization Interface

Urdu Steemer

Stemming plays a vital role to alleviate data sparsity problems by converting inflected forms of words to their base forms, thus, reducing dimensionality of data up to great extent SETMOKE Urdu stemmer works as follows:

Urdu Question Classification

Question classification refers to the process of classifying questions into predefined categories. It plays an effective role in the performance of information retrieval. With the help of question classification, the lookup span of the search engine can be reduced upto great extent through question classification as search engine has to search the answer of the provided query only in certain domain and context. SETMOKE question classification module classify questions into three, six, and seven classes based on difficulty, general properties, and subjectivity

Methodology of Urdu Question Classification

Urdu Question Classification Interface

Named Entity Recognition

Named entity recognition plays a vital role in the development of numerous applications based on speech recognition, information retrieval, and machine translation. SETMOKE APIs named entity recognition module is capable to detect person name, organization, location, date, time, number,
and designation from Urdu text.

Methodology of NER

NER Interface

Information Retrieval

Information retrieval (IR) refers to the process of finding and acquiring certain data or documents from large collections against particular user query IR has revolutionized search engines by providing robust methodologies to extract most relevant documen ts or information from unstructured texts SETMOKE API IR module is capable to index, store and query a document based on relevancy. It implements several similarity measures such as BM25,TFIDF, Frequency, Dfree and PL2.

Methodology of IR

Information Retrieval Interface

Urdu Sentiment Analysis

Sentiment analysis or opinion mining is all about identifying people perceptions regarding the certain organization, person, place, product or service. Customers perceptions and feedback are usually acquired through focus groups, surveys, observation, and some other pretty labor intensive methods SETMOKE sentiment analysis employ deep learning models and an existing Urdu sentiment dictionary comprising of 4000 positive and 2000 negative expressions to correctly classify sentiments. We also extend the Urdu sentiment dictionary with 4000 neutral expressions in order to better classify sentiments expressed in Nastaleeq Urdu.

Methodology of Urdu Sentiment Analysis

Urdu Sentiment Analysis Interface

Datasets and Results (As per Need)

Contribution of Datasets

Urdu Stemmer This dataset has 4162 base words and 9743 words with possible variations of the base words.Accuracy is 97 % Urdu Text Classification Statistics of DSL and CLE dataset

Results of Urdu Text Classification

1- DSL Dataset Results Results of Text Classification CLE dataset Results

2- Urdu Text Summarization Dataset and Statistics

3- Results of Urdu Text Summarization

4- Urdu Question Classification

Subjectivity Based E-Learning Dataset Classes and Number of Questions

Biology : 274
Chemistry: 52
Computer: 166
Education: 124
Environment: 13
Pakistan Studies: 39
Physics: 137
Algorithm : CNN-based Model
Accuracy : N/A
Difficulty based E-Learning Dataset
Classes and Number of Questions

Hard : 274
Medium: 52
Easy: 166
General Urdu Question Classification dataset
Description : 801
Entity : 1004
Abbreviation : 35
Number : 408
Location : 193
Other : 76

Urdu Named Entity (Recognition Available Tags)


Algorithm : Bidirectional-LSTM

Accuracy : 93.35%

Dataset Statistics

Total Documents: 633
Total Sentences: 3232
Total Words: 109816
Total unique words : 12527
Words with no Tags: 88.75%
Location tags: 1.92%
Person tags: 3.44%
Time tags: 0.36%
Organization tags: 1.48%
Number tags: 2.09%
Designation tags: 0.66%
DATE tags: 1.3%

Nastaliq Urdu Sentiment Classes and Training Words

Positive: 2633 Words

Negative: 4754 Words
Neutral: 2000 Words
Total 109 sentences for testing
Positive: 51 Sentences
Negative: 48 Sentences
Neutral: 10 Sentences
Accuracy : 97%

Roman Urdu Sentiment Analysis

Total Documents: 12099
Training: 8711
Validation : 2178
Testing : 1210
Accuracy : 71.4%

Urdu Information Retrieval

We used 500 documents from different classes such as agriculture,business, entertainment, news and sports etc. We used 30 Queries relevant to
document classes to make Gold Standard dataset.
Applications of:

1- SETMOKE Spam Complaints Filtering System

2- Criminal Case Log Aggregation & Summarization

3- Ontology based Urdu Car Advertisement Search Engine

Tumor segmentation in Liver

The segmentation of the liver and its lesions on medical images enables oncologists to diagnose liver cancer correctly and to evaluate patient reaction to therapy. A fully automatic method to segment the liver and locate its unhealthy tissues is a useful instrument for diagnosing hepatic illnesses and evaluating their action to the medicines. We aim to propose an automatic method to segment liver and its lesions in Computed Tomography (CT) scans using Convolutional Neural Networks (CNNs).

Tumor segmentation in Liver

Tumor segmentation in Liver

Spine Curvature Estimation

Adolescent Idiopathic Scoliosis (AIS) shows in adults as an abnormal spine curvature. Precise automated assessment of the Cobb angle that quantitatively evaluates scoliosis plays a significant part in the diagnosis and therapy of scoliosis. By inspiring through the architecture and popularity of Convolutional Neural Networks (CNNs) in the field of deep learning, we aim to propose a novel automated method to find the Cobb angles for spine curvature estimation that will help in the assessment of scoliosis.

Spine Curvature Estimation

Pneumonia Identification

Over 15% of deaths including children under age 5 are caused by pneumonia globally. We trained a deep learning model for the identification and localization of pneumonia in Chest X-Rays (CXRs) images. Our identification model is based on Mask-RCNN, a deep neural network approach that incorporated local and global features for pixel-wise segmentation.

Pneumonia Identification

Segmentation Tasks

Rapid development in the field of medical imaging are radically changing medicine. Determining the existence or severity of the disease will affect a patient’s clinical care or outcome status in research. During radiotherapy planning, accurate segmentation of medical images is a main step in contouring. We performed the following segmentation tasks in this regard:

  • Lung Segmentation

One of the significant steps in automatic chest X-ray images assessment is to correctly identify the limits of the lung. We put an effort to extract the lung boundary and introduces lung segmentation using rule-based methods such as adaptive thresholding in Chest X-rays.

  • Brain Tumor Segmentation

For tumor segmentation in brain, we devised different rule-based approaches in image processing including adaptive thresholding and K-Means clustering. The thresholding was performed on Magnetic Resonance Imaging (MRI) scans.


Segmentation Tasks

Police Training System Using Virtual Reality

Using this system we can train police officers, how they should engage in a dangerous situations. We can monitor their motion and responses with the system. System consists of two modules, VR(virtual reality) and supportive hardware. Using VR we can provide officers different environments with different tasks. Difficulty of the tasks can also be controlled, to help them learn gradually. Using the hardware module we can ensure that the environment and the situation feels authentic. During the mission if any officers gets a hit with something then to ensure the authenticity of the situation the supportive hardware will hit him with a small jerk. The system is currently under development.

Police Training System Using Virtual Reality

Scoliosis Detection

This application detects possible presence of scoliosis (a condition which adds abnormal curvature to the spine) in a subject based on his standing and/or walking posture. Depth sensor technology is used to detect a person’s body frame and calculate different angles between shoulders, neck and spinal column. These angles then help to decide the presence or absence of scoliosis using machine learning.

Self-Learning Gait Recognition for Security Installations and Access Control Systems

After facial recognition, gait recognition for identification of a person is being explored and implemented in some places. We have developed a hybrid system of these two technologies which, in addition, is capable of learning and improving its accuracy on the go. This feature can be deployed in any existing surveillance or access control system.