SMS SPAM DETECTION USING DEEP LEARNING.

 

 SPAM SMS DETECTION USING DEEP LEARNING

 

PROBLEM STATEMENT:

The growth of mobile phones user has led to the dramatic increase in spam SMS through in the most of the parts of the world SMS mobile messaging is considered as clean the problem of the span messaging is it romantically increasing year by year in Middle East and Asia as many people fall for the spam messages and provide personal as well as the financial information which might lead to the financial loss due to the online fraud however it has become much important that we could analyze the content in the mobile messages and predict or segment the SMS to spam or not in this report we will analyze the data and propose a model or system that will classify the type of SMS whether it is spam or ham

 



Fig. 1: Visualization of model working. 

INTRODUCTION:

SMS that is short message service which is commonly referred as text message is the service of sending short messages of around in 160 characters to various electronic devices which has a unique cell phone number SMS is use an alternate option for email or voice call their voice call is not possible or there is a lock and lack of internet availability it is one of the most use service that generate millions of dollars for the mobile operators one of the report signifies that billions of the SMS are sent over different gadgets and are sent in a single day over the years this helps mobile operator on user revenue according to the report portio the turnover in 2010 was one 179 billion which then hiked to 200 Billion in 2011 and has crossed two 250 billion since September 2014 nowadays operator provide unlimited service for sending the messages which has huge limit nearly hundred SMS per days


SPAM MESSAGES:

These messages are undesirable and unwelcome messages this message is sent by spammers for different ill cause of taking a hold over personal data or tricking them into subscription or payment gateways for a successful financial fraud and also problem to one’s integrity

 

WORKING ARCHITECTURE OF SMS:


Fig. 2: Working Architecture of SMS.

ROOT CAUSE:

In today’s scenario we use phones for every day communication mobile application and financial transactions are increasingly relying on mobile phones due to which the people become target and get attracted towards the fraud.

So, we will be using machine learning algorithms to analyze the message content and want the user while reading or when the user open this Message search algorithm works by constructing model relying on inputs and consume information for making predictions or decisions.

 

PROJECT DEVELOPMENT STRATEGIES:

 

As classification is primary subject with dealing with the subjects of the machine learning and data mining while performing process of classification the main aim of the algorithm is to provide the classification of inputs into desired labels that is Pam or him in machine learning there are different classifiers which can be chosen on the basis of following characteristics: -

1. computational cost

2. expected output

 

CLASSIFIERS THAT CAN BE USED FOR CLASSIFICATION:

  1. Naïve Bayes:

Naïve Bayes is one of the proficient and helpful inductive learning approach for machine learning while doing classification it’s a reasonable presentation is amazing reason behind it is based upon the seldom exists in the physical domain application.

  1. Support Vector Machine (SVM):

SVM that is support vector machine attend significant enhancement over naive bayes approach and can perform stoutly over diversity of various errands

  1. Natural Language Processing (NLP):

Natural language processing NLP is a sub field of computer science and artificial intelligence which is concerned with the computers and human interaction and which will be the best classified for the human language and can build the model with the highest accuracy

 

STEPS FOR CLASSIFYING USING THE DEEP LEARNING:

i.                 Load and explore the spam data

ii.               Prepare train test data

iii.              Train the spam detection model using the approach

iv.              Evaluate the model

v.                Use the model that is final train classified to classify the new messages

 

Fig. 3: Pre-Process steps involved.

STEP 1: LOAD DATA AND EXPLORE THE DATA:

Download any data specific to that country we can use various UCI repositories or Cagle et cetera to obtain the data to use them for the training of our model once the data is obtained load the dataset to start with the modelling assign labels ham or spam so that the model can learn accordingly

 

STEP 2 A: VISUALIZATION:

To better understand the data, we need to visualize the data to visualize the text data we need to create a word cloud that is do sentimental analysis on the data

 

Fig. 4: Visualization of data using wordcloud.

STEP 2 B: PRE-PROCESS, TEST AND TRAIN DATA:

Pre-processing of tax to be able to use for the model training which includes tokenization sequencing and padding

i.                 Tokenization:

Tokenization as deep learning models do not understand text we need to convert text into numerical representation so we need to talk nice that takes views tokenizer API from tensorflow keras split which will convert words into integer

ii.               Padding:

After tokenization we represent each sentence by sequences of numbers also we will padding the sequences so that every sequence will be of same length.

Example,

sentence one has 25 words and sentence two has 52 words after applying the above step the sentence one and two will be made to sentences of max length of 50

 

STEP 3: TRAINING OUR MODEL:

Train our model using dense sequential model from Kerala use the Adam optimizer which will help to avoid model to lose local minima and accuracy categorical cross entropy to calculate the loss ReLu as activation function and drop out layer to avoid overfitting

 

STEP 4: EVALUATE THE MODEL:

Now as we have compiled our model, we need to evaluate the model we calculate the loss through categorical cross entropy if the loss is high we can raise or increase the number epoch to gain more accuracy

 

STEP 5: COMPILE MODEL INTO PRODUCTION:

Use the final train model and integrate the model be the system and check the model accuracy on the unseen data

 

BENEIFTS:

  1. Helps maintaining user integrity by preventing user to provide the data for ill cause
  2. Protection against viruses as they may contain vulnerable link upon opening of which the virus can enter our device
  3. Keeping hackers at bay
  4. Saves time
  5. Helps one from being the victim of online fraud

 

CONCLUSION:

Now we are living in the age of information and technology and people are moving away from the traditional ways in this way we are facing issues with the information integrity lost that is through spam messages we have proposed a system through which we can detect the spam messages beforehand and save one from being a victim

Comments