Sponsored Ad

Sunday, November 8, 2009

How to use Bayesian Theorem in Software Testing

Program proving is basically exert in continuous exploration, discovering and interviewing. This exert becomes very interesting and disputing at times, when the application blow test is as complex as Maps. You must have utilized applications like Google function, Yahoo function, etc main use of these applications is to help users find the route. As a contribution to these applications, the user gives source and destination and based on this information, maps to give directions to get from origin to destination. You might think that the description that the application is simple but has many challenges.

As a tester you need to find out relevant consultations and the quality of the results produced by the system.

During beta testing the application, we have thousands of queries and input data that were used by end users. To give an idea about how much data we had, for each city there are over 8,000 consultations. For example, Mumbai Hotels, Mumbai Escorts, Mumbai Taj, etc. to find relevant data of these consultations is a very difficult and time consuming.

These data can be analyzed for appropriate consultation in two different forms, or human resources are applied to analyze this or the use of Artificial Intelligence and write some clever tool. Given that human resources is too expensive to get:) we decided to develop a tool to classify the input data.

After seeing the various possible solutions, we decided to use Bayesian classifier. For people who are interested in learning more about Bayesian classifier, this is what Wikipedia says about him.

Bayes 'low (as well known as Bayes pattern or Bayes' law) is a answer in probability theory, which concerns the conditional probability distributions of random variables and marginal. In some interpretations of probability, Bayes' theorem tells how to modify or revise beliefs in light of new evidence a posteriori.

The probability of event A conditional on another event B is generally different from the probability of B conditional on A. However, there is a definite relationship between the two, and Bayes' theorem is a statement of that relationship.

The use of the classifier based on Bayesian theorem is well known in the filtering of spam email. Generally, spam filters, have a large dataset in terms of good and spam email. Works on the probability that certain words are present in the spam messages instead of normal mail. Filtration System spam email also learn that it is to users every time the user successfully Report spam or not spam button.

So we write our own tool based on the theorem Bayseian with the ability to learn what is a fact that good and evil is a fact. This tool will learn to classify data based on how they train. In simple terms, the entry of the tool would be the definition of what is good, the bad and the sample data. Based on this, it will sort the data in good or bad, simple as that.

Normally, to classify a set of text, we must teach the tool that is good and bad. During the training, sorter track how often words are classified as good or bad are occurring in each category.

Application program

This tool was arose in crimson, as the collection of Lucas Carlson classifier gem is now available as a finisher. This library features a naive Bayesian classifier. More data about this can be base here.

In our application, following code reads three files

* Good.yml

* Not_good.yml

* Input File

For implementation, we must give two arguments to the command line. City name and input file name. Now, on the basis of the definition of good and evil, it will create a directory named for the city and put good.txt and bad.txt in that directory with information classified as good or bad.

require 'Stemmer'

require 'classifier'

if ARGV.empty?

puts "*** You must provide names and city name input to the script file **** \ n"

else if ARGV [1]

puts "I am in search of the city # (ARGV [0]) \ n"

puts "The input file is # (ARGV [1]) \ n"

BATCH = ARGV [1]. to_s.downcase

pwd = Dir.getwd

City = ARGV [0]. to_s.downcase]. to_s.downcase

Dir.mkdir ( "# (city)")

The load ratings above #

= Good YAML:: LOAD_FILE ( 'good.yml')

Not good = YAML:: LOAD_FILE ( 'not_good.yml')

data = File.open ( "# () input file", "r")

Goody = File.open ( "# (PWD)" + "\ \ "+"#{ city)" + "\ \ good.txt", "A")

nogood = File.open ( "# (PWD)" + "\ \ "+"#{ city)" + "\ \ nogood.txt", "A")

classifier = Classifier:: Bayes.new ( 'good', 'not good')

# Train the classifier

not_good.each (| | not_good not_good classifier.train_no_good)

good.each (| | good_one good_one classifier.train_good)

while Line3 = data.gets

if classifier.classify (line 3) == "Good"

goody.write Line3

else

no good. write Line3

end

end

else

puts "*** The second argument is the file name is required *** \ n"

end

end

Quality of results

The quality of the result depends on the amount of training we have given to the classifier. It is a kind of apprenticeship system where the quality of the result depends on training. The main advantage of this approach is the reduction in human effort needed to sort the data. Similar to this, there are many applications where human intervention is needed to classify what is good and bad. A properly trained classifier similar to this can be useful in similar situations.

We hope this interesting article and you will be able to use it if necessary to classify the data for your application.

0 comments:

Post a Comment

Sponsored Ad

Development Updates

Tech Updates