Data Frames and Transactions

Transactions are a very useful tool when dealing with data mining.  It provides a way to mine itemsets or rules on datasets.

In R the data must be in transactions form.  If the data is only available in a data.frame then to create (or coerce) the data frame to transaction the researcher may use the following code.   This example shows the “Adult” dataset available in the arules package.  It originates from the “Census Income” database.  These data, AdultUCI, can be coerced to transactions using the following commands:


library("arules");

data("AdultUCI");

Adult = as(AdultUCI, "transactions");

The dataframe can be in either a normalized (single) form or a flat file (basket) form.  When the file is in basket form it means that each record represents a transaction where the items in the basket are represented by columns.  When the dataset is in ‘single’ form it means that each record represents one single item and each item contains a transaction id.  The following snippet of code shows the read.transaction() function and how the data is set up.


my_data = paste("1,2","1","2,3", sep="\n");

write(my_data, file = "my_basket");

trans = read.transactions("my_basket", format = "basket", sep=",");

inspect(trans);

Once the data has been coerced to transactions the data is ready for mining itemsets or rules.  Association Rule Learning uses the transaction data files available in R.  A very popular algorithm for association rules is the apriori algorithm.  I have discussed approaches on the use of Association Rule Learning and the Apriori Algorithm.

 

Leave a comment

4 Comments

  1. Eric Kureck

     /  September 18, 2013

    Hi,

    Thanks for sharing but I have a problem when I call this part:
    Adult = as(AdultUCI, “transactions”);
    Error in asMethod(object) :
    column(s) 1, 3, 5, 11, 12, 13 not logical or a factor. Use as.factor or categorize first.

    You know what it is? Thanks in advance.

    Reply
    • JM

       /  November 6, 2013

      Well, you can’t have non-factors columns in your transaction object.
      Here are a step by step instructions I used to do it (not optimized):
      **********
      # Retrieve the Type of columns
      typeCols <- sapply(AdultUCI, class)

      # Retrieve list of columns that are qualitative / categorical variables
      factCols <- grep('factor', typeCols)
      subAdultUCI <- AdultUCI[,factCols]

      Adult <- as(subAdultUCI, "transactions");
      rules <- apriori(Adult, parameter=list(support=0.01, confidence=0.5, minlen = 2));
      ********
      Hope this gets you going.
      JM

      Reply
  2. vct

     /  December 20, 2013

    i done thses codes but agin another error ocurred when i am used with my dataset
    “MED”
    Error in apriori(MED, parameter = list(support = 0.01, confidence = 0.5, :
    internal error in trio library

    typeCols <- sapply(MED, class)
    factCols <- grep('factor', typeCols)
    subM <- MED[,factCols]
    MED <- as(subM, "transactions");
    rules <- apriori(MED, parameter=list(support=0.01, confidence=0.5, minlen = 2));
    Error in apriori(MED, parameter = list(support = 0.01, confidence = 0.5, :
    internal error in trio library

    Reply
  1. Association Rule Learning and the Apriori Algorithm | Statistical Research

Leave a Reply

Your email address will not be published. Required fields are marked *


seven × 9 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

%d bloggers like this: