Data Frames and Transactions

Transactions are a very useful tool when dealing with data mining.  It provides a way to mine itemsets or rules on datasets.

In R the data must be in transactions form.  If the data is only available in a data.frame then to create (or coerce) the data frame to transaction the researcher may use the following code.   This example shows the “Adult” dataset available in the arules package.  It originates from the “Census Income” database.  These data, AdultUCI, can be coerced to transactions using the following commands:


library("arules");

data("AdultUCI");

Adult = as(AdultUCI, "transactions");

The dataframe can be in either a normalized (single) form or a flat file (basket) form.  When the file is in basket form it means that each record represents a transaction where the items in the basket are represented by columns.  When the dataset is in ‘single’ form it means that each record represents one single item and each item contains a transaction id.  The following snippet of code shows the read.transaction() function and how the data is set up.


my_data = paste("1,2","1","2,3", sep="\n");

write(my_data, file = "my_basket");

trans = read.transactions("my_basket", format = "basket", sep=",");

inspect(trans);

Once the data has been coerced to transactions the data is ready for mining itemsets or rules.  Association Rule Learning uses the transaction data files available in R.  A very popular algorithm for association rules is the apriori algorithm.  I have discussed approaches on the use of Association Rule Learning and the Apriori Algorithm.

 

Posted in Uncategorized

5 replies on “Data Frames and Transactions

  1. Hi,

    Thanks for sharing but I have a problem when I call this part:
    Adult = as(AdultUCI, “transactions”);
    Error in asMethod(object) :
    column(s) 1, 3, 5, 11, 12, 13 not logical or a factor. Use as.factor or categorize first.

    You know what it is? Thanks in advance.

    1. Well, you can’t have non-factors columns in your transaction object.
      Here are a step by step instructions I used to do it (not optimized):
      **********
      # Retrieve the Type of columns
      typeCols <- sapply(AdultUCI, class)

      # Retrieve list of columns that are qualitative / categorical variables
      factCols <- grep('factor', typeCols)
      subAdultUCI <- AdultUCI[,factCols]

      Adult <- as(subAdultUCI, "transactions");
      rules <- apriori(Adult, parameter=list(support=0.01, confidence=0.5, minlen = 2));
      ********
      Hope this gets you going.
      JM

  2. i done thses codes but agin another error ocurred when i am used with my dataset
    “MED”
    Error in apriori(MED, parameter = list(support = 0.01, confidence = 0.5, :
    internal error in trio library

    typeCols <- sapply(MED, class)
    factCols <- grep('factor', typeCols)
    subM <- MED[,factCols]
    MED <- as(subM, "transactions");
    rules <- apriori(MED, parameter=list(support=0.01, confidence=0.5, minlen = 2));
    Error in apriori(MED, parameter = list(support = 0.01, confidence = 0.5, :
    internal error in trio library

  3. Hi Sir,

    Thanks for association rule explination.

    Can we appearence option (i.e. default=”lhs”, rhs=”race.White”)

    something like this?

    Thanks,
    Prashant

Leave a Reply

Your email address will not be published.