Home Rule Discovery Systemâ„¢ Download Help and support About us
Username:    Password:     
  Create account 

Help and support
What is data mining used for?
RDS Quickstart
Forums
Data Mining Newsletter
Professional support
Installation instructions
Submit bug report

Message boards
Categories » Compumine Rule Discovery System » Pruning method

Threads [ Previous | Next ]
Pruning method
Fotios Xystrakis
Rank:
Posts: 2
Joined: 10/6/08
Pruning method | 10/14/08 12:34 PM
Dear developers/users

firstly congratulations for the software. it is indeed helpfull!

I would like to ask you which is the method for pruning the developed tree to an optimum size.
I suppose that it should be a method of "minimum error rate", since the methods for developing the tree is based on validation sets (slit sample, N-fold cross validation etc...), yet I am not sure since I could not find some documentation in the guide -maybe I did not profoundly check-
-A citation or two dealing with the specifc method which is embodied could be also great (if possible)-

I would apreciate your reaction!
Thank you in advance!
Fotis
Compumine Support
Rank:
Posts: 19
Joined: 9/8/06
RE: Pruning method | 11/5/08 9:42 PM as a reply to Fotios Xystrakis.
Dear Fotios,

Pruning of an individual tree model is based on minimizing the expected error on a validation set, i.e., a randomly selected part of the training examples that is not used for growing the tree. Note that this happens independently of the selected validation method (e.g., cross-validation), since the latter is used for choosing experimental design, i.e., how training and test examples are to be obtained or generated. The test examples are not included in neither the growing or pruning of trees.

Hoping this helps.

/The Compumine team
Fotios Xystrakis
Rank:
Posts: 2
Joined: 10/6/08
RE: Pruning method | 11/17/08 5:24 PM as a reply to Compumine Support.
thank you for the answer...

in some more details...
selecting the e.g. 3-fold validation method, there will be formed 3 groups (better called subsets) from the original data set and each time, the analysis will run with 2 of them acting a training subset and one as validation set.
the analysis will be performed till all subsets (folds) will be consecutively used as validation sets..

that means that there will be formed as many trees as the number of folds and the tree that will be finally chosen as the optimum is the one which is attributed the minimum expected error based on the validation test...

Am I right? -or lost in space?-

thank you again for your comments!
yours, Fotis