|
Christos Begleris
Rank:
Posts: 1
Joined: 7/2/07
|
Why sorting out the same data set results in...
|
7/11/07 5:58 PM
I am using a data base with 3 categorical variables and 1 regression variable. I am trying to establish the variables importance on the regression value and using a tree model.
My data looks as follows: Captain Port Vessel Amount ALGIANNAKIS GEORGIOS ALGECIRAS, SPAIN Suezmax 142.23 KATSANTONIS NIKOS ASHKELON, ISRAEL Aframax 188.1 EMPENADO ORATIO AUGUSTA, GEORGIA Suezmax 1300 CADRON ROMAN BOSPORUS, TURKEY Suezmax 35.155
There are a total of 800 entries, with captains, ports and vessels being periodically repeated.
I am sorting the data as follows: a. By captain (alphabetically - with captain data in column A) b. By port (alphabetically - with port data in column A) c. By type (alphabetically - with type data in column A)
In all three cases above I am getting a different tree and different values for variable importance.
Would you be able to explain why?
How should the data be sorted to get the 'truest' results possible?
Many thanks,
Christos Begleris
|