HTW Berlin Fotopedia, cc-by-nc, Andrea Kirkby, 2008

HTW Berlin
Fachbereich 4
Internationaler Studiengang
Internationale Medieninformatik (Master)
Semantic Modeling
Summer Term 2016

Lab 9: Classification

  1. Start Weka and load the segment challenge dataset. Are you able to do better than 78,8% using the test set as Ian did in video 1? Only spend 10 minutes!
  2. Now load the weather data. Switch to the Classify panel.
  3. The C4.5 algorithm for building decision tress is implemented in Weka as a classifier called J48. Select it by clicking the Choose button near the top of the Classify tab. Select the trees entry to reveal its subentries, and click J48 to choose that classifiers. Choose Use training set from the Test options part of the Classify panel. Once the test strategy has been set, the classifier is built and evaluated by pressing the Start button. This processes the training set using the currently selected learning algorithm. Then it classifies all the instances in the training data and outputs performance statistics.
  4. What does the output mean? How do you interpret it?
  5. How would this instance be classified using the decision tree?
    outlook = sunny, temerature = cool, humidity = high, windy = TRUE
  6. Load the iris data using the preprocess panel. Evaluate C4.5 on this data using the training set and cross validation. What is the estimated percentage of correct classifications for these two? Which estimate is more realistic?
  7. Use the Visualize classifier errors function to find the wrongly classified test instances for the cross-validation performed in Exercise 6. What can you say about the location of the errors? Play around with the visualizations until you feel comfortable with what you can do with the system.

Prepare a report detailing what you did. You should work in groups of 2 or 3. Submit your written report (everyone should have their own copy submitted) to the Moodle area by 22.00 the evening before the session it is due.

This exercise is taken from Witten, Frank & Hall: Data Mining, 3rd Edition.


Some rights reserved. CC-BY-NC Prof. Dr. Debora Weber-Wulff
Questions or comments: <weberwu@htw-berlin.de>