Attachment 'sheet09.m'

Download

   1 function out = sheet09
   2 
   3 data % load data prepared by sheet09.py
   4 
   5 % compute word counts, tf-idf, and term frequencies
   6 word_counts = word_counts;
   7 tf = tf(word_counts);
   8 tfidf = tf_idf(word_counts, doc_counts);
   9 
  10 % plot similarities (scalar products) for all three
  11 % features
  12 figure(1)
  13 S = similarities(word_counts);
  14 plot_sim(S, doc_class);
  15 title('similarities using the raw word_counts');
  16 
  17 figure(2)
  18 S = similarities(tf);
  19 plot_sim(S, doc_class);
  20 title('similarities using the term-frequencies');
  21 
  22 figure(3)
  23 S = similarities(tfidf);
  24 plot_sim(S, doc_class);
  25 title('similarities using the tf-idfs');
  26 caxis([0, 1])
  27 
  28 % collect word counts to compare classes
  29 wc1 = collect_word_counts(word_counts, doc_class == 1);
  30 wc2 = collect_word_counts(word_counts, doc_class == -1);
  31 
  32 % first, we take term-frequencies
  33 wc1 = tf(wc1);
  34 wc2 = tf(wc2);
  35 
  36 fprintf('Top 20 words for positive and negative class using\n');
  37 fprintf('term frequencies\n');
  38 
  39 fprintf('Top 20 positive words:\n');
  40 show_top_words(20, wc1, words)
  41 fprintf('Top 20 negative words:\n');
  42 show_top_words(20, wc2, words)
  43 
  44 % now, we weight the term frequencies by the invers document
  45 % frequency
  46 tfidf1 = tf_idf(wc1, doc_counts);
  47 tfidf2 = tf_idf(wc2, doc_counts);
  48 
  49 fprintf('Top 20 words for positive and negative class using\n');
  50 fprintf('tf-idf scores\n');
  51 
  52 fprintf('Top 20 positive words:\n');
  53 show_top_words(20, tfidf1, words)
  54 fprintf('Top 20 negative words:\n');
  55 show_top_words(20, tfidf2, words)
  56 
  57 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  58 % Your solution below!
  59 %
  60 
  61 % 1. compute term frequencies from word counts
  62 function word_counts = tf(word_counts)
  63 % ...
  64 
  65 % 2. compute the TF-IDF statistic
  66 function score = tf_idf(word_counts, doc_counts)
  67 % ...
  68 
  69 % 3. compute linear similarities (scalar products between
  70 % all *rows* of matrix feats.
  71 function S = similarities(feats)
  72 % ...
  73 
  74 % 4. plot similarities. Also plott boxes around the classes. You can
  75 % assume that the class where doc_class == 1 come first.
  76 function plot_sim(S, doc_class)
  77 % ...
  78 
  79 % 5. print the top n entries in each *row* of feats. Use the words
  80 % cell-array to print the real names.
  81 function show_top_words(n, feats, words)
  82 % ...
  83 
  84 % 6. from word_counts, sum rows in index I
  85 function wc = collect_word_counts(word_counts, I)
  86 % ...

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2010-06-17 12:21:00, 2873.9 KB) [[attachment:bioinf.pdf]]
  • [get | view] (2010-05-04 11:11:44, 3591.3 KB) [[attachment:cca_lecture.pdf]]
  • [get | view] (2010-06-08 09:37:55, 209.6 KB) [[attachment:data.tar.gz]]
  • [get | view] (2010-04-19 09:59:41, 65.6 KB) [[attachment:full_sheet01.pdf]]
  • [get | view] (2010-04-20 09:18:53, 61.3 KB) [[attachment:full_sheet02.pdf]]
  • [get | view] (2010-04-27 09:42:10, 70.0 KB) [[attachment:full_sheet03.pdf]]
  • [get | view] (2010-05-04 10:48:12, 75.0 KB) [[attachment:full_sheet04.pdf]]
  • [get | view] (2010-05-11 08:22:55, 91.1 KB) [[attachment:full_sheet05.pdf]]
  • [get | view] (2010-05-18 10:01:06, 61.9 KB) [[attachment:full_sheet06.pdf]]
  • [get | view] (2010-05-27 10:02:14, 76.7 KB) [[attachment:full_sheet07.pdf]]
  • [get | view] (2010-06-01 08:38:57, 70.8 KB) [[attachment:full_sheet08.pdf]]
  • [get | view] (2010-06-08 09:37:48, 58.2 KB) [[attachment:full_sheet09.pdf]]
  • [get | view] (2010-06-15 10:05:24, 120.8 KB) [[attachment:full_sheet10.pdf]]
  • [get | view] (2010-06-22 08:07:29, 71.3 KB) [[attachment:full_sheet11.pdf]]
  • [get | view] (2010-06-29 09:14:44, 76.2 KB) [[attachment:full_sheet12.pdf]]
  • [get | view] (2010-07-06 10:08:39, 83.4 KB) [[attachment:full_sheet13.pdf]]
  • [get | view] (2010-06-01 08:40:12, 1391.7 KB) [[attachment:kld-tutorial.pdf]]
  • [get | view] (2010-05-27 06:38:11, 2850.3 KB) [[attachment:lect-ids.pdf]]
  • [get | view] (2010-05-20 13:07:56, 2099.2 KB) [[attachment:lect-struct.pdf]]
  • [get | view] (2010-04-20 09:19:25, 26591.3 KB) [[attachment:mnist_train.mat]]
  • [get | view] (2010-07-06 10:08:16, 192.5 KB) [[attachment:optim-intro.pdf]]
  • [get | view] (2010-04-20 09:19:00, 1.0 KB) [[attachment:sheet02.m]]
  • [get | view] (2010-05-11 08:23:01, 0.6 KB) [[attachment:sheet05.m]]
  • [get | view] (2010-05-27 10:02:41, 4.3 KB) [[attachment:sheet07.m]]
  • [get | view] (2010-06-01 08:39:07, 0.9 KB) [[attachment:sheet08.m]]
  • [get | view] (2010-06-08 09:38:00, 2.2 KB) [[attachment:sheet09.m]]
  • [get | view] (2010-06-08 09:38:06, 2.3 KB) [[attachment:sheet09.py]]
  • [get | view] (2010-06-22 08:07:55, 1.1 KB) [[attachment:sheet11.m]]
  • [get | view] (2010-06-22 08:07:51, 129.6 KB) [[attachment:splice-test-data.txt]]
  • [get | view] (2010-06-22 08:09:26, 5.4 KB) [[attachment:splice-test-label.txt]]
  • [get | view] (2010-06-22 08:07:41, 59.6 KB) [[attachment:splice-train-data.txt]]
  • [get | view] (2010-06-22 08:07:47, 2.5 KB) [[attachment:splice-train-label.txt]]
  • [get | view] (2010-04-27 08:49:35, 1515.8 KB) [[attachment:ssa_data.mat]]
  • [get | view] (2010-04-27 08:49:39, 585.7 KB) [[attachment:ssa_lecture.pdf]]
  • [get | view] (2010-04-27 08:49:50, 7.4 KB) [[attachment:ssa_simple.m]]
  • [get | view] (2010-05-27 06:34:10, 1217.5 KB) [[attachment:stud-data.mat.gz]]
  • [get | view] (2010-06-08 09:39:24, 1013.6 KB) [[attachment:textmining.pdf]]
  • [get | view] (2010-05-04 10:49:39, 1.0 KB) [[attachment:tkcca_example.m]]
  • [get | view] (2010-05-04 10:48:19, 4.1 KB) [[attachment:tkcca_simple.m]]
  • [get | view] (2010-05-04 10:48:24, 150.9 KB) [[attachment:tkcca_toy_data.mat]]
 All files | Selected Files: delete move to page copy to page

You are not allowed to attach a file to this page.