-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtraining_data_Weka_AMZ4
529 lines (399 loc) · 46.6 KB
/
training_data_Weka_AMZ4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
Script started on 2021-12-01 05:43:48+00:00 [TERM="screen" TTY="/dev/pts/5" COLUMNS="154" LINES="81"]
-----------------1+2+3:DOWNLOAD WEKA, UPDATE CLASSPATH, CONVERT TXT FILE TO ARFF FILE------------------
--------------------------------------------------------------------------------------------------------
luong@f6linux12:~$ nkdir [K[K[K[K[K[Kmkdir a5
luong@f6linux12:~$ cd weka-3-8-5/
luong@f6linux12:~/weka-3-8-5$ export CLASSPATH=$CLASSPATH:`pwd`/weka.jar:`pwd`/libsvm.jar
luong@f6linux12:~/weka-3-8-5$ java weka.core.converters.TextDirectoryLoader -dir BODYandTWEET/[K > BODYandTWEET/[K.arff
-------------------4. ARFF FILE----------------------------
-----------------------------------------------------------
luong@f6linux12:~/weka-3-8-5$ vi BODYandTWEET.arff
[?1049h[?1h=[1;81r[H[J[?25l[81;1H"BODYandTWEET.arff" 207L, 888872C[1;1H@relation _home_luong_weka-3-8-5_BODYandTWEET
@attribute text string
@attribute @@class@@ {REVIEWS.UNHELPFUL,REVIEWS.HELPFUL}
@data
'shortly before untimely death vince lombardi jerry kramer set out collect storie those knew coach professionally includ book frank gifford bart starr pauu[9;1Hl hornung willie davi max mcgee large collection through eye those best knew football geniu kramer retell tale give best insight personality coach packer [10;1Hfan long cherish thi book thi book far biography serv more collection anecdote those seek biography lombardi recommend pride still matter david maranis thh[11;1Hi collection storie light certain make packer fan smile\n',REVIEWS.UNHELPFUL
............................
'please note thi review concern new publicationthe chronicle narnia perfect book wonderful children adult read again again lewi brilliant author theologiaa[3;1Hn competent read book young enough pick book horrifi found out reprint chronological order publisher decid tamper order read book chronological order spoii[4;1Hl surprise magic out first visit narnia lion witch wardrobe already know youre suppo know lightpole professor th dont alway put chronological order youre [5;1Hread please read correct order 1 lion witch wardrobe 2 prince caspian 3the voyage dawn treader 4 silver chair 5 horse boy 6 magician nephew 7 last battle\\[6;1Hnwednesday my b-day n don\'t know what 2 do! \nI activated my Selfcontrol block early, meaning I can\'t check out the new QC. Regularizing my internal cc[7;1Hlock is might be difficult. #fb\nWhere did u move to? I thought u were already in sd. ?? Hmmm. Random u found me. Glad to hear yer doing well.\nwednesdayy[8;1H my b-day! don\'t know what 2 do!! \n',REVIEWS.HELPFUL
~ [10;1H~ [11;1H~ [12;1H~ [13;1H~ [14;1H~ [15;1H~ [16;1H~ [17;1H~ [18;1H~ [19;1H~ [20;1H~ [21;1H~ [22;1H~ [23;1H~ [24;1H~ [25;1H~ [26;1H~ [27;1H~ [28;1H~ [29;1H~ [30;1H~ [31;1H~ [32;1H~ [33;1H~ [34;1H~ [35;1H~ [36;1H~ [37;1H~ [38;1H~ [39;1H~ [40;1H~ [41;1H~ [42;1H~ [43;1H~ [44;1H~ [45;1H~ [46;1H~ [47;1H~ [48;1H~ [49;1H~ [50;1H~ [51;1H~ [52;1H~ [53;1H~ [54;1H~ [55;1H~ [56;1H~ [57;1H~ [58;1H~ [59;1H~ [60;1H~ [61;1H~ [62;1H~ [63;1H~ [64;1H~ [65;1H~ [66;1H~ [67;1H~ [68;1H~ [69;1H~ [70;1H~ [71;1H~ [72;1H~ [73;1H~ [74;1H~ [75;1H~ [76;1H~ [77;1H~ [78;1H~ [79;1H~ [80;1H~ [81;137H207,1[9CBot[2;1H[34h[?25h[?25l[81;127H:[2;1H[81;127H[K[81;1H:[34h[?25hq[?25l[81;1H[K[81;1H[?1l>[34h[?25h[?1049lluong@f6linux12:~/weka-3-8-5$
----------------5.CONVERT ARFF FILE TO WORD VECTOR--------------
----------------------------------------------------------------
luong@f6linux12:~/weka-3-8-5$ java -Xmx1024m weka.filters.unsupervised.attribute.StringToWordVector -i BODy[KYandTWEET.arff -o BODYandTWEET_training.arff _[K-M 2
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by weka.core.WekaPackageClassLoaderManager (file:/home/luong/weka-3-8-5/weka.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of weka.core.WekaPackageClassLoaderManager
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
-----------------6.ARFF AFTER TRAINING FORM----------------------
-----------------------------------------------------------------
luong@f6linux12:~/weka-3-8-5$ java -Xmx1024m weka.filters.unsupervised.attribute.StringToWordVector -i BODYandTWEET.arff -o BODYandTWEET_training.arff -M 2M[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[Cvi BODYandTWEET.arff[K
[KM[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[C[K[K[K[K[K[KT_training.arff
[?1049h[?1h=[1;81r[H[J[?25l[81;1H"BODYandTWEET_training.arff" 1214L, 373967C[1;1H@relation '_home_luong_weka-3-8-5_BODYandTWEET-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmerss[2;1H.NullStemmer-stopwords-handlerweka.core.stopwords.Null-M2-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'
@attribute @@class@@ {REVIEWS.UNHELPFUL,REVIEWS.HELPFUL}
@attribute #fb numeric
@attribute #therapyfail numeric
@attribute " numeric
@attribute *sigh* numeric
@attribute +15 numeric
@attribute - numeric
@attribute //apps numeric
@attribute //is numeric
@attribute //tr numeric
@attribute //www numeric
@attribute 0 numeric
@attribute 1 numeric
@attribute 100th numeric
@attribute 10th numeric
@attribute 12 numeric
@attribute 1st numeric
@attribute 2 numeric
@attribute 3 numeric
@attribute 30 numeric
@attribute 3rd numeric
@attribute 4 numeric
@attribute '50\%' numeric
@attribute 536-page numeric
@attribute 5am numeric
@attribute 6 numeric
@attribute 8 numeric
@attribute 9th& numeric
@attribute @Alliana07 numeric
@attribute @BatManYNG numeric
@attribute @BridgetsBeaches numeric
@attribute @ColinDeMar numeric
@attribute @FakerPattyPattz numeric
@attribute @HibaNick numeric
@attribute @Hollywoodheat numeric
@attribute @JonathanRKnight numeric
@attribute @Kenichan numeric
@attribute @LOLTrish numeric
@attribute @LettyA numeric
@attribute @MissXu numeric
@attribute @Starrbby numeric
@attribute @Viennah numeric
@attribute @alielayus numeric
@attribute @alydesigns numeric
@attribute @andywana numeric
@attribute @ashleyac numeric
@attribute @caregiving numeric
@attribute @cocomix04 numeric
@attribute @dannyvegasbaby numeric
@attribute @fleurylis numeric
@attribute @grum numeric
@attribute @jacobsummers numeric
@attribute @jdarter numeric
@attribute @jonathanchard numeric
@attribute @julieebaby numeric
@attribute @kpreyes numeric
@attribute @localtweeps numeric
@attribute @machineplay numeric
@attribute @makeherfamous numeric
@attribute @mangaaa numeric
@attribute @marykatherine_q numeric
@attribute @mercedesashley numeric
@attribute @nationwideclass numeric
@attribute @naughtyhaughty numeric
@attribute @ninjen numeric
@attribute @octolinz16 numeric
@attribute @penndbad numeric
@attribute @robluketic numeric
@attribute @rumblepurr numeric
@attribute @smarrison numeric
@attribute @stark numeric
@attribute @statravelAU numeric
@attribute @tea numeric
@attribute @thecoolestout numeric
@attribute @twista202 numeric
@attribute A numeric
@attribute ADDED numeric[81;137H1,1[11CTop[1;1H[34h[?25h[?25l[81;127HG[1;1H[81;127H [77;1H[H[J[1;1H{0 REVIEWS.HELPFUL,1 1,4 1,5 1,6 1,8 1,9 1,10 1,11 1,13 1,14 1,16 1,17 1,18 1,19 1,20 1,22 1,24 1,26 1,27 1,28 1,29 1,30 1,31 1,32 1,34 1,35 1,36 1,37 1,33[2;1H9 1,41 1,42 1,43 1,44 1,45 1,47 1,48 1,49 1,52 1,53 1,55 1,58 1,59 1,63 1,64 1,66 1,67 1,68 1,71 1,72 1,73 1,74 1,75 1,76 1,77 1,78 1,80 1,81 1,82 1,83 1,,[3;1H84 1,87 1,88 1,89 1,90 1,91 1,92 1,93 1,95 1,96 1,97 1,98 1,99 1,101 1,104 1,106 1,108 1,109 1,110 1,111 1,112 1,113 1,114 1,115 1,116 1,117 1,119 1,120 11[4;1H,121 1,122 1,123 1,124 1,125 1,128 1,129 1,130 1,132 1,133 1,134 1,135 1,137 1,140 1,141 1,142 1,144 1,146 1,151 1,153 1,154 1,155 1,157 1,158 1,160 1,1611[5;1H 1,162 1,163 1,164 1,165 1,167 1,168 1,169 1,170 1,171 1,172 1,174 1,175 1,176 1,177 1,179 1,180 1,181 1,183 1,185 1,186 1,187 1,190 1,191 1,192 1,193 1,11[6;1H94 1,195 1,198 1,199 1,201 1,202 1,203 1,204 1,207 1,208 1,209 1,210 1,212 1,213 1,215 1,216 1,217 1,218 1,219 1,220 1,221 1,224 1,225 1,227 1,230 1,231 11[7;1H,232 1,233 1,236 1,237 1,238 1,239 1,240 1,242 1,246 1,247 1,248 1,249 1,250 1,252 1,253 1,254 1,256 1,257 1,258 1,259 1,260 1,261 1,263 1,264 1,266 1,2677[8;1H 1,272 1,273 1,274 1,275 1,276 1,277 1,279 1,282 1,284 1,287 1,288 1,289 1,293 1,295 1,298 1,299 1,301 1,302 1,303 1,304 1,305 1,306 1,307 1,308 1,309 1,33[9;1H10 1,312 1,313 1,317 1,318 1,319 1,320 1,321 1,322 1,324 1,325 1,332 1,333 1,335 1,336 1,337 1,342 1,344 1,346 1,348 1,349 1,352 1,353 1,354 1,355 1,356 11[10;1H,357 1,359 1,360 1,361 1,362 1,365 1,366 1,367 1,368 1,369 1,372 1,374 1,375 1,376 1,377 1,380 1,381 1,382 1,384 1,385 1,388 1,389 1,390 1,393 1,394 1,3955[11;1H 1,396 1,397 1,398 1,399 1,402 1,403 1,404 1,405 1,406 1,410 1,411 1,413 1,414 1,416 1,417 1,420 1,423 1,425 1,426 1,428 1,429 1,430 1,433 1,435 1,438 1,44[12;1H45 1,450 1,451 1,452 1,455 1,456 1,457 1,458 1,460 1,461 1,462 1,464 1,467 1,468 1,469 1,470 1,472 1,473 1,474 1,475 1,477 1,479 1,480 1,481 1,482 1,483 11[13;1H,484 1,486 1,488 1,489 1,490 1,491 1,492 1,493 1,494 1,496 1,498 1,502 1,503 1,505 1,506 1,507 1,508 1,510 1,512 1,513 1,515 1,517 1,519 1,520 1,522 1,5266[14;1H 1,527 1,528 1,530 1,531 1,532 1,533 1,534 1,536 1,537 1,538 1,539 1,540 1,542 1,544 1,545 1,547 1,548 1,549 1,550 1,553 1,555 1,560 1,561 1,562 1,566 1,55[15;1H67 1,569 1,570 1,572 1,573 1,574 1,575 1,577 1,580 1,581 1,582 1,583 1,585 1,588 1,589 1,591 1,592 1,594 1,597 1,598 1,600 1,601 1,603 1,604 1,605 1,606 11[16;1H,609 1,610 1,612 1,614 1,617 1,619 1,620 1,623 1,624 1,625 1,626 1,627 1,628 1,629 1,630 1,631 1,634 1,635 1,636 1,638 1,639 1,640 1,641 1,642 1,643 1,6455[17;1H 1,646 1,647 1,648 1,649 1,651 1,652 1,654 1,655 1,656 1,658 1,659 1,661 1,662 1,663 1,664 1,665 1,668 1,669 1,670 1,673 1,674 1,675 1,676 1,677 1,682 1,66[18;1H84 1,685 1,686 1,688 1,689 1,692 1,694 1,698 1,699 1,701 1,703 1,704 1,706 1,707 1,708 1,710 1,712 1,717 1,718 1,720 1,722 1,728 1,731 1,732 1,733 1,734 11[19;1H,735 1,736 1,740 1,741 1,743 1,744 1,745 1,749 1,751 1,752 1,754 1,755 1,756 1,758 1,759 1,761 1,763 1,764 1,766 1,768 1,769 1,770 1,775 1,777 1,778 1,7811[20;1H 1,782 1,783 1,784 1,785 1,787 1,789 1,790 1,791 1,795 1,796 1,798 1,800 1,802 1,805 1,807 1,808 1,813 1,815 1,816 1,817 1,818 1,819 1,820 1,821 1,822 1,88[21;1H23 1,824 1,825 1,826 1,830 1,831 1,840 1,841 1,842 1,845 1,846 1,847 1,849 1,852 1,854 1,855 1,856 1,857 1,858 1,860 1,861 1,862 1,865 1,866 1,867 1,868 11[22;1H,869 1,870 1,872 1,873 1,874 1,875 1,876 1,877 1,878 1,880 1,881 1,883 1,884 1,885 1,886 1,887 1,888 1,889 1,890 1,891 1,892 1,893 1,896 1,897 1,898 1,9000[23;1H 1,901 1,903 1,907 1,908 1,910 1,915 1,916 1,919 1,921 1,922 1,923 1,924 1,926 1,928 1,930 1,932 1,933 1,935 1,936 1,937 1,938 1,939 1,940 1,941 1,944 1,99[24;1H45 1,947 1,949 1,950 1,951 1,952 1,953 1,954 1,956 1,957 1,958 1,961 1,962 1,963 1,964 1,965 1,966 1,968 1,971 1,972 1,973 1,974 1,975 1,976 1,977 1,978 11[25;1H,984 1,985 1,987 1,988 1,989 1,992 1,994 1,996 1,998 1,999 1,1000 1,1001 1,1002 1,1004 1,1006 1}
{301 1,385 1,399 1,454 1,557 1,735 1,779 1,884 1,890 1,936 1}
{0 REVIEWS.HELPFUL,3 1,6 1,8 1,12 1,17 1,18 1,20 1,21 1,22 1,25 1,28 1,30 1,32 1,33 1,35 1,36 1,39 1,40 1,43 1,46 1,47 1,48 1,49 1,53 1,54 1,55 1,63 1,66 [28;1H1,67 1,68 1,73 1,75 1,81 1,82 1,84 1,85 1,90 1,97 1,99 1,103 1,104 1,112 1,116 1,118 1,127 1,128 1,129 1,132 1,136 1,138 1,141 1,143 1,145 1,147 1,151 1,11[29;1H52 1,153 1,158 1,159 1,161 1,163 1,165 1,166 1,170 1,173 1,176 1,177 1,178 1,183 1,185 1,188 1,190 1,193 1,194 1,195 1,197 1,200 1,202 1,206 1,207 1,208 11[30;1H,217 1,218 1,219 1,221 1,223 1,224 1,225 1,227 1,228 1,230 1,236 1,238 1,239 1,242 1,249 1,251 1,256 1,258 1,259 1,262 1,263 1,264 1,265 1,268 1,269 1,2733[31;1H 1,274 1,275 1,277 1,278 1,279 1,281 1,282 1,283 1,284 1,286 1,288 1,289 1,290 1,293 1,294 1,301 1,304 1,306 1,309 1,312 1,313 1,314 1,315 1,317 1,319 1,33[32;1H20 1,321 1,325 1,327 1,329 1,330 1,332 1,335 1,337 1,339 1,340 1,341 1,342 1,347 1,348 1,352 1,354 1,362 1,363 1,367 1,369 1,375 1,376 1,377 1,378 1,380 11[33;1H,382 1,383 1,384 1,389 1,390 1,394 1,397 1,398 1,403 1,404 1,408 1,410 1,411 1,420 1,424 1,426 1,427 1,432 1,433 1,440 1,442 1,445 1,446 1,455 1,456 1,4577[34;1H 1,460 1,461 1,464 1,465 1,467 1,468 1,469 1,470 1,473 1,474 1,477 1,481 1,482 1,483 1,487 1,488 1,490 1,491 1,493 1,495 1,500 1,502 1,503 1,505 1,506 1,55[35;1H08 1,510 1,512 1,516 1,519 1,523 1,524 1,527 1,529 1,531 1,533 1,537 1,538 1,539 1,540 1,542 1,543 1,544 1,548 1,549 1,553 1,562 1,566 1,567 1,570 1,571 11[36;1H,572 1,573 1,574 1,575 1,577 1,579 1,581 1,582 1,584 1,588 1,591 1,593 1,594 1,598 1,599 1,600 1,601 1,606 1,607 1,609 1,610 1,612 1,614 1,615 1,617 1,6199[37;1H 1,620 1,627 1,628 1,629 1,630 1,632 1,635 1,636 1,639 1,642 1,643 1,644 1,645 1,647 1,651 1,654 1,655 1,657 1,658 1,660 1,662 1,664 1,665 1,668 1,669 1,66[38;1H71 1,673 1,674 1,675 1,676 1,679 1,680 1,684 1,685 1,686 1,689 1,692 1,693 1,694 1,697 1,698 1,700 1,704 1,709 1,714 1,717 1,723 1,728 1,733 1,738 1,739 11[39;1H,743 1,744 1,749 1,756 1,758 1,759 1,761 1,763 1,764 1,765 1,770 1,776 1,777 1,778 1,781 1,782 1,787 1,797 1,799 1,800 1,801 1,802 1,805 1,806 1,807 1,8099[40;1H 1,810 1,811 1,812 1,813 1,815 1,816 1,817 1,818 1,821 1,822 1,823 1,824 1,826 1,828 1,830 1,831 1,834 1,835 1,837 1,838 1,842 1,845 1,846 1,854 1,855 1,88[41;1H60 1,861 1,862 1,869 1,872 1,875 1,877 1,878 1,880 1,881 1,882 1,883 1,885 1,886 1,887 1,888 1,890 1,891 1,893 1,896 1,897 1,900 1,901 1,903 1,905 1,907 11[42;1H,910 1,914 1,915 1,919 1,920 1,921 1,926 1,929 1,931 1,932 1,933 1,934 1,937 1,938 1,939 1,942 1,944 1,945 1,946 1,947 1,950 1,951 1,952 1,953 1,954 1,9600[43;1H 1,961 1,965 1,966 1,970 1,971 1,972 1,973 1,974 1,975 1,978 1,981 1,983 1,984 1,988 1,991 1,995 1,996 1,997 1,998 1,1000 1,1001 1,1002 1,1006 1}
{365 1,376 1,434 1,461 1,491 1,496 1,520 1,521 1,580 1,601 1,614 1,617 1,657 1,668 1,681 1,700 1,749 1,765 1,845 1,884 1,931 1,998 1}
{0 REVIEWS.HELPFUL,1 1,2 1,3 1,4 1,5 1,6 1,7 1,9 1,11 1,12 1,14 1,15 1,16 1,17 1,18 1,19 1,20 1,21 1,22 1,23 1,24 1,25 1,26 1,27 1,28 1,31 1,33 1,34 1,35 [46;1H1,36 1,37 1,38 1,40 1,42 1,43 1,44 1,45 1,46 1,47 1,48 1,49 1,50 1,52 1,53 1,54 1,57 1,58 1,59 1,61 1,62 1,63 1,64 1,65 1,66 1,67 1,68 1,71 1,72 1,73 1,744[47;1H 1,75 1,76 1,77 1,78 1,79 1,80 1,81 1,82 1,84 1,85 1,87 1,89 1,92 1,93 1,94 1,95 1,97 1,98 1,100 1,101 1,102 1,103 1,104 1,105 1,106 1,107 1,108 1,109 1,11[48;1H10 1,111 1,113 1,114 1,115 1,116 1,117 1,118 1,119 1,120 1,122 1,123 1,124 1,125 1,126 1,127 1,128 1,129 1,131 1,132 1,133 1,134 1,135 1,137 1,139 1,140 11[49;1H,141 1,142 1,143 1,144 1,146 1,147 1,148 1,150 1,151 1,152 1,153 1,154 1,155 1,156 1,157 1,158 1,160 1,161 1,163 1,164 1,165 1,166 1,167 1,168 1,169 1,1700[50;1H 1,171 1,172 1,173 1,174 1,175 1,177 1,178 1,180 1,181 1,182 1,183 1,184 1,185 1,186 1,188 1,189 1,194 1,195 1,196 1,197 1,198 1,200 1,201 1,202 1,203 1,22[51;1H05 1,206 1,207 1,209 1,210 1,213 1,214 1,216 1,217 1,218 1,219 1,220 1,221 1,223 1,224 1,225 1,226 1,227 1,228 1,229 1,230 1,231 1,233 1,234 1,235 1,236 11[52;1H,238 1,239 1,240 1,242 1,243 1,246 1,247 1,248 1,249 1,250 1,251 1,252 1,253 1,254 1,255 1,256 1,257 1,258 1,259 1,260 1,262 1,263 1,264 1,265 1,266 1,2677[53;1H 1,268 1,270 1,271 1,272 1,273 1,274 1,275 1,277 1,278 1,279 1,280 1,283 1,284 1,285 1,286 1,287 1,288 1,289 1,290 1,293 1,294 1,295 1,297 1,298 1,300 1,33[54;1H01 1,302 1,303 1,304 1,306 1,307 1,308 1,309 1,310 1,311 1,313 1,314 1,315 1,318 1,319 1,321 1,322 1,324 1,325 1,326 1,327 1,329 1,330 1,332 1,335 1,336 11[55;1H,337 1,340 1,341 1,342 1,345 1,346 1,347 1,348 1,350 1,351 1,352 1,353 1,354 1,355 1,357 1,360 1,362 1,363 1,364 1,366 1,367 1,368 1,369 1,370 1,372 1,3733[56;1H 1,374 1,375 1,376 1,377 1,378 1,379 1,381 1,382 1,383 1,384 1,385 1,388 1,389 1,390 1,392 1,393 1,395 1,397 1,398 1,399 1,401 1,402 1,403 1,404 1,405 1,44[57;1H06 1,408 1,409 1,413 1,414 1,416 1,417 1,419 1,420 1,421 1,422 1,423 1,424 1,425 1,426 1,428 1,429 1,431 1,432 1,433 1,435 1,437 1,438 1,440 1,442 1,443 11[58;1H,445 1,446 1,447 1,450 1,451 1,455 1,456 1,457 1,458 1,459 1,460 1,461 1,462 1,463 1,464 1,465 1,467 1,468 1,469 1,471 1,472 1,473 1,474 1,475 1,477 1,4788[59;1H 1,479 1,480 1,482 1,483 1,484 1,486 1,488 1,489 1,490 1,491 1,492 1,493 1,494 1,495 1,497 1,498 1,500 1,501 1,502 1,503 1,504 1,505 1,506 1,507 1,508 1,55[60;1H10 1,511 1,512 1,513 1,514 1,515 1,516 1,517 1,518 1,519 1,520 1,521 1,522 1,523 1,524 1,525 1,526 1,527 1,528 1,529 1,530 1,531 1,532 1,533 1,534 1,535 11[61;1H,536 1,537 1,538 1,539 1,540 1,542 1,543 1,544 1,546 1,547 1,548 1,549 1,550 1,551 1,553 1,555 1,556 1,558 1,559 1,560 1,561 1,562 1,563 1,565 1,566 1,5677[62;1H 1,568 1,569 1,570 1,571 1,572 1,573 1,574 1,575 1,577 1,578 1,579 1,581 1,582 1,583 1,584 1,585 1,586 1,587 1,590 1,593 1,594 1,595 1,596 1,598 1,599 1,66[63;1H00 1,601 1,602 1,603 1,605 1,606 1,607 1,608 1,609 1,610 1,611 1,612 1,613 1,614 1,616 1,617 1,619 1,620 1,622 1,623 1,625 1,627 1,629 1,630 1,631 1,632 11[64;1H,634 1,635 1,636 1,637 1,638 1,639 1,640 1,641 1,642 1,643 1,644 1,645 1,646 1,647 1,648 1,649 1,650 1,651 1,652 1,654 1,655 1,656 1,657 1,658 1,659 1,6611[65;1H 1,662 1,664 1,665 1,667 1,668 1,669 1,670 1,671 1,673 1,674 1,675 1,676 1,677 1,679 1,680 1,682 1,684 1,685 1,686 1,687 1,688 1,689 1,690 1,692 1,693 1,66[66;1H94 1,697 1,698 1,699 1,700 1,704 1,707 1,708 1,709 1,712 1,714 1,715 1,716 1,717 1,718 1,720 1,721 1,723 1,724 1,727 1,728 1,731 1,732 1,733 1,734 1,735 11[67;1H,738 1,739 1,740 1,741 1,743 1,744 1,745 1,746 1,749 1,751 1,754 1,755 1,756 1,758 1,759 1,760 1,761 1,762 1,763 1,764 1,765 1,766 1,768 1,769 1,770 1,7755[68;1H 1,776 1,777 1,778 1,780 1,781 1,782 1,783 1,784 1,785 1,788 1,789 1,790 1,791 1,792 1,793 1,795 1,796 1,797 1,798 1,799 1,800 1,802 1,803 1,805 1,806 1,88[69;1H07 1,808 1,809 1,810 1,811 1,813 1,815 1,816 1,817 1,818 1,819 1,821 1,822 1,823 1,824 1,825 1,826 1,827 1,828 1,829 1,830 1,831 1,832 1,833 1,834 1,835 11[70;1H,837 1,838 1,840 1,842 1,843 1,845 1,846 1,849 1,850 1,852 1,854 1,855 1,856 1,857 1,858 1,860 1,862 1,864 1,865 1,866 1,868 1,869 1,870 1,871 1,872 1,8744[71;1H 1,875 1,876 1,877 1,878 1,880 1,881 1,882 1,883 1,884 1,885 1,886 1,887 1,888 1,889 1,890 1,891 1,892 1,893 1,894 1,895 1,896 1,897 1,898 1,899 1,900 1,99[72;1H01 1,903 1,905 1,907 1,908 1,909 1,910 1,913 1,914 1,915 1,916 1,919 1,921 1,922 1,923 1,924 1,925 1,926 1,928 1,929 1,930 1,931 1,932 1,933 1,934 1,935 11[73;1H,936 1,937 1,938 1,939 1,942 1,943 1,945 1,946 1,947 1,948 1,950 1,951 1,952 1,953 1,954 1,956 1,957 1,958 1,959 1,960 1,961 1,962 1,963 1,964 1,965 1,9666[74;1H 1,967 1,968 1,970 1,971 1,972 1,973 1,974 1,975 1,977 1,978 1,981 1,982 1,983 1,984 1,985 1,987 1,988 1,989 1,992 1,994 1,995 1,996 1,997 1,998 1,999 1,11[75;1H000 1,1001 1,1002 1,1003 1,1004 1,1006 1}
{305 1,334 1,643 1,844 1,845 1,884 1,974 1}
{0 REVIEWS.HELPFUL,1 1,12 1,17 1,21 1,25 1,115 1,124 1,128 1,171 1,174 1,175 1,181 1,209 1,220 1,224 1,231 1,234 1,261 1,268 1,277 1,298 1,301 1,321 1,3366[78;1H 1,339 1,346 1,389 1,393 1,398 1,402 1,403 1,406 1,416 1,429 1,464 1,472 1,520 1,553 1,561 1,562 1,577 1,580 1,620 1,623 1,631 1,646 1,651 1,654 1,658 1,66[79;1H81 1,684 1,700 1,704 1,726 1,735 1,752 1,775 1,860 1,874 1,878 1,884 1,891 1,900 1,919 1,960 1,962 1,963 1,965 1,999 1,1005 1}
~ [81;137H1214,1[8CBot[77;1H[34h[?25h[?25l[81;127H:[77;1H[81;127H[K[81;1H:[34h[?25hq[?25l[81;1H[K[81;1H[?1l>[34h[?25h[?1049lluong@f6linux12:~/weka-3-8-5$
------------
------------
====> Notice:
- The form has all attributes with their type (numeric)
- The numbers are in pair which I guess represents which attribute is present
- (value 1 means the attribute is present)
--------------7.RUN ClassificationViaRegression Classifier---------------
------------------------------------------------------------------------
luong@f6linux12:~/weka-3-8-5$ java -Xmx1024m weka.classifiers.meta.ClassificationViaRegression -W weka.classifiers.trees.M5P -num-decimal-places 4 -t BO DYandTWEET_training.arff -d a[KBODYandTWEET_training.arff [K[K[K[K[Kmodel -c 1
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by weka.core.WekaPackageClassLoaderManager (file:/home/luong/weka-3-8-5/weka.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of weka.core.WekaPackageClassLoaderManager
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Dec 01, 2021 5:47:54 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
Dec 01, 2021 5:47:54 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Dec 01, 2021 5:47:54 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
Dec 01, 2021 5:47:54 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
Options: -W weka.classifiers.trees.M5P -num-decimal-places 4
=== Classifier model (full training set) ===
Classification via Regression
Classifier for class with index 0:
M5 pruned model tree:
(using smoothed linear models)
LM1 (200/13.884%)
LM num: 1
@@class@@ =
-0.9969 * I
+ 0.0344 * book
+ 0.9638
Number of Rules : 1
Classifier for class with index 1:
M5 pruned model tree:
(using smoothed linear models)
LM1 (200/13.884%)
LM num: 1
@@class@@ =
0.9969 * I
- 0.0344 * book
+ 0.0362
Number of Rules : 1
Time taken to build model: 1.08 seconds
Time taken to test model on training data: 0.04 seconds
=== Error on training data ===
Correctly Classified Instances 199 99.5 %
Incorrectly Classified Instances 1 0.5 %
Kappa statistic 0.99
Mean absolute error 0.0103
Root mean squared error 0.0693
Relative absolute error 2.0599 %
Root relative squared error 13.8526 %
Total Number of Instances 200
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.010 0.990 1.000 0.995 0.990 0.999 0.998 REVIEWS.UNHELPFUL
0.990 0.000 1.000 0.990 0.995 0.990 0.999 0.998 REVIEWS.HELPFUL
Weighted Avg. 0.995 0.005 0.995 0.995 0.995 0.990 0.999 0.998
=== Confusion Matrix ===
a b <-- classified as
100 0 | a = REVIEWS.UNHELPFUL
1 99 | b = REVIEWS.HELPFUL
Time taken to perform cross-validation: 4.1 seconds
=== Stratified cross-validation ===
Correctly Classified Instances 199 99.5 %
Incorrectly Classified Instances 1 0.5 %
Kappa statistic 0.99
Mean absolute error 0.0116
Root mean squared error 0.0723
Relative absolute error 2.3221 %
Root relative squared error 14.4682 %
Total Number of Instances 200
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.010 0.990 1.000 0.995 0.990 0.991 0.978 REVIEWS.UNHELPFUL
0.990 0.000 1.000 0.990 0.995 0.990 0.991 0.995 REVIEWS.HELPFUL
Weighted Avg. 0.995 0.005 0.995 0.995 0.995 0.990 0.991 0.986
=== Confusion Matrix ===
a b <-- classified as
100 0 | a = REVIEWS.UNHELPFUL
1 99 | b = REVIEWS.HELPFUL
luong@f6linux12:~/weka-3-8-5$
luong@f6linux12:~/weka-3-8-5$
luong@f6linux12:~/weka-3-8-5$
luong@f6linux12:~/weka-3-8-5$ ls
BODYandTWEET a5_bodyonly_test.arff data text_example weka.ico
BODYandTWEET.arff a5_bodyonly_test_training.arff doc text_example.arff weka.jar
BODYandTWEET_training.arff a5_bodyonly_test_training.model documentation.css text_example1.arff weka.sh
BODYandTWEET_training.model a5test documentation.html text_example1_training.arff wekaexamples.zip
COPYING a5test.arff jre text_example_training.arff
README a5test_training.arff remoteExperimentServer.jar text_example_training.model
WekaManual.pdf a5test_training.model result.txt weka-src.jar
a5_bodyonly_test changelogs text_binary_classify.sh weka.gif
luong@f6linux12:~/weka-3-8-5$ cd ~
---------------8.SCRIPT FOR TRAINING DATA------------------------------------------------
-----------------------------------------------------------------------------------------
luong@f6linux12:~$ cat text_tra[K[K[Kbir[Knary_classify.sh
#!/bin/bash
cd $1 #go to weka dir
export CLASSPATH=$CLASSPATH:`pwd`/weka.jar:`pwd`/libsvm.jar #update CLASSPATH
java weka.core.converters.TextDirectoryLoader -dir $2 > $2.arff #convert txt file to arff file in weka (need to check the structure of this dir)
java -Xmx1024m weka.filters.unsupervised.attribute.StringToWordVector -i $2.arff -o $2_training.arff -M 2 #convert the .arff file from the previous step to a word vector
java -Xmx1024m weka.classifiers.meta.ClassificationViaRegression -W weka.classifiers.trees.M5P -num-decimal-places 4 -t $2_training.arff -d $2_training.model -c 1 #run ClassificationViaRegression classifier for this sample dataset (can try other classifier)
echo "done"
-------------9.TRAINING FOR 2 CASES------------------------------------------------------
-------Training files that contain both the amazon review_body abnd tweets---------------
-----------------------------------------------------------------------------------------
luong@f6linux12:~$ ./text_binary_classify.sh ~/weka-3-8-5 ~/weka-3-8-5/TWEET[K[K[K[K[KBODYandTWEET
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by weka.core.WekaPackageClassLoaderManager (file:/home/luong/weka-3-8-5/weka.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of weka.core.WekaPackageClassLoaderManager
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by weka.core.WekaPackageClassLoaderManager (file:/home/luong/weka-3-8-5/weka.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of weka.core.WekaPackageClassLoaderManager
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Dec 01, 2021 5:52:16 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
Dec 01, 2021 5:52:16 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Dec 01, 2021 5:52:16 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
Dec 01, 2021 5:52:16 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
Options: -W weka.classifiers.trees.M5P -num-decimal-places 4
=== Classifier model (full training set) ===
Classification via Regression
Classifier for class with index 0:
M5 pruned model tree:
(using smoothed linear models)
LM1 (200/13.884%)
LM num: 1
@@class@@ =
-0.9969 * I
+ 0.0344 * book
+ 0.9638
Number of Rules : 1
Classifier for class with index 1:
M5 pruned model tree:
(using smoothed linear models)
LM1 (200/13.884%)
LM num: 1
@@class@@ =
0.9969 * I
- 0.0344 * book
+ 0.0362
Number of Rules : 1
Time taken to build model: 1.08 seconds
Time taken to test model on training data: 0.05 seconds
=== Error on training data ===
Correctly Classified Instances 199 99.5 %
Incorrectly Classified Instances 1 0.5 %
Kappa statistic 0.99
Mean absolute error 0.0103
Root mean squared error 0.0693
Relative absolute error 2.0599 %
Root relative squared error 13.8526 %
Total Number of Instances 200
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.010 0.990 1.000 0.995 0.990 0.999 0.998 REVIEWS.UNHELPFUL
0.990 0.000 1.000 0.990 0.995 0.990 0.999 0.998 REVIEWS.HELPFUL
Weighted Avg. 0.995 0.005 0.995 0.995 0.995 0.990 0.999 0.998
=== Confusion Matrix ===
a b <-- classified as
100 0 | a = REVIEWS.UNHELPFUL
1 99 | b = REVIEWS.HELPFUL
Time taken to perform cross-validation: 4.08 seconds
=== Stratified cross-validation ===
Correctly Classified Instances 199 99.5 %
Incorrectly Classified Instances 1 0.5 %
Kappa statistic 0.99
Mean absolute error 0.0116
Root mean squared error 0.0723
Relative absolute error 2.3221 %
Root relative squared error 14.4682 %
Total Number of Instances 200
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.010 0.990 1.000 0.995 0.990 0.991 0.978 REVIEWS.UNHELPFUL
0.990 0.000 1.000 0.990 0.995 0.990 0.991 0.995 REVIEWS.HELPFUL
Weighted Avg. 0.995 0.005 0.995 0.995 0.995 0.990 0.991 0.986
=== Confusion Matrix ===
a b <-- classified as
100 0 | a = REVIEWS.UNHELPFUL
1 99 | b = REVIEWS.HELPFUL
done
-------Training files that contain just the amazon review_body text---------------
----------------------------------------------------------------------------------
luong@f6linux12:~$ ./text_binary_classify.sh ~/weka-3-8-5 ~/weka-3-8-5/BODYandTWEET[K[K[K[K[K[K[K[K[K[K[K[Ka5_bodyonly_test
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by weka.core.WekaPackageClassLoaderManager (file:/home/luong/weka-3-8-5/weka.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of weka.core.WekaPackageClassLoaderManager
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by weka.core.WekaPackageClassLoaderManager (file:/home/luong/weka-3-8-5/weka.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of weka.core.WekaPackageClassLoaderManager
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Dec 01, 2021 5:52:59 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
Dec 01, 2021 5:52:59 AM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Dec 01, 2021 5:52:59 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
Dec 01, 2021 5:52:59 AM com.github.fommil.netlib.LAPACK <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
Options: -W weka.classifiers.trees.M5P -num-decimal-places 4
=== Classifier model (full training set) ===
Classification via Regression
Classifier for class with index 0:
M5 pruned model tree:
(using smoothed linear models)
LM1 (200/13.358%)
LM num: 1
@@class@@ =
-1.0082 * //app
+ 0.0793 * experience
+ 0.0367 * thi
+ 0.8922
Number of Rules : 1
Classifier for class with index 1:
M5 pruned model tree:
(using smoothed linear models)
LM1 (200/13.358%)
LM num: 1
@@class@@ =
1.0082 * //app
- 0.0793 * experience
- 0.0367 * thi
+ 0.1078
Number of Rules : 1
Time taken to build model: 1.23 seconds
Time taken to test model on training data: 0.08 seconds
=== Error on training data ===
Correctly Classified Instances 199 99.5 %
Incorrectly Classified Instances 1 0.5 %
Kappa statistic 0.99
Mean absolute error 0.0118
Root mean squared error 0.0666
Relative absolute error 2.3533 %
Root relative squared error 13.3231 %
Total Number of Instances 200
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.010 0.990 1.000 0.995 0.990 1.000 1.000 REVIEWS.UNHELPFUL
0.990 0.000 1.000 0.990 0.995 0.990 1.000 1.000 REVIEWS.HELPFUL
Weighted Avg. 0.995 0.005 0.995 0.995 0.995 0.990 1.000 1.000
=== Confusion Matrix ===
a b <-- classified as
100 0 | a = REVIEWS.UNHELPFUL
1 99 | b = REVIEWS.HELPFUL
Time taken to perform cross-validation: 5.2 seconds
=== Stratified cross-validation ===
Correctly Classified Instances 199 99.5 %
Incorrectly Classified Instances 1 0.5 %
Kappa statistic 0.99
Mean absolute error 0.0133
Root mean squared error 0.075
Relative absolute error 2.652 %
Root relative squared error 15.0086 %
Total Number of Instances 200
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.010 0.990 1.000 0.995 0.990 0.997 0.995 REVIEWS.UNHELPFUL
0.990 0.000 1.000 0.990 0.995 0.990 0.997 0.997 REVIEWS.HELPFUL
Weighted Avg. 0.995 0.005 0.995 0.995 0.995 0.990 0.997 0.996
=== Confusion Matrix ===
a b <-- classified as
100 0 | a = REVIEWS.UNHELPFUL
1 99 | b = REVIEWS.HELPFUL
done
-------------10--------------------------
-------------10--------------------------
- Although the percentages from both are the same (which is not expected), I can see listed Error on training data and Stratified cross-validation did change. Thus, there are some differences in statistic in training results between using just amazon review_body text, vs. combining the review_body text with twitter tweets.
- Combining the review_body text with twitter tweets gives better results.
luong@f6linux12:~$ exit
Script done on 2021-12-01 05:53:26+00:00 [COMMAND_EXIT_CODE="0"]