plot_cluster_top_terms raises AttributeError on p.getA().flatten #12

elena-sharova · 2018-10-26T09:47:39Z

Hello,

perhaps I am not feeding the data to the model in the right format, but when I call plot_cluster_top_terms, I get an AttributeError:

'numpy.ndarray' object has no attribute 'getA'

The model fits without any issues.

Syncrossus · 2018-11-16T11:55:45Z

Hello,
I can't seem to reproduce the issue. Could you show me a relevant snippet of your code?
Here's an example of my (functional) code for comparison:

from coclust.visualization import plot_cluster_top_terms
from coclust.coclustering import CoclustMod
from gensim.matutils import corpus2csc
from gensim.corpora import Dictionary

# loading the corpus
corpus = <load a corpus of sentences that can be iterated on multiple times>
vocab = list(set([term for doc in corpus for term in doc]))

# creating standard Dictionary representation of corpus and creating standard doc-term matrix
dct = Dictionary(corpus)
bow_corpus = [dct.doc2bow(doc) for doc in corpus]
doc_term_mat = corpus2csc(bow_corpus).T
model = CoclustMod(n_clusters=4)

# surprisingly, model.fit works despite doc_term_mat being a gensim data type
model.fit(doc_term_mat)
plot_cluster_top_terms(in_data=doc_term_mat,
                       all_terms=vocab,
                       nb_top_terms=5,
                       model=model)

If you can't get this sample to work, the most likely explanation is that your corpus doesn't meet some of the obscure requirements of model.fit. I actually can't get this to work with toy examples, but I have no issues with real corpora. The problem then is that you typically can't hold the corpus in memory, so you want to read it with a generator, but then you can't iterate over it multiple times. The best workaround I've found is to create a dedicated class that redefines __next__ and __iter__ to wrap a generator.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plot_cluster_top_terms raises AttributeError on p.getA().flatten #12

plot_cluster_top_terms raises AttributeError on p.getA().flatten #12

elena-sharova commented Oct 26, 2018

Syncrossus commented Nov 16, 2018 •

edited

Loading

plot_cluster_top_terms raises AttributeError on p.getA().flatten #12

plot_cluster_top_terms raises AttributeError on p.getA().flatten #12

Comments

elena-sharova commented Oct 26, 2018

Syncrossus commented Nov 16, 2018 • edited Loading

Syncrossus commented Nov 16, 2018 •

edited

Loading