Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot_cluster_top_terms raises AttributeError on p.getA().flatten #12

Open
elena-sharova opened this issue Oct 26, 2018 · 1 comment
Open

Comments

@elena-sharova
Copy link

Hello,

perhaps I am not feeding the data to the model in the right format, but when I call plot_cluster_top_terms, I get an AttributeError:

'numpy.ndarray' object has no attribute 'getA'

The model fits without any issues.

@Syncrossus
Copy link
Contributor

Syncrossus commented Nov 16, 2018

Hello,
I can't seem to reproduce the issue. Could you show me a relevant snippet of your code?
Here's an example of my (functional) code for comparison:

from coclust.visualization import plot_cluster_top_terms
from coclust.coclustering import CoclustMod
from gensim.matutils import corpus2csc
from gensim.corpora import Dictionary

# loading the corpus
corpus = <load a corpus of sentences that can be iterated on multiple times>
vocab = list(set([term for doc in corpus for term in doc]))

# creating standard Dictionary representation of corpus and creating standard doc-term matrix
dct = Dictionary(corpus)
bow_corpus = [dct.doc2bow(doc) for doc in corpus]
doc_term_mat = corpus2csc(bow_corpus).T
model = CoclustMod(n_clusters=4)

# surprisingly, model.fit works despite doc_term_mat being a gensim data type
model.fit(doc_term_mat)
plot_cluster_top_terms(in_data=doc_term_mat,
                       all_terms=vocab,
                       nb_top_terms=5,
                       model=model)

If you can't get this sample to work, the most likely explanation is that your corpus doesn't meet some of the obscure requirements of model.fit. I actually can't get this to work with toy examples, but I have no issues with real corpora. The problem then is that you typically can't hold the corpus in memory, so you want to read it with a generator, but then you can't iterate over it multiple times. The best workaround I've found is to create a dedicated class that redefines __next__ and __iter__ to wrap a generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants