Bugs of running own codes for imputation #8

HelloWorldLTY · 2024-03-09T23:45:53Z

Hi, I tried to impute my own spatial datasets (as mouse) with the tutorial for imputation. However, it seems that I cannot impute it with a bug:

ValueError: None of AnnData.var.index found in pre-trained gene set. In case the input gene names are gene symbols, please enable `ensembl_auto_conversion`, or manually convert gene symbols to ensembl ids in the input dataset.

I check that my dataset is in gene name (here the genes name are all upper-case since I tried to use orthology genes.).

The text was updated successfully, but these errors were encountered:

wehos · 2024-03-11T00:07:14Z

Sorry for the inconvenience. Our method used Ensembl id as gene index. We provided an automatic method to map gene names to ensembl id based on mygene here.

HelloWorldLTY · 2024-03-12T14:27:46Z

Hi, thanks. After transferring the data with this method, I meet a new bug:
In this function:

pipeline.fit(train_data, # An AnnData object
            pipeline_config, # The config dictionary we created previously, optional
            split_field = 'split', #  Specify a column in .obs that contains split information
            train_split = 'train',
            valid_split = 'valid',
            batch_gene_list = batch_gene_list, # Specify genes that are measured in each batch, see previous section for more details
            device = DEVICE,
            )

     43 g2id = dict(zip(self.gene_list, list(range(len(self.gene_list)))))
     44 for batch in batch_gene_list:
---> 45     idx = torch.LongTensor([g2id[g] for g in batch_gene_list[batch]])
     46     self.batch_gene_mask[batch] = torch.zeros(len(g2id)).bool()
     47     self.batch_gene_mask[batch][idx] = True

KeyError: '0'

I think the reason is after transferring the gene name, there are some strange gene:

'ENSG00000137547',
  'ENSG00000120992',
  'ENSG00000187735',
  'ENSG00000047249',
  'ENSG00000023287',
  '0',
  'ENSG00000168300',
  '0-1',

wehos · 2024-03-12T15:53:04Z

Generally it is the same issue as here. Did you follow the tutorial? The tutorial should have automatically removed gene ids that are not in pretrained list.

HelloWorldLTY · 2024-03-13T04:10:20Z

Yes, I followed the tutorial but used my own datasets. The dataset I used is from tangram: https://github.com/broadinstitute/Tangram/blob/master/tutorial_tangram_with_squidpy.ipynb

I will try to remove all the genes with 0 or 0-id and then have a try🤔

wehos · 2024-03-28T20:24:09Z

Hello, I have updated the codes so that now it should work more smoothly. If you installed CellPLM with pip previously, please try pip install -U cellplm to update it accordingly. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs of running own codes for imputation #8

Bugs of running own codes for imputation #8

HelloWorldLTY commented Mar 9, 2024 •

edited

Loading

wehos commented Mar 11, 2024

HelloWorldLTY commented Mar 12, 2024

wehos commented Mar 12, 2024

HelloWorldLTY commented Mar 13, 2024

wehos commented Mar 28, 2024

Bugs of running own codes for imputation #8

Bugs of running own codes for imputation #8

Comments

HelloWorldLTY commented Mar 9, 2024 • edited Loading

wehos commented Mar 11, 2024

HelloWorldLTY commented Mar 12, 2024

wehos commented Mar 12, 2024

HelloWorldLTY commented Mar 13, 2024

wehos commented Mar 28, 2024

HelloWorldLTY commented Mar 9, 2024 •

edited

Loading