Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Not an arrow file #30

Open
kdbchau opened this issue Nov 30, 2022 · 5 comments
Open

Error: Not an arrow file #30

kdbchau opened this issue Nov 30, 2022 · 5 comments
Labels

Comments

@kdbchau
Copy link

kdbchau commented Nov 30, 2022

I'm having issues downloading and reading the drosophila feather file. I have no idea what the error message means and when I look it up, I don't get any answers online.

My code and the error message:

library(Rcistarget)
# load transcriptome DEGs and obtain rownames as the gene name list
gene <- read.csv("DEG_data.csv",  check.names=FALSE, header=TRUE, row.names=1)
geneSet <- rownames(gene)
class(geneSet) $ it is a character

# Load databases (1) gene-motif rankings and (2) annotation of motifs to TFs
feather_database_url='https://resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr/gene_based/dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather'
curl_fetch_memory(feather_database_url)

$status_code
[1] 200

$type
[1] NA

$headers
  [1] 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d 0a 44 61 74 65 3a 20 57 65 64 2c 20 33 30 20 4e 6f 76 20 32 30
 [38] 32 32 20 31 38 3a 33 38 3a 30 31 20 47 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 61 63 68 65 0d 0a 72 65 66 65
 [75] 72 72 65 72 2d 70 6f 6c 69 63 79 3a 20 73 74 72 69 63 74 2d 6f 72 69 67 69 6e 0d 0a 78 2d 63 6f 6e 74 65 6e 74
[112] 2d 74 79 70 65 2d 6f 70 74 69 6f 6e 73 3a 20 6e 6f 73 6e 69 66 66 0d 0a 78 2d 66 72 61 6d 65 2d 6f 70 74 69 6f
[149] 6e 73 3a 20 73 61 6d 65 6f 72 69 67 69 6e 0d 0a 4c 61 73 74 2d 4d 6f 64 69 66 69 65 64 3a 20 57 65 64 2c 20 32
[186] 30 20 4a 75 6c 20 32 30 32 32 20 30 38 3a 33 36 3a 34 37 20 47 4d 54 0d 0a 45 54 61 67 3a 20 22 32 65 39 65 30
[223] 35 33 32 2d 35 65 34 33 38 38 30 36 35 61 35 63 30 22 0d 0a 41 63 63 65 70 74 2d 52 61 6e 67 65 73 3a 20 62 79
[260] 74 65 73 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 37 38 32 31 30 37 39 35 34 0d 0a 56 69 61 3a 20
[297] 31 2e 31 20 72 65 73 6f 75 72 63 65 73 2e 61 65 72 74 73 6c 61 62 2e 6f 72 67 0d 0a 58 2d 52 50 2d 48 6f 73 74
[334] 3a 20 74 69 65 72 33 2d 70 2d 72 65 76 70 72 6f 78 2d 32 0d 0a 78 2d 78 73 73 2d 70 72 6f 74 65 63 74 69 6f 6e
[371] 3a 20 31 3b 20 6d 6f 64 65 3d 62 6c 6f 63 6b 0d 0a 53 65 74 2d 43 6f 6f 6b 69 65 3a 20 4e 53 43 5f 4a 4f 6a 64
[408] 69 32 6b 63 64 34 6c 77 7a 65 61 64 30 67 71 76 67 33 63 7a 64 72 70 7a 69 65 54 3d 35 63 63 62 61 33 64 38 38
[445] 30 37 35 66 62 39 62 63 34 62 61 30 35 35 39 64 34 62 30 35 36 39 36 66 30 32 63 64 38 35 61 66 38 63 63 66 36
[482] 61 38 39 39 62 36 62 66 34 64 33 33 66 32 35 34 31 37 63 32 35 33 36 37 63 34 3b 65 78 70 69 72 65 73 3d 54 68
[519] 75 2c 20 30 31 2d 44 65 63 2d 32 30 32 32 20 30 32 3a 33 38 3a 30 31 20 47 4d 54 3b 70 61 74 68 3d 2f 3b 73 65
[556] 63 75 72 65 3b 68 74 74 70 6f 6e 6c 79 0d 0a 0d 0a

$modified
[1] "2022-07-20 04:36:47 EDT"

$times
     redirect    namelookup       connect   pretransfer starttransfer         total 
     0.000000      0.000984      0.105118      0.330360      0.468158    630.694841 

$content
   [1] 41 52 52 4f 57 31 00 00 ff ff ff ff 00 c6 0d 00 10 00 00 00 00 00 0a 00 0c 00 06 00 05 00 08 00 0a 00 00 00 00
  [38] 01 04 00 0c 00 00 00 08 00 08 00 00 00 04 00 08 00 00 00 04 00 00 00 4f 43 00 00 98 c5 0d 00 4c c5 0d 00 14 c5
  [75] 0d 00 e0 c4 0d 00 a0 c4 0d 00 60 c4 0d 00 30 c4 0d 00 fc c3 0d 00 bc c3 0d 00 7c c3 0d 00 3c c3 0d 00 fc c2 0d
 [112] 00 bc c2 0d 00 7c c2 0d 00 3c c2 0d 00 fc c1 0d 00 bc c1 0d 00 80 c1 0d 00 4c c1 0d 00 1c c1 0d 00 e8 c0 0d 00
 [149] b4 c0 0d 00 80 c0 0d 00 4c c0 0d 00 18 c0 0d 00 e4 bf 0d 00 b0 bf 0d 00 6c bf 0d 00 28 bf 0d 00 e4 be 0d 00 a4
 [186] be 0d 00 70 be 0d 00 30 be 0d 00 f0 bd 0d 00 b0 bd 0d 00 70 bd 0d 00 34 bd 0d 00 f8 bc 0d 00 bc bc 0d 00 80 bc
 [223] 0d 00 44 bc 0d 00 08 bc 0d 00 cc bb 0d 00 90 bb 0d 00 54 bb 0d 00 18 bb 0d 00 dc ba 0d 00 a0 ba 0d 00 64 ba 0d
 [260] 00 28 ba 0d 00 ec b9 0d 00 b0 b9 0d 00 74 b9 0d 00 38 b9 0d 00 fc b8 0d 00 c0 b8 0d 00 84 b8 0d 00 48 b8 0d 00
 [297] 0c b8 0d 00 d0 b7 0d 00 94 b7 0d 00 58 b7 0d 00 1c b7 0d 00 e0 b6 0d 00 a4 b6 0d 00 68 b6 0d 00 2c b6 0d 00 f0
 [334] b5 0d 00 b4 b5 0d 00 78 b5 0d 00 3c b5 0d 00 00 b5 0d 00 c4 b4 0d 00 88 b4 0d 00 4c b4 0d 00 10 b4 0d 00 d4 b3
 [371] 0d 00 98 b3 0d 00 5c b3 0d 00 20 b3 0d 00 e4 b2 0d 00 a8 b2 0d 00 6c b2 0d 00 30 b2 0d 00 f4 b1 0d 00 b8 b1 0d
 [408] 00 7c b1 0d 00 40 b1 0d 00 04 b1 0d 00 c8 b0 0d 00 8c b0 0d 00 50 b0 0d 00 14 b0 0d 00 d8 af 0d 00 9c af 0d 00
 [445] 60 af 0d 00 24 af 0d 00 e8 ae 0d 00 ac ae 0d 00 70 ae 0d 00 34 ae 0d 00 f8 ad 0d 00 bc ad 0d 00 80 ad 0d 00 44
 [482] ad 0d 00 08 ad 0d 00 cc ac 0d 00 90 ac 0d 00 54 ac 0d 00 18 ac 0d 00 dc ab 0d 00 a0 ab 0d 00 64 ab 0d 00 28 ab
 [519] 0d 00 ec aa 0d 00 b0 aa 0d 00 74 aa 0d 00 38 aa 0d 00 fc a9 0d 00 c0 a9 0d 00 84 a9 0d 00 48 a9 0d 00 0c a9 0d
 [556] 00 d0 a8 0d 00 94 a8 0d 00 58 a8 0d 00 1c a8 0d 00 e0 a7 0d 00 a4 a7 0d 00 68 a7 0d 00 2c a7 0d 00 f0 a6 0d 00
 [593] c0 a6 0d 00 84 a6 0d 00 48 a6 0d 00 14 a6 0d 00 e4 a5 0d 00 b0 a5 0d 00 7c a5 0d 00 4c a5 0d 00 18 a5 0d 00 e4
 [630] a4 0d 00 b0 a4 0d 00 7c a4 0d 00 48 a4 0d 00 14 a4 0d 00 e0 a3 0d 00 ac a3 0d 00 78 a3 0d 00 44 a3 0d 00 10 a3
 [667] 0d 00 e0 a2 0d 00 ac a2 0d 00 78 a2 0d 00 44 a2 0d 00 10 a2 0d 00 d8 a1 0d 00 a4 a1 0d 00 70 a1 0d 00 3c a1 0d
 [704] 00 08 a1 0d 00 d4 a0 0d 00 9c a0 0d 00 64 a0 0d 00 30 a0 0d 00 f8 9f 0d 00 c0 9f 0d 00 8c 9f 0d 00 54 9f 0d 00
 [741] 20 9f 0d 00 ec 9e 0d 00 b8 9e 0d 00 84 9e 0d 00 54 9e 0d 00 24 9e 0d 00 f0 9d 0d 00 bc 9d 0d 00 88 9d 0d 00 54
 [778] 9d 0d 00 20 9d 0d 00 ec 9c 0d 00 b4 9c 0d 00 7c 9c 0d 00 48 9c 0d 00 14 9c 0d 00 e0 9b 0d 00 ac 9b 0d 00 74 9b
 [815] 0d 00 40 9b 0d 00 08 9b 0d 00 d0 9a 0d 00 98 9a 0d 00 5c 9a 0d 00 24 9a 0d 00 f0 99 0d 00 bc 99 0d 00 88 99 0d
 [852] 00 54 99 0d 00 24 99 0d 00 f4 98 0d 00 c0 98 0d 00 8c 98 0d 00 5c 98 0d 00 28 98 0d 00 f4 97 0d 00 c0 97 0d 00
 [889] 8c 97 0d 00 5c 97 0d 00 28 97 0d 00 f8 96 0d 00 c8 96 0d 00 90 96 0d 00 60 96 0d 00 2c 96 0d 00 f4 95 0d 00 bc
 [926] 95 0d 00 88 95 0d 00 54 95 0d 00 20 95 0d 00 ec 94 0d 00 b8 94 0d 00 84 94 0d 00 50 94 0d 00 1c 94 0d 00 e4 93
 [963] 0d 00 ac 93 0d 00 74 93 0d 00 40 93 0d 00 0c 93 0d 00 d8 92 0d 00 a4 92 0d 00 70 92 0d 00 3c 92 0d 00 08 92 0d
[1000] 00
 [ reached getOption("max.print") -- omitted 782106954 entries ]


motifRankings <- importRankings("dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather")

Error: Invalid: Not an Arrow file

That's it...that one line is the whole error and I don't know what an arrow file is?

@kdbchau
Copy link
Author

kdbchau commented Nov 30, 2022

Update - okay so I have changed my code bit and instead used the following to download the file I needed. But now I get a new error again with importRankings():

getOption('timeout')
options(timeout=3600)
download.file(feather_database_url, destfile=basename(feather_database_url)) # this took about 10 mins for me
motifRankings <- importRankings("dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather")

Error: IOError: Verification of flatbuffer-encoded Footer failed.

Something is wrong with the way importRankings is opening the downloaded feather file. It is an error no matter what method I use to download the the file.

@s-aibar
Copy link
Member

s-aibar commented Dec 1, 2022

Dear @kdbchau,

I don't recognize the error message, but the most likely reason is a failure on the download. The files are quite big, and the attempt to download them directly in R usually results in incomplete files.

Please confirm whether the download was successful by comparing the shasums (they are available in the same location as the database file: https://resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr/gene_based/dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather.sha1sum.txt ).

If the download is incomplete, we would suggest to try with zsync (there are some example code and instructions on the header of the download page: https://resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr/gene_based/).

I hope this helps,
Sara





We normally recommend to use zsync (you can see the exact code here), but any other comm

@s-aibar s-aibar added the solved? label Dec 1, 2022
@kdbchau
Copy link
Author

kdbchau commented Dec 1, 2022

Do incomplete downloads always produce a shasum file? I see the file in the location I had it downloaded but no shasum files. Reason I had to change a directory was because I could not find any access to a database folder.

@kdbchau
Copy link
Author

kdbchau commented Dec 1, 2022

So I used zsync_curl and after a while I still got this error trying to use importRankings:

# On my terminal
$ zsync_curl https://resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr/gene_based/dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather.zsync
.
.
.
#################### 100.0% 1560260.6 kBps DONE

verifying download...checksum matches OK
used 1972224 local, fetched 780134400

Seems like everything downloaded fine.

But now I have two database files in my directory, one ending with .feather (from my original code, it is 763,776 KB) and the one from zsync which for some reason ends with .feather.zs-old (it is 766,015 KB). Do these sizes make sense?

When I try importRankings on either file, the .feather gives me the original Error: Invalid: Not an Arrow file. If I try it with the .zs-old gives me this error: Error in getColumnNames(dbFile) : object 'ret' not found

Should the zsync downloaded file have the .zs-old old ending or did I somehow mess this up and download the incorrect file by chance?

@kdbchau
Copy link
Author

kdbchau commented Dec 2, 2022

I have another question - how do I even download the sha256sum.txt for the database I downloaded? The link you provide is what would be the shasum textfile for the database for me to compare with - but I redownloaded the database on a different cluster a few times with no sha256sum.txt file produced anywhere.

Once again, on a different cluster with the latest Rcistarget downloaded the feather file downloaded (746kb - I downloaded it two ways and the size was identical) the error persists:


> library(RcisTarget)
> motifRankings <- importRankings("dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather")
Error in openFeather(path) : Invalid: Not a feather file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants