Skip to content

Commit

Permalink
#1401 update readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
mvolikas authored and rzo1 committed Nov 13, 2024
1 parent d8776b4 commit 54b7697
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 10 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ This will not only create a fully formed project containing a POM with the depen

Alternatively if you can't or don't want to use the Maven archetype above, you can simply copy the files from [archetype-resources](https://github.com/apache/incubator-stormcrawler/tree/master/archetype/src/main/resources/archetype-resources).

Have a look at the code of the [CrawlTopology class](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/src/main/java/CrawlTopology.java), the [crawler-conf.yaml](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/crawler-conf.yaml) file as well as the files in [src/main/resources/](https://github.com/apache/incubator-stormcrawler/tree/master/archetype/src/main/resources/archetype-resources/src/main/resources), they are all that is needed to run a crawl topology : all the other components come from the core module.
Have a look at [crawler.flux](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/crawler.flux), the [crawler-conf.yaml](https://github.com/apache/incubator-stormcrawler/blob/master/archetype/src/main/resources/archetype-resources/crawler-conf.yaml) file as well as the files in [src/main/resources/](https://github.com/apache/incubator-stormcrawler/tree/master/archetype/src/main/resources/archetype-resources/src/main/resources), they are all that is needed to run a crawl topology : all the other components come from the core module.

## Getting help

Expand Down
11 changes: 2 additions & 9 deletions archetype/src/main/resources/archetype-resources/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,13 @@ where _seeds.txt_ is a file containing URLs to inject, with one URL per line.

# Running the crawl

You can now submit the topology using the storm command:

``` sh
storm local target/${artifactId}-${version}.jar --local-ttl 60 ${package}.CrawlTopology -- -conf crawler-conf.yaml
```

This will run the topology in local mode for 60 seconds. Simply use the 'storm jar' to start the topology in distributed mode, where it will run indefinitely.

You can also use Flux to do the same:
You can now submit the flux topology using the storm command:

``` sh
storm local target/${artifactId}-${version}.jar org.apache.storm.flux.Flux crawler.flux --local-ttl 3600
```

Note that in local mode, Flux uses a default TTL for the topology of 20 secs. The command above runs the topology for 1 hour.

Alternatively, you can use `storm jar` to start the topology in distributed mode, where it will run indefinitely.
It is best to run the topology with `storm jar` to benefit from the Storm UI and logging. In that case, the topology runs continuously, as intended.

0 comments on commit 54b7697

Please sign in to comment.