Skip to content

Commit

Permalink
More typos
Browse files Browse the repository at this point in the history
Signed-off-by: Maroun Touma <[email protected]>
  • Loading branch information
touma-I committed Nov 15, 2024
1 parent ef7c57d commit 8c55ad8
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions transforms/universal/web2parquet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ For configuring the crawl, users need to identify the follow parameters:

| parameter:type | Description |
| --- | --- |
| urls:list | list of seeds URL (i.e. ['https://thealliance.ai'] or ['https://www.apache.org/projects','https://www.apache.org/foundation']). The list can include any number of valid urls that are not configured to block web crawlers |
| urls:list | list of seeds URL (i.e., ['https://thealliance.ai'] or ['https://www.apache.org/projects','https://www.apache.org/foundation']). The list can include any number of valid urls that are not configured to block web crawlers |
|depth:int | control crawling depth |
| downloads:int | number of downloads that are stored to the download folder. Since the crawler operations happen asyncrhonous, the process can result in any 10 of the visited URLs being retrieved (i.e. consecutive runs can result in different files being downloaded) |
| folder:str | folder where downloaded files are stored. If the folder is not empty, new files are added or replace existing ones with the same URLs |
| downloads:int | number of downloads that are stored to the download folder. Since the crawler operations happen asynchronously, the process can result in any 10 of the visited URLs being retrieved (i.e. consecutive runs can result in different files being downloaded) |
| folder:str | folder where downloaded files are stored. If the folder is not empty, new files are added or replace the existing ones with the same URLs |


## Invoking the transform from a notebook

In order to invoke the transfrom from the notebook, users must enable nested asynchronous io as follow:
In order to invoke the transfrom from a notebook, users must enable nested asynchronous io as follows:
import nest_asyncio
nest_asyncio.apply()

Expand Down

0 comments on commit 8c55ad8

Please sign in to comment.