diff --git a/transforms/universal/web2parquet/README.md b/transforms/universal/web2parquet/README.md index 6fc31ca5b..2bbcaa9cd 100644 --- a/transforms/universal/web2parquet/README.md +++ b/transforms/universal/web2parquet/README.md @@ -11,15 +11,15 @@ For configuring the crawl, users need to identify the follow parameters: | parameter:type | Description | | --- | --- | -| urls:list | list of seeds URL (i.e. ['https://thealliance.ai'] or ['https://www.apache.org/projects','https://www.apache.org/foundation']). The list can include any number of valid urls that are not configured to block web crawlers | +| urls:list | list of seeds URL (i.e., ['https://thealliance.ai'] or ['https://www.apache.org/projects','https://www.apache.org/foundation']). The list can include any number of valid urls that are not configured to block web crawlers | |depth:int | control crawling depth | -| downloads:int | number of downloads that are stored to the download folder. Since the crawler operations happen asyncrhonous, the process can result in any 10 of the visited URLs being retrieved (i.e. consecutive runs can result in different files being downloaded) | -| folder:str | folder where downloaded files are stored. If the folder is not empty, new files are added or replace existing ones with the same URLs | +| downloads:int | number of downloads that are stored to the download folder. Since the crawler operations happen asynchronously, the process can result in any 10 of the visited URLs being retrieved (i.e. consecutive runs can result in different files being downloaded) | +| folder:str | folder where downloaded files are stored. If the folder is not empty, new files are added or replace the existing ones with the same URLs | ## Invoking the transform from a notebook -In order to invoke the transfrom from the notebook, users must enable nested asynchronous io as follow: +In order to invoke the transfrom from a notebook, users must enable nested asynchronous io as follows: import nest_asyncio nest_asyncio.apply()