Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in SmartScraperGraph: TypeError: unsupported operand type(s) for -: 'str' and 'int' #876

Closed
d-sutariya opened this issue Jan 10, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@d-sutariya
Copy link

d-sutariya commented Jan 10, 2025

I encountered a TypeError when running the SmartScraperGraph pipeline in the ScrapeGraphAI library. The error message is as follows:

WARNING! token is not default parameter.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.
--- Executing Fetch Node ---
--- (Fetching HTML from: https://higher.gs.com/campus?&page=1&sort=RELEVANCE) ---
--- Executing ParseNode Node ---
Traceback (most recent call last):
  File "/home/deeps/deep/projects/job scrapper/app.py", line 38, in <module>
    result = smart_scraper_graph.run()
  File "/home/deeps/miniconda3/envs/job_scrapper/lib/python3.13/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 292, in run
    self.final_state, self.execution_info = self.graph.execute(inputs)
                                            ~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/home/deeps/miniconda3/envs/job_scrapper/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py", line 358, in execute
    return self._execute_standard(initial_state)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/home/deeps/miniconda3/envs/job_scrapper/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py", line 303, in _execute_standard
    raise e
  File "/home/deeps/miniconda3/envs/job_scrapper/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py", line 276, in _execute_standard
    result, node_exec_time, cb_data = self._execute_node(
                                      ~~~~~~~~~~~~~~~~~~^
        current_node, state, llm_model, llm_model_name
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/deeps/miniconda3/envs/job_scrapper/lib/python3.13/site-packages/scrapegraphai/graphs/base_graph.py", line 200, in _execute_node
    result = current_node.execute(state)
  File "/home/deeps/miniconda3/envs/job_scrapper/lib/python3.13/site-packages/scrapegraphai/nodes/parse_node.py", line 98, in execute
    chunk_size=self.chunk_size - 250,
               ~~~~~~~~~~~~~~~~^~~~~
TypeError: unsupported operand type(s) for -: 'str' and 'int'

This error suggests that the chunk_size variable is expected to be an integer but is treated as a string, leading to an invalid subtraction operation.

Steps to Reproduce:
The issue occurs on both Windows 11 and Ubuntu 24.04 with the following code snippet:

from scrapegraphai.graphs import SmartScraperGraph
from langchain_huggingface.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

# Define your HuggingFace models
llm_model_instance = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2",
    token="YOUR_HUGGINGFACE_API_KEY"
)

embedder_model_instance = HuggingFaceInferenceAPIEmbeddings(
    api_key="YOUR_HUGGINGFACE_API_KEY",
    model_name="sentence-transformers/all-MiniLM-l6-v2"
)

# Modify the graph configuration
graph_config = {
    "llm": {
        "model_instance": llm_model_instance,
        "model_tokens": "YOUR_HUGGINGFACE_API_KEY"
    },
    "embeddings": {
        "model_instance": embedder_model_instance
    },
    "verbose": True
}

# Create an instance of SmartScraperGraph
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract useful information from the webpage.",
    source="https://higher.gs.com/campus?&page=1&sort=RELEVANCE",
    config=graph_config,
)

# Run the scraper
result = smart_scraper_graph.run()
print(result)

Environment:

OS: Windows 11, Ubuntu 24.04
Python version: Python 3.13.1
ScrapeGraphAI version: 1.35.0
langchain-huggingface version:    0.1.2
Copy link

dosubot bot commented Jan 10, 2025

Hello, @d-sutariya! I'm here to help you with any bugs, questions, or contributions.

To resolve the TypeError in the SmartScraperGraph related to the chunk_size being treated as a string instead of an integer, you need to ensure that chunk_size is explicitly converted to an integer. This can be done by modifying the ParseNode class's __init__ method to convert chunk_size to an integer when it is retrieved from the node_config. Here's how you can do it:

def __init__(
    self,
    input: str,
    output: List[str],
    node_config: Optional[dict] = None,
    node_name: str = "ParseNode",
):
    super().__init__(node_name, "node", input, output, 1, node_config)

    self.verbose = (
        False if node_config is None else node_config.get("verbose", False)
    )
    self.parse_html = (
        True if node_config is None else node_config.get("parse_html", True)
    )
    self.parse_urls = (
        False if node_config is None else node_config.get("parse_urls", False)
    )

    self.llm_model = node_config.get("llm_model")
    self.chunk_size = int(node_config.get("chunk_size", 0))  # Ensure chunk_size is an integer

This change ensures that chunk_size is always treated as an integer, preventing any TypeError that might occur if it is mistakenly set as a string in the configuration [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the bug Something isn't working label Jan 10, 2025
@PeriniM
Copy link
Collaborator

PeriniM commented Jan 10, 2025

Hey @d-sutariya, I noticed in your code you are setting the model_tokens as the hugging face api key, please try addding an integer chink size. the following is the issue in your code:

graph_config = {
    "llm": {
        "model_instance": llm_model_instance,
        "model_tokens": "YOUR_HUGGINGFACE_API_KEY"  #<-- ERROR HERE 
    },
    "embeddings": {
        "model_instance": embedder_model_instance
    },
    "verbose": True
}

@d-sutariya
Copy link
Author

@PeriniM Can you tell me what to enter in the model_tokens?

@d-sutariya
Copy link
Author

@PeriniM In the example documentation model_tokens is not provided:-
can you please explain what this parameter means?
https://scrapegraph-ai.readthedocs.io/en/latest/scrapers/llm.html#hugging-face-hub
https://scrapegraph-doc.onrender.com/docs/Graphs/smart_scraper_graph#example-usage

@VinciGit00
Copy link
Collaborator

You can also remove the embedded, they are not required anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants