Anyone have recommendation for how I can scrap a website, and extract unique names – such as product names.

I was thinking of using some website scrapping tool, then a local LLM to find unique product names.

  • nomad@infosec.pub
    link
    fedilink
    arrow-up
    4
    ·
    15 hours ago

    You are probably talking about scraping a website. There are usually tools for this already that make that easy. Last time I had to do something like that I used scrapy.

    • Lysergid@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      15 hours ago

      Scrappy created exactly for this use case. I used to work in project for product info scraping when LLMs didn’t exist. So you don’t really have to use LLM. It’s usually semi-structured data. Your biggest pain will likely be SPAs with JS which need to run in order to load content. If you need to render SPAs check Selenium web driver or similar