Anyone have recommendation for how I can scrap a website, and extract unique names – such as product names.

I was thinking of using some website scrapping tool, then a local LLM to find unique product names.

  • dudesss@lemmy.caOP
    link
    fedilink
    arrow-up
    1
    ·
    13 hours ago

    I was thinking of doing it once a day. Even if I have to manually initiate it to be legal. It would only be for personal non-public nor commercial reasons.

    It would save me time from manually copying the HTML over to an LLM or something.

    • exuA
      link
      fedilink
      English
      arrow-up
      4
      ·
      12 hours ago

      I was joking about your use of scrap and scrapping, as in to remove or to cancel :)

      Web scraping only has one p

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      4
      ·
      13 hours ago

      Just read the robots.txt and obey the rules. Also set your user agent string properly. We’ve had crawlers forever on the internet and that’s the long accepted way to give consent or revoke consent, for website owners. Either you match a disallow directive and need to stop. Or you’re completely fine to scrape it.