So, in the era of increasingly good AI powered tools and general search engines full of SEO spam, last week I started creating something little old school and against the trends.

For now It’s a have-fun-and-find-out project that main aim is to provide good search results for general web development queries with a special focus on independent blog authors.

The thesis is that no SEO spam website is in the index, which will already filter out most annoying noise on Google/Bing.

Search results are grouped per type: docs, blogs and magazines (e.g. blog platforms or bigger websites).

For now it’s far from being done in terms of having a full index, but in most cases it already replaces my go-to search engine when I’m looking up some stuff during work.

I’m looking forward hearing out what y’all think and if you think it makes sense overall I can only encourage you to post some links to blogs or docs that are still missing in the index. I’m more than happy to add it to the crawler.

Responds like: “nei, total shit, who would need that” also accepted but constructive critique more appreciated ;)

EDIT: everyone many thanks for all your voices and comments. I’m super grateful for all of them and happy that we have such place like Lemmy!

  • Kissaki@feddit.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    11 months ago

    I think the main issue as well as my main question is around scope.

    You say targets we developers, but the current index is quite narrow. So will you accept significant expansion of that, as long as it may be relevant to Web developers? Where would you draw lines on mixed c content or technologies?

    ASP.NET docs is definitely docs for web developers. But maybe not what you had in mind. Would that apply? The docs are h hosted on a platform with a lot of other docs of the dotnet space. Some may be relevant to “Web developers”, others not. And the line is subjective and dynamic.

    My website has some technological development resources and blog posts. But also very different things. Would that fit into scope or not?

    How narrow out broad would you make the index?

    I guess it’s an index for search, so noise shouldn’t be a problem as long as there are gains through/of quality content.

    • sznowicki@lemmy.worldOP
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      11 months ago

      It’s still in MVP, work in progress, hence the index is not “full”.

      For me “web development” is everything that we might need for well, web. Servers, mongo docs all goes into the index (I’m adding it every day basically but also it takes some time to index stuff and I observe how this whole thing works as index grows).

      ASP.NET goes into the index of course. If your website has dev resources and blog posts that would go into it as well. Recently one person suggested tons of Haskell blogs and they are being indexed as we speak.

      I have also a different problem, dev.to has a lot of good resources but also tons of SEO spam and low quality content. It’s also freaking huge and while it was for some time in the index I had to remove it and think about it some more.

      Where would you draw lines on mixed c content or technologies

      For now the line is: does this website have anything that web devs would need? Yes? Then it might get in.

      If it’s a blog about locomotive CPU programming then maybe not. Although mostly due to infrastructure costs. Indexing cost in the end but having some non related stuff in the index should not hurt the results.

      All of what I wrote is the state for today. I’m changing my mind often as it’s still in “having fun” state.

      PS. also thanks for the feedback!

      • Kissaki@feddit.de
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        11 months ago

        I have also a different problem, dev.to has a lot of good resources but also tons of SEO spam and low quality content. It’s also freaking huge and while it was for some time in the index I had to remove it and think about it some more.

        Yeah, a public platform is unlikely to provide consistent content. If curation is not an explicit goal and practice there, I would not include them for the reasons you mentioned.

        If indexing could happen not on domain but with more granular filters - URL base paths - that may be viable. Indexing specific authors on devto.

        • sznowicki@lemmy.worldOP
          link
          fedilink
          arrow-up
          0
          ·
          11 months ago

          Good idea. I had this thought once to do some narrow indexing of websites, e.g. stack overflow is a big issue, indexing all of this is crazy, picking up some specific tags on the other hand feels like tons of work. In the end I adjust the whole project as it grows with hope that after every tuning it gets better.

          As long as I have fun with it I’ll continue :D

          • Kissaki@feddit.de
            link
            fedilink
            English
            arrow-up
            0
            ·
            11 months ago

            Of course - cutting scope is a good call to keep it manageable and fun, and not end up with creep and what you wanted to evade in the first place. :)

  • Kissaki@feddit.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    11 months ago

    Index categories are blog, docs, magazines. Have you considered indexing source code websites?

    I thought I would remember a second one, but I can’t recall right now.

    Subpaths on GitHub and GitLab would be a similar fashion but would require more specific filters - unless they are projects hosted on dedicated instances.

    Project issue tickets may also be very relevant to developer searches!?