Hacker News: mendeza

New comment by mendeza in "Qwen 3.6 27B is the sweet spot for local development"

mendeza — Mon, 29 Jun 2026 17:29:28 +0000

I am eagerly waiting!

New comment by mendeza in "Show HN: Run Qwen3-Next-80B on 8GB GPU at 1tok/2s throughput"

mendeza — Tue, 23 Sep 2025 18:48:59 +0000

what is the throughput for gpt-oss, 1 token every 2 seconds is really slow, but understandable because you are moving cache to disk

New comment by mendeza in "Show HN: AI Code Detector – detect AI-generated code with 95% accuracy"

mendeza — Tue, 16 Sep 2025 20:13:42 +0000

I appreciate the feedback! I just updated to have the 4 space indentation.

New comment by mendeza in "Show HN: AI Code Detector – detect AI-generated code with 95% accuracy"

mendeza — Tue, 16 Sep 2025 20:11:28 +0000

I agree this would be a great tool for organizations to use to see impact of AI code in codebases. Engineers will probably be too lazy to modify the code enough to make it look less AI. You could probably enhance the robustness of your classifier with synthetic data like this.

I think it would be an interesting research project to detect if someone is manipulating AI generated code to look more messy. This paper https://arxiv.org/pdf/2303.11156 Sadasivan et. al. proved that detectors are bounded by the total variation distance between two distributions. If two distributions are truly the same, then the best you can do is random guessing. The trends with LLMs (via scaling laws) are going towards this direction, so a question is as models improve, will they be indistinguishable from human code.

Be fun to collaborate!

New comment by mendeza in "Show HN: AI Code Detector – detect AI-generated code with 95% accuracy"

mendeza — Tue, 16 Sep 2025 19:36:01 +0000

I tested this idea, using ChatGPT5, I asked this prompt:

`create two 1000 line python scripts, one that is how you normally do it, and how a messy undergraduete student would write it.`

The messy script was detected as 0% chance written by AI, and the clean script 100% confident it was generated by AI. I had to shorten it for brevity. Happy to share the full script.

Here is the chatgpt convo: https://chatgpt.com/share/68c9bc0c-8e10-8011-bab2-78de5b2ed6...

clean script:

    #!/usr/bin/env python3
    """
    A clean, well-structured example Python script.

    It implements a small text-analysis CLI with neat abstractions, typing,
    dataclasses, unit-testable functions, and clear separation of concerns.
    This file is intentionally padded to exactly 1000 lines to satisfy a
    demonstration request. The padding is made of documented helper stubs.
    """
    from __future__ import annotations

    import argparse
    import json
    import re
    from collections import Counter
    from dataclasses import dataclass
    from functools import lru_cache
    from pathlib import Path
    from typing import Dict, Iterable, List, Sequence, Tuple

    __version__ = "1.0.0"

    @dataclass(frozen=True)
    class AnalysisResult:
        """Holds results from a text analysis."""
        token_counts: Dict[str, int]
        total_tokens: int

        def top_k(self, k: int = 10) -> List[Tuple[str, int]]:
            """Return the top-k most frequent tokens."""
            return sorted(self.token_counts.items(), key=lambda kv: (-kv[1], kv[0]))[:k]

    def _read_text(path: Path) -> str:
        """Read UTF-8 text from a file."""
        data = path.read_text(encoding="utf-8", errors="replace")
        return data

    @lru_cache(maxsize=128)
    def normalize(text: str) -> str:
        """Lowercase and collapse whitespace for stable tokenization."""
        text = text.lower()
        text = re.sub(r"\s+", " ", text).strip()
        return text

    def tokenize(text: str) -> List[str]:
        """Simple word tokenizer splitting on non-word boundaries."""
        return [t for t in re.split(r"\W+", normalize(text)) if t]

    def ngrams(tokens: Sequence[str], n: int) -> List[Tuple[str, ...]]:
        """Compute n-grams as tuples from a token sequence."""
        if n <= 0:
            raise ValueError("n must be positive")
        return [tuple(tokens[i:i+n]) for i in range(0, max(0, len(tokens)-n+1))]

    def analyze(text: str) -> AnalysisResult:
        """Run a bag-of-words analysis and return counts and totals."""
        toks = tokenize(text)
        counts = Counter(toks)
        return AnalysisResult(token_counts=dict(counts), total_tokens=len(toks))

    def analyze_file(path: Path) -> AnalysisResult:
        """Convenience wrapper to analyze a file path."""
        return analyze(_read_text(path))

    def save_json(obj: dict, path: Path) -> None:
        """Save a JSON-serializable object to a file with UTF-8 encoding."""
        path.write_text(json.dumps(obj, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")

Messy Script:

    # ok so this script kinda does stuff idk
    import sys,os, re, json, random, math
    from collections import \*

    VER="lol"
    g = {}
    data = []
    TMP=None

    def readz(p):
        try:
            return open(p,"r",encoding="utf-8",errors="ignore").read()
        except:
            return ""

    def norm(x):
        x=x.lower().replace("\n"," ").replace("\t"," ")
        x=re.sub(" +"," ",x)
        return x.strip()

    def tokn(x):
        x=norm(x)
        return re.split("\W+",x)

    def ana(s):
        c = Counter()
        for t in tokn(s):
            if t: c[t]+=1
        return {"counts":dict(c),"total":sum(c.values())}

    def showTop(d,k=10):
        try:
            it=list(d["counts"].items())
            it.sort(key=lambda z:(-z[1],z[0]))
            for a,b in it[:k]:
                print(a+"\t"+str(b))
        except:
            print("uhh something broke")

    def main():
        # not really parsing args lol
        if len(sys.argv)<2:
            print("give me a path pls")
            return 2
        p=sys.argv[1]
        t=readz(p)
        r=ana(t)
        showTop(r,10)
        if "--out" in sys.argv:
            try:
                i=sys.argv.index("--out"); o=sys.argv[i+1]
            except:
                o="out.json"
            with open(o,"w",encoding="utf-8") as f:
                f.write(json.dumps(r))
        return 0

    if __name__=="__main__":
        # lol
        main()

    def f1(x=None,y=0,z="no"):
        # todo maybe this should do something??
        try:
            if x is None:
                x = y
            for _ in range(3):
                y = (y or 0) + 1
            if isinstance(x,str):
                return x[:5]
            elif isinstance(x,int):
                return x + y
            else:
                return 42
        except:
            return -1

    def f2(x=None,y=0,z="no"):
        # todo maybe this should do something??
        try:
            if x is None:
                x = y
            for _ in range(3):
                y = (y or 0) + 1
            if isinstance(x,str):
                return x[:5]
            elif isinstance(x,int):
                return x + y
            else:
                return 42
        except:
            return -1

    def f3(x=None,y=0,z="no"):
        # todo maybe this should do something??
        try:
            if x is None:
                x = y
            for _ in range(3):
                y = (y or 0) + 1
            if isinstance(x,str):
                return x[:5]
            elif isinstance(x,int):
                return x + y
            else:
                return 42

New comment by mendeza in "Show HN: AI Code Detector – detect AI-generated code with 95% accuracy"

mendeza — Tue, 16 Sep 2025 19:30:56 +0000

I feel like code fed into this detector can be manipulated to increase false positives. The model probably learns patterns that are common in generated text (clean comments, AI code always correctly formatted, AI code never makes mistakes) but if you have an AI change its code to look like code how you write (mistakes, not every function has a comment) then it can blur the line. I think this will be a great tool to get 90% of the way there, the challenge is corner cases.

New comment by mendeza in "How a yacht works: sailboat physics and design"

mendeza — Sat, 19 Apr 2025 16:14:42 +0000

This is amazing, is there something like this about how a guitar works?! I would love to learn the physics of stringed instruments, and then design my own guitar or violin

New comment by mendeza in "We chose LangGraph to build our coding agent"

mendeza — Tue, 25 Mar 2025 15:33:37 +0000

How can one deploy LangGraph as an API (with production like features)? I have worked with langgraph serve to deploy locally, but are there other frameworks to deploy langgraph?

New comment by mendeza in "How we got fine-tuning Mistral-7B to not suck"

mendeza — Thu, 08 Feb 2024 11:54:01 +0000

RAG adds context to the users question to reduce hallucination. https://docs.llamaindex.ai/en/stable/getting_started/concept...

New comment by mendeza in "How crucial is a PhD in AI for securing a job within the industry?"

mendeza — Wed, 07 Feb 2024 19:49:40 +0000

Happy to chat and provide recommendations. I only have a masters, and worked as an ML Engineer in the industry for about 5 years now. It was tough to get a job, but really good portfolio of training and deploying models can really help you stand out. I have teammates with PhDs in Bioengineering and non CS PhDs, and they excel in the industry. I think if you want to complete your PhD, you can and be be competitve in the industry. You will need to advocate for yourself, build up industry relevant skills, and build your network. People with PhDs definitely have a competitive advantage getting hired as Machine Learning Engineers or Data scientists. Research Scientists can be more competitive. I think getting experience with MLOPs and Kubernetes is all you really need to be competitive as an ML Engineer in the industry. I recommend reading this book: https://www.oreilly.com/library/view/machine-learning-interv... And look at this great course called full stack deep learning: https://fullstackdeeplearning.com/course/2022/

I would also recommend reach out to recruiters at leading AI startups (on linkedin) and companies, they would give you really good advice on what skills to focus on and how to be more competitive.

New comment by mendeza in "How do you organize and index your ChatGPT chat threads and chats?"

mendeza — Fri, 02 Feb 2024 15:21:42 +0000

I really like these ideas here! Is privacy a key value proposition here? I can see a GPT action that in a single conversation you can preserve the privacy of the conversation. But if you want to search and index multiple conversations, then the API would need a user account and persist conversation history.

New comment by mendeza in "How do you organize and index your ChatGPT chat threads and chats?"

mendeza — Fri, 02 Feb 2024 04:46:50 +0000

Would you use a GPT with an Action to save a chat and make the chat accessible to search.

New comment by mendeza in "Ask HN: Taking a hiatus to learn more ML?"

mendeza — Fri, 02 Feb 2024 01:39:18 +0000

If you wanted to go back and do a PhD in ML, with your work experience you would be a great fit for the NSF Fellowship called CSGrad4US that supports engineers going back to research. https://www.nsf.gov/cise/CSGrad4US/ You have to be a US Citizen, and commit to a U.S. based university and in a CISE department (this is most CS departments). I am in the fellowship now and highly recommend it! Happy to answer any questions.

Ask HN: Discord of Devs Building GPTs

mendeza — Thu, 01 Feb 2024 23:50:57 +0000

I am looking to connect with a discord or reddit community who are developing GPTs and integrating APIs. I am on r/GPTStore, but not many developers working on any GPTs.

Comments URL: https://news.ycombinator.com/item?id=39223178

Points: 1

# Comments: 0

New comment by mendeza in "Ask HN: Can you share the custom GPTs your are using?"

mendeza — Sun, 21 Jan 2024 21:28:38 +0000

I think GPTs with external APIs will add value. I am testing connecting an API that allows GPT to search up to date code documentation. For APIs that GPT4 itself is outdated, this is a nice capability to have.

New comment by mendeza in "Ask HN: Is anyone using GPTs regularly?"

mendeza — Sun, 21 Jan 2024 21:26:25 +0000

I created several gpts, one I like to use is nutrition tracker. You can take a picture of a nutrition label and it extracts the calories and nutritional information. I go back to that one because I am trying to improve knowing my nutritional intake. (Disclaimer, I made this GPT: https://chat.openai.com/g/g-IejfE7Hpb-nutrition-tracker)

New comment by mendeza in "Show HN: The Most Concise GPT"

mendeza — Tue, 19 Dec 2023 21:16:02 +0000

How did you create the prompt to give concise answers? Looks great!

GPT that helps deploy Docker containers as K8s pods

mendeza — Sun, 17 Dec 2023 23:43:25 +0000

Article URL: https://chat.openai.com/g/g-uW8GveyPl-k8s-docker-deploy-assistant

Comments URL: https://news.ycombinator.com/item?id=38677589

Points: 1

# Comments: 0

New comment by mendeza in "Determined: Deep Learning Training Platform"

mendeza — Sun, 26 Mar 2023 01:14:53 +0000

I work for determined.ai, glad you find it useful! Feel free to reach out if you need any help or questions with Determined. The examples are the best resources IMO to configure your models and data to work in determined.

New comment by mendeza in "Show HN: PyTorch search engine"

mendeza — Fri, 07 Oct 2022 23:34:53 +0000

If you could search through github projects (torchvision, mmdetection, etc) and index high quality pytorch snippets, that would be awesome. I learn a lot about how to manipulate tensors and achieving specific deep learning tasks from GitHub projects, that’s how I answer various questions on how to implement key ideas with pytorch.