<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: cchianel</title><link>https://news.ycombinator.com/user?id=cchianel</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 13 Jun 2026 08:08:46 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=cchianel" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by cchianel in "Using OR-Tools CP-SAT for Scheduling Problems"]]></title><description><![CDATA[
<p>You can find a list of quickstarts at <a href="https://github.com/TimefoldAI/timefold-quickstarts" rel="nofollow">https://github.com/TimefoldAI/timefold-quickstarts</a>.<p>Examples include:<p>- School Timetabling<p>- Employee Scheduling<p>- Conference Scheduling<p>- Flight Crew Scheduling<p>Metaheurstics are also very useful for puzzle games; you can quickly run a metaheuristic to generate a difficult but solvable puzzle in less than a second, while only being about 20 lines of code without libraries (but as your scale increases to hundreds of different pieces, you probably want a library so you can use their incremental calculation).</p>
]]></description><pubDate>Wed, 13 May 2026 17:09:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=48124625</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=48124625</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48124625</guid></item><item><title><![CDATA[New comment by cchianel in "Using OR-Tools CP-SAT for Scheduling Problems"]]></title><description><![CDATA[
<p>Currently, no language ports of Timefold Solver are planned. Unfortunately, FFI (foreign function interface) have a terrible performance penalty, and since we would be doing multiple FFI calls for moves, it can easily become 100x slower just from FFI overhead.<p>This basically means you have two choices:<p>1. Translate the constraints from the new language to Java bytecode at runtime.
2. Translate the entire solver to a new language.<p>We did (1) for a bit for CPython, but since CPython bytecode constantly change and break (and is so poorly documented) it was a nightmare to maintain. You can find a blog post of me explaining it a bit more here: <a href="https://timefold.ai/blog/java-vs-python-speed" rel="nofollow">https://timefold.ai/blog/java-vs-python-speed</a>. The CPython port is no longer maintained, and has quite a few missing features.<p>That being said, we have a wide range of ready made models that you can access via an API, which might fit your use case (you can see a list at <a href="https://docs.timefold.ai/" rel="nofollow">https://docs.timefold.ai/</a>).</p>
]]></description><pubDate>Wed, 13 May 2026 16:59:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=48124505</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=48124505</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48124505</guid></item><item><title><![CDATA[New comment by cchianel in "Using OR-Tools CP-SAT for Scheduling Problems"]]></title><description><![CDATA[
<p>Although this post discusses Constraint Programming - Satisfiability (CP-SAT) Solvers and Mixed Integer Problem (MIP) Solvers, it does not discuss Metaheuristic Solvers.<p>Metaheuristic solvers are different in that you don't need to model your problem as a mixed integer problem. Instead, all it cares about is having a function that returns something you can compare. This allows you to model your problem however you like. Some metaheurstic techniques include Simulated Annealing, Late Acceptance Local Search, and Tabu Search.<p>Metaheuristic solvers may not generate optimal solutions (after all, by their nature, they don't know the structure of the problem), but they generate "good enough" and "close to optimal" solutions. Metaheurstic solvers tends to beat MIP and CP-SAT for VRP, whereas MIP and CP-SAT are better for bin packing.<p>If you want to try using a Metaheuristic solver, I can recommend Timefold, which allows you to define your constraints using your domain objects in an incremental matter (it has SQL/Java-Streams like syntax, which in my opinion, is more readable than formulas) (disclosure: I work for Timefold).</p>
]]></description><pubDate>Wed, 13 May 2026 15:12:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48123007</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=48123007</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48123007</guid></item><item><title><![CDATA[New comment by cchianel in "Algorithms for Optimization [pdf]"]]></title><description><![CDATA[
<p>I haven't; from a quick reading, InfoBax is for when you have an expensive function and want to do limited evaluations.
Timefold works with cheap functions and does many evaluations.
Timefold does this via Constraint Streams, so a function like:<p><pre><code>    var score = 0;
    for (var shiftA : solution.getShifts()) {
        for (var shiftB : solution.getShifts()) {
            if (shiftA != shiftB && shiftA.getEmployee() == shiftB.getEmployee() && shiftA.overlaps(shiftB)) {
                score -= 1;
            }
        }
    }
    return score
</code></pre>
usually takes shift * shift evaluations of overlaps, we only check the shifts affected by the change (changing it from O(N^2) to O(1) usually).<p>That being said, it might be useful for a move selector. I need to give it a more in depth reading.</p>
]]></description><pubDate>Mon, 01 Dec 2025 06:58:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=46104357</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=46104357</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46104357</guid></item><item><title><![CDATA[New comment by cchianel in "Algorithms for Optimization [pdf]"]]></title><description><![CDATA[
<p>I thought "No Free Lunch Theorem" was a joke from its name (although I should know better since "Hairy Ball Theorem" exists).<p>So if I understand correct, it states that all optimization algorithm must choose an order to evaluate solutions, and the faster it evaluates the "best" solution, the "better" the algorithm.<p>For example, say the search space is {"a", "b", "c"} and the best solution is "c". Then there are 3! (6) ways to evaluate all solutions:<p>"a", "b", "c"<p>"a", "c", "b"<p>"b", "a", "c"<p>"b", "c", "a"<p>"c", "a", "b"<p>"c", "b", "a"<p>(of course, in a real problem, the search space is much, MUCH larger).<p>The way Timefold tackles this is move selectors. In particular, you can design custom moves that uses knowledge of your particular problem which in turn affect when certain solutions are evaluated. Additionally, you can design custom phases that runs your own algorithm and then pass the solution found to local search. One of the projects we are working on currently is the neighborhood API, to make it much easier to write your own custom moves. That being said, for most problems that humans deal with, you don't need to configure Timefold and just work with the defaults (which is best-fit followed by a late acceptance local search with change and swap moves).<p>For what it's worth: metaheuristics solvers tend to be better than linear solvers for shift scheduling and vehicle routing, but worse for bin packing. But it still works, and is still fast.</p>
]]></description><pubDate>Mon, 01 Dec 2025 06:41:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=46104261</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=46104261</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46104261</guid></item><item><title><![CDATA[New comment by cchianel in "Algorithms for Optimization [pdf]"]]></title><description><![CDATA[
<p>By 1% of optimal, I was giving an example percentage to clarify there are solutions that exists, that are almost as good as optimal, that can be found in reasonable amount of time.<p>There cannot be a guarantee to find a solution to a given percentage worse than optimal for a fully general problem, since you would need to know optimal to give such a guarantee (and since the problem fully general, you cannot use the structure of the problem to reduce it).<p>Most constraint problems have many feasible solutions, and have a way to judge how much worse or better one solution is to another.<p>There are good and bad way to write constraints.<p>One bad way to write constraints is score traps, where between one clearly better solution has the same score as a clearly worse solution.<p>For example, for shift scheduling, a solution with only 1 overlapping shift with the same employee is better than a solution with 2 overlapping shifts with the same employee.<p>A bad score function would penalize both solutions by 1, meaning a solver have no idea which of the two solutions are better.<p>A good score function would penalize the schedule with 1 overlapping shift with the same employee by 1, and the schedule with 2 overlapping shifts with the same employee by 2.<p>The class of problems I am talking about is the class of problems where you can assign a score to a possible solution, with limited score traps.<p>Timefold has no guarantees about finding a solution in reasonable time (but unless you done something terribly wrong or have a truly massive dataset, it finds a good solution really quickly 99.99% of the time). Instead, you set the termination condition of the solver; it could be time-based (say 60 minutes), unimproved time spent (solve until no new better solutions are found after 60 minutes), or the first feasible solution (there are other termination conditions that can be set).</p>
]]></description><pubDate>Mon, 01 Dec 2025 05:36:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=46103859</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=46103859</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46103859</guid></item><item><title><![CDATA[New comment by cchianel in "Algorithms for Optimization [pdf]"]]></title><description><![CDATA[
<p>That depends; do you want the optimal solution?<p>If so, I agree it is impossible for a fully general problem solver to find the optimal solution to a problem in a reasonable amount of time (unless P = NP, which is unlikely).<p>However, if a "good enough" solution that is only 1% worse than optimal works, then a fully general solver can do the job in a reasonable amount of time.<p>One such example of a fully general solver is Timefold; you express your constraints using plain old Java objects, so you can in theory do whatever you want in your constraint functions (you can even do network calls, but that is extremely ill-advised since that will drastically slow down score calculation speeds).<p>Disclosure: I work for Timefold.</p>
]]></description><pubDate>Mon, 01 Dec 2025 04:19:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=46103429</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=46103429</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46103429</guid></item><item><title><![CDATA[New comment by cchianel in "Algorithms for Optimization [pdf]"]]></title><description><![CDATA[
<p>Some additional optimization resources (for metaheuristics, where you only have the objective/score function and no derivative):<p>- "Essentials of Metaheuristics" by Sean Luke <a href="https://cs.gmu.edu/~sean/book/metaheuristics/" rel="nofollow">https://cs.gmu.edu/~sean/book/metaheuristics/</a><p>- "Clever Algorithms" by Jason Brownlee <a href="https://cleveralgorithms.com/" rel="nofollow">https://cleveralgorithms.com/</a><p>Timefold uses the metaheuristic algorithms in these books (Tabu Search, Late Acceptance, Simulated Annealing, etc.) to find near-optimal solutions quickly from a score function (typically defined in a Java stream-like/SQL-like syntax so score calculation can be done incrementally to improve score calculation speed).<p>You can see simplified diagrams of these algorithms in action in Timefold's docs: <a href="https://docs.timefold.ai/timefold-solver/latest/optimization-algorithms/local-search" rel="nofollow">https://docs.timefold.ai/timefold-solver/latest/optimization...</a>.<p>Disclosure: I work for Timefold.</p>
]]></description><pubDate>Mon, 01 Dec 2025 04:12:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=46103399</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=46103399</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46103399</guid></item><item><title><![CDATA[New comment by cchianel in "A Thousand Tiny Optimisations"]]></title><description><![CDATA[
<p>Games on old consoles such as Nintendo 64 and the Playstation One use a variety of tricks to be playable on limited hardware. For the specific example of Final Fantasy VII, there is one trick that allows you to skip almost the entirely of Disk 1 once you reach the open world. In particular, the trick involves a use-after-free in the world collision cache (fun fact: a lot of speedrun techniques are security flaws!). This video explains it without spoilers: <a href="https://www.youtube.com/watch?v=llQnrUHS0yA" rel="nofollow">https://www.youtube.com/watch?v=llQnrUHS0yA</a> ; a better video is <a href="https://www.youtube.com/watch?v=JHn6jpKsYCk" rel="nofollow">https://www.youtube.com/watch?v=JHn6jpKsYCk</a>, but that might have minor spoilers IIRC (Note: this trick is relatively new, and probably not used in that video).</p>
]]></description><pubDate>Wed, 11 Jun 2025 00:53:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=44243162</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=44243162</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44243162</guid></item><item><title><![CDATA[New comment by cchianel in "A Thousand Tiny Optimisations"]]></title><description><![CDATA[
<p>Related: Archipelago (<a href="https://archipelago.gg/" rel="nofollow">https://archipelago.gg/</a>), a framework that randomizes multiple games, across multiple clients. For example, Player A is playing Mario 64, and Player B is playing Pokemon Red. Some items from Player B's Pokemon Red (such as the Rock Smash HM) would be in Player A's Mario 64 world (for example, from collecting a Star). As a result, the two players need to coordinate and help each other progress.<p>Archipelago GitHub: <a href="https://github.com/ArchipelagoMW/Archipelago">https://github.com/ArchipelagoMW/Archipelago</a></p>
]]></description><pubDate>Wed, 11 Jun 2025 00:40:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=44243082</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=44243082</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44243082</guid></item><item><title><![CDATA[New comment by cchianel in "Solving the local optima problem – NQueens"]]></title><description><![CDATA[
<p>N-Queens can be solved (find a single valid solution) in polynomial time [1]. That being said, Local Search is a powerful technique that can solve a lot of problems such as employee scheduling, vehicle routing, or maintenance scheduling. You might want to take a look at Timefold [2], which is a local search solver that can be configured to use simulated annealing, tabu search, late acceptance and other local search algorithms. Note: I work for Timefold.<p>[1] <a href="https://stackoverflow.com/a/13109557" rel="nofollow">https://stackoverflow.com/a/13109557</a><p>[2] <a href="https://solver.timefold.ai/" rel="nofollow">https://solver.timefold.ai/</a></p>
]]></description><pubDate>Tue, 20 May 2025 17:09:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=44043727</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=44043727</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44043727</guid></item><item><title><![CDATA[New comment by cchianel in "Show HN: I built a hardware processor that runs Python"]]></title><description><![CDATA[
<p>CPython bytecode changes behaviour for no reason and very suddenly, so you will be vulnerable to changes in Python language versions. A few from the top of my head:<p>- In Python 3.10, jumps changed from absolute indices to relative indices<p>- In Python 3.11, cell variables index is calculated differently for cell variables corresponding to parameters and cell variables corresponding to local variables<p>- In Python 3.11, MAKE_FUNCTION has the code object at the TOS instead of the qualified name of the function<p>For what it's worth, I created a detailed behaviour of each opcode (along with example Python sources) here: <a href="https://github.com/TimefoldAI/timefold-solver/blob/main/python/jpyinterpreter/developer-docs/src/modules/ROOT/pages/opcodes/opcodes.adoc">https://github.com/TimefoldAI/timefold-solver/blob/main/pyth...</a> (for up to Python 3.11).</p>
]]></description><pubDate>Mon, 28 Apr 2025 17:19:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=43823776</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=43823776</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43823776</guid></item><item><title><![CDATA[New comment by cchianel in "Show HN: I built a hardware processor that runs Python"]]></title><description><![CDATA[
<p>The primary reason, in my opinion, is the vast majority of Python libraries lack type annotations (this includes the standard library). Without type annotations, there is very little for a non-JIT compiler to optimize, since:<p>- The vast majority of code generation would have to be dynamic dispatches, which would not be too different from CPython's bytecode.<p>- Types are dynamic; the methods on a type can change at runtime due to monkey patching. As a result, the compiler must be able to "recompile" a type at runtime (and thus, you cannot ship optimized target files).<p>- There are multiple ways every single operation in Python might be called; for instance `a.b` either does a __dict__ lookup or a descriptor lookup, and you don't know which method is used unless you know the type (and if that type is monkeypatched, then the method that called might change).<p>A JIT compiler might be able to optimize some of these cases (observing what is the actual type used), but a JIT compiler can use the source file/be included in the CPython interpreter.</p>
]]></description><pubDate>Mon, 28 Apr 2025 13:15:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=43821211</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=43821211</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43821211</guid></item><item><title><![CDATA[New comment by cchianel in "Show HN: I built a hardware processor that runs Python"]]></title><description><![CDATA[
<p>As someone who did a CPython Bytecode → Java bytecode translator (<a href="https://timefold.ai/blog/java-vs-python-speed" rel="nofollow">https://timefold.ai/blog/java-vs-python-speed</a>), I strongly recommend against the CPython Bytecode → PySM Assembly step:<p>- CPython Bytecode is far from stable; it changes every version, sometimes changing the behaviour of existing bytecodes. As a result, you are pinned to a specific version of Python unless you make multiple translators.<p>- CPython Bytecode is poorly documented, with some descriptions being misleading/incorrect.<p>- CPython Bytecode requires restoring the stack on exception, since it keeps a loop iterator on the stack instead of in a local variable.<p>I recommend instead doing CPython AST → PySM Assembly. CPython AST is significantly more stable.</p>
]]></description><pubDate>Mon, 28 Apr 2025 12:54:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=43820991</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=43820991</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43820991</guid></item><item><title><![CDATA[New comment by cchianel in "I built an AI company to save my open source project"]]></title><description><![CDATA[
<p>(Disclosure: I work for Timefold)
OR Tools is a MIPS/SAT solver, while Timefold is a meta-heuristic solver.
As a result, the developer experience is quite different:<p>- In OR Tools, you need to write your constraints as mathematical equations. You can only provide your own function in a few scenarios, such as calculating the distance between two points. This allows OR Tools to eliminate symmetries and calculate gradients to improve its solution search. In Timefold, your constraints are treated like a black box, so your constraints can use any function and call any library you like (although you shouldn't do things like IO in them). However, since said constraints are a black box, Timefold is unable to eliminate symmetries or calculate gradients.<p>- In OR Tools, you have very little control in how it does its search. At most, you can select what algorithm/strategy it uses. In Timefold, you have a ton of control in how it does its search: from what moves its tries to adding your own custom phases. If none are configured, it will use sensible defaults. That being said, certain problems strongly benefit from custom moves.<p>- In OR Tools, you do not need to create custom classes; you create variables using methods on their domain model classes. In Timefold, you need to define your domain model by creating your own classes. This adds initial complexity, but it makes the code more readable: instead of your variable being an int, it is an Employee or a Visit. That being said, it can be difficult for someone to design an initial model, since it requires an understanding of the problem they are solving.<p>All in all, when using Timefold, you need to think in a more declarative approach (think Prolog, SQL, etc). For example, let say you have a room assignment problem where a teacher can only teach at a single room at once, and minimize room changes. In Timefold, it may look like this:<p><pre><code>    # Typically would be stored in a parameterization object, but a global var
    # is used for brevity
    teacher_count = 3
    
    @planning_entity
    @dataclass
    class Room:
        id: Annotated[int, PlanningId]
        date: int
        name: str
        # This can also be a Teacher object, but str is used for brevity here
        teacher: Annotated[str, PlanningVariable] = field(default=None)
    
    
    @planning_solution
    @dataclass
    class Timetable:
        rooms: Annotated[list[Room], PlanningEntityCollectionProperty]
        teachers: Annotated[list[str], ValueRangeProvider]
        score: Annotated[HardSoftScore, PlanningScore] = field(default=None)


    @constraint_provider
    def timetable_constraints(cf: ConstraintFactory):
        return [
            cf.for_each_unique_pair(Room, Joiners.equal(lambda room: room.date), Joiners.equal(lambda room: room.teacher))
              .penalize(HardSoftScore.ONE_HARD)
              .as_constraint('Teacher time conflict'),
            cf.for_each_unique_pair(Room, Joiners.equal(lambda room: room.teacher))
              .filter(lambda a, b: a.name != b.name)
              .penalize(HardSoftScore.ONE_SOFT)
              .as_constraint('Minimize room change')
        ]


    solver_config = SolverConfig(
        solution_class=Timetable,
        entity_class_list=[Room],
        score_director_factory_config=ScoreDirectorFactoryConfig(
            constraint_provider_function=timetable_constraints
        ),
        termination_config=TerminationConfig(
            spent_limit=Duration(seconds=5)
        )
    )

    solver_factory = SolverFactory.create(solver_config)
    solver = solver_factory.build_solver()
    problem: Timetable = build_problem()
    solver.solve(problem)
</code></pre>
Compared to OR Tools:<p><pre><code>    teacher_count = 3
    problem: Timetable = build_problem()
    model = cp_model.CpModel()
    objective = []
    room_vars = [cp_model.model.new_int_var(0, teacher_count - 1, f'room_assignment_{i}' for i in range(len(problem.rooms)))]
    room_vars_by_date = {date: [room_vars[i] for i in range(len(problem.rooms)) if problem.rooms[i].date == date] for date in {room.date for room in problem.rooms}}
    room_vars_by_name = {name: [room_vars[i] for i in range(len(problem.rooms)) if problem.rooms[i].name == name] for name in {room.name for room in problem.rooms}}
    for date, date_vars in room_vars_by_date.items():
        model.AddAllDifferent(date_vars)
    for name, name_vars in room_vars_by_name.items():
        for assignment_1 in names_vars:
            for assignment_2 in names_vars:
                if assignment_1 is not assignment_2:
                    objective.add(assignment_1 != assignment_2)
    
    model.minimize(sum(objective))
    solver = cp_model.CpSolver()
    status = solver.solve(model)</code></pre></p>
]]></description><pubDate>Fri, 14 Feb 2025 18:00:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=43051112</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=43051112</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43051112</guid></item><item><title><![CDATA[New comment by cchianel in "Tiny JITs for a Faster FFI"]]></title><description><![CDATA[
<p>IPC methods were actually used when constructing the foreign API prototype, since if you do not use JPype, the JVM must be launched in its own process. The IPC methods were used on the API level, with the JVM starting its own CPython interpreter, with CPython and Java using `cloudpickle` to send each other functions/objects.<p>Using IPC for all internal calls would probably take significant overhead; the user functions are typically small (think `lambda shift: shift.date in employee.unavailable_dates` or `lambda lesson: lesson.teacher`). Depending on how many constraints you have and how complicated your domain model is, there could be potentially hundreds of context switches for a single score calculation. It might be worth prototyping though.</p>
]]></description><pubDate>Thu, 13 Feb 2025 13:36:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=43035656</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=43035656</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43035656</guid></item><item><title><![CDATA[New comment by cchianel in "Tiny JITs for a Faster FFI"]]></title><description><![CDATA[
<p>JNI/the new Foreign FFI communicate with CPython via CPython's C API. The primary issue is getting the garbage collectors to work with each other. The Java solver works by repeatedly calling user defined functions when calculating the score. As a result:<p>- The Java side needs to store opaque Python pointers which may have no references on the CPython side.<p>- The CPython side need to store generated proxies for some Java objects (the result of constraint collectors, which are basically aggregations of a solution's data).<p>Solving runs a long time, typically at least a hour (although you can modify how long it runs for). If we don't free memory (by releasing the opaque Python Pointer return values), we will quickly run out of memory after a couple of minutes. The only way to free memory on the Java side is to close the arena holding the opaque Python pointer. However, when that arena is closed, its memory is zeroed out to prevent use-after-free. As a result, if CPython haven't garbage collected that pointer yet, it will cause a segmentation fault on the next CPython garbage collection cycle.<p>JPype (a CPython -> Java bridge) does dark magic to link the JVM's and CPython's garbage collector, but has performance issues when calling a CPython function inside a Java function, since its proxies have to do a lot of work. Even GraalPy, where Python is ran inside a JVM, has performance issues when Python calls Java code which calls Python code.</p>
]]></description><pubDate>Thu, 13 Feb 2025 13:21:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=43035519</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=43035519</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43035519</guid></item><item><title><![CDATA[New comment by cchianel in "Tiny JITs for a Faster FFI"]]></title><description><![CDATA[
<p>I had to deal with a lot of FFI to enable a Java Constraint Solver (Timefold) to call functions defined in CPython. In my experience, most of the performance problems from FFI come from using proxies to communicate between the host and foreign language.<p>A direct FFI call using JNI or the new foreign interface is fast, and has roughly the same speed as calling a Java method directly. Alas, the CPython and Java garbage collectors do not play nice, and require black magic in order to keep them in sync.<p>On the other hand, using proxies (such as in JPype or GraalPy) cause a significant performance overhead, since the parameters and return values need to be converted, and might cause additional FFI calls (in the other direction). The fun thing is if you pass a CPython object to Java, Java has a proxy to the CPython object. And if you pass that proxy back to CPython, a proxy to that proxy is created instead of unwrapping it. The result: JPype proxies are 1402% slower than calling CPython directly using FFI, and GraalPy proxies are 453% slower than calling CPython directly using FFI.<p>What I ultimately end up doing is translating CPython bytecode into Java bytecode, and generating Java data structures corresponding to the CPython classes used. As a result, I got a 100x speedup compared to using proxies. (Side note: if you are thinking about translating/reading CPython bytecode, don't; it is highly unstable, poorly documented, and its VM has several quirks that make it hard to map directly to other bytecodes).<p>For more details, you can see my blog post on the subject: <a href="https://timefold.ai/blog/java-vs-python-speed" rel="nofollow">https://timefold.ai/blog/java-vs-python-speed</a></p>
]]></description><pubDate>Thu, 13 Feb 2025 05:51:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=43033064</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=43033064</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43033064</guid></item><item><title><![CDATA[New comment by cchianel in "Decorator JITs: Python as a DSL"]]></title><description><![CDATA[
<p>I had the misfortune of translating CPython bytecode to Java bytecode, and I do not wish that experience on anyone:<p>- CPython's bytecode is <i>extremely</i> unstable. Not only do new opcodes are added/removed each release, the meaning of <i>existing</i> opcodes can change. For instance the meaning for the argument to <i>JUMP_IF_FALSE_OR_POP</i> changes depending on the CPython version; in CPython 3.10 and below, it is an absolute address, in CPython 3.11 and above, it a relative address.<p>- The documentation for the bytecode in <i>dis</i> tends to be outdated or outright wrong. I often had to analyze the generated bytecode to figure out what each opcode means (and then submit a corresponding PR to update said documentation). Moreover, it assumes you know how the inner details of CPython work, from the descriptor protocol to how binary operations are implemented (each of which are about 30 lines functions when written in Python).<p>- CPython's bytecode is extremely atypical. For instance, a for-loop keeps its iterator on the stack instead of storing it in a synthetic variable. As a result, when an exception occurs inside a for-loop, instead of the stack containing only the exception, it will also contain the for-loop iterator.<p>As for why I did this, I have Java calling CPython in a hot loop. Although direct FFI is fast, it causes a memory leak due to Java's and Python's Garbage collectors needing to track each other's objects. When using JPype or GraalPy, the overhead of calling Python in a Java hot-loop is massive; I got a 100x speedup from translating the CPython bytecode to Java bytecode with identical behaviour (details can be found in my blog post: <a href="https://timefold.ai/blog/java-vs-python-speed" rel="nofollow">https://timefold.ai/blog/java-vs-python-speed</a>).<p>I <i>strongly</i> recommend using the AST instead (although there are no backward comparability guarantees with AST, it is far less likely to break between versions).</p>
]]></description><pubDate>Tue, 04 Feb 2025 04:38:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=42928027</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=42928027</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42928027</guid></item><item><title><![CDATA[New comment by cchianel in "What did Ada Lovelace's program actually do? (2018)"]]></title><description><![CDATA[
<p>John Walker built a virtual machine for the Babbage's instruction set, and it has a web emulator: <a href="https://fourmilab.ch/babbage/emulator.html" rel="nofollow">https://fourmilab.ch/babbage/emulator.html</a>.<p>I don't think Ada program is available as an example though, so you'll need to input it manually.<p>Fun fact: my compiler course project was creating a C compiler targeting the emulator <a href="https://github.com/Christopher-Chianelli/ccpa">https://github.com/Christopher-Chianelli/ccpa</a> (warning, said code is terrible).</p>
]]></description><pubDate>Mon, 16 Dec 2024 19:55:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=42434702</link><dc:creator>cchianel</dc:creator><comments>https://news.ycombinator.com/item?id=42434702</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42434702</guid></item></channel></rss>