Hacker News: zebproj

New comment by zebproj in "Virtual violin produces realistic sounds"

zebproj — Wed, 06 May 2026 11:55:22 +0000

The article makes it sound like this is a very a new idea, but physical models of music instruments, including violin, has been around for over 40 years. Daisy Bell, the first piece of computer music and performed by their model, utilized a physical model of the human singing voice based on measurements of human vocal tract, and that was done in 1962.

Julius Smith wrote pretty comprehensive textbook on the subject of building physical models of musical instruments, available online. Here, for example, is a chapter on modeling bowed string sounds: https://ccrma.stanford.edu/~jos/pasp/Bowed_Strings.html

New comment by zebproj in "How Brian Eno Created Ambient 1: Music for Airports (2019)"

zebproj — Tue, 02 Dec 2025 12:17:40 +0000

I often use the general algorithm for 2/1 as my "hello world" when I'm building new generative music systems. You don't need too many ingredients to set it up, and it yields some surprisingly decent sounding results.

The most recent one[0] I made was done when I was playing around with Rust, WASM, and WebAudio. (You'll need to press somewhere to start the sound)

0: https://pbat.ch/isorhythms/

Ask HN: How many of you are working in tech without a STEM degree?

zebproj — Wed, 23 Jul 2025 11:58:08 +0000

As someone without a STEM degree and who is largely self-taught, I'm interested in hearing about similar experiences. What is your story? What are you doing now? How long have you been doing it? etc, etc.

Comments URL: https://news.ycombinator.com/item?id=44658213

Points: 53

# Comments: 82

New comment by zebproj in "Show HN: A singing synthesizer for the browser with automatic 3-part harmony"

zebproj — Fri, 27 Dec 2024 02:17:34 +0000

Greetings,

What a beautiful idea. Sadly, I do not think I currently have the skills required to build such a tool.

The underlying algorithms and vocal models I'm using here are just good enough to get some singing vowels working. You'd need a far more complex model to simulate the turbulent airflow required for a cough.

If you suspend disbelief and allow for more abstract sounds, I believe you can craft sounds that have similar emotional impact. A few years ago, I made some non-verbal goblin sounds [0] from very simple synthesizer components and some well-placed control curves. Even though they don't sound realistic, character definitely comes through.

0: https://pbat.ch/gestlings/goblins

New comment by zebproj in "Show HN: A singing synthesizer for the browser with automatic 3-part harmony"

zebproj — Thu, 26 Dec 2024 19:10:53 +0000

Thanks everyone for the suggestions and kind words.

Some details:

The source code for this project can be found on github [0].

I am using an AudioWorklet node with custom DSP using Rust/WebAssembly. Graphics are just done with the Canvas API. The voice leading is done algorithmically using a state machine with some heuristics.

The underlying DSP algorithm is a physical model of the human voice, similar to the model you'd find in Pink Trombone [1], but with some added improvements. The DSP code for that is a small crate [2] I've been working on just for singing synthesizers based on previous work I've done.

0: https://github.com/paulBatchelor/trio

1: https://dood.al/pinktrombone/

2: https://github.com/PaulBatchelor/voxbox

New comment by zebproj in "Show HN: A singing synthesizer for the browser with automatic 3-part harmony"

zebproj — Thu, 26 Dec 2024 19:05:08 +0000

Thanks, I'll look into it

New comment by zebproj in "Show HN: A singing synthesizer for the browser with automatic 3-part harmony"

zebproj — Thu, 26 Dec 2024 19:04:33 +0000

A tutorial would be helpful.

Holding down a note and waiting will cause a second, then a third not to appear. When you move your held, note to another pitch, the other pitches will follow, but with a bit of delay. This produces what is known as staggered voice leading, and produces interesting "in-between" chords.

Show HN: A singing synthesizer for the browser with automatic 3-part harmony

zebproj — Thu, 26 Dec 2024 05:14:00 +0000

Article URL: https://pbat.ch/recurse/demos/trio/

Comments URL: https://news.ycombinator.com/item?id=42513276

Points: 208

# Comments: 34

Show HN: A singing synthesizer for the browser with algorithmic 3-part harmony

zebproj — Sun, 04 Aug 2024 11:35:06 +0000

Hello HN!

This is a demo I built during my batch at the Recurse Center. It's built using WebAudio, WebAssembly (via Rust), and the Canvas API. The source code for this demo, along with my other vocal synthesis related RC demos[0], can be found at a monorepo[1] where I've been dumping all my RC work and logs.

The sound is generated using a physical model of the human singing voice, using a work-in-progress project called VoxBox [2].

The harmonization is done using something that kind of resembles a markov chain. Only, instead of using weighted probabilities and randomness, I use a selection heuristic that chooses a chord based on how often it has been used and how much voice movement is required.

Thanks for reading! Happy to answer any other questions.

0: https://pbat.ch/recurse/demos/

1: https://github.com/PaulBatchelor/Recurse/tree/main/scratch/t...

2: https://github.com/paulBatchelor/voxbox

Comments URL: https://news.ycombinator.com/item?id=41152848

Points: 3

# Comments: 1

New comment by zebproj in "Csound"

zebproj — Sat, 02 Sep 2023 22:44:25 +0000

oh yeah, I made that.

Sporth is a stack-based language I wrote a few years ago. Stack-based languages are a great way to build up sound structures. I highly recommend trying it.

Chorth may need some fixes before it can run again. I haven't looked at it in a while, but I had a lot of fun using when I was in SLOrk.

New comment by zebproj in "Csound"

zebproj — Sat, 02 Sep 2023 13:42:50 +0000

If you compare codebases, SuperCollider is definitely the more "modern" of the 2. SC is written in a reasonably modern version of C++, and over the years has gone through significant refactoring. Csound is mostly implemented in C, with some of the newer bits written in C++. Many parts of Csound have been virtually untouched since the 90s.

Syntax-wise, Csound very closely resembles the MUSIC-N language used by early computer musicians in the 60s. "Trapped in Convert" by Richard Boulanger was written in Csound in 1979, and to this day is able to run on the latest version of Csound.

Both Csound and SC are both very capable DSP engines, with a good core set of DSP algorithms. You can get a "good" sound out of both if you know what you are doing.

I find people who are more CS-inclined tend to prefer SuperCollider over Csound because it's actually a programming language you can be expressive in. While there have been significant syntax improvements in Csound 6, I'd still call Csound a "text-based synthesizer" rather than a "programming language".

That being said, I also think Csound lends itself to those who have more of a formal background in music. Making an instrument in an Orchestra is just like making a synthesizer patch, and creating events in a Csound score is just like composing notes for an instrument to play.

FWIW, I've never managed to get SuperCollider to stick for me. The orchestra/score paradigm of Csound just seems to fit better with how I think about music. It's also easier to offline render WAV files in Csound, which was quite helpful for me.

New comment by zebproj in "Csound"

zebproj — Sat, 02 Sep 2023 12:48:13 +0000

You might enjoy my project called sndkit [0]. It's a collection of DSP algorithms implemented in C, written in a literate programming style, and presented inside of a static wiki. There's also a tiny TCL-like scripting language included that allows one to build up patches. This track [1] was made entirely using sndkit.

0: https://pbat.ch/sndkit/

1: https://soundcloud.com/patchlore/synthwave

New comment by zebproj in "Csound"

zebproj — Sat, 02 Sep 2023 12:41:10 +0000

I actually met BT and asked him about this track.

While it's mostly written in Csound, he "cheated" with the guitar track, which was a recorded sample brought into Csound.

New comment by zebproj in "An IBM computer learned to sing in 1961"

zebproj — Sat, 13 May 2023 00:20:20 +0000

See my other comments here for more info about the underlying technology.

It is pretty incredible that sophisticated digital physical models of the human vocal tract were being done in the early 60s. This was able to be done largely due to the deep pockets of Bell Labs. A lot of R+D was put into the voice and voice transmission.

New comment by zebproj in "An IBM computer learned to sing in 1961"

zebproj — Sat, 13 May 2023 00:14:43 +0000

Many of the simpler vocal tract physical models are very similar to the cascaded allpass filter topologies found in LPC speech synthesizers.

In general, tract physical models have never sounded all that realistic. The one big thing they have going for them is control. Compared to other speech synthesis techniques, they can be quite malleable. Pink Trombone [1] uses a physical model under the hood. While it's not realistic sounding, the interface is quite compelling.

1: https://dood.al/pinktrombone/

New comment by zebproj in "An IBM computer learned to sing in 1961"

zebproj — Fri, 12 May 2023 17:37:17 +0000

Sort of. Both use articulatory synthesis, which attempts to model speech by breaking it up into components and using some coordinated multi-dimensional continuous control to perform phonemes (the articulation aspect). The voder uses analog electronics, while Daisy does it digitally (and without a human performer).

The underlying signal processing used for both is different, but both use a source-filter mechanism.

New comment by zebproj in "An IBM computer learned to sing in 1961"

zebproj — Fri, 12 May 2023 17:31:57 +0000

> Does that mean that the similarity in sound to formant-based speech synthesis is because they're both using a sawtooth wave, noise, or other relatively simple sound as the raw input?

Essentially, yes. Both are known as "source-filter" models. A sawtooth, narrow pulse, or impulse wave is a good approximation glottal excitation for the source signal, though many articulatory speech models use a more specialized source model that's analytically derived from real waveforms produce by the glottis. The Lilencrantz-Fant Derivative Glottal Waveform model is the most common, but a few others exist.

In formant synthesis, the formant frequencies are known ahead of time and are explicitly added to the spectrum using some kind of peak filter. With waveguides, those formants are implicitly created based on the shape of the vocal tract (the vocal tract here is approximated as a series of cylindrical tubes with varying diameters).

New comment by zebproj in "An IBM computer learned to sing in 1961"

zebproj — Fri, 12 May 2023 15:46:24 +0000

The singing synthesizer used a surprisingly sophisticated physical model of the human voice [1].

The music was mostly likely created using some variant of MUSIC-N [2], the first computer music language. The syntax and design of Csound[3] was based off of MUSIC-N, and I believe the older Csound opcodes are either ported or based off those found.

Apparently the sources for MUSIC-V (the last major iteration of the MUSIC language) can be found on github [4], though I haven't tried to run it yet.

1: https://ccrma.stanford.edu/~jos/pasp/Singing_Kelly_Lochbaum_...

2: https://en.wikipedia.org/wiki/MUSIC-N

3: https://en.wikipedia.org/wiki/Csound

4: https://github.com/vlazzarini/MUSICV

New comment by zebproj in "An IBM computer learned to sing in 1961"

zebproj — Fri, 12 May 2023 15:39:42 +0000

The neat thing about this particular singing synthesizer is that it used a surprisingly sophisticated (especially for the 60s) physical model of the human vocal tract [1], and was perhaps the first use of physical modeling sound synthesis. Vowel shapes were obtained through physical measurements of an actual vocal tract via x-rays. In this case, they were Russian vowels, but were close enough for English.

While this particular kind of speech synthesis[2] isn't really used anymore, it's still fun to play around with. Pink Trombone [3] is a good example of a fun toy that uses a waveguide physical model, similar to the Kelly-Lochbaum model above. I've adapted some of the DSP in Pink Trombone a few times[4][5][6], and used it in some music[7] and projects[8]of mine.

For more in-depth information about specifically doing singing synthesis (as opposed to general speech synthesis) using waveguide physical models, Perry Cook's Dissertation [9] is still considered to be a seminal work. In the early 2000s, there were a handful of follow-ups to physically-based singing synthesis being done at CCRMA. Hui-Ling Lu's dissertation [10] on glottal source modelling for singing purposes comes to mind.

1: https://ccrma.stanford.edu/~jos/pasp/Singing_Kelly_Lochbaum_...

2: https://en.wikipedia.org/wiki/Articulatory_synthesis

3: https://dood.al/pinktrombone/

4: https://pbat.ch/proj/voc/

5: https://pbat.ch/sndkit/tract/

6: https://pbat.ch/sndkit/glottis/

7: https://soundcloud.com/patchlore/sets/looptober-2021

8: https://pbat.ch/wiki/vocshape/

9: https://www.cs.princeton.edu/~prc/SingingSynth.html

10: https://web.archive.org/web/20080725195347/http://ccrma-www....

New comment by zebproj in "Aubio, a C library for analyzing songs"

zebproj — Tue, 21 Sep 2021 22:10:19 +0000

From the website:

> Note: aubio is not MIT or BSD licensed. Contact the author if you need it in your commercial product.