Back in the early 90s I experimented with extremely primitive translations of various data sets into sound. The most basic translation of all is taking a computer file of any provenance (plain text, image, etc.), adding a standard wave file header and ‘calling it a sound file’. For instance, whilst programming and debugging C code during my student days in California, I often ended up staring at core dumps ¹ trying to figure out what had gone wrong. Here’s what one of these sounds like (turn down the volume for safety):

Whilst working on a new piece for Trio Abstrakt this year, and prompted by the lovely photographs of the musicians on their website, I wanted to try this again, in a more ‘intelligent’ and flexible manner—because including pictures of the trio transformed into sound files binds the players to the electronics in an admittedly obscure(d) manner, but composers play such games all the time, most often with numbers, which is also the case here, at the micro-level at least.

algorithm summary

I’ve included some sound files below, using the image at the top of this post in the generation of the first four examples. The Common Lisp (with slippery chicken and CLM) code is included below and has detailed comments which explain the process, but to summarise, the synthesis approach is a form of frequency modulation (hello John!) which functions as follows:

First of all, the pixels of the image are resized. This affects the sound file’s final duration and speed of spectral development, as the overall sound file duration is in fact determined by the number of pixels in the image. The pixels are then read row by row starting at the top, but rather than restarting the even rows at the left (after reaching the previous odd row’s last pixel on the right) we zig-zag through the rows left-to-right then right-to-left, etc., so as not to create any sudden jumps. Thus we always proceed with the nearest pixel left or right even when going down a row.

Various standard oscillator types (sine, sawtooth, square, triangle, pulse train), with a user-defined starting frequency, can be used with the sum of the red, green, and blue (RGB) values of each pixel first scaled, by default, to fit within a certain decibel range, even inverted if desired, before being passed as the fm-input to the oscillator function. The resultant sample vector, after oscillation pixel-by-pixel, is also, by default, normalised and made symmetrical around the zero point of the x-axis.

Also possible is a ‘basic’ translation, i.e. without oscillator, that merely maps the RGB sum onto sample values ranging from -1.0 to 1.0 but as this means that all black pixels will result in sample values of -1.0—which is essentially the same, because of its absolute value, as the white pixel value of 1.0—another quite basic but more flexible translation is offered where black correlates to 0 (silence, with no DC offset) and all other RGB sums tend, through interpolation, towards the extremes of 1.0 and -1.0 as they move towards white. In both cases though, what matters most for the sounding result, is how much nearby pixels change colour, as a DC offset (i.e. constant sample value) of 1.0 or -1.0 and everything in between will result in silence at the sound output stage.

Next up will be an investigation of these methods’ potential for live synthesis, using breath control data from a MIDI wind controller to manipulate the speed of ‘pixel reading’. This might make it’s way into my next piece for EW-4, an electronic wind quartet based in Switzerland, to be premiered in Spring 2023.

examples

Basic translation algorithm (no oscillators), image scaled by 0.02:

66Hz sine wave translation, image scaled by 0.05:

4205Hz sine wave translation, image scaled by 0.03:

1051Hz sawtooth wave translation, image scaled by 0.1:

Various image translations at work in an early demo mix of the beginning of in competence. Mixed with the translations is a 1960s recording of a speech from Shakespeare’s Richard II.

common lisp code

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; File:             image-snd.lsp
;;;
;;; Project:          in competent for trio abstract: translate image to sndfile
;;;
;;; Author:           Michael Edwards: m@michael-edwards.org
;;;
;;; Creation date:    February 5th 2022
;;;
;;; $$ Last modified:  12:36:12 Wed Sep 21 2022 CEST
;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(in-package :slippery-chicken)
(require :asdf)
(asdf:load-system :imago)
(load-from-same-dir "abstrakt-config.lsp")
(unless (fboundp 'clm::samps2sf)
  (compile-from-same-dir "samps2sf.ins"))
(shadowing-import 'imago:flip)
(shadowing-import 'imago:copy)
(shadowing-import 'imago:scale)
(use-package :imago)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; An alternative to using an oscillator: do waveshaping for the rgb pixel sum,
;;; mapping black (0) to sample value 0.0 and zig-zagging up/down from there,
;;; increasing the absolute value of the sample along the way. Returns one
;;; (float) sample value for the given pixel-sum.
(defun pixel-sum-shaper1 (pixel-sum &optional (max 765))
  (assert
   (and (integerp pixel-sum)
        (>= pixel-sum 0)
        (<= pixel-sum max)) (pixel-sum) "pixel-sum-shaper1: argument (~a) should be an integer >= 0 and <= ~a" pixel-sum max) (interpolate pixel-sum '(0. 0.0 29.577 0.04 49.296 -0.022 84.062 0.1 115.715 -0.094 166.048 0.177 220.533 -0.207 318.605 0.379 401.11 -0.426 525.127 0.715 631.502 -0.664 713.488 1. 755. -1.))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; This is the main method, using either a basic interpolation mapping (sim. to ;;; digital waveshaping) or a CLM oscillator to convert an rgb-pixel into a ;;; single sample value. We can pass an oscillator (which requires a method) or ;;; a method without an oscillator (see below) or neither. Returns one (float) ;;; sample value for the given rgb-pixel. 'osc' and 'method' are the oscillator ;;; unit generator and the oscillator function that operates upon it. If 'osc' ;;; is nil then basic interpolation is applied: either a simple mapping of the ;;; sum of the rgb pixels to values from -1.0 to 1.0, or if a 'method' is given ;;; (e.g. #'pixel-sum-shaper1) any arbitrary mapping the user desires. 'db-min' ;;; and 'db-max' are used to create an amplitude scaler for the returned ;;; samples; their values are of course in decibels, and the actual amplitude ;;; used is dependent on the rgb-pixel value. 'invert' turns white into black ;;; and vice-versa, inverting all colours inbetween. (defun pixel2sample (pixel &key osc method (db-min -12) (db-max 0) invert) (declare (type rgb-pixel pixel)) ;; should be able to do this but imago is throwing type errors: invert-color ;; returns negative numbers, so try something else below ;; (when invert (setf pixel (invert-color pixel))) ;; the pixel is a 32 bit unsigned-byte with the alpha value in the first byte ;; so we need to discard that and get the rgb values (multiple-value-bind (r g b) (color-rgb pixel) ; each of r g b is an unsigned byte so 0-255 ;; invert color, of sorts, 'by hand' (when invert (setq r (- 255 r) b (- 255 b) g (- 255 g))) (let* ((max (* 255 3)) (sum (+ r g b)) ;; we'll also use the pixel to determine an amplitude scaler (amp (rescale sum 0 max (db2amp db-min) (db2amp db-max)))) ;; in generating float samples from -1.0 to 1.0 we don't just want to move ;; from black (0) being -1.0 to white being 1.0, rather we want black to ;; be around 0.0 and the higher/closer to white a pixel becomes, then the ;; higher its absolute value and the more frequently it changes (* amp (if osc (if method (funcall method osc (clm:hz->radians sum))
                 (error "pixel2sample: :method required if :osc given"))
             (if method
                 ;; a function was provided (but not oscillator) so this method
                 ;; has to map an integer from 0 to max (755) to a float between
                 ;; -1.0 and 1.0
                 (funcall method sum)
                 (rescale sum 0 max -1.0 1.0)))))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; This is the main workhorse. 'image' is the path to a PNG file. 'base-freq'
;;; is the frequency from which we do our frequency modulation. 'scale' is a
;;; scaler for the image; the lower the value, the shorter the output sound file
;;; becomes; some experimentation will be necessary here as depending upon the
;;; resolution of the image file, the developments in the resultant sound file
;;; may become too slow or too quick to me musically useful. 'osc' is the
;;; oscillator to use and may be one of the standard shapes: sine, saw,
;;; triangle, square, and pulse train. 'invert' is a colour inverter (see
;;; pixel2sample). 'symmetrical' T processes the resulting sampes so that they
;;; are symmetrical around 0.0--similar to but not the same as removing DC
;;; offset. 'normalise' T will scale the samples so that they have a maximum
;;; absolute value of 1.0.  NB 'normalise' only works if symmetrical is t also
(defun image2samples (image &optional (base-freq 20) scale (method 'osc) invert
                              (symmetrical t) (normalise t))
  (let* ((img (read-image image))
         (iw (progn
               (when scale
                 (let ((s (sqrt scale)))
                   ;; must use the square root of the scale argument as we scale
                   ;; both the width and the height
                   (setq img (scale img s s))))
               (image-width img)))
         (iw-1 (1- iw))
         (pixels '())
         ;; in order to avoid dc offset we'll use a e.g. 20Hz sine wave to
         ;; represent black but with the pixel value used as fm-input /
         ;; glissando
         (osc (case method              ; can also be nil
                (osc (clm:make-oscil base-freq))
                (saw (clm:make-sawtooth-wave base-freq))
                (tri (clm:make-triangle-wave base-freq))
                (square (clm:make-square-wave base-freq))
                (pulse (clm:make-pulse-train base-freq))))
         (osc-fun (case method          ; can also be nil
                    (osc #'clm:oscil)
                    (saw #'clm:sawtooth-wave)
                    (tri #'clm:triangle-wave)
                    (square #'clm:square-wave)
                    (pulse #'clm:pulse-train))))
    (when (and method (not (functionp method)) (not osc))
      (error "image2samples: unknown method: ~a" method))
    (loop for row below (image-height img) do
      (loop for column below iw do
        (push (pixel2sample
               (aref (image-pixels img) row
                     ;; so we'll process rows in order but the columns
                     ;; will go back and forth so that we don't jump at
                     ;; the start of the next row
                     (if (evenp row)
                         column
                         (- iw-1 column)))
               :invert invert :osc osc :method (if osc-fun osc-fun method))
              pixels)))
    (when symmetrical
      (setq pixels (force-symmetrical-and-normalise
                    pixels :verbose t :min (if normalise 1.0 nil))))
    pixels))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; This is the main routine to render a sound file from an image (usually
;;; PNG). The first two arguments are the paths to the existing image and the
;;; sound file to be created. The keyword arguments are described in
;;; image2samples. 
(defun image2sndfile (image sndfile &key (base-freq 20) (srate 48000) scale
                                      (symmetrical t) (normalise t) invert
                                      (method 'osc))
  (let ((samples (image2samples image base-freq scale method invert symmetrical
                                normalise)))
    ;; (print (average samples))
    (clm:with-sound (:output sndfile :channels 1 :srate srate 
                             :reverb nil :statistics t)
      (clm::samps2sf samples))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; EOF

wikipedia: “a core dump, memory dump, crash dump … consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed” ↩

Comments

2 responses to “image to sound synthesis”

in competence: ZKM recording – michael edwards

October 4, 2023

[…] written about the image-to-sound synthesis software I developed particularly for this project elsewhere. More on the bespoke algorithms is also available in a separate […]

in competence: algorithms – michael edwards

October 4, 2023

[…] Ambisonics sound files were made, collated, and mixed in Reaper using IEM’s plugins. My image-to-sound synthesis algorithms were the main new sound file developments for the project along with significant and […]

image to sound synthesis

algorithm summary

examples

common lisp code

Comments

2 responses to “image to sound synthesis”

Leave a Reply Cancel reply

Cancel reply